Determined AI 0.14.3

Run Determined AI deep-learning training platform in cloud. Build models easier with GPU and out-of-the box experiment tracking. No need to manage cloud instances.
Config Version
7.0.0
Cost per hour*
$0.009
Cost per month*
$6.57
Recommended minimal configuration
CPU
0.5vCPU
RAM
0.75GB
Volume
5GB
Optional resources
GPU

Quarter of NVIDIA A100

$0.40 per GPU-hour

You can request any amount of CPU and RAM during the installation process.

About this App
Support

Note: Puzl does not offer support for this solution. Please refer to the community support options mentioned above.

Description

An open-source deep-learning training platform. Train models faster and in a more reliable way with out-of-the-box support of training distributions between multiple GPUs, experiment tracking, log management, metrics, checkpoints and many more without changing your model code.

There's no need to manage virtual machines and pay for the excess infrastructure. With Puzl, you can flexibly configure CPU, RAM and GPU requests for Pods running your experiments. You're paying only for resources actually utilized by your experiments.

How To Deploy Determined AI

Install Determined AI in your Puzl cloud dashboard

Puzl provides you with Determined installed over Kubernetes. There are some limitations.

Click the 'Install' button at the top of this page and setup a Volume for your experiments. This Volume/Volumes will be mounted to each experiment's Pod by the path /media/volume_name.

Note that, Determined AI can launch multiple experiments. There will be a Pod created to run each experiment. This Pod will be terminated when task is finished.

You can pick either a persistent fast NVMe-based Data Storage or a Shared File System Data Storage.

⚠️ If you're going to run several experiments at the same time, it could be better to opt for Shared File System Data Storage. It is slower than NVMe-based Data Storage, but can be mounted to several Pods within your namespace at the same time.

Checkpoint object storage configuration - AWS S3 object storage to store experiment, and trial metadata. If you want to setup S3 bucket later or to use a bucket not from AWS you can do it in the experiment configuration.

⚠️ It's highly recommended to setup an S3 bucket. Otherwise, experiment will fail with the following error:

ERROR: Checkpoint storage validation failed: Parameter validation failed: Invalid bucket name

Advanced settings:

  1. Experiment configuration - Determined AI can launch multiple experiments. There will be a Pod created to run each experiment. Here you can setup RAM, CPU and GPU requests for your experiments. It's possible to change the amount of resources an individual task is allowed to use in the Determined dashboard.

  2. PostgreSQL - Configure a Volume for your database. Determined requires a PostgreSQL database to work.

  3. Determined AI - configuration for a primary permanent Pod, required to provide Determined AI work.

‼️ WARNING: In Determined, notebook state is not persistent by default. If a failure occurs (e.g., the agent hosting the notebook crashes), the content of the notebook will not be saved.

Getting started after installing Determined AI

Installation can take up to a few minutes.

On your Determined AI page find section Access. There you will find a link to access your Determined dashboard, username and password. Your Determined AI app will be opened in a new tab. To access the interface you will need to enter the username and a password that were created for you.

Initially, a Determined installation has two user accounts: admin and determined. Both of these accounts have same password, generated for you during the installation process.

  • The admin user has the sole privilege to create users, change other users’ passwords, and activate/deactivate users.

  • The determined user is designed for ease of use on a single-user installation.

To access previously mounted Volume from Jupyter Notebook, in the Notebook's terminal run:

cd /media/<volume_name>

You can easily check resources consumption and estimate costs:

determined ai puzl pods table

Copy your dataset to Volume

To copy your dataset to volume, you can use Puzl SSH server app.

Install Determined CLI

Determined includes a command-line tool called det that may be used to interact with Determined.

After the CLI has been installed, it should be configured to connect to the Determined master at the appropriate IP address. This can be accomplished by setting the DET_MASTER environment variable:

export DET_MASTER=https://<determined_host>

To access locally (e.g. from Jupyter Notebook):

export DET_MASTER=http://<internal_host>:<internal_port>
This app is based on the following software
SoftwareLicense
Determined AIApache 2.0