Quarter of NVIDIA A100
$0.40 per GPU-hour
You can request any amount of CPU and RAM during the installation process.
Note: Puzl does not offer support for this solution. Please refer to the community support options mentioned above.
An open-source deep-learning training platform. Train models faster and in a more reliable way with out-of-the-box support of training distributions between multiple GPUs, experiment tracking, log management, metrics, checkpoints and many more without changing your model code.
There's no need to manage virtual machines and pay for the excess infrastructure. With Puzl, you can flexibly configure CPU, RAM and GPU requests for Pods running your experiments. You're paying only for resources actually utilized by your experiments.
Puzl provides you with Determined installed over Kubernetes. There are some limitations.
Click the 'Install' button at the top of this page and setup a Volume for your experiments. This Volume/Volumes will be mounted to each experiment's Pod by the path /media/volume_name
.
Note that, Determined AI can launch multiple experiments. There will be a Pod created to run each experiment. This Pod will be terminated when task is finished.
You can pick either a persistent fast NVMe-based Data Storage or a Shared File System Data Storage.
⚠️ If you're going to run several experiments at the same time, it could be better to opt for Shared File System Data Storage. It is slower than NVMe-based Data Storage, but can be mounted to several Pods within your namespace at the same time.
Checkpoint object storage configuration - AWS S3 object storage to store experiment, and trial metadata. If you want to setup S3 bucket later or to use a bucket not from AWS you can do it in the experiment configuration.
⚠️ It's highly recommended to setup an S3 bucket. Otherwise, experiment will fail with the following error:
ERROR: Checkpoint storage validation failed: Parameter validation failed: Invalid bucket name
Experiment configuration - Determined AI can launch multiple experiments. There will be a Pod created to run each experiment. Here you can setup RAM, CPU and GPU requests for your experiments. It's possible to change the amount of resources an individual task is allowed to use in the Determined dashboard.
PostgreSQL - Configure a Volume for your database. Determined requires a PostgreSQL database to work.
Determined AI - configuration for a primary permanent Pod, required to provide Determined AI work.
‼️ WARNING: In Determined, notebook state is not persistent by default. If a failure occurs (e.g., the agent hosting the notebook crashes), the content of the notebook will not be saved.
Installation can take up to a few minutes.
On your Determined AI page find section Access. There you will find a link to access your Determined dashboard, username and password. Your Determined AI app will be opened in a new tab. To access the interface you will need to enter the username and a password that were created for you.
Initially, a Determined installation has two user accounts: admin
and determined
. Both of these accounts have same password, generated for you during the installation process.
The admin
user has the sole privilege to create users, change other users’ passwords, and activate/deactivate users.
The determined
user is designed for ease of use on a single-user installation.
To access previously mounted Volume from Jupyter Notebook, in the Notebook's terminal run:
cd /media/<volume_name>
You can easily check resources consumption and estimate costs:
To copy your dataset to volume, you can use Puzl SSH server app.
Determined includes a command-line tool called det that may be used to interact with Determined.
After the CLI has been installed, it should be configured to connect to the Determined master at the appropriate IP address. This can be accomplished by setting the DET_MASTER
environment variable:
export DET_MASTER=https://<determined_host>
To access locally (e.g. from Jupyter Notebook):
export DET_MASTER=http://<internal_host>:<internal_port>
Software | License |
---|---|
Determined AI | Apache 2.0 |