Fine Tuning (coming soon)
Guide to creating, managing, and monitoring serverless fine-tuning jobs
This guide will walk you through the creation, management, and monitoring of fine-tuning jobs.
What is fine tuning?
Fine tuning is the process of taking a pre-trained machine learning model and training it further on a specific dataset to adapt the model to a particular task or domain. In the context of this service, fine tuning allows you to customise foundation models using your own data, improving performance on your unique workloads.
A typical fine-tuning workflow consists of:
- Selecting a base model: Choose a pre-trained model as the starting point.
- Providing training and validation datasets: Supply data that represents your target use case.
- Launching a fine-tuning job: Start the process, which will train the model on your data.
- Monitoring job status: Track progress, review logs, and handle errors.
- Retrieving the tuned model: Once complete, access your custom model for inference.
Prerequisites
- JWT: Grab a long lived JWT from the Nscale CLI
- Your organization’s ID: Find your organization ID from the N scale CLI
- A Dataset from which to train your new model
Step 1 - Select a base model
The supported models we offer are detailed below.
You can access this list using our current supported models using:
Step 2 - Create a fine tuning job
Once you’ve chosen a base model upon which to train the next step is deciding whether to use a LoRA or not. LoRA stands for Low-Rank Adaptation, a technique used to fine-tune large language models more efficiently by injecting small, trainable adapter layers into the model, rather than updating all of the model’s parameters. Enabling LoRA lets you fine-tune large models with less hardware and cost, while disabling LoRA gives you full control and flexibility at the expense of higher resource usage.
Without LoRA Payload:
With LoRA:
A successful response will look something like this:
Step 3 - Check the status of your jobs
You can check the status of your jobs using the list endpoint. These jobs can be filtered by job status: queued
, running
, completed
or failed
Step 4 - Monitor training metrics for your job
After launching a fine-tuning job, you can monitor its progress and evaluate its performance by retrieving training and evaluation metrics. These metrics help you understand how well your model is learning and when it might be finished or require intervention.
How to use these metrics
- Monitor training progress: Watch
train_loss
andeval_loss
to ensure your model is learning and not overfitting. - Compare experiments: Use metrics to compare different jobs, hyperparameters, or datasets.
- Regularly polling this endpoint during training allows you to make data-driven decisions, such as stopping training early if the model converges or adjusting hyperparameters for future runs.
Step 5 - Use your fine-tuned model
Once your job is complete, you can either export your model to Hugging Face or download it directly for local use.
Option 1: Export to Hugging Face
We support direct to hugging face upload of your model. This allows you to start using your model straight away or share it with others on the hugging face platform
Option 2: Download your model
To download your new model and start using it straight away, you can call this endpoint, which will redirect you to the download URL. From there, you can download the tar.gz archive.