This guide will walk you through the creation, management, and monitoring of fine-tuning jobs.

What is fine tuning?

Fine tuning is the process of taking a pre-trained machine learning model and training it further on a specific dataset to adapt the model to a particular task or domain. In the context of this service, fine tuning allows you to customise foundation models using your own data, improving performance on your unique workloads.

A typical fine-tuning workflow consists of:

  • Selecting a base model: Choose a pre-trained model as the starting point.
  • Providing training and validation datasets: Supply data that represents your target use case.
  • Launching a fine-tuning job: Start the process, which will train the model on your data.
  • Monitoring job status: Track progress, review logs, and handle errors.
  • Retrieving the tuned model: Once complete, access your custom model for inference.

Prerequisites

  1. JWT: Grab a long lived JWT from the Nscale CLI
  2. Your organization’s ID: Find your organization ID from the N scale CLI
  3. A Dataset from which to train your new model

Step 1 - Select a base model

The supported models we offer are detailed below.

models/meta-llama/Meta-Llama-3-8B-Instruct
models/meta-llama/Meta-Llama-3-8B
models/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
models/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
models/mistralai/Mistral-7B-Instruct-v0.2
models/mistralai/Mistral-7B-Instruct-v0.1

You can access this list using our current supported models using:

curl -X GET "https://fine-tuning.api.nscale.com/api/v1/organizations/$ORGANIZATION_ID/base-models?offset=0&limit=10" \
  -H "Authorization: Bearer $NSCALE_API_TOKEN" \
  -H 'Accept: application/json'

Step 2 - Create a fine tuning job

Once you’ve chosen a base model upon which to train the next step is deciding whether to use a LoRA or not. LoRA stands for Low-Rank Adaptation, a technique used to fine-tune large language models more efficiently by injecting small, trainable adapter layers into the model, rather than updating all of the model’s parameters. Enabling LoRA lets you fine-tune large models with less hardware and cost, while disabling LoRA gives you full control and flexibility at the expense of higher resource usage.

Without LoRA Payload:

curl -X POST "https://fine-tuning.api.nscale.com/api/v1/organizations/$ORGANIZATION_ID/jobs" \
  -H "Authorization: Bearer $NSCALE_API_TOKEN" \
  -H 'Content-Type: application/json' \
  -H 'Accept: application/json' \
  -d '{
    "name": "job name",
    "base_model_id": "<Base_Model_ID>",
    "dataset": {
      "id": "<Dataset_ID>",
      "prompt_column": "prompts_column",
      "answer_column": "answer_column"
    },
    "hyperparameters": {
      "n_epochs": 3,
      "evaluation_epochs": 1,
      "warmup_epochs": 0,
      "batch_size": 32,
      "learning_rate": 0.0002,
      "weight_decay": 0.0,
      "best_checkpoints": true,
      "lora": {
        "enabled": false
      }
    },
    "dry_run": false
  }'

With LoRA:

curl -X POST "https://fine-tuning.api.nscale.com/api/v1/organizations/$ORGANIZATION_ID/jobs" \
  -H "Authorization: Bearer $NSCALE_API_TOKEN" \
  -H 'Content-Type: application/json' \
  -H 'Accept: application/json' \
  -d '{
    "name": "job name",
    "base_model_id": "<Base_Model_ID>",
    "dataset": {
      "id": "<Dataset_ID>",
      "prompt_column": "prompts_column",
      "answer_column": "answer_column"
    },
    "hyperparameters": {
      "n_epochs": 3,
      "evaluation_epochs": 1,
      "warmup_epochs": 0,
      "batch_size": 32,
      "learning_rate": 0.0002,
      "weight_decay": 0.0,
      "best_checkpoints": true,
      "lora": {
        "enabled": false,
        "r": 8,
        "alpha": 16,
        "dropout": 0.05,
        "trainable_modules": ["v_proj"]
      }
    },
    "dry_run": false
  }'

A successful response will look something like this:

{
    "dry_run": false,
    "job": {
        "id": "227b5607-cca4-42d1-8ac8-ad1f6c9b44ef",
        "name": "Example",
        "base_model": "DeepSeek-R1-Distill-Qwen-1.5B",
        "dataset": {
            "id": "cd9d82c1-e4ba-4cba-b8c6-b0c3e031d8c2",
            "prompt_column": "sample_question",
            "answer_column": "sample_answer"
        },
        "hyperparameters": {
            "n_epochs": 1,
            "evaluation_epochs": 1,
            "warmup_epochs": 0,
            "batch_size": 8,
            "learning_rate": 0.00001,
            "weight_decay": 0.01,
            "best_checkpoints": true,
            "lora": {
                "enabled": true,
                "r": 8,
                "alpha": 8,
                "dropout": 0,
                "trainable_modules": [
                    "up_proj"
                ]
            }
        },
        "status": "queued",
        "started_at": null,
        "completed_at": null
    },
    "estimated_usage": {
        "tokens": 0,
        "cost": 0
    }
}

Step 3 - Check the status of your jobs

You can check the status of your jobs using the list endpoint. These jobs can be filtered by job status: queued, running, completed or failed

curl -X GET "https://fine-tuning.api.nscale.com/api/v1/organizations/$ORGANIZATION_ID/jobs?offset=0&limit=10&status=running" \
  -H "Authorization: Bearer $NSCALE_API_TOKEN" \
  -H 'Accept: application/json'

Step 4 - Monitor training metrics for your job

After launching a fine-tuning job, you can monitor its progress and evaluate its performance by retrieving training and evaluation metrics. These metrics help you understand how well your model is learning and when it might be finished or require intervention.

curl -X GET "https://fine-tuning.api.nscale.com/api/v1/organizations/$ORGANIZATION_ID/jobs/$JOB_ID/metrics" \
  -H "Authorization: Bearer $NSCALE_API_TOKEN" \
  -H 'Accept: application/json'

How to use these metrics

  • Monitor training progress: Watch train_loss and eval_loss to ensure your model is learning and not overfitting.
  • Compare experiments: Use metrics to compare different jobs, hyperparameters, or datasets.
  • Regularly polling this endpoint during training allows you to make data-driven decisions, such as stopping training early if the model converges or adjusting hyperparameters for future runs.

Step 5 - Use your fine-tuned model

Once your job is complete, you can either export your model to Hugging Face or download it directly for local use.

Option 1: Export to Hugging Face

We support direct to hugging face upload of your model. This allows you to start using your model straight away or share it with others on the hugging face platform

curl -X POST "https://fine-tuning.api.nscale.com/api/v1/organizations/$ORGANIZATION_ID/jobs/$JOB_ID/exports/hugging-face" \
  -H "Authorization: Bearer $NSCALE_API_TOKEN" \
  -H 'Content-Type: application/json' \
  -H 'Accept: application/json' \
  -d '{
    "huggingface_user_id": "your-hf-username",
    "huggingface_access_token": "hf_xxx",
    "huggingface_repository_id": "your-hf-username/your-model-repo"
  }'

Option 2: Download your model

To download your new model and start using it straight away, you can call this endpoint, which will redirect you to the download URL. From there, you can download the tar.gz archive.

curl -X GET "https://fine-tuning.api.nscale.com/api/v1/organizations/$ORGANIZATION_ID/jobs/{job_id}/download" \
  -H "Authorization: Bearer $NSCALE_API_TOKEN"