> ## Documentation Index
> Fetch the complete documentation index at: https://docs.nscale.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Datasets 

> This guide will walk you through the creation, management, and deletion of datasets for fine-tuning of foundational language models.

<Warning>
  **Fine-tuning service going offline**

  As of May 20, 2026, the existing fine-tuning service is offline while we prepare the next version. If you still need to export models or datasets, [contact support](mailto:support@nscale.com).

  [Learn more <Icon icon="arrow-up-right" size="11" />](/docs/faqs/deprecations)
</Warning>

# What is a dataset?

A dataset is a collection of files used in the fine-tuning process. It consists of a mandatory training file and an optional validation file.

* **Training File (Required):** Contains the data used to teach the model. The model learns from these examples and adjusts its internal weights accordingly.
* **Validation File (Optional):** Contains data not present in the training set. It is used to gauge how well the new model performs on unseen data, which helps detect overfitting. A validation file is highly recommended for robust model evaluation.

<Note>
  If a validation dataset is not provided, our fine-tuning service will randomly select 1% of your training dataset to evaluate the fine-tuning at the end of
  the fine-tuning process and provide evaluation metrics.
</Note>

### Data Formatting Requirements

**1. File Format**

The fine-tuning service accepts files only in **CSV (Comma-Separated Values)** format.

**2. Column Structure**

Your CSV files must contain the following columns:

* `prompt`**(Optional):** This column should contain the input text, instruction, or question for the model.
* `answer`**(Required):** This column must contain the desired output or response from the model.

Here is an example of a valid CSV file for fine-tuning:

| question                                          | answer                                                                                                                                       |
| :------------------------------------------------ | :------------------------------------------------------------------------------------------------------------------------------------------- |
| What is the capital of France?                    | The capital of France is Paris.                                                                                                              |
| Who wrote "To Kill a Mockingbird"?                | Harper Lee wrote "To Kill a Mockingbird".                                                                                                    |
| Explain the theory of relativity in simple terms. | The theory of relativity, developed by Albert Einstein, describes how gravity is a property of spacetime, and how space and time are linked. |

### Dataset Overview

A dataset is the artifact used by a fine-tuning job. It consists of a required training file and an optional validation file. The diagram below illustrates the relationship between a dataset, its component files, and the required format.

```mermaid theme={null}
graph TD
    A[Dataset] --> B["Training File <br/>(Required)"]
    A --> C["Validation File <br/>(Optional)"]

    B --> F["prompt column <br/>(Optional)"]
    B --> G["answer column <br/>(Required)"]

    C --> H["prompt column <br/>(Optional)"]
    C --> I["answer column <br/>(Required)"]

    style A fill:#1565c0,stroke:#0d47a1,stroke-width:2px,color:#fff
    style B fill:#2e7d32,stroke:#1b5e20,stroke-width:2px,color:#fff
    style C fill:#2e7d32,stroke:#1b5e20,stroke-width:2px,color:#fff

    classDef required stroke:#ffffff,stroke-width:3px,color:                                                                                                                                            8 4,color:#fff,fill:#2e7d32
    classDef optional stroke:#ff9800,stroke-width:2px,color:5 5,color:#fff,fill:#f57c00
    classDef answerCol stroke:#7b1fa2,stroke-width:2px,color:#fff,fill:#9c27b0

    class B required
    class C,F,H optional
    class G,I answerCol
```

### Dataset Management Workflow

Once you have prepared your training and validation files in the required CSV format, you can create a dataset to start fine-tuning your model.

### Prerequisites

Before you begin, ensure you have the following:

* **Service Token (JWT):** A valid JWT is required to authenticate your requests. Please see our guide on [how to create a service token](/docs/manage/service-tokens).
* **Organization ID:** You can find your Organization ID by navigating to **Settings → Organisation** in the Nscale platform.

## Create Dataset

### Step 1: Upload Your Files

Your training and optional validation CSV files must be uploaded individually. Each successful upload returns a response containing a unique `id`. It is essential to save the `id` for each uploaded file, as you will need them in the next step to create your dataset.

```bash theme={null}
 curl -X POST https://fine-tuning.api.nscale.com/api/v1/organizations/$ORGANIZATION_ID/files \
 -H "Authorization: Bearer $NSCALE_API_TOKEN" \
 -H 'Content-Type: multipart/form-data' \
 -H 'Accept: application/json' \
 -F 'file=@"<PATH_TO_FILE>"'
```

### Step 2: Create a New Dataset

Once you have the file `id` for your training and validation files, you can create a dataset. A dataset groups these files under a single ID that you'll use to start a fine-tuning job.

To create a dataset, provide a **name**, the file`id` for your **training file**, and optionally, the file `id` for your **validation file**.

```bash theme={null}

curl -X POST "https://fine-tuning.api.nscale.com/api/v1/organizations/$ORGANIZATION_ID/datasets" \
  -H "Authorization: Bearer $NSCALE_API_TOKEN" \
  -H 'Content-Type: application/json' \
  -H 'Accept: application/json' \
  -d '{
 "name": "example_dataset",
 "training_file_id": "682d47e8-6d65-4c9a-a9fe-0d695c610366",
 "validation_file_id": "4df01235-360e-4b7c-816e-da3e370de6c2" // optional}
```

A successful request creates the dataset artifact and returns its details, including the new dataset `id` . With your new dataset created, you're ready to start fine-tuning. See the [**Fine-Tuning**](/docs/ai-services/fine-tuning) guide for the next steps.

### List all Datasets

To retrive a list of all datasets, use:

```bash theme={null}
curl -X GET "https://fine-tuning.api.nscale.com/api/v1/organizations/$ORGANIZATION_ID/datasets" \
-H "Authorization: Bearer $NSCALE_API_TOKEN"
```

### Get a Dataset

To get a particular dataset, use:

```bash theme={null}
curl -X GET "https://fine-tuning.api.nscale.com/api/v1/organizations/$ORGANIZATION_ID/datasets/$DATASET_ID"
-H "Authorization: Bearer $NSCALE_API_TOKEN"
```

### Delete a Dataset

To delete a dataset, use:

```bash theme={null}
curl -X DELETE "https://fine-tuning.api.nscale.com/api/v1/organizations/$ORGANIZATION_ID/datasets/$DATASET_ID" \
-H "Authorization: Bearer $NSCALE_API_TOKEN" \
-H 'Content-Type: application/json' \
-H 'Accept: application/json'
```
