> ## Documentation Index
> Fetch the complete documentation index at: https://docs.nscale.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Chat

> This guide will walk you through integrating a chat model into your application using Nscale’s API. With our serverless architecture, you can focus on building your application without worrying about infrastructure management.

## Prerequisites

1. **Service Token:** Sign up on the Nscale platform to create your [**service token**](/docs/manage/service-tokens).
2. **Model Selection:** Choose a chat model from Nscale’s library.
   * Example: Llama 3.1 8B Instruct `meta-llama/Llama-3.1-8B-Instruct`

## Step 1: Set up your environment

Before making requests, ensure you have the necessary tools installed for your language of choice:

For Python:
Install openai library

```bash theme={null}
pip install openai
```

For Typescript:
Install openai library

```bash theme={null}
npm install openai
```

For cURL:
Ensure cURL is installed on your system (it's usually pre-installed on most Unix-based systems).

## Step 2: Sending an inference request

Let's walk through an example where we summarise a blog post into 100 words.

**Request structure**

Each request to the Nscale Chat Completions API endpoint should include the following:

1. Headers:
   * `"Authorization": "Bearer <SERVICE-TOKEN>"`
   * `"Content-Type": "application/json"`
2. Payload:
   * `"model"`: `"<model id e.g., meta-llama/Llama-3.1-8B-Instruct>"`
   * `"messages"`: `"<array of messages to send to the model>"`

Example use case: Summarise a blog post

<CodeGroup>
  ```python Python theme={null}
  import os
  import openai

  nscale_service_token = os.getenv("NSCALE_SERVICE_TOKEN")
  nscale_base_url = "https://inference.api.nscale.com/v1"

  client = openai.OpenAI(
      api_key=nscale_service_token,
      base_url=nscale_base_url
  )

  blog_text = "Serverless inference simplifies access to AI models..."

  response = client.chat.completions.create(
      model="meta-llama/Llama-3.1-8B-Instruct",
      messages=[
          {"role": "system", "content": "Provide a summary of the blog post in 100 words."},
          {"role": "user", "content": blog_text}
      ]
  )

  print(response.choices[0].message.content)
  ```

  ```typescript Typescript theme={null}
  import OpenAI from 'openai';

  const nscaleServiceToken = process.env.NSCALE_SERVICE_TOKEN;
  const nscaleBaseUrl = "https://inference.api.nscale.com/v1";

  const client = new OpenAI({
      apiKey: nscaleServiceToken,
      baseURL: nscaleBaseUrl
  });

  const blogText = "Serverless inference simplifies access to AI models...";

  const response = await client.chat.completions.create({
      model: "meta-llama/Llama-3.1-8B-Instruct",
      messages: [
          { role: "system", content: "Provide a summary of the blog post in 100 words." },
          { role: "user", content: blogText }
      ]
  });

  console.log(response.choices[0].message.content);
  ```

  ```bash cURL theme={null}
  curl https://inference.api.nscale.com/v1/chat/completions \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer $NSCALE_SERVICE_TOKEN" \
    -d '{
      "model": "meta-llama/Llama-3.1-8B-Instruct",
      "messages": [
          { "role": "system", "content": "Provide a summary of the blog post in 100 words." },
          { "role": "user", "content": "Serverless inference simplifies access to AI models..." }
      ]
    }'
  ```
</CodeGroup>

## Step 3: Understanding the response

The API will return a JSON object containing the model's output and token usage:

Example Response:

```json theme={null}
{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "In this article, the author discusses the challenges of deploying Artificial Intelligence (AI) models in real-world applications..."
      }
    }
  ],
  "usage": {
    "completion_tokens": 175,
    "prompt_tokens": 1172,
    "total_tokens": 1347
  }
}
```

Key Fields:

* `choices`: An array of message objects containing the model's output.
* `usage`: An object containing the input (prompt\_tokens), output (completion\_tokens), and total number of tokens used.

## Step 4: Using the CLI for Chat Inferencing

You can also use the Nscale CLI to interact with chat models. This is a convenient way to test models or build command-line applications.

### Prerequisites

* Ensure you have the [Nscale CLI installed](/docs/cli/overview)

### Examples

Here are some examples of using the CLI for chat inferencing:

```bash theme={null}
# Generate a single response
nscale chat "What is machine learning?" -t $NSCALE_SERVICE_TOKEN -m meta-llama/Llama-3.1-8B-Instruct

# Start an interactive chat session
nscale chat -i -t $NSCALE_SERVICE_TOKEN -m meta-llama/Llama-3.1-8B-Instruct

# Use a custom system message
nscale chat --message "system:You are a creative storyteller" -t $NSCALE_SERVICE_TOKEN -m meta-llama/Llama-3.1-8B-Instruct

# Get usage statistics in JSON format
nscale chat "Explain quantum computing" --stats -t $NSCALE_SERVICE_TOKEN -m meta-llama/Llama-3.1-8B-Instruct

# Limit the response length
nscale chat "Write a story" --max-tokens 100 -t $NSCALE_SERVICE_TOKEN -m meta-llama/Llama-3.1-8B-Instruct

# Supply chat history
nscale chat -t $NSCALE_SERVICE_TOKEN -m meta-llama/Llama-3.1-8B-Instruct \
  --message "system:You are a helpful assistant" \
  --message "user:What is your name?"

# Start interactive mode with chat history
nscale chat -i -t $NSCALE_SERVICE_TOKEN -m meta-llama/Llama-3.1-8B-Instruct \
  --message "system:You are a helpful assistant" \
  --message "user:Hello" \
  --message "assistant:Hello! How can I help?"

# Use service token from environment variable
export NSCALE_SERVICE_TOKEN=your_service_token
nscale chat "What is machine learning?" -m meta-llama/Llama-3.1-8B-Instruct
```

For more details on CLI usage, refer to the [CLI documentation](/docs/cli/overview).

## Step 5: Monitoring and scaling

Nscale handles scaling automatically based on traffic patterns—no manual intervention needed! Use the Nscale Console to monitor:

* API usage by model
* Spend breakdowns

For custom models or high-throughput applications on dedicated endpoints, contact Nscale Support.

## Troubleshooting

Common status codes and their meanings:

| Status | Description                           | Response Format                             |
| ------ | ------------------------------------- | ------------------------------------------- |
| 200    | Success (synchronous)                 | `application/json` response with completion |
| 201    | Success (streaming)                   | `text/event-stream` with delta updates      |
| 401    | Invalid service token or unauthorized | Error object                                |
| 404    | Model not found or unavailable        | Error object                                |
| 429    | Insufficient credit                   | Error object                                |
| 500    | Internal server error                 | Error object                                |
| 503    | Service temporarily unavailable       | Error object                                |

### Success Response Format (200)

```json theme={null}
{
  "id": "cmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "meta-llama/Llama-3.1-8B-Instruct",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "..."
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 56,
    "completion_tokens": 31,
    "total_tokens": 87
  }
}
```

### Error Response Format

```json theme={null}
{
  "error": {
    "code": "TOO_MANY_REQUESTS",
    "message": "You have insufficient credit to run this request",
    "param": null,
    "error_type": "INSUFFICIENT_CREDIT"
  }
}
```

For the extensive list of error codes and handling, see the [error code page](https://nscale.mintlify.app/docs/faqs/error-codes)

By following this guide, you'll be able to easily integrate chat models into your application using Nscale's serverless inference service.

<Card title="Contact Support" icon="headset" iconType="solid" href="mailto:helpdesk@nscale.com">
  Need assistance? Get help from our support team
</Card>
