Chat
This guide will walk you through integrating a chat model into your application using Nscale’s API. With our serverless architecture, you can focus on building your application without worrying about infrastructure management.
Prerequisites
-
API Key: Sign up on the Nscale platform to get your API key.
-
Model Selection: Choose a chat model from Nscale’s library.
- Example: Llama 3.1 8B Instruct
meta-llama/Llama-3.1-8B-Instruct
- Example: Llama 3.1 8B Instruct
Step 1: Set up your environment
Before making requests, ensure you have the necessary tools installed for your language of choice:
For Python: Install openai library
For Typescript: Install openai library
For cURL: Ensure cURL is installed on your system (it’s usually pre-installed on most Unix-based systems).
Step 2: Sending an inference request
Let’s walk through an example where we summarise a blog post into 100 words.
Request structure
Each request to the Nscale Chat Completions API endpoint should include the following:
-
Headers:
-
"Authorization": "Bearer <API-KEY>"
-
"Content-Type": "application/json"
-
-
Payload:
-
"model"
:"<model id e.g., meta-llama/Llama-3.1-8B-Instruct>"
-
"messages"
:"<array of messages to send to the model>"
-
Example use case: Summarise a blog post
Step 3: Understanding the response
The API will return a JSON object containing the model’s output and token usage:
Example Response:
Key Fields:
-
choices
: An array of message objects containing the model’s output. -
usage
: An object containing the input (prompt_tokens), output (completion_tokens), and total number of tokens used.
Step 4: Using the CLI for Chat Inferencing
You can also use the Nscale CLI to interact with chat models. This is a convenient way to test models or build command-line applications.
Prerequisites
- Ensure you have the Nscale CLI installed
Examples
Here are some examples of using the CLI for chat inferencing:
For more details on CLI usage, refer to the CLI documentation.
Step 5: Monitoring and scaling
Nscale handles scaling automatically based on traffic patterns—no manual intervention needed! Use the Nscale Console to monitor:
-
API usage by model
-
Spend breakdowns
For custom models or high-throughput applications on dedicated endpoints, contact Nscale Support.
Troubleshooting
Common status codes and their meanings:
Status | Description | Response Format |
---|---|---|
200 | Success (synchronous) | application/json response with completion |
201 | Success (streaming) | text/event-stream with delta updates |
401 | Invalid API key or unauthorized | Error object |
404 | Model not found or unavailable | Error object |
429 | Insufficient credit | Error object |
500 | Internal server error | Error object |
503 | Service temporarily unavailable | Error object |
Success Response Format (200)
Error Response Format
For the extensive list of error codes and handling, see the error code page
By following this guide, you’ll be able to easily integrate chat models into your application using Nscale’s serverless inference service.
Contact Support
Need assistance? Get help from our support team