Changelog

July 2025

Serverless Fine-tuning

Introducing Serverless Fine-tuning – the fastest way to customise open-weight foundation models without touching infrastructure. Spin up a secure, pay-as-you-go training job in a single API call, watch metrics stream in real-time, and download your tuned model or push straight to Hugging Face.

Features

Two-step workflow. Pick any supported base model (Llama 3, Mistral 7B, DeepSeek, Qwen and more) and launch a job with your dataset – no cluster sizing, no Dockerfiles
LoRA-powered efficiency. Default Low-Rank-Adaptation (LoRA) reduces GPU hours and cost
Live metrics & easy monitoring. Poll one endpoint to track train_loss, eval_loss, perplexity
Export or deploy instantly. One-click push to Hugging Face or direct download of a ready-to-serve artefact
Serverless pricing. $2 minimum per job, billed by processed tokens; every new account still gets $5 free credit to experiment

Quick start

# 1. List available base models
curl -H "Authorization: Bearer $NSCALE_API_TOKEN" \
     https://fine-tuning.api.nscale.com/api/v1/organizations/$ORG_ID/base-models

# 2. Launch a LoRA job
curl -X POST https://fine-tuning.api.nscale.com/api/v1/organizations/$ORG_ID/jobs \
     -H "Authorization: Bearer $NSCALE_API_TOKEN" \
     -H "Content-Type: application/json" \
     -d '{
           "name": "support-bot-finetune",
           "base_model_id": "e5f6a7b8-c9d0-1234-efab-567890123456",
           "dataset": {"id":"<DATASET_ID>","prompt_column":"prompt","answer_column":"response"},
           "hyperparameters":{"n_epochs":3,"batch_size":4,"lora":{"enabled":true,"r":8,"alpha":16}}
         }'

# 3. Stream training metrics
curl -H "Authorization: Bearer $NSCALE_API_TOKEN" \
     https://fine-tuning.api.nscale.com/api/v1/organizations/$ORG_ID/jobs/$JOB_ID/metrics

March 2025

Serverless Inference

Our fully-managed, pay-per-request runtime that puts a pool of GPUs behind a single OpenAI-compatible endpoint. Instead of capacity planning, container images and infra dashboards, you call https://inference.api.nscale.com/v1/* and get deterministic, low-latency responses from today’s best open-source models—all billed per token and delivered from data-sovereign, 100% renewable data-centres.

Features

OpenAI-compatible endpoints. Drop-in support for Llama, Qwen, DeepSeek and other leading models makes migration a copy-paste job
Pay-as-you-go billing. Prices are per 1 million tokens including input and output tokens for Chat, Multimodal, Language and Code models. Image models is based on image size and steps
80% lower cost & 100% renewable. Our vertically-integrated stack slashes TCO versus hyperscalers while guaranteeing data privacy—requests are never logged or reused
$5 free credits to get started. Every new account includes starter credits so you can ship to production in minutes

Under the hood

Area	What it looks like	Why it matters
API surface	Drop‑in equivalents for `GET /models`, `POST /chat/completions`, `POST /images` with optional `stream: true` for SSE (`text/event-stream`).	Migrate from OpenAI by changing only the base URL and key.
Model library	Launch set covers Meta Llama‑4 Scout 17B, Qwen‑3 235B, Mixtral‑8×22B, DeepSeek‑R1 distills, SD‑XL 1.0 and more (text, code, vision).	Lets teams A/B models or mix modalities without provisioning extra infra.
Elastic runtime	“Zero rate limits, no cold starts.” Traffic is sharded over thousands of MI300X/MI250X/H100 GPUs, spun up on‑demand by our orchestration layer.	Bursty workloads stay < 200 ms tail latency without you over‑allocating GPUs.
Cost model	Tokens in, tokens out — billed per 1M tokens; images billed per megapixel. Every account starts with $5 free credit.	Fine‑grained, deterministic spend; easy to embed in metered SaaS.
Security / privacy	End‑to‑end TLS, org‑scoped API keys, full tenant isolation; we never log or train on user prompts or outputs.	Meets GDPR, HIPAA and most vendor‑assessment checklists out of the box.
Sustainability	All compute runs in hydro‑powered facilities; the vertical stack is 80% cheaper‑per‑token than hyperscalers.	Fewer carbon (and budget) emissions per request.

Quick start

curl -X POST \
  https://inference.api.nscale.com/v1/chat/completions \
  -H "Authorization: Bearer $NSCALE_KEY" \
  -H "Content-Type: application/json" \
  -d '{
        "model": "meta-llama/Llama-4-Scout-17B-Instruct",
        "messages": [{"role":"user","content":"Hello world"}],
        "stream": true
      }'

​Serverless Fine-tuning

​Features

​Quick start

​Serverless Inference

​Features

​Under the hood

​Quick start

Serverless Fine-tuning

Features

Quick start

Serverless Inference

Features

Under the hood

Quick start