Changelog
Changelog
News and improvements from the Nscale team.
March 2025
Serverless Inference
Our fully-managed, pay-per-request runtime that puts a pool of GPUs behind a single OpenAI-compatible endpoint. Instead of capacity planning, container images and infra dashboards, you call https://inference.api.nscale.com/v1/*
and get deterministic, low-latency responses from today’s best open-source models—all billed per token and delivered from data-sovereign, 100% renewable data-centres.
Features
- OpenAI-compatible endpoints. Drop-in support for Llama, Qwen, DeepSeek and other leading models makes migration a copy-paste job
- Pay-as-you-go billing. Prices are per 1 million tokens including input and output tokens for Chat, Multimodal, Language and Code models. Image models is based on image size and steps
- 80% lower cost & 100% renewable. Our vertically-integrated stack slashes TCO versus hyperscalers while guaranteeing data privacy—requests are never logged or reused
- $5 free credits to get started. Every new account includes starter credits so you can ship to production in minutes
Under the hood
Area | What it looks like | Why it matters |
---|---|---|
API surface | Drop‑in equivalents for GET /models , POST /chat/completions , POST /images with optional stream: true for SSE (text/event-stream ). | Migrate from OpenAI by changing only the base URL and key. |
Model library | Launch set covers Meta Llama‑4 Scout 17B, Qwen‑3 235B, Mixtral‑8×22B, DeepSeek‑R1 distills, SD‑XL 1.0 and more (text, code, vision). | Lets teams A/B models or mix modalities without provisioning extra infra. |
Elastic runtime | “Zero rate limits, no cold starts.” Traffic is sharded over thousands of MI300X/MI250X/H100 GPUs, spun up on‑demand by our orchestration layer. | Bursty workloads stay < 200 ms tail latency without you over‑allocating GPUs. |
Cost model | Tokens in, tokens out — billed per 1M tokens; images billed per megapixel. Every account starts with $5 free credit. | Fine‑grained, deterministic spend; easy to embed in metered SaaS. |
Security / privacy | End‑to‑end TLS, org‑scoped API keys, full tenant isolation; we never log or train on user prompts or outputs. | Meets GDPR, HIPAA and most vendor‑assessment checklists out of the box. |
Sustainability | All compute runs in hydro‑powered facilities; the vertical stack is 80% cheaper‑per‑token than hyperscalers. | Fewer carbon (and budget) emissions per request. |