inferencing, infer
Subcommands
- list-models — List available models
- list-endpoints — List inferencing endpoints
- chat — Send a chat completion request
list-models
Returns a list of all models available on the serverless platform.Flags
| Flag | Description |
|---|---|
—json | Emit the full JSON payload (mutually exclusive with -q) |
-q, —query stringArray | jq filter for value extraction (see Query output with -q) |
Example
list-endpoints
Returns a list of all model endpoints available for use by the specified organization.Flags
| Flag | Description |
|---|---|
—org string | Organization ID |
—json | Emit the full JSON payload (mutually exclusive with -q) |
-q, —query stringArray | jq filter for value extraction (see Query output with -q) |
Example
chat
Send a chat completion request to the inference API using a configuration file. Supports both batch and interactive modes.Flags
| Flag | Description |
|---|---|
--config string | Path to a chat configuration file (JSON) |
--messages string | Path to a JSON+LD file containing additional messages |
--ui | Launch interactive chat TUI |
Reasoning content
When you use the interactive TUI (--ui) with a model that supports reasoning, the model’s thought process appears in a separate “Thought Process” bubble above the response. The reasoning streams live as the model works through the problem, and the final answer appears in the standard response bubble once reasoning is complete.
This gives you visibility into how the model arrives at its answer without cluttering the final response.
Examples
Related
Models
Learn about available models on the Nscale platform.
Chat Use Case
End-to-end guide for chat inferencing.