Call DeepSeek's 2026 flagship V4 series (V4-Pro and V4-Flash) via the TokenHub unified /chat/completions endpoint. Fully OpenAI-compatible — the official openai SDK works out of the box. Streaming, tool use and thinking mode with reasoning_content are all supported. 1M context window, 384K max output, per-token billing at DeepSeek catalog pricing.
th-xxxxxxxxxxxx...)https://tokenhub.store/api/v1Pass the API Key in the Authorization header:
Authorization: Bearer th-your-api-keyPOST/chat/completionsChat completion. Same schema as OpenAI /v1/chat/completions, with streaming, tools, JSON mode and DeepSeek-specific thinking fields.
Pricing is per 1 million tokens (USD), based on DeepSeek catalog list price (no promotional discount applied). Both the canonical ID and the deepseek/* alias are accepted. Billing uses the completion_tokens returned by the upstream (which already includes reasoning_tokens).
| Tier | Model ID | Input | Output | Notes |
|---|---|---|---|---|
| V4-Pro | deepseek-v4-pro | $1.80 | $3.60 | 2026 top-tier flagship. Best reasoning & coding quality. |
| V4-Flash | deepseek-v4-flash | $0.15 | $0.30 | Ultra cost-effective flagship, ~12× cheaper than Pro; great default for production. |
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| model | string | Required | — | DeepSeek V4 model ID. Example: "deepseek/deepseek-v4-flash". |
| messages | array | Required | — | Chat history. Each item is { role, content }. role ∈ system | user | assistant | tool. |
| max_tokens | integer | Optional | upstream default | Max output tokens. If omitted, DeepSeek uses its upstream default (up to 384K). In thinking mode the counter INCLUDES reasoning tokens — do not set it too small. |
| temperature | number | Optional | 1.0 | Sampling temperature, 0.0–2.0. Lower = more deterministic. DeepSeek recommends 0.0 for code, 1.3 for creative writing. |
| top_p | number | Optional | 1.0 | Nucleus sampling. Use temperature OR top_p, not both. |
| stream | boolean | Optional | false | If true, returns Server-Sent Events (SSE) deltas. |
| thinking | object | Optional | {type:'enabled'} | DeepSeek-specific. Pass { type: 'disabled' } via extra_body to skip the reasoning phase for faster/cheaper responses. Default: enabled. |
| reasoning_effort | string | Optional | medium | Thinking depth: low | medium | high. Higher = more reasoning tokens, better quality, higher cost. |
| tools | array | Optional | — | List of tool/function definitions for tool use (function calling). |
| tool_choice | string|object | Optional | auto | Control tool selection: auto | none | required | { type:'function', function:{ name } }. |
| response_format | object | Optional | — | JSON mode: { "type": "json_object" } forces the model to return valid JSON. |
curl https://tokenhub.store/api/v1/chat/completions \
-H "Authorization: Bearer th-your-api-key" \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek/deepseek-v4-flash",
"messages": [
{"role": "system", "content": "You are a concise assistant."},
{"role": "user", "content": "Explain CAP theorem in 3 bullets."}
],
"temperature": 0.3
}'from openai import OpenAI
client = OpenAI(
api_key="th-your-api-key",
base_url="https://tokenhub.store/api/v1",
)
resp = client.chat.completions.create(
model="deepseek/deepseek-v4-flash",
temperature=0.3,
messages=[
{"role": "system", "content": "You are a concise assistant."},
{"role": "user", "content": "Explain CAP theorem in 3 bullets."},
],
)
msg = resp.choices[0].message
# DeepSeek V4 returns the chain-of-thought in a separate field
print("Thinking:", getattr(msg, "reasoning_content", None))
print("Answer: ", msg.content)
print("Usage: ", resp.usage)import OpenAI from "openai";
const client = new OpenAI({
apiKey: "th-your-api-key",
baseURL: "https://tokenhub.store/api/v1",
});
const resp = await client.chat.completions.create({
model: "deepseek/deepseek-v4-flash",
temperature: 0.3,
messages: [
{ role: "system", content: "You are a concise assistant." },
{ role: "user", content: "Explain CAP theorem in 3 bullets." },
],
});
const msg: any = resp.choices[0].message;
console.log("Thinking:", msg.reasoning_content);
console.log("Answer: ", msg.content);
console.log("Usage: ", resp.usage);DeepSeek V4 opens a dedicated reasoning phase before writing the final answer. Here's what you need to know:
Sign up for TokenHub and start calling DeepSeek V4 via our OpenAI-compatible API