D
DeepSeek API Guide

DeepSeek V4 on TokenHub — OpenAI-Compatible Thinking Models

Call DeepSeek's 2026 flagship V4 series (V4-Pro and V4-Flash) via the TokenHub unified /chat/completions endpoint. Fully OpenAI-compatible — the official openai SDK works out of the box. Streaming, tool use and thinking mode with reasoning_content are all supported. 1M context window, 384K max output, per-token billing at DeepSeek catalog pricing.

OpenAI-CompatibleStreamingThinking ModeTool Use1M Context384K Output

1Get Your API Key

  1. Visit tokenhub.store and register an account (GitHub / Google sign-in supported)
  2. Go to Dashboard → API Keys, click "Create New Key"
  3. Go to Dashboard → Billing to add Credits (1 Credit = $1 USD)
  4. Copy the API Key (format: th-xxxxxxxxxxxx...)
⚠️ The API Key is shown only once upon creation. Save it securely; if lost, create a new one.

2API Overview

Base URL

https://tokenhub.store/api/v1

Authentication

Pass the API Key in the Authorization header:

Header
Authorization: Bearer th-your-api-key

Endpoint (OpenAI-compatible)

POST
/chat/completions

Chat completion. Same schema as OpenAI /v1/chat/completions, with streaming, tools, JSON mode and DeepSeek-specific thinking fields.

Drop-in with the official openai SDK — just point base_url to TokenHub and use your TokenHub API key. No other code changes required.

3Models & Pricing

Pricing is per 1 million tokens (USD), based on DeepSeek catalog list price (no promotional discount applied). Both the canonical ID and the deepseek/* alias are accepted. Billing uses the completion_tokens returned by the upstream (which already includes reasoning_tokens).

TierModel IDInputOutputNotes
V4-Prodeepseek-v4-pro$1.80$3.602026 top-tier flagship. Best reasoning & coding quality.
V4-Flashdeepseek-v4-flash$0.15$0.30Ultra cost-effective flagship, ~12× cheaper than Pro; great default for production.

4Request Parameters

ParameterTypeRequiredDefaultDescription
modelstringRequiredDeepSeek V4 model ID. Example: "deepseek/deepseek-v4-flash".
messagesarrayRequiredChat history. Each item is { role, content }. role ∈ system | user | assistant | tool.
max_tokensintegerOptionalupstream defaultMax output tokens. If omitted, DeepSeek uses its upstream default (up to 384K). In thinking mode the counter INCLUDES reasoning tokens — do not set it too small.
temperaturenumberOptional1.0Sampling temperature, 0.0–2.0. Lower = more deterministic. DeepSeek recommends 0.0 for code, 1.3 for creative writing.
top_pnumberOptional1.0Nucleus sampling. Use temperature OR top_p, not both.
streambooleanOptionalfalseIf true, returns Server-Sent Events (SSE) deltas.
thinkingobjectOptional{type:'enabled'}DeepSeek-specific. Pass { type: 'disabled' } via extra_body to skip the reasoning phase for faster/cheaper responses. Default: enabled.
reasoning_effortstringOptionalmediumThinking depth: low | medium | high. Higher = more reasoning tokens, better quality, higher cost.
toolsarrayOptionalList of tool/function definitions for tool use (function calling).
tool_choicestring|objectOptionalautoControl tool selection: auto | none | required | { type:'function', function:{ name } }.
response_formatobjectOptionalJSON mode: { "type": "json_object" } forces the model to return valid JSON.

5curl Examples

bash
curl https://tokenhub.store/api/v1/chat/completions \
  -H "Authorization: Bearer th-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek/deepseek-v4-flash",
    "messages": [
      {"role": "system", "content": "You are a concise assistant."},
      {"role": "user", "content": "Explain CAP theorem in 3 bullets."}
    ],
    "temperature": 0.3
  }'

6Python Example

python
from openai import OpenAI

client = OpenAI(
    api_key="th-your-api-key",
    base_url="https://tokenhub.store/api/v1",
)

resp = client.chat.completions.create(
    model="deepseek/deepseek-v4-flash",
    temperature=0.3,
    messages=[
        {"role": "system", "content": "You are a concise assistant."},
        {"role": "user", "content": "Explain CAP theorem in 3 bullets."},
    ],
)

msg = resp.choices[0].message
# DeepSeek V4 returns the chain-of-thought in a separate field
print("Thinking:", getattr(msg, "reasoning_content", None))
print("Answer:  ", msg.content)
print("Usage:   ", resp.usage)

7JavaScript / Node.js Example

typescript
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "th-your-api-key",
  baseURL: "https://tokenhub.store/api/v1",
});

const resp = await client.chat.completions.create({
  model: "deepseek/deepseek-v4-flash",
  temperature: 0.3,
  messages: [
    { role: "system", content: "You are a concise assistant." },
    { role: "user", content: "Explain CAP theorem in 3 bullets." },
  ],
});

const msg: any = resp.choices[0].message;
console.log("Thinking:", msg.reasoning_content);
console.log("Answer:  ", msg.content);
console.log("Usage:   ", resp.usage);

8Thinking Mode Deep Dive

DeepSeek V4 opens a dedicated reasoning phase before writing the final answer. Here's what you need to know:

  • reasoning_content is returned as a SEPARATE field on the assistant message (not inside content). Don't feed it back on follow-up turns.
  • completion_tokens in usage already INCLUDES reasoning_tokens — that's also what we bill on. Check completion_tokens_details.reasoning_tokens to see how much went to thinking.
  • Setting max_tokens too low in thinking mode leads to empty content (all tokens consumed by reasoning). Leave it unset, or give at least 2000+.
  • Disable via extra_body: { thinking: { type: 'disabled' } } for latency-sensitive scenarios (chat, classification, simple extraction).
  • reasoning_effort: 'low' | 'medium' | 'high' controls how much the model thinks. 'high' gives best results on math/coding; 'low' is faster.
  • Prompt caching: if you reuse the same system prompt, DeepSeek returns prompt_cache_hit_tokens separately. TokenHub currently bills uniformly at the miss rate (a small over-charge in exchange for predictable pricing).

9FAQ

Ready to Start?

Sign up for TokenHub and start calling DeepSeek V4 via our OpenAI-compatible API