DeepSeek API Guide

DeepSeek V4 on TokenHub — OpenAI-Compatible Thinking Models

Call DeepSeek's 2026 flagship V4 series (V4-Pro and V4-Flash) via the TokenHub unified /chat/completions endpoint. Fully OpenAI-compatible — the official openai SDK works out of the box. Streaming, tool use and thinking mode with reasoning_content are all supported. 1M context window, 384K max output, per-token billing at DeepSeek catalog pricing.

OpenAI-CompatibleStreamingThinking ModeTool Use1M Context384K Output

1Get Your API Key

Visit tokenhub.store and register an account (GitHub / Google sign-in supported)
Go to Dashboard → API Keys, click "Create New Key"
Go to Dashboard → Billing to add Credits (1 Credit = $1 USD)
Copy the API Key (format: th-xxxxxxxxxxxx...)

⚠️ The API Key is shown only once upon creation. Save it securely; if lost, create a new one.

2API Overview

Base URL

https://tokenhub.store/api/v1

Authentication

Pass the API Key in the Authorization header:

Header

Authorization: Bearer th-your-api-key

Endpoint (OpenAI-compatible)

POST

/chat/completions

Chat completion. Same schema as OpenAI /v1/chat/completions, with streaming, tools, JSON mode and DeepSeek-specific thinking fields.

Drop-in with the official openai SDK — just point base_url to TokenHub and use your TokenHub API key. No other code changes required.

3Models & Pricing

Pricing is per 1 million tokens (USD), based on DeepSeek catalog list price (no promotional discount applied). Both the canonical ID and the deepseek/* alias are accepted. Billing uses the completion_tokens returned by the upstream (which already includes reasoning_tokens).

Tier	Model ID	Input	Output	Notes
V4-Pro	deepseek-v4-pro	$1.80	$3.60	2026 top-tier flagship. Best reasoning & coding quality.
V4-Flash	deepseek-v4-flash	$0.15	$0.30	Ultra cost-effective flagship, ~12× cheaper than Pro; great default for production.

4Request Parameters

Parameter	Type	Required	Default	Description
model	string	Required	—	DeepSeek V4 model ID. Example: "deepseek/deepseek-v4-flash".
messages	array	Required	—	Chat history. Each item is { role, content }. role ∈ system \| user \| assistant \| tool.
max_tokens	integer	Optional	upstream default	Max output tokens. If omitted, DeepSeek uses its upstream default (up to 384K). In thinking mode the counter INCLUDES reasoning tokens — do not set it too small.
temperature	number	Optional	1.0	Sampling temperature, 0.0–2.0. Lower = more deterministic. DeepSeek recommends 0.0 for code, 1.3 for creative writing.
top_p	number	Optional	1.0	Nucleus sampling. Use temperature OR top_p, not both.
stream	boolean	Optional	false	If true, returns Server-Sent Events (SSE) deltas.
thinking	object	Optional	{type:'enabled'}	DeepSeek-specific. Pass { type: 'disabled' } via extra_body to skip the reasoning phase for faster/cheaper responses. Default: enabled.
reasoning_effort	string	Optional	medium	Thinking depth: low \| medium \| high. Higher = more reasoning tokens, better quality, higher cost.
tools	array	Optional	—	List of tool/function definitions for tool use (function calling).
tool_choice	string\|object	Optional	auto	Control tool selection: auto \| none \| required \| { type:'function', function:{ name } }.
response_format	object	Optional	—	JSON mode: { "type": "json_object" } forces the model to return valid JSON.

5curl Examples

bash

curl https://tokenhub.store/api/v1/chat/completions \
  -H "Authorization: Bearer th-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek/deepseek-v4-flash",
    "messages": [
      {"role": "system", "content": "You are a concise assistant."},
      {"role": "user", "content": "Explain CAP theorem in 3 bullets."}
    ],
    "temperature": 0.3
  }'

6Python Example

python

from openai import OpenAI

client = OpenAI(
    api_key="th-your-api-key",
    base_url="https://tokenhub.store/api/v1",
)

resp = client.chat.completions.create(
    model="deepseek/deepseek-v4-flash",
    temperature=0.3,
    messages=[
        {"role": "system", "content": "You are a concise assistant."},
        {"role": "user", "content": "Explain CAP theorem in 3 bullets."},
    ],
)

msg = resp.choices[0].message
# DeepSeek V4 returns the chain-of-thought in a separate field
print("Thinking:", getattr(msg, "reasoning_content", None))
print("Answer:  ", msg.content)
print("Usage:   ", resp.usage)

7JavaScript / Node.js Example

typescript

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "th-your-api-key",
  baseURL: "https://tokenhub.store/api/v1",
});

const resp = await client.chat.completions.create({
  model: "deepseek/deepseek-v4-flash",
  temperature: 0.3,
  messages: [
    { role: "system", content: "You are a concise assistant." },
    { role: "user", content: "Explain CAP theorem in 3 bullets." },
  ],
});

const msg: any = resp.choices[0].message;
console.log("Thinking:", msg.reasoning_content);
console.log("Answer:  ", msg.content);
console.log("Usage:   ", resp.usage);

8Thinking Mode Deep Dive

DeepSeek V4 opens a dedicated reasoning phase before writing the final answer. Here's what you need to know:

reasoning_content is returned as a SEPARATE field on the assistant message (not inside content). Don't feed it back on follow-up turns.
completion_tokens in usage already INCLUDES reasoning_tokens — that's also what we bill on. Check completion_tokens_details.reasoning_tokens to see how much went to thinking.
Setting max_tokens too low in thinking mode leads to empty content (all tokens consumed by reasoning). Leave it unset, or give at least 2000+.
Disable via extra_body: { thinking: { type: 'disabled' } } for latency-sensitive scenarios (chat, classification, simple extraction).
reasoning_effort: 'low' | 'medium' | 'high' controls how much the model thinks. 'high' gives best results on math/coding; 'low' is faster.
Prompt caching: if you reuse the same system prompt, DeepSeek returns prompt_cache_hit_tokens separately. TokenHub currently bills uniformly at the miss rate (a small over-charge in exchange for predictable pricing).

9FAQ

Ready to Start?

Top Up Credits Get API Key