DeepSeek API 指南

TokenHub 上的 DeepSeek V4 — 相容 OpenAI 的思考型 model

透過 TokenHub 統一的 /chat/completions 端點呼叫 DeepSeek 2026 旗艦 V4 系列（V4-Pro 與 V4-Flash）。完整相容 OpenAI — 官方 openai SDK 可直接使用。支援串流、工具使用，以及帶有 reasoning_content 的思考模式。1M context window、384K max output，依 DeepSeek catalog 定價按 token 計費。

OpenAI-CompatibleStreamingThinking ModeTool Use1M Context384K Output

1取得你的 API Key

前往 tokenhub.store 並註冊帳號（支援 GitHub / Google 登入）
前往 Dashboard → API Keys，點擊 "Create New Key"
前往 Dashboard → Billing 加值 Credits（1 Credit = $1 USD）
複製 API Key（格式： th-xxxxxxxxxxxx...)

⚠️ API Key 只會在建立時顯示一次。請妥善保存；若遺失，請建立新的 API Key。

2API 概覽

Base URL

https://tokenhub.store/api/v1

驗證

在 Authorization 標頭中帶入 API Key：

Header

Authorization: Bearer th-your-api-key

端點（相容 OpenAI）

POST

/chat/completions

聊天完成。與 OpenAI /v1/chat/completions 使用相同 schema，支援串流、tools、JSON mode 以及 DeepSeek 專屬思考欄位。

可直接使用官方 openai SDK — 只需將 base_url 指向 TokenHub，並使用你的 TokenHub API Key。無需其他程式碼修改。

3Model 與定價

價格以每 100 萬 tokens（USD）計算，依 DeepSeek catalog 列價（未套用促銷折扣）。可接受 canonical ID 與 deepseek/* alias。Billing 以上游回傳的 completion_tokens 計費（其已包含 reasoning_tokens）。

方案	Model ID	輸入	輸出	備註
V4-Pro	deepseek-v4-pro	$1.80	$3.60	2026 頂級旗艦。最佳推理與程式碼品質。
V4-Flash	deepseek-v4-flash	$0.15	$0.30	超高性價比旗艦，約為 Pro 的 12 倍更便宜；非常適合作為生產環境預設值。

4請求參數

參數	型別	是否必填	預設值	說明
model	string	必填	—	DeepSeek V4 的 model ID。範例："deepseek/deepseek-v4-flash"。
messages	array	必填	—	對話歷史。每個項目為 { role, content }。role ∈ system \| user \| assistant \| tool。
max_tokens	integer	選填	upstream default	最大輸出 tokens。若未提供，DeepSeek 會使用其上游預設值（最高可達 384K）。在思考模式下，計數會包含 reasoning tokens — 不要設得太小。
temperature	number	選填	1.0	採樣 temperature，0.0–2.0。越低越具決定性。DeepSeek 建議程式碼使用 0.0，創意寫作使用 1.3。
top_p	number	選填	1.0	Nucleus sampling。請使用 temperature 或 top_p，勿同時使用兩者。
stream	boolean	選填	false	若為 true，則回傳 Server-Sent Events（SSE）增量。
thinking	object	選填	{type:'enabled'}	DeepSeek 專屬。透過 extra_body 傳入 { type: 'disabled' } 可跳過推理階段，以獲得更快／更便宜的回應。預設：enabled。
reasoning_effort	string	選填	medium	思考深度：low \| medium \| high。越高 = 更多 reasoning tokens、更佳品質、更高成本。
tools	array	選填	—	供 tool use（function calling）使用的工具／函式定義清單。
tool_choice	string\|object	選填	auto	控制工具選擇：auto \| none \| required \| { type:'function', function:{ name } }。
response_format	object	選填	—	JSON mode: { "type": "json_object" } 強制 model 回傳有效的 JSON。

5curl 範例

bash

curl https://tokenhub.store/api/v1/chat/completions \
  -H "Authorization: Bearer th-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek/deepseek-v4-flash",
    "messages": [
      {"role": "system", "content": "You are a concise assistant."},
      {"role": "user", "content": "Explain CAP theorem in 3 bullets."}
    ],
    "temperature": 0.3
  }'

6Python 範例

python

from openai import OpenAI

client = OpenAI(
    api_key="th-your-api-key",
    base_url="https://tokenhub.store/api/v1",
)

resp = client.chat.completions.create(
    model="deepseek/deepseek-v4-flash",
    temperature=0.3,
    messages=[
        {"role": "system", "content": "You are a concise assistant."},
        {"role": "user", "content": "Explain CAP theorem in 3 bullets."},
    ],
)

msg = resp.choices[0].message
# DeepSeek V4 returns the chain-of-thought in a separate field
print("Thinking:", getattr(msg, "reasoning_content", None))
print("Answer:  ", msg.content)
print("Usage:   ", resp.usage)

7JavaScript / Node.js 範例

typescript

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "th-your-api-key",
  baseURL: "https://tokenhub.store/api/v1",
});

const resp = await client.chat.completions.create({
  model: "deepseek/deepseek-v4-flash",
  temperature: 0.3,
  messages: [
    { role: "system", content: "You are a concise assistant." },
    { role: "user", content: "Explain CAP theorem in 3 bullets." },
  ],
});

const msg: any = resp.choices[0].message;
console.log("Thinking:", msg.reasoning_content);
console.log("Answer:  ", msg.content);
console.log("Usage:   ", resp.usage);

8Thinking Mode 深入說明

DeepSeek V4 會在產生最終答案前先開啟專門的推理階段。以下是你需要知道的：

reasoning_content 會以 assistant 訊息中的獨立欄位回傳（不在 content 內）。後續輪次不要把它再送回去。
usage 中的 completion_tokens 已經包含 reasoning_tokens — 我們的計費也是依此計算。請查看 completion_tokens_details.reasoning_tokens 以了解有多少 token 用於思考。
在 thinking 模式下將 max_tokens 設得太低，會導致 content 空白（所有 token 都被推理消耗）。請保持不設定，或至少給 2000+。
可透過 extra_body 停用：{ thinking: { type: 'disabled' } }，適用於對延遲敏感的情境（聊天、分類、簡單擷取）。
reasoning_effort: 'low' | 'medium' | 'high' 用來控制 model 思考的程度。'high' 在數學/程式設計上效果最佳；'low' 較快。
Prompt caching：如果你重複使用相同的 system prompt，DeepSeek 會另外回傳 prompt_cache_hit_tokens。TokenHub 目前一律按 miss rate 計費（會有些微加價，但換來可預測的定價）。

9FAQ

準備開始了嗎？

註冊 TokenHub，透過我們 OpenAI-compatible API 開始呼叫 DeepSeek V4

加值 Credits 取得 API Key