DeepSeek API 指南

TokenHub 上的 DeepSeek V4 — OpenAI-Compatible 思考型 model

通过 TokenHub 统一的 /chat/completions 端点调用 DeepSeek 2026 旗舰 V4 系列（V4-Pro 和 V4-Flash）。完全 OpenAI-compatible — 官方 openai SDK 可直接开箱即用。支持流式输出、工具调用，以及带 reasoning_content 的思考模式。1M 上下文窗口、384K 最大输出、按 token 计费，价格基于 DeepSeek 目录定价。

OpenAI-CompatibleStreamingThinking ModeTool Use1M Context384K Output

1获取你的 API Key

访问 tokenhub.store 并注册账号（支持 GitHub / Google 登录）
前往 Dashboard → API Keys，点击 "Create New Key"
前往 Dashboard → Billing 充值 Credits（1 Credit = $1 USD）
复制 API Key（格式： th-xxxxxxxxxxxx...)

⚠️ API Key 仅在创建时显示一次。请妥善保存；如果丢失，请重新创建一个。

2API 概览

Base URL

https://tokenhub.store/api/v1

身份验证

将 API Key 放入 Authorization 请求头：

Header

Authorization: Bearer th-your-api-key

端点（OpenAI-compatible）

POST

/chat/completions

聊天补全。与 OpenAI /v1/chat/completions 的 schema 相同，支持流式输出、tools、JSON 模式和 DeepSeek 特有的 thinking 字段。

可直接接入官方 openai SDK —— 只需将 base_url 指向 TokenHub 并使用你的 TokenHub API Key，无需修改其他代码。

3model 与定价

价格按每 100 万 tokens（USD）计算，基于 DeepSeek catalog 列表价（不含促销折扣）。canonical ID 和 deepseek/* 别名均可接受。计费使用上游返回的 completion_tokens（其中已包含 reasoning_tokens）。

档位	Model ID	输入	输出	说明
V4-Pro	deepseek-v4-pro	$1.80	$3.60	2026 顶级旗舰。具备最佳推理与编码质量。
V4-Flash	deepseek-v4-flash	$0.15	$0.30	超高性价比旗舰，约为 Pro 的 12 倍更便宜；非常适合作为生产环境默认选择。

4请求参数

参数	类型	必填	默认值	说明
model	string	必填	—	DeepSeek V4 的 model ID。示例："deepseek/deepseek-v4-flash"。
messages	array	必填	—	聊天历史。每一项为 { role, content }。role ∈ system \| user \| assistant \| tool。
max_tokens	integer	可选	upstream default	最大输出 tokens。若省略，DeepSeek 将使用上游默认值（最高 384K）。在 thinking 模式下，计数包含 reasoning tokens —— 不要设置得太小。
temperature	number	可选	1.0	采样 temperature，0.0–2.0。越低越确定性。DeepSeek 建议代码使用 0.0，创意写作用 1.3。
top_p	number	可选	1.0	Nucleus sampling。使用 temperature 或 top_p，不要同时使用两者。
stream	boolean	可选	false	如果为 true，则返回 Server-Sent Events (SSE) 增量。
thinking	object	可选	{type:'enabled'}	DeepSeek 专用。通过 extra_body 传入 { type: 'disabled' } 以跳过推理阶段，获得更快/更便宜的响应。默认：enabled。
reasoning_effort	string	可选	medium	思考深度：low \| medium \| high。越高 = 更多 reasoning tokens、更高质量、更高成本。
tools	array	可选	—	用于工具调用的 tool/function 定义列表。
tool_choice	string\|object	可选	auto	控制 tool 选择：auto \| none \| required \| { type:'function', function:{ name } }。
response_format	object	可选	—	JSON 模式：{ "type": "json_object" } 会强制 model 返回有效 JSON。

5curl 示例

bash

curl https://tokenhub.store/api/v1/chat/completions \
  -H "Authorization: Bearer th-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek/deepseek-v4-flash",
    "messages": [
      {"role": "system", "content": "You are a concise assistant."},
      {"role": "user", "content": "Explain CAP theorem in 3 bullets."}
    ],
    "temperature": 0.3
  }'

6Python 示例

python

from openai import OpenAI

client = OpenAI(
    api_key="th-your-api-key",
    base_url="https://tokenhub.store/api/v1",
)

resp = client.chat.completions.create(
    model="deepseek/deepseek-v4-flash",
    temperature=0.3,
    messages=[
        {"role": "system", "content": "You are a concise assistant."},
        {"role": "user", "content": "Explain CAP theorem in 3 bullets."},
    ],
)

msg = resp.choices[0].message
# DeepSeek V4 returns the chain-of-thought in a separate field
print("Thinking:", getattr(msg, "reasoning_content", None))
print("Answer:  ", msg.content)
print("Usage:   ", resp.usage)

7JavaScript / Node.js 示例

typescript

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "th-your-api-key",
  baseURL: "https://tokenhub.store/api/v1",
});

const resp = await client.chat.completions.create({
  model: "deepseek/deepseek-v4-flash",
  temperature: 0.3,
  messages: [
    { role: "system", content: "You are a concise assistant." },
    { role: "user", content: "Explain CAP theorem in 3 bullets." },
  ],
});

const msg: any = resp.choices[0].message;
console.log("Thinking:", msg.reasoning_content);
console.log("Answer:  ", msg.content);
console.log("Usage:   ", resp.usage);

8思考模式深度解析

DeepSeek V4 会在生成最终答案前开启专门的推理阶段。你需要了解以下内容：

reasoning_content 会作为 assistant 消息中的独立字段返回（不在 content 内）。后续轮次不要把它再传回去。
usage 中的 completion_tokens 已经包含 reasoning_tokens——我们的计费也是基于这个值。查看 completion_tokens_details.reasoning_tokens 可了解有多少 token 用于思考。
在思考模式下将 max_tokens 设得过低会导致 content 为空（所有 token 都被推理消耗）。请保持不设置，或至少给到 2000+。
对低延迟场景（聊天、分类、简单提取）可通过 extra_body 禁用：{ thinking: { type: 'disabled' } }。
reasoning_effort: 'low' | 'medium' | 'high' 用于控制 model 的思考强度。'high' 在数学/编码上效果最佳；'low' 更快。
Prompt 缓存：如果你复用相同的 system prompt，DeepSeek 会单独返回 prompt_cache_hit_tokens。TokenHub 目前统一按未命中费率计费（会有轻微加价，以换取可预测的定价）。

9常见问题

准备开始了吗？

充值 Credits 获取 API Key