DeepSeek API 가이드

TokenHub의 DeepSeek V4 — OpenAI-Compatible 사고형 model

TokenHub 통합 /chat/completions 엔드포인트를 통해 DeepSeek의 2026년 플래그십 V4 시리즈(V4-Pro 및 V4-Flash)를 호출할 수 있습니다. 완전히 OpenAI-compatible하며, 공식 openai SDK를 바로 사용할 수 있습니다. 스트리밍, 도구 사용, reasoning_content를 사용하는 사고 모드도 모두 지원합니다. 1M 컨텍스트 윈도우, 384K max output, 토큰 단위 과금은 DeepSeek 카탈로그 가격을 따릅니다.

OpenAI-CompatibleStreamingThinking ModeTool Use1M Context384K Output

1API Key 받기

방문하세요 tokenhub.store 에서 계정을 등록하세요(GitHub / Google 로그인 지원)
Dashboard → API Keys로 이동해 "Create New Key"를 클릭하세요
Dashboard → Billing에서 Credits를 추가하세요(1 Credit = $1 USD)
API Key를 복사하세요(형식: th-xxxxxxxxxxxx...)

⚠️ API Key는 생성 시 한 번만 표시됩니다. 안전하게 보관하세요. 분실한 경우 새로 생성해야 합니다.

2API 개요

Base URL

https://tokenhub.store/api/v1

인증

Authorization 헤더에 API Key를 전달하세요:

Header

Authorization: Bearer th-your-api-key

엔드포인트(OpenAI-compatible)

POST

/chat/completions

Chat completion. OpenAI /v1/chat/completions와 동일한 스키마를 사용하며, streaming, tools, JSON mode 및 DeepSeek 전용 thinking 필드를 지원합니다.

공식 openai SDK를 그대로 사용할 수 있습니다. base_url을 TokenHub로 지정하고 TokenHub API key를 사용하세요. 그 외 코드 변경은 필요하지 않습니다.

3model 및 요금

요금은 100만 tokens당 USD 기준이며, DeepSeek 카탈로그 목록 가격을 따릅니다(프로모션 할인 미적용). 정식 ID와 deepseek/* 별칭 모두 허용됩니다. 과금은 상위에서 반환된 completion_tokens를 사용합니다(이미 reasoning_tokens가 포함되어 있음).

구분	model ID	입력	출력	비고
V4-Pro	deepseek-v4-pro	$1.80	$3.60	2026년 최상위 플래그십. 최고의 추론 및 코딩 품질.
V4-Flash	deepseek-v4-flash	$0.15	$0.30	매우 비용 효율적인 플래그십으로, Pro보다 약 12배 저렴합니다. 프로덕션의 기본값으로 좋습니다.

4요청 파라미터

파라미터	유형	필수 여부	기본값	설명
model	string	필수	—	DeepSeek V4 model ID. 예: "deepseek/deepseek-v4-flash".
messages	array	필수	—	대화 기록입니다. 각 항목은 { role, content } 형식입니다. role ∈ system \| user \| assistant \| tool.
max_tokens	integer	선택	upstream default	최대 출력 tokens입니다. 생략하면 DeepSeek는 상위 기본값을 사용합니다(최대 384K). thinking 모드에서는 카운터에 reasoning tokens가 포함되므로 너무 작게 설정하지 마세요.
temperature	number	선택	1.0	샘플링 temperature, 0.0–2.0. 낮을수록 더 결정적입니다. DeepSeek는 코딩에는 0.0, 창작 글쓰기는 1.3을 권장합니다.
top_p	number	선택	1.0	Nucleus sampling. temperature 또는 top_p 중 하나만 사용하세요. 둘 다 사용하지 마세요.
stream	boolean	선택	false	true이면 Server-Sent Events (SSE) 델타를 반환합니다.
thinking	object	선택	{type:'enabled'}	DeepSeek 전용입니다. extra_body를 통해 { type: 'disabled' }를 전달하면 reasoning 단계가 건너뛰어 더 빠르고 저렴한 응답을 받을 수 있습니다. 기본값: enabled.
reasoning_effort	string	선택	medium	Thinking 깊이: low \| medium \| high. 높을수록 reasoning Tokens가 더 많아지고, 품질은 좋아지지만 비용도 올라갑니다.
tools	array	선택	—	tool 사용(function calling)을 위한 tool/function 정의 목록입니다.
tool_choice	string\|object	선택	auto	tool 선택 제어: auto \| none \| required \| { type:'function', function:{ name } }.
response_format	object	선택	—	JSON mode: { "type": "json_object" }는 model이 유효한 JSON을 반환하도록 강제합니다.

5curl 예제

bash

curl https://tokenhub.store/api/v1/chat/completions \
  -H "Authorization: Bearer th-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek/deepseek-v4-flash",
    "messages": [
      {"role": "system", "content": "You are a concise assistant."},
      {"role": "user", "content": "Explain CAP theorem in 3 bullets."}
    ],
    "temperature": 0.3
  }'

6Python 예제

python

from openai import OpenAI

client = OpenAI(
    api_key="th-your-api-key",
    base_url="https://tokenhub.store/api/v1",
)

resp = client.chat.completions.create(
    model="deepseek/deepseek-v4-flash",
    temperature=0.3,
    messages=[
        {"role": "system", "content": "You are a concise assistant."},
        {"role": "user", "content": "Explain CAP theorem in 3 bullets."},
    ],
)

msg = resp.choices[0].message
# DeepSeek V4 returns the chain-of-thought in a separate field
print("Thinking:", getattr(msg, "reasoning_content", None))
print("Answer:  ", msg.content)
print("Usage:   ", resp.usage)

7JavaScript / Node.js 예제

typescript

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "th-your-api-key",
  baseURL: "https://tokenhub.store/api/v1",
});

const resp = await client.chat.completions.create({
  model: "deepseek/deepseek-v4-flash",
  temperature: 0.3,
  messages: [
    { role: "system", content: "You are a concise assistant." },
    { role: "user", content: "Explain CAP theorem in 3 bullets." },
  ],
});

const msg: any = resp.choices[0].message;
console.log("Thinking:", msg.reasoning_content);
console.log("Answer:  ", msg.content);
console.log("Usage:   ", resp.usage);

8Thinking Mode 자세히 보기

DeepSeek V4는 최종 답변을 쓰기 전에 전용 reasoning 단계를 시작합니다. 알아두어야 할 사항은 다음과 같습니다:

reasoning_content는 assistant 메시지의 별도 필드로 반환됩니다(content 안에 들어가지 않습니다). 후속 턴에 다시 넣지 마세요.
usage의 completion_tokens에는 reasoning_tokens가 이미 포함됩니다. 과금도 이것을 기준으로 합니다. reasoning에 얼마나 사용되었는지는 completion_tokens_details.reasoning_tokens에서 확인하세요.
thinking mode에서 max_tokens를 너무 낮게 설정하면 content가 비어 있게 됩니다(모든 Tokens가 reasoning에 소진됨). 설정하지 않거나 최소 2000+로 지정하세요.
지연 시간에 민감한 시나리오(chat, classification, simple extraction)에서는 extra_body: { thinking: { type: 'disabled' } }로 비활성화하세요.
reasoning_effort: 'low' | 'medium' | 'high'는 model이 얼마나 많이 생각할지 제어합니다. 'high'는 수학/coding에서 가장 좋은 결과를 제공하고, 'low'는 더 빠릅니다.
Prompt caching: 동일한 system prompt를 재사용하면 DeepSeek는 prompt_cache_hit_tokens를 별도로 반환합니다. TokenHub는 현재 miss rate 기준으로 일괄 과금합니다(예측 가능한 가격을 위해 약간 더 부과될 수 있습니다).

9FAQ

시작할 준비가 되셨나요?

TokenHub에 가입하고 OpenAI-compatible API를 통해 DeepSeek V4를 바로 호출해 보세요

Credits 충전 API Key 받기