DeepSeek API Pricing: Direct vs OpenRouter vs Together vs XALEN

By Abhishek Raj · Updated May 20, 2026 · Our methodology

DeepSeek V3.1 pricing ranges from $0.05/1M input tokens (direct from DeepSeek, with cache hit) to $0.07/1M (OpenRouter). The cheapest way to run DeepSeek depends on your workload: direct API for lowest per-token cost with cache benefits, Together AI for fastest open-source inference, XALEN for batch processing (50% off, making it $0.03/1M), and OpenRouter for convenience if you also need other models. This guide breaks down every provider's pricing with real cost calculations.

DeepSeek Model Lineup (May 2026)

DeepSeek has become one of the most cost-effective model families in the market. Their Mixture-of-Experts architecture delivers frontier-level performance at a fraction of the cost of dense models. Before comparing providers, let us establish what each model does:

DeepSeek V3.1 (685B MoE, ~37B active): The general-purpose workhorse. Excellent for coding, analysis, content generation, and most production tasks. 128K context. This is what most teams should default to.
DeepSeek R1 (671B MoE): The reasoning model. Uses chain-of-thought processing for complex math, science, and multi-step reasoning. More expensive per token because reasoning tokens (the thinking process) are billed as output. 128K context.
DeepSeek V3 (671B MoE): The previous-generation general model. Still available on some providers at lower prices. Good for cost-sensitive workloads where V3.1 improvements are not needed.
DeepSeek Coder V2 (236B MoE): Specialized for code generation and analysis. Smaller and faster than V3.1 for coding tasks.

Provider-by-Provider Pricing

Direct from DeepSeek

DeepSeek operates its own API at api.deepseek.com. Pricing is the baseline against which all providers are measured. DeepSeek also offers aggressive prompt caching: if 64+ tokens of your prompt prefix match a previous request, the cached portion is billed at a 90% discount.

Model	Input $/1M	Cached Input $/1M	Output $/1M
DeepSeek V3.1	$0.05	$0.005	$0.10
DeepSeek R1	$0.08	$0.008	$0.15
DeepSeek V3	$0.04	$0.004	$0.08
DeepSeek Coder V2	$0.03	$0.003	$0.06

The cache discount is significant. If your application uses a consistent system prompt (which most do), roughly 70-90% of your input tokens will hit the cache after the first request, effectively reducing input costs by 80%+.

Gateway Provider Pricing Comparison

DeepSeek V3.1 pricing across major gateways (per 1M tokens):

Provider	Input $/1M	Output $/1M	Markup vs Direct	Cache Support
DeepSeek (direct)	$0.05	$0.10	baseline	Yes (90% discount)
Together AI	$0.055	$0.10	+10% input	No
XALEN	$0.06	$0.10	+20% input	No (batch: 50% off)
Fireworks	$0.06	$0.10	+20% input	No
OpenRouter	$0.07	$0.14	+40% input, +40% output	No

Real-World Cost Scenarios

Abstract per-token pricing does not tell the full story. Caching, batching, and volume discounts change the math significantly. Let us run the numbers for three common workloads, each processing 1 million requests per month:

Scenario 1: Chatbot (500 input + 300 output tokens)

Provider	Monthly Cost	Notes
DeepSeek direct (no cache)	$55/mo	Baseline
DeepSeek direct (80% cache hit)	$37/mo	Typical with consistent system prompt
Together AI	$58/mo	No cache, fast inference
XALEN (real-time)	$60/mo	No cache
OpenRouter	$77/mo	No cache, routing margin

Scenario 2: Document Processing Batch (2000 input + 500 output tokens)

Provider	Monthly Cost	Notes
DeepSeek direct	$150/mo	No cache (unique docs)
Together AI	$160/mo	No cache
XALEN batch (50% off)	$85/mo	24h delivery, half price
XALEN real-time	$170/mo	No cache
OpenRouter	$210/mo	Routing margin on both input/output

Scenario 3: Code Analysis (5000 input + 1000 output tokens)

Provider	Monthly Cost	Notes
DeepSeek direct (60% cache)	$210/mo	System prompt + framework docs cached
Together AI	$375/mo	No cache
XALEN batch	$200/mo	50% off, 24h delivery
XALEN real-time	$400/mo	No cache
OpenRouter	$490/mo	Routing margin

Understanding DeepSeek's Cache Pricing

DeepSeek's prompt caching is more aggressive than most providers. The cache works at the prefix level: if the first 64+ tokens of your prompt match a previous request, those tokens are served from cache at 90% discount. This means your system prompt, few-shot examples, and any consistent context in the prompt prefix are essentially free after the first request.

The catch: this cache only works on DeepSeek's direct API. Gateway providers (OpenRouter, Together AI, XALEN, Fireworks) route requests through their own infrastructure and do not implement DeepSeek's prefix caching. If your workload has high cache hit rates (chatbots with consistent system prompts, RAG with fixed context), the direct API can be 40-70% cheaper than any gateway.

However, DeepSeek's direct API has limitations: availability can be inconsistent (especially from regions outside China), rate limits are lower than gateway providers, and there is no SLA for enterprise use. Some teams use the direct API for development and a gateway for production as a pragmatic compromise.

Batch Processing: XALEN's Hidden Advantage

For workloads that do not need real-time responses (document processing, content generation, data extraction, evaluation pipelines), XALEN's batch processing at 50% off across all models changes the math significantly. At $0.03/1M input tokens for DeepSeek V3.1 batch, XALEN undercuts even DeepSeek's direct API pricing (without cache) by 40%.

The trade-off is latency: batch jobs are delivered within 24 hours. If your pipeline can tolerate this delay (most document processing, content generation, and evaluation workloads can), batch is the cheapest way to access DeepSeek models through any provider.

DeepSeek R1: Reasoning Model Pricing

DeepSeek R1 deserves separate attention because reasoning models consume more tokens than they appear to. R1 generates "thinking tokens" as part of its chain-of-thought process. These tokens are billed as output but are not returned to you. A request that generates 500 visible output tokens might consume 2000-5000 total output tokens, depending on the reasoning complexity.

This means R1's effective cost is 3-10x higher than its listed per-token price for complex reasoning tasks. Always budget for reasoning tokens when estimating R1 costs. For simple tasks that do not require chain-of-thought, use V3.1 instead of R1.

Provider	R1 Input $/1M	R1 Output $/1M	Effective Cost (3x reasoning)
DeepSeek direct	$0.08	$0.15	~$0.53/1M
Together AI	$0.08	$0.15	~$0.53/1M
XALEN	$0.08	$0.15	~$0.53/1M
OpenRouter	$0.10	$0.22	~$0.76/1M

Effective cost assumes 500 visible output tokens + 1000 reasoning tokens per request. OpenRouter's markup is most painful for reasoning models because the markup applies to all output tokens, including reasoning.

Availability and Regional Considerations

DeepSeek's direct API is served from infrastructure in China. Depending on your region, latency and availability can vary. Developers in the US and Europe have reported occasional connectivity issues, especially during peak hours. DeepSeek does not offer an SLA for API availability, which makes it risky as a sole provider for production workloads.

Gateway providers (Together AI, XALEN, Fireworks) host their own copies of DeepSeek models on infrastructure in the US and other regions. This eliminates the cross-Pacific latency and provides more consistent availability. The trade-off is a slight pricing markup. For production applications serving users outside Asia, using a gateway provider for DeepSeek models is generally the more reliable choice.

Some enterprises also have compliance requirements that restrict data from passing through infrastructure in certain jurisdictions. If your compliance team has concerns about data routing through China, a gateway provider that hosts DeepSeek models domestically resolves this issue entirely while preserving the same model quality and behavior.

Which Provider Should You Use?

Lowest per-token cost: DeepSeek direct API with cache. If your prompts have consistent prefixes, nothing beats the 90% cache discount.
Lowest batch cost: XALEN batch at 50% off. Cheaper than direct DeepSeek for non-real-time workloads.
Fastest inference: Together AI. They optimize open-source model inference on their own hardware.
Need other models too: OpenRouter (widest catalog) or XALEN (200+ models + domain computation). Pay the gateway margin for convenience.
Maximum reliability: Gateway provider (XALEN, Together AI, OpenRouter). DeepSeek's direct API has availability concerns from some regions.

DeepSeek vs Other Budget Models

DeepSeek V3.1 is not the only affordable frontier-class model. For context, here is how it stacks up against other budget options:

Model	Input $/1M	Output $/1M	Strength
DeepSeek V3.1	$0.05	$0.10	All-around, coding
GPT-4.1 Mini	$0.03	$0.08	Best value frontier
Claude Sonnet 4	$0.03	$0.15	Reasoning quality
Qwen 3 235B	$0.06	$0.10	Multilingual
Llama 4 Scout	$0.05	$0.08	512K context

DeepSeek V3.1 competes well on raw price, but its unique advantage is the cache discount on the direct API. Without cache, GPT-4.1 Mini is actually cheaper per token. With 80% cache hit rate, DeepSeek becomes the cheapest frontier model by a significant margin. The right choice depends on whether your workload benefits from prefix caching.

Code Example: Using DeepSeek on XALEN

from openai import OpenAI

client = OpenAI(
    base_url="https://api.xalen.io/v1",
    api_key="xln_test_YOUR_KEY"
)

response = client.chat.completions.create(
    model="deepseek/deepseek-chat-v3.1",
    messages=[
        {"role": "system", "content": "You are a helpful coding assistant."},
        {"role": "user", "content": "Write a Python function to merge two sorted lists."}
    ],
    temperature=0.3,
    max_tokens=1000
)
print(response.choices[0].message.content)

For more on gateway options, see our API gateway comparison and full pricing guide.

Run DeepSeek on XALEN

DeepSeek V3.1 at $0.06 input. Batch at 50% off. Plus 200+ other models.

Get API Key Compare Models

Last updated: May 20, 2026. Pricing from official provider documentation. DeepSeek direct pricing may differ by region. XALEN is both an API gateway and model provider; see our methodology. This guide is updated monthly.