Has DeepSeek V4 been released?

DeepSeek V4 launched in early March 2026, with a 'V4 Lite' variant appearing on March 9. The full model scores 81% on SWE-Bench Verified and costs $0.30 per million input tokens — roughly 10x cheaper than competing frontier models. Weights are available under Apache 2.0.

How many parameters does DeepSeek V4 have?

DeepSeek V4 has approximately 1 trillion total parameters using a Mixture-of-Experts (MoE) architecture, but only activates ~37 billion per token. This is roughly 50% more total parameters than V3's 671 billion while keeping inference costs comparable.

What is DeepSeek's Engram memory system?

Engram is a conditional memory architecture introduced in DeepSeek's January 2026 paper. It provides O(1) deterministic knowledge lookup for static patterns like entity names, achieving 97% Needle-in-a-Haystack accuracy at million-token scale. The optimal parameter split is 20-25% Engram memory and 75-80% MoE computation.

How does DeepSeek V4 compare to GPT-5.4 and Claude Opus 4.6?

DeepSeek V4 scores 81% on SWE-Bench Verified (vs. Claude Opus 4.5's 80.9% record), supports 1M token context, and is natively multimodal. Its key advantage is cost: $0.30/M input tokens vs. $2.50 for GPT-5.4 and $15.00 for Opus 4.6. It's open-source under Apache 2.0 while competitors are proprietary.

Is DeepSeek V4 open source?

Yes. DeepSeek V4 model weights are released under Apache 2.0 licensing, making it freely available for local deployment, fine-tuning, and commercial use without restriction. This continues DeepSeek's open-source tradition from V3.

Key Takeaways

1 trillion parameters, 37B active: DeepSeek V4 uses a Mixture-of-Experts architecture that activates only ~37B parameters per token — keeping inference costs comparable to V3 despite 50% more total parameters.
81% SWE-Bench Verified: V4 claims the coding benchmark crown — beating Claude Opus 4.5's previous record of 80.9%.
Engram memory is the architectural breakthrough: A new conditional memory system that provides O(1) knowledge lookup, achieving 97% accuracy on Needle-in-a-Haystack at million-token scale.
10x cheaper than Western competitors: At $0.30/M input tokens, V4 undercuts GPT-5.4 ($2.50) and Claude ($3-15) by an order of magnitude.
Open-source under Apache 2.0: Full model weights available for local deployment, fine-tuning, and commercial use — the only frontier-class model with this level of openness.

DeepSeek V4: The Open-Source Model That's Rewriting the Economics of AI

DeepSeek has done it again. After V3 proved that a Chinese lab could build frontier-class models at a fraction of Western costs, V4 raises the stakes to a level that demands attention from every developer, startup, and enterprise making AI infrastructure decisions.

One trillion parameters. Million-token context. Native multimodal. 81% SWE-Bench Verified. And all of it open-source under Apache 2.0 at 10-40x lower inference costs than Western competitors.

Whether these claims fully hold up under independent scrutiny is still being determined. But the architecture innovations — particularly Engram memory — represent genuine advances that will influence model design across the industry regardless.

Here's everything we know as of March 2026.

Release Timeline

DeepSeek V4's path to release was bumpy, with multiple delayed windows:

Date	Event
January 2026	Engram paper published — conditional memory architecture
February 2026 (early)	Original release target — missed
February 2026 (mid)	Second release window — also missed
Early March 2026	Full V4 model launched
March 9, 2026	"V4 Lite" appeared on DeepSeek's website
March 2026 (ongoing)	Independent benchmarking and community validation

The delayed timeline actually increased anticipation. By the time V4 launched, the Engram paper had already been widely discussed, and expectations were sky-high.

Architecture Deep Dive

Mixture-of-Experts at Trillion Scale

DeepSeek V4 continues the MoE architecture that made V3 so efficient, but scales it dramatically:

Metric	DeepSeek V3	DeepSeek V4
Total Parameters	671B	~1T
Active Parameters	~37B	~37B
Context Window	128K	1M
Architecture	MoE	MoE + Engram
Multimodal	Text only	Text + Image + Video
License	Apache 2.0	Apache 2.0

The key insight: total parameters increased by 50%, but active parameters per token stayed constant at ~37B. This means V4 has access to far more knowledge and capability without proportionally increasing inference costs.

Engram: The Memory Revolution

Engram is the most architecturally significant innovation in V4. Detailed in DeepSeek's January 2026 paper ("Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models"), it addresses a fundamental limitation of Transformers.

The Problem: Traditional Transformers treat every piece of knowledge the same way — through computation. Whether the model needs to recall that "Paris is the capital of France" (a static fact) or reason about a complex code refactor (dynamic computation), it uses the same attention mechanism. This is wasteful.

Engram's Solution: Add a separate memory system for static, deterministic knowledge. Instead of computing the answer to "What is the capital of France?" through multiple attention layers, Engram provides O(1) deterministic lookup — essentially a learned hash table for factual knowledge.

The Key Finding — Sparsity Allocation Law: DeepSeek's research revealed that under a fixed sparse parameter budget, the optimal split is approximately 20-25% memory (Engram) and 75-80% computation (MoE). This ratio maximizes both recall accuracy and reasoning capability.

Performance Impact: Engram achieves 97% Needle-in-a-Haystack accuracy at million-token context scale, solving the retrieval degradation problem that plagues standard Transformer architectures. At 1M tokens, most models' retrieval accuracy drops below 80%. V4 with Engram maintains 97%.

DeepSeek Sparse Attention (DSA)

Beyond Engram, V4 introduces DeepSeek Sparse Attention — an attention mechanism that dynamically allocates compute based on input complexity. Simple passages get lightweight attention; complex reasoning passages get full attention depth.

This is what makes the million-token context window practical. Without DSA, processing 1M tokens would be prohibitively expensive even at DeepSeek's low costs. With it, most of the context window is processed efficiently, with full compute reserved for the parts that need it.

Manifold-Constrained Hyper-Connections

The third architectural innovation is Manifold-Constrained Hyper-Connections — a technique that improves gradient flow during training. The practical result is more stable training at trillion-parameter scale, which partly explains how DeepSeek trained V4 at a fraction of Western costs.

Benchmark Analysis

The Numbers

Benchmark	DeepSeek V4	Claude Opus 4.5	GPT-5.4	Notes
SWE-Bench Verified	81%	80.9%	~82%	V4 beats previous record
HumanEval	90%	~88%	~90%	Code generation
Context (NIAH)	97% @ 1M	95% @ 200K	96% @ 1M	Engram advantage
Multimodal	Native	N/A	Native	Text + Image + Video

Caveat: Independent Verification

It's important to note that as of late March 2026, many of these numbers come from internal benchmarks. Until third-party evaluations from organizations like Artificial Analysis, LMSYS, or independent researchers fully confirm the claims, treat the exact percentages as aspirational rather than definitive.

That said, V3's benchmarks were largely confirmed by independent testing, giving DeepSeek credibility that these V4 numbers are in the right ballpark.

Pricing: The Cost Revolution Continues

DeepSeek V4's pricing is its most disruptive feature:

Model	Input Price (per M tokens)	Output Price (per M tokens)	Cache Hit Price
DeepSeek V4	$0.30	$0.50	$0.03
GPT-5.4	$2.50	$15.00	N/A
Claude Sonnet 4.6	$3.00	$15.00	$0.30
Claude Opus 4.6	$15.00	$75.00	$1.50

The cache hit pricing is particularly compelling: if your prompts share a common prefix (which they almost always do in production applications), cached input tokens cost only $0.03 per million — a 90% discount.

What This Means in Practice

For a typical app builder processing 100M tokens per month:

Provider	Monthly Cost
DeepSeek V4	~$40-80
GPT-5.4	~$500-1,500
Claude Sonnet 4.6	~$600-1,800
Claude Opus 4.6	~$3,000-9,000

This 10-40x cost advantage is why DeepSeek matters for the broader AI ecosystem. It makes frontier-class AI accessible to indie developers, small startups, and cost-sensitive enterprise teams.

Platforms like ZBuild can integrate DeepSeek V4 as a backend model option, passing these dramatic cost savings directly to users building AI-powered applications.

Native Multimodal: Text, Image, and Video

Unlike V3 (text-only), V4 is natively multimodal. As reported by the Financial Times, V4 integrates text, image, and video generation during pre-training rather than bolting on vision as a separate module.

This matters because:

Cross-modal reasoning is more coherent — the model understands relationships between text descriptions and visual content natively
Image and video understanding — V4 can analyze screenshots, diagrams, and video frames alongside text
Generation capabilities — early reports suggest text-to-image and text-to-video generation, though quality assessments are still emerging

For developers building applications that process visual content — document analysis, UI design, video summarization — native multimodal support eliminates the need for separate vision APIs.

Practical Multimodal Use Cases

The native multimodal integration opens several practical workflows:

Code from Screenshots: Provide a screenshot of a UI design and V4 generates the corresponding code — HTML/CSS, React components, or SwiftUI views
Diagram Understanding: Feed architecture diagrams, flowcharts, or database schemas and V4 explains the design, identifies issues, or generates implementation code
Document Processing: Extract structured data from scanned documents, invoices, and forms without a separate OCR pipeline
Video Summarization: Process video frames to generate summaries, transcripts, or highlight key moments

For app builders like ZBuild, native multimodal means users can upload mockups and screenshots directly as part of the app creation workflow — the AI understands visual context without additional tooling.

Open-Source Impact

DeepSeek V4's Apache 2.0 license is arguably more significant than its benchmark scores. Here's what it enables:

Self-Hosting

Organizations with data sovereignty requirements can run V4 on their own infrastructure. No API calls, no data leaving the building, no vendor dependency. The ~37B active parameters per token make it runnable on high-end enterprise GPU clusters.

Fine-Tuning

The open weights allow domain-specific fine-tuning — medical, legal, financial, or any specialized vertical. This is impossible with proprietary models from OpenAI or Anthropic.

Research

The full architecture details and training methodology enable the research community to build on DeepSeek's innovations. Engram memory, DSA, and Manifold-Constrained Hyper-Connections are all available for study and improvement.

Cost Control

Even beyond DeepSeek's already-low API prices, self-hosting at scale can reduce per-token costs further. For high-volume applications processing billions of tokens monthly, self-hosting V4 can be 100x cheaper than proprietary API pricing.

DeepSeek V4 vs. V3: Should You Upgrade?

For existing DeepSeek V3 users, here's the upgrade calculus:

Feature	V3	V4	Upgrade Impact
Context Window	128K	1M	High — enables codebase-scale analysis
SWE-Bench	69%	81%	High — 12-point improvement
Multimodal	Text only	Text + Image + Video	Medium — depends on use case
Engram Memory	No	Yes	High — dramatically better retrieval
API Price	$0.27/M input	$0.30/M input	Low — minimal cost increase
Architecture	MoE	MoE + Engram + DSA	High — fundamentally better

Verdict: Upgrade. The cost increase is negligible, and the capability improvements — especially Engram memory and million-token context — are substantial. The only reason to stay on V3 is if you have production workloads that require the exact behavioral consistency of your current model.

How DeepSeek V4 Fits the Developer Ecosystem

For Indie Developers and Startups

V4's pricing makes frontier-class AI accessible at startup budgets. Combined with Apache 2.0 licensing, you can build and deploy production applications without worrying about API cost scaling. Tools like ZBuild that integrate multiple model providers let you leverage DeepSeek V4's cost advantage while maintaining the option to route specific tasks to other models when needed.

For Enterprise Teams

The self-hosting option addresses data sovereignty, compliance, and cost concerns simultaneously. Fine-tuning capability means you can build domain-specific models that outperform general-purpose alternatives in your specific vertical.

For Researchers

The open architecture is a goldmine. Engram memory alone opens multiple research directions — conditional memory architectures, sparsity allocation optimization, and hybrid retrieval-computation systems.

For the AI Industry

V4 puts pressure on every frontier model provider to justify their pricing. When an open-source model matches or exceeds proprietary benchmarks at 10x lower cost, the value proposition of closed models shifts from "better performance" to "better integration, support, and reliability."

Risks and Uncertainties

Benchmark Verification

The 81% SWE-Bench claim needs independent confirmation. DeepSeek has been trustworthy with V3 benchmarks, but trillion-parameter models are harder to evaluate consistently. Wait for Artificial Analysis and LMSYS results before making infrastructure decisions based on exact numbers.

Geopolitical Risk

DeepSeek is a Chinese company, and US-China tech tensions are ongoing. Export controls, API access restrictions, or political pressure could affect availability for Western developers. Self-hosting with open weights mitigates but doesn't eliminate this risk.

Multimodal Quality

The multimodal capabilities are the least-tested aspect of V4. Image and video understanding quality needs real-world validation beyond internal benchmarks.

Support and Reliability

Open-source means community support, not enterprise SLAs. If your production application depends on V4, you're responsible for uptime, scaling, and debugging. DeepSeek's API service has been reliable, but it doesn't offer the enterprise support infrastructure of OpenAI or Anthropic.

The Bottom Line

DeepSeek V4 is the most important open-source AI model released in 2026 so far. Its combination of trillion-parameter scale, Engram memory innovation, million-token context, native multimodal capabilities, and aggressively low pricing under an Apache 2.0 license makes it a genuine alternative to proprietary frontier models.

The caveats are real — benchmark verification is ongoing, geopolitical risks exist, and enterprise support is limited. But for developers and organizations willing to navigate those uncertainties, V4 offers frontier-class capabilities at a fraction of the cost.

Whether you access it through DeepSeek's API, self-host it on your infrastructure, or use it through platforms like ZBuild that integrate multiple model providers, DeepSeek V4 deserves a place in your AI toolkit.

Frequently Asked Questions

Can I self-host DeepSeek V4 on consumer hardware?

Not practically. While the model activates only ~37B parameters per token, hosting the full 1T parameter MoE model requires significant GPU memory for the expert routing tables. You'll need enterprise-grade GPU clusters (multiple A100s or H100s). For most developers, DeepSeek's API at $0.30/M input tokens is far more cost-effective than self-hosting unless you're processing billions of tokens monthly.

How does V4 Lite differ from the full V4 model?

DeepSeek V4 Lite appeared on DeepSeek's website on March 9, 2026, but no official specifications have been published. Based on DeepSeek's naming patterns with V3, "Lite" likely refers to a distilled or smaller variant optimized for speed and cost at the expense of some capability. Expect it to be faster and cheaper but with reduced performance on complex reasoning tasks.

Is DeepSeek V4 censored for certain topics?

Like all Chinese AI models, DeepSeek V4 has content filtering for politically sensitive topics, particularly those related to Chinese politics and governance. For general development, coding, and technical use cases, the filtering has minimal impact. For applications involving sensitive political content or unrestricted generation, this is a legitimate consideration.

What programming languages does V4 handle best?

Based on SWE-Bench results (which primarily test Python, JavaScript, and Java), V4 excels at mainstream languages. Community reports suggest strong performance across Python, JavaScript/TypeScript, Java, Go, Rust, and C++. Less common languages like Haskell, Elixir, or Zig likely have weaker support due to training data distribution.

How does DeepSeek V4 compare to Llama 4 for self-hosting?

Both are open-source and available under permissive licenses. DeepSeek V4's MoE architecture with ~37B active parameters per token offers better performance-per-compute than dense models. Llama 4's advantage is Meta's larger ecosystem and community support. For pure capability per dollar, V4 likely wins. For community tooling and fine-tuning ecosystem, Llama may be more accessible.

DeepSeek V4 Release: Specs, Benchmarks & Everything We Know About the 1T Open-Source Model (2026)