back

DeepSeek V3.2

Name: DeepSeek V3.2
Author: DeepSeek

MIT

DeepSeek · 685B (37B active) · Mixture of Experts

State-of-the-art MoE — 37B active params

HuggingFace Ollama

359.8K downloads 1.3K likes 2025-12 128K context

Use Cases

chat code reasoning

Mixture of Experts

Total experts: 256

Active experts: 8

Active params: 37.0B

Quantization Options

Quant	Bits	VRAM	Quality	Status
Q2_K	2	219.8 GB	low	—
Q3_K_M	3	307.5 GB	moderate	—
Q4_K_M	4	351.4 GB	good	—
Q5_K_M	5	439.1 GB	good	—
Q6_K	6	526.8 GB	excellent	—
Q8_0	8	702.3 GB	excellent	—
F16	16	1404 GB	lossless	—

About this model

DeepSeek v3.2

DeepSeek-V3.2 is a model that harmonizes high computational efficiency with superior reasoning and agent performance. Our approach is built upon three key technical breakthroughs:

DeepSeek Sparse Attention (DSA): an efficient attention mechanism that substantially reduces computational complexity while preserving model performance, specifically optimized for long-context scenarios.
Scalable Reinforcement Learning Framework: By implementing a robust RL protocol and scaling post-training compute, DeepSeek-V3.2 performs comparably to GPT-5.
Large-Scale Agentic Task Synthesis Pipeline: To integrate reasoning into tool-use scenarios, [DeepSeek team] developed a novel synthesis pipeline that systematically generates training data at scale. This facilitates scalable agentic post-training, improving compliance and generalization in complex interactive environments.

Reference

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models