back

Nemotron 3 Nano 30B

Name: Nemotron 3 Nano 30B
Author: NVIDIA

NVIDIA Open

NVIDIA · 30B (3B active) · Mixture of Experts

MoE with 1M context and 3B active

HuggingFace Ollama LM Studio

1.0M downloads 655 likes 2025-06 1024K context

Use Cases

chat reasoning

Mixture of Experts

Total experts: 128

Active experts: 6

Active params: 3.0B

Quantization Options

Quant	Bits	VRAM	Quality	Status
Q2_K	2	10.1 GB	low	—
Q3_K_M	3	13.9 GB	moderate	—
Q4_K_M	4	15.9 GB	good	—
Q5_K_M	5	19.7 GB	good	—
Q6_K	6	23.6 GB	excellent	—
Q8_0	8	31.2 GB	excellent	—
F16	16	62 GB	lossless	—

About this model

Nemotron 3 Nano 30B

ollama run nemotron-3-nano:30b

Ollama’s Cloud

ollama run nemotron-3-nano:30b-cloud

Model Dates:

September 2025 - December 2025

Data Freshness:

The post-training data has a cutoff date of November 28, 2025.
The pre-training data has a cutoff date of June 25, 2025.

What is Nemotron?

NVIDIA Nemotron™ is a family of open models with open weights, training data, and recipes, delivering leading efficiency and accuracy for building specialized AI agents.

Nemotron 3 Nano is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks. It responds to user queries and tasks by first generating a reasoning trace and then concluding with a final response. The model’s reasoning capabilities can be configured through a flag in the chat template. If the user prefers the model to provide its final answer without intermediate reasoning traces, it can be configured to do so, albeit with a slight decrease in accuracy for harder prompts that require reasoning. Conversely, allowing the model to generate reasoning traces first generally results in higher-quality final solutions to queries and tasks.

The model employs a hybrid Mixture-of-Experts (MoE) architecture, consisting of 23 Mamba-2 and MoE layers, along with 6 Attention layers. Each MoE layer includes 128 experts plus 1 shared expert, with 6 experts activated per token. The model has 3.5B active parameters and 30B parameters in total.

The supported languages include: English, German, Spanish, French, Italian, and Japanese. Improved using Qwen.

Reasoning Benchmark Evaluations

Task	NVIDIA-Nemotron-3-Nano-30B-A3B-BF16	Qwen3-30B-A3B-Thinking-2507	GPT-OSS-20B
General Knowledge
MMLU-Pro	78.3	80.9	75.0
Reasoning
AIME25 (no tools)	89.1	85.0	91.7
AIME25 (with tools)	99.2	-	98.7
GPQA (no tools)	73.0	73.4	71.5
GPQA (with tools)	75.0	-	74.2
LiveCodeBench (v6 2025-08–2025-05)	68.3	66.0	61.0
SciCode (subtask)	33.3	33.0	34.0
HLE (no tools)	10.6	9.8	10.9
HLE (with tools)	15.5	-	17.3
MiniF2F pass@1	50.0	5.7	12.1
MiniF2F pass@32	79.9	16.8	43.0
Agentic
Terminal Bench (hard subset)	8.5	5.0	6.0
SWE-Bench (OpenHands)	38.8	22.0	34.0
TauBench V2 (Airline)	48.0	58.0	38.0
TauBench V2 (Retail)	56.9	58.8	38.0
TauBench V2 (Telecom)	42.2	26.3	49.7
TauBench V2 (Average)	49.0	47.7	48.7
BFCL v4	53.8	46.4*	-
Chat & Instruction Following
IFBench (prompt)	71.5	51.0	65.0
Scale AI Multi Challenge	38.5	44.8	33.8
Arena-Hard-V2 (Hard Prompt)	72.1	49.6*	71.2*
Arena-Hard-V2 (Creative Writing)	63.2	66.0*	25.9&
Arena-Hard-V2 (Average)	67.7	57.8	48.6
Long Context
AA-LCR	35.9	59.0	34.0
RULER-100@256k	92.9	89.4	-
RULER-100@512k	91.3	84.0	-
RULER-100@1M	86.3	77.5	-
Multilingual
MMLU-ProX (avg over langs)	59.5	77.6*	69.1*
WMT24++ (en->xx)	86.2	85.6	83.2

License/Terms of Use

Governing Terms: Use of this model is governed by the NVIDIA Open Model License Agreement.