Versions

Overview of available model variants

Model	Release	Strengths	Weaknesses	Status
Nemotron 3 Ultra (550B-A55B)	4 June 2026	550B total parameters, ~55B active (hybrid LatentMoE: Mamba-2 + MoE + Attention) Up to 1M token context window Frontier reasoning over code, mathematics and science Open weights on Hugging Face Available via NVIDIA NIM on build.nvidia.com	Very high hardware requirements for self-hosting (e.g. 8x B200/GB200, 16x H100 or 8x H200)	Current
Nemotron 3 Super (120B-A12B) Recommended	March 2026 (GTC)	120B total parameters, ~12B active (hybrid Mamba-Transformer MoE) Strong agentic, reasoning and tool-calling capabilities Optimized for multi-agent systems and high-throughput workloads Open weights on Hugging Face	Medium to high resource requirements for self-hosting	Current
Nemotron 3 Nano (30B-A3B) Recommended	15 December 2025	~31.6B total parameters, ~3.6B active (hybrid Mamba-Transformer MoE) 1M token context window Highly efficient – up to 4x higher throughput than Nemotron Nano 2 Self-hostable on consumer hardware (vLLM, SGLang, LM Studio, llama.cpp, Ollama) Open weights on Hugging Face	Smallest variant – less suited for highly complex reasoning tasks	Current
Nemotron 3 Nano Omni (30B-A3B)	28 April 2026	Multimodal: unifies vision, audio and language 256K token context window Optimized for efficient multimodal AI agents	Newer model – ecosystem support still maturing	Current
Llama Nemotron Ultra (253B)	2025	Reasoning variant based on Llama 3.1 128K token context Open weights, established Llama ecosystem	Previous generation – superseded by Nemotron 3	Current

Technical Details

API, features and capabilities

API & Availability

Availability Public

Features & Capabilities

Tool Use Function Calling Structured Output Reasoning Mode

Training & Knowledge

Knowledge Cutoff May 2026 (Ultra), February 2026 (Super)

Fine-Tuning Available (LoRA, Full, PEFT)

Language Support

Best Quality English

Supported Multilingual (incl. German, Spanish, French, Italian, Japanese)

German is supported; best quality in English. Ultra supports 10 languages beyond English.

Hosting & Compliance

GDPR-compliant hosting options and licensing

GDPR-Compliant Hosting Options

License & Hosting

License NVIDIA Open Model License (permissive, commercial use allowed)

Security Filters Customizable (Nemotron Safety / Guard models available)

Enterprise Support Yes

SLA Available Yes

On-Premise Edge-capable

As of June 2026: NVIDIA has completed its open model family Nemotron 3 with the three reasoning sizes Nano, Super and Ultra. With the release of Nemotron 3 Ultra (550B) on 4 June 2026, NVIDIA offers its first frontier-scale model with open weights. All models are released under the NVIDIA Open Model License and can be freely downloaded, modified and operated commercially. innFactory AI Consulting from Germany advises on GDPR-compliant deployment.

What is NVIDIA Nemotron?

Nemotron is NVIDIA’s family of open models (open weights) built specifically for agentic AI and reasoning. Unlike purely proprietary models, NVIDIA openly publishes weights, training data and training techniques – so the community can run, customize and further train the models for their own purposes. The current Nemotron 3 generation uses an efficient hybrid Mamba-Transformer Mixture-of-Experts (MoE) architecture, activating only a fraction of the parameters per request.

The Nemotron 3 Model Variants

NVIDIA offers three reasoning sizes that scale with task complexity and hardware:

Nemotron 3 Nano (30B-A3B)

~31.6B parameters total, ~3.6B active (hybrid Mamba-Transformer MoE)
1M token context window
Highly efficient: up to 4x higher throughput than Nemotron Nano 2
Self-hostable on moderate hardware (vLLM, SGLang, LM Studio, llama.cpp, Ollama)
Release: 15 December 2025

Nemotron 3 Super (120B-A12B)

120B parameters total, ~12B active (hybrid Mamba-Transformer MoE)
Strong agentic, reasoning and tool-calling capabilities
Optimized for multi-agent systems and high-throughput workloads (e.g. IT ticket automation)
Release: GTC, March 2026

Nemotron 3 Ultra (550B-A55B)

550B parameters total, ~55B active (hybrid LatentMoE: Mamba-2 + MoE + Attention, with Multi-Token Prediction)
Up to 1M token context for long-context analysis
Frontier reasoning over code, mathematics and science
Release: 4 June 2026 (Computex)

Additional models: NVIDIA also maintains Nemotron Nano Omni (multimodal, vision/audio/language, 256K context), retriever/RAG models, Nemotron Safety/Guard for content safety, plus OCR and speech models. The older Llama Nemotron line (Nano 8B, Super 49B, Ultra 253B) is based on Llama 3.1 with 128K context.

Key Strengths

Efficiency through Hybrid MoE

The Mamba-Transformer MoE architecture activates only a fraction of total parameters per token. This significantly reduces inference costs – according to NVIDIA, Nemotron 3 Nano achieves up to 4x higher throughput than the previous generation and reduces the number of reasoning tokens.

Agentic & Tool Use

All Nemotron 3 models are built for agentic workflows: native tool calling, function calls and structured output for multi-agent systems.

Open Weights & Permissive License

The NVIDIA Open Model License is permissive and allows use, modification, distribution and commercial deployment – without an attribution requirement. NVIDIA also publishes training datasets and tools (NeMo Gym, NeMo RL, NeMo Evaluator).

Note: Some individual model cards on Hugging Face list slightly different license names (e.g. OpenMDW). We verify the specific license per model and version as part of our consulting.

EU Availability & GDPR Compliance

Because Nemotron is released as open weights, the cleanest sovereignty path is self-hosting on EU infrastructure – all data stays under your control.

Self-Hosting (recommended for sovereignty)

Run on your own hardware or with an EU cloud provider (e.g. in Frankfurt)
Full GDPR compliance, no dependency on US APIs
Nano can already run on moderate hardware; Ultra requires multiple high-end GPUs (e.g. 8x B200/GB200, 16x H100 or 8x H200)

Managed via Hyperscalers (EU Regions)

AWS: Amazon Bedrock / AWS Marketplace – Nemotron 3 listed (verify EU region)
Microsoft Foundry: Nemotron 3 listed (verify EU region)
Google: Gemini Enterprise Agent Platform (formerly Vertex AI – rebranded in April 2026 at Cloud Next) in the Model Garden

Directly via NVIDIA

build.nvidia.com and NVIDIA NIM microservices for hosted endpoints or containerized self-service deployments
Hosted inference also via providers such as Together AI, Fireworks, DeepInfra, OpenRouter, Baseten

For sensitive data we recommend self-hosting in the EU. For a quick start, EU regions of the hyperscalers or NIM containers in your own cloud environment are suitable.

Integration with CompanyGPT

Thanks to open weights, Nemotron integrates excellently into our GDPR-compliant solution CompanyGPT. This lets you run a powerful reasoning and agent model entirely within your own or an EU-hosted environment – without company data flowing to third countries. innFactory AI Consulting handles selection, deployment and fine-tuning of the right Nemotron model.

Our Recommendation

Nemotron is one of the strongest open model families for agentic and reasoning applications – and particularly attractive for sovereign, GDPR-compliant deployments thanks to its open weights.

For most enterprises, we recommend:

Nemotron 3 Nano for efficient, cost-effective agents and RAG – self-hostable on moderate hardware
Nemotron 3 Super as a balanced choice for demanding multi-agent systems
Nemotron 3 Ultra for frontier reasoning at maximum requirements (given appropriate GPU infrastructure)

We are happy to advise you on model selection, self-hosting in the EU, and integration into existing workflows.

Cost estimation for this model

For up-to-date token pricing, model variants and EU availability, see our sister project ai-prices.eu. It helps you compare and estimate the operational cost of leading AI models for your specific use case.

Compare prices on ai-prices.eu

ai-prices.eu is a project by innFactory AI Consulting GmbH and provides transparent cost estimates for leading AI models.

NVIDIA Nemotron