Skip to main content
9 – 17 UHR +49 8031 3508270 LUITPOLDSTR. 9, 83022 ROSENHEIM
DE / EN
LLM NVIDIA USA

NVIDIA Nemotron

NVIDIA Nemotron 3 (Nano, Super, Ultra) - open models with open weights for agentic AI. GDPR-compliant via self-hosting. AI consulting from Germany advises on secure usage.

License NVIDIA Open Model License (permissive, commercial use allowed)
GDPR Hosting Available
Context 128K-1M Tokens
Modality Text, Code → Text, Code

Versions

Overview of available model variants

ModelReleaseEUStrengthsWeaknessesStatus
Nemotron 3 Ultra (550B-A55B)
4 June 2026
550B total parameters, ~55B active (hybrid LatentMoE: Mamba-2 + MoE + Attention) Up to 1M token context window Frontier reasoning over code, mathematics and science Open weights on Hugging Face Available via NVIDIA NIM on build.nvidia.com
Very high hardware requirements for self-hosting (e.g. 8x B200/GB200, 16x H100 or 8x H200)
Current
Nemotron 3 Super (120B-A12B) Recommended
March 2026 (GTC)
120B total parameters, ~12B active (hybrid Mamba-Transformer MoE) Strong agentic, reasoning and tool-calling capabilities Optimized for multi-agent systems and high-throughput workloads Open weights on Hugging Face
Medium to high resource requirements for self-hosting
Current
Nemotron 3 Nano (30B-A3B) Recommended
15 December 2025
~31.6B total parameters, ~3.6B active (hybrid Mamba-Transformer MoE) 1M token context window Highly efficient – up to 4x higher throughput than Nemotron Nano 2 Self-hostable on consumer hardware (vLLM, SGLang, LM Studio, llama.cpp, Ollama) Open weights on Hugging Face
Smallest variant – less suited for highly complex reasoning tasks
Current
Nemotron 3 Nano Omni (30B-A3B)
28 April 2026
Multimodal: unifies vision, audio and language 256K token context window Optimized for efficient multimodal AI agents
Newer model – ecosystem support still maturing
Current
Llama Nemotron Ultra (253B)
2025
Reasoning variant based on Llama 3.1 128K token context Open weights, established Llama ecosystem
Previous generation – superseded by Nemotron 3
Current

Use Cases

Typical applications for this model

Agentic Workflows & Multi-Agent Systems
Reasoning over Code, Mathematics & Science
Tool Calling & Function Calls
RAG & Knowledge Retrieval
Self-Hosted Deployments on EU Infrastructure
Cost-Efficient Inference (Nano)

Technical Details

API, features and capabilities

API & Availability
Availability Public
Features & Capabilities
Tool Use Function Calling Structured Output Reasoning Mode
Training & Knowledge
Knowledge Cutoff May 2026 (Ultra), February 2026 (Super)
Fine-Tuning Available (LoRA, Full, PEFT)
Language Support
Best Quality English
Supported Multilingual (incl. German, Spanish, French, Italian, Japanese)
German is supported; best quality in English. Ultra supports 10 languages beyond English.

Hosting & Compliance

GDPR-compliant hosting options and licensing

GDPR-Compliant Hosting Options
Self-Hosted
Own Infrastructure (EU)
Open weights - full control, cleanest sovereignty path
AWS
Frankfurt (eu-central-1)
Amazon Bedrock / AWS Marketplace - Nemotron 3 available (verify region)
Azure
West Europe
Microsoft Foundry - Nemotron 3 listed (verify region)
Google Cloud
Frankfurt (europe-west3)
Gemini Enterprise Agent Platform (formerly Vertex AI, rebrand April 2026) - Model Garden
License & Hosting
License NVIDIA Open Model License (permissive, commercial use allowed)
Security Filters Customizable (Nemotron Safety / Guard models available)
Enterprise Support Yes
SLA Available Yes
On-Premise Edge-capable

Benchmarks

Performance comparison with standardized tests

AIME (Ultra, no tools)
88.6
GPQA (Ultra)
87.0
SWE-Bench Verified (Ultra)
70.7
LiveCodeBench v6 (Ultra)
89.0
MMLU-Pro (Ultra)
86.8

As of June 2026: NVIDIA has completed its open model family Nemotron 3 with the three reasoning sizes Nano, Super and Ultra. With the release of Nemotron 3 Ultra (550B) on 4 June 2026, NVIDIA offers its first frontier-scale model with open weights. All models are released under the NVIDIA Open Model License and can be freely downloaded, modified and operated commercially. innFactory AI Consulting from Germany advises on GDPR-compliant deployment.

What is NVIDIA Nemotron?

Nemotron is NVIDIA’s family of open models (open weights) built specifically for agentic AI and reasoning. Unlike purely proprietary models, NVIDIA openly publishes weights, training data and training techniques – so the community can run, customize and further train the models for their own purposes. The current Nemotron 3 generation uses an efficient hybrid Mamba-Transformer Mixture-of-Experts (MoE) architecture, activating only a fraction of the parameters per request.

The Nemotron 3 Model Variants

NVIDIA offers three reasoning sizes that scale with task complexity and hardware:

Nemotron 3 Nano (30B-A3B)

  • ~31.6B parameters total, ~3.6B active (hybrid Mamba-Transformer MoE)
  • 1M token context window
  • Highly efficient: up to 4x higher throughput than Nemotron Nano 2
  • Self-hostable on moderate hardware (vLLM, SGLang, LM Studio, llama.cpp, Ollama)
  • Release: 15 December 2025

Nemotron 3 Super (120B-A12B)

  • 120B parameters total, ~12B active (hybrid Mamba-Transformer MoE)
  • Strong agentic, reasoning and tool-calling capabilities
  • Optimized for multi-agent systems and high-throughput workloads (e.g. IT ticket automation)
  • Release: GTC, March 2026

Nemotron 3 Ultra (550B-A55B)

  • 550B parameters total, ~55B active (hybrid LatentMoE: Mamba-2 + MoE + Attention, with Multi-Token Prediction)
  • Up to 1M token context for long-context analysis
  • Frontier reasoning over code, mathematics and science
  • Release: 4 June 2026 (Computex)

Additional models: NVIDIA also maintains Nemotron Nano Omni (multimodal, vision/audio/language, 256K context), retriever/RAG models, Nemotron Safety/Guard for content safety, plus OCR and speech models. The older Llama Nemotron line (Nano 8B, Super 49B, Ultra 253B) is based on Llama 3.1 with 128K context.

Key Strengths

Efficiency through Hybrid MoE

The Mamba-Transformer MoE architecture activates only a fraction of total parameters per token. This significantly reduces inference costs – according to NVIDIA, Nemotron 3 Nano achieves up to 4x higher throughput than the previous generation and reduces the number of reasoning tokens.

Agentic & Tool Use

All Nemotron 3 models are built for agentic workflows: native tool calling, function calls and structured output for multi-agent systems.

Open Weights & Permissive License

The NVIDIA Open Model License is permissive and allows use, modification, distribution and commercial deployment – without an attribution requirement. NVIDIA also publishes training datasets and tools (NeMo Gym, NeMo RL, NeMo Evaluator).

Note: Some individual model cards on Hugging Face list slightly different license names (e.g. OpenMDW). We verify the specific license per model and version as part of our consulting.

EU Availability & GDPR Compliance

Because Nemotron is released as open weights, the cleanest sovereignty path is self-hosting on EU infrastructure – all data stays under your control.

Self-Hosting (recommended for sovereignty)

  • Run on your own hardware or with an EU cloud provider (e.g. in Frankfurt)
  • Full GDPR compliance, no dependency on US APIs
  • Nano can already run on moderate hardware; Ultra requires multiple high-end GPUs (e.g. 8x B200/GB200, 16x H100 or 8x H200)

Managed via Hyperscalers (EU Regions)

  • AWS: Amazon Bedrock / AWS Marketplace – Nemotron 3 listed (verify EU region)
  • Microsoft Foundry: Nemotron 3 listed (verify EU region)
  • Google: Gemini Enterprise Agent Platform (formerly Vertex AI – rebranded in April 2026 at Cloud Next) in the Model Garden

Directly via NVIDIA

  • build.nvidia.com and NVIDIA NIM microservices for hosted endpoints or containerized self-service deployments
  • Hosted inference also via providers such as Together AI, Fireworks, DeepInfra, OpenRouter, Baseten

For sensitive data we recommend self-hosting in the EU. For a quick start, EU regions of the hyperscalers or NIM containers in your own cloud environment are suitable.

Integration with CompanyGPT

Thanks to open weights, Nemotron integrates excellently into our GDPR-compliant solution CompanyGPT. This lets you run a powerful reasoning and agent model entirely within your own or an EU-hosted environment – without company data flowing to third countries. innFactory AI Consulting handles selection, deployment and fine-tuning of the right Nemotron model.

Our Recommendation

Nemotron is one of the strongest open model families for agentic and reasoning applications – and particularly attractive for sovereign, GDPR-compliant deployments thanks to its open weights.

For most enterprises, we recommend:

  • Nemotron 3 Nano for efficient, cost-effective agents and RAG – self-hostable on moderate hardware
  • Nemotron 3 Super as a balanced choice for demanding multi-agent systems
  • Nemotron 3 Ultra for frontier reasoning at maximum requirements (given appropriate GPU infrastructure)

We are happy to advise you on model selection, self-hosting in the EU, and integration into existing workflows.

Cost estimation for this model

For up-to-date token pricing, model variants and EU availability, see our sister project ai-prices.eu. It helps you compare and estimate the operational cost of leading AI models for your specific use case.

Compare prices on ai-prices.eu

ai-prices.eu is a project by innFactory AI Consulting GmbH and provides transparent cost estimates for leading AI models.

Consultation for this model?

We help you select and integrate the right AI model for your use case.