As of June 2026: NVIDIA has completed its open model family Nemotron 3 with the three reasoning sizes Nano, Super and Ultra. With the release of Nemotron 3 Ultra (550B) on 4 June 2026, NVIDIA offers its first frontier-scale model with open weights. All models are released under the NVIDIA Open Model License and can be freely downloaded, modified and operated commercially. innFactory AI Consulting from Germany advises on GDPR-compliant deployment.
What is NVIDIA Nemotron?
Nemotron is NVIDIA’s family of open models (open weights) built specifically for agentic AI and reasoning. Unlike purely proprietary models, NVIDIA openly publishes weights, training data and training techniques – so the community can run, customize and further train the models for their own purposes. The current Nemotron 3 generation uses an efficient hybrid Mamba-Transformer Mixture-of-Experts (MoE) architecture, activating only a fraction of the parameters per request.
The Nemotron 3 Model Variants
NVIDIA offers three reasoning sizes that scale with task complexity and hardware:
Nemotron 3 Nano (30B-A3B)
- ~31.6B parameters total, ~3.6B active (hybrid Mamba-Transformer MoE)
- 1M token context window
- Highly efficient: up to 4x higher throughput than Nemotron Nano 2
- Self-hostable on moderate hardware (vLLM, SGLang, LM Studio, llama.cpp, Ollama)
- Release: 15 December 2025
Nemotron 3 Super (120B-A12B)
- 120B parameters total, ~12B active (hybrid Mamba-Transformer MoE)
- Strong agentic, reasoning and tool-calling capabilities
- Optimized for multi-agent systems and high-throughput workloads (e.g. IT ticket automation)
- Release: GTC, March 2026
Nemotron 3 Ultra (550B-A55B)
- 550B parameters total, ~55B active (hybrid LatentMoE: Mamba-2 + MoE + Attention, with Multi-Token Prediction)
- Up to 1M token context for long-context analysis
- Frontier reasoning over code, mathematics and science
- Release: 4 June 2026 (Computex)
Additional models: NVIDIA also maintains Nemotron Nano Omni (multimodal, vision/audio/language, 256K context), retriever/RAG models, Nemotron Safety/Guard for content safety, plus OCR and speech models. The older Llama Nemotron line (Nano 8B, Super 49B, Ultra 253B) is based on Llama 3.1 with 128K context.
Key Strengths
Efficiency through Hybrid MoE
The Mamba-Transformer MoE architecture activates only a fraction of total parameters per token. This significantly reduces inference costs – according to NVIDIA, Nemotron 3 Nano achieves up to 4x higher throughput than the previous generation and reduces the number of reasoning tokens.
Agentic & Tool Use
All Nemotron 3 models are built for agentic workflows: native tool calling, function calls and structured output for multi-agent systems.
Open Weights & Permissive License
The NVIDIA Open Model License is permissive and allows use, modification, distribution and commercial deployment – without an attribution requirement. NVIDIA also publishes training datasets and tools (NeMo Gym, NeMo RL, NeMo Evaluator).
Note: Some individual model cards on Hugging Face list slightly different license names (e.g. OpenMDW). We verify the specific license per model and version as part of our consulting.
EU Availability & GDPR Compliance
Because Nemotron is released as open weights, the cleanest sovereignty path is self-hosting on EU infrastructure – all data stays under your control.
Self-Hosting (recommended for sovereignty)
- Run on your own hardware or with an EU cloud provider (e.g. in Frankfurt)
- Full GDPR compliance, no dependency on US APIs
- Nano can already run on moderate hardware; Ultra requires multiple high-end GPUs (e.g. 8x B200/GB200, 16x H100 or 8x H200)
Managed via Hyperscalers (EU Regions)
- AWS: Amazon Bedrock / AWS Marketplace – Nemotron 3 listed (verify EU region)
- Microsoft Foundry: Nemotron 3 listed (verify EU region)
- Google: Gemini Enterprise Agent Platform (formerly Vertex AI – rebranded in April 2026 at Cloud Next) in the Model Garden
Directly via NVIDIA
- build.nvidia.com and NVIDIA NIM microservices for hosted endpoints or containerized self-service deployments
- Hosted inference also via providers such as Together AI, Fireworks, DeepInfra, OpenRouter, Baseten
For sensitive data we recommend self-hosting in the EU. For a quick start, EU regions of the hyperscalers or NIM containers in your own cloud environment are suitable.
Integration with CompanyGPT
Thanks to open weights, Nemotron integrates excellently into our GDPR-compliant solution CompanyGPT. This lets you run a powerful reasoning and agent model entirely within your own or an EU-hosted environment – without company data flowing to third countries. innFactory AI Consulting handles selection, deployment and fine-tuning of the right Nemotron model.
Our Recommendation
Nemotron is one of the strongest open model families for agentic and reasoning applications – and particularly attractive for sovereign, GDPR-compliant deployments thanks to its open weights.
For most enterprises, we recommend:
- Nemotron 3 Nano for efficient, cost-effective agents and RAG – self-hostable on moderate hardware
- Nemotron 3 Super as a balanced choice for demanding multi-agent systems
- Nemotron 3 Ultra for frontier reasoning at maximum requirements (given appropriate GPU infrastructure)
We are happy to advise you on model selection, self-hosting in the EU, and integration into existing workflows.
