← Library · Frontier

NVIDIA Nemotron 3 Ultra Released as 550B Parameter MoE Model for Agent Workflows

NVIDIA has released Nemotron 3 Ultra, a 550B-parameter Mixture-of-Experts (MoE) model with 55B active parameters. It is optimized for orchestrating complex, long-running agent workflows by combining frontier reasoning and high throughput with domain adaptability. The model includes architectural innovations like hybrid Mamba-Transformer layers, NVFP4 quantization for cross-architecture GPU deployment, LatentMoE for efficient expert routing, and multi-token prediction for improved generative speed. It is trained using NVIDIA NeMo RL and Gym open libraries with large datasets and is fully open-source, including weights, data, and recipes.

Why it matters

This release provides a powerful, open-source model designed for complex AI agent tasks, offering flexibility and efficiency across various NVIDIA GPU architectures. Its open nature allows developers to adapt and deploy it in domain-specific workflows.

Learn one new AI thing every day.

Daily Deck sends you seven plain-English cards like this every morning. Free.

Start free