On Wednesday, Nvidia released Nemotron 3 Super, the second model in its open-weight Nemotron 3 family — just before its GTC conference kicks off in San Jose next week.
The model is fast and efficient enough, reports TNS Senior Editor for AI Frederic Lardinois, to manage complex agentic AI systems at scale. As Lardinois points out in his story, the benchmarks appear to back up Nvidia’s claims. Here’s one from benchmarking firm Artificial Analysis: Nemotron 3 Super clocks in at 478 output tokens per second, which is almost twice as fast as OpenAI’s open-weight model.
The new model is now available on build.nvidia.com, Perplexity, OpenRouter, and Hugging Face. Enterprises will also be able to access it through Google Cloud’s Vertex AI, Oracle Cloud Infrastructure, and — soon — Amazon Bedrock and Microsoft Azure, as well as on platforms like Coreweave, Crusoe, Nebius, and Together AI.
There’s still no word on the debut of Nemotron 3 Ultra, the 500-billion-parameter sibling that Nvidia teased last year. (Maybe GTC will bring answers.)
Go deeper: Nvidia launches Nemotron 3 Super, a 120B open model for large-scale AI systems
More stories for your Thursday:
⦾ Tetrate launches open source marketplace to simplify Envoy adoption
⦾ Microsoft’s VS Code team moved to weekly releases after 10 years of monthly — and credits AI for making it possible
⦾ JetBrains names the debt AI agents leave behind
⦾ “Self-healing” IT? HPE research explores how AI-trained models can catch silent infrastructure failures
⦾ The 2 failures with AI coding that are creating security bottlenecks
⦾ Publish your data, AI techniques, and agentic engineering work on Towards Data Science