Nvidia’s New AI Chips Slash Training Times for Massive AI Models

Nvidia’s latest generation of AI chips is making significant advances in training some of the world’s largest artificial intelligence systems, according to new benchmark data released on Wednesday by MLCommons, a nonprofit organization that tracks AI system performance.

The results show a dramatic drop in the number of chips required to train large language models (LLMs), highlighting Nvidia’s growing technological lead in this critical area of AI development. While much of the financial market’s current focus is on the booming sector of AI inference—where AI models answer user queries—training remains a core competitive battleground, especially for developing next-generation models with trillions of parameters.

Blackwell Chips Outperform Previous Generations

Nvidia’s new Blackwell chips demonstrated superior performance over its previous Hopper generation. In tests involving Meta Platforms’ open-source Llama 3.1 405B model, which is complex enough to simulate some of the most demanding AI training workloads, Nvidia’s Blackwell chips completed training tasks with more than double the speed per chip compared to Hopper.

In one benchmark, a system using 2,496 Blackwell chips completed the training run in just 27 minutes. By comparison, even though more than three times as many Hopper chips were used in previous tests, they only achieved faster results due to sheer scale rather than efficiency.

Nvidia and its partners were the only ones to submit data for models of this size, giving Nvidia a clear demonstration of its leadership in training capabilities for multi-trillion parameter models.

Changing Industry Trends in AI Training

Chetan Kapoor, chief product officer of CoreWeave, which collaborated with Nvidia on the results, noted that AI companies are moving away from building vast, homogenous data centers with 100,000 or more identical chips. Instead, they are increasingly assembling smaller, specialized subsystems that handle different aspects of the training process. This modular approach allows companies to speed up training times and manage extremely large model sizes more efficiently.

“Using a methodology like that, they’re able to continue to accelerate or reduce the time to train some of these crazy, multi-trillion parameter model sizes,” Kapoor explained at a press briefing.

Global Competition Also Heating Up

While Nvidia maintains a dominant position, competitors around the world are also pushing for breakthroughs. For example, China’s DeepSeek has recently claimed it can create competitive chatbots while using far fewer chips than many U.S. rivals, adding to the growing international race for AI supremacy.

MLCommons’ report also included results from Advanced Micro Devices (AMD) and others, though Nvidia’s Blackwell system stood out in the training category.