AI

Blackwell is Nvidia's latest MLPerf star

A lot of GPUs are needed for AI training and inference, and that often means a lot of money out of customers’ pockets and into Nvidia’s, but if buyers are looking for a silver lining as they prepare to write big checks for Nvidia’s newest (and more expensive) Blackwell GPU packages, at least they now know  how many fewer Blackwells it will take vs. Hopper H100s to get the LLM job done.

Nvidia’s analysis this week of its latest MLPerf benchmark results, the first MLPerf showcase to include Blackwell in a preview mode running large language models, observed, “Taking advantage of larger, higher-bandwidth HBM3e memory, just 64 Blackwell GPUs were able to run in the GPT-3 [175B] LLM benchmark without compromising per-GPU performance. The same benchmark run using Hopper needed 256 GPUs.”

As GPU performance metrics climb, counting the number of chips needed one by one may not be the best form of comparison, as Nvidia for a while has been distancing itself from the notion of a GPU as an individual chip and more toward GPUs as part of a full rack or system solution. This makes sense given how GPUs are deployed and used in the market on AI training and inference.

“Very few people are training [AI] on single nodes,” Dave Salvator, director of accelerated computing products at Nvidia, said in a briefing arranged to discuss the MLPerf results. “It is mostly a multi-node configuration that's being used in the cutting-edge models. Scale clusters can range anywhere from hundreds or thousands to even tens of thousands of GPUs to get this work done, and then over time, we will see that go to even the next order of magnitude, into the hundreds of thousands.”

Still, it is impressive to see as customers move from H100 GPUs to Blackwell GPUs, just how much the higher per-GPUe compute performance and increasing efficiency is driving the total number of GPUs needed to come down even as demand and the size of large language models continues to grow.

Meanwhile, the Hopper family itself continues to improve through optimization and software enhancements. Nvidia’s analysis stated, “In this round of MLPerf training submissions, Hopper delivered a 1.3x improvement on GPT-3 175B per-GPU training performance since the introduction of the benchmark. Nvidia also submitted large-scale results on the GPT-3 175B benchmark using 11,616 Hopper GPUs connected with Nvidia NVLink and NVSwitch high-bandwidth GPU-to-GPU communication and Nvidia Quantum-2 InfiniBand networking. Nvidia Hopper GPUs have more than tripled scale and performance on the GPT-3 175B benchmark since last year. In addition, on the Llama 2 70B LoRA fine-tuning benchmark, NVIDIA increased performance by 26% using the same number of Hopper GPUs, reflecting continued software enhancements.”