Latest MLPerf performance benchmarks show generative AI advances

It’s time to talk about MLPerf benchmark results again, and while the big names battle over who did what better than everyone else, the overall conclusion one could draw is that processors from an array of companies are quickly advancing in their ability to handle the training of AI large language models. 

MLCommons this week released MLPerf Version 3.1 Training benchmark results, which measure the performance of training machine learning models, as well as results for the MLPerf High Performance Computing v.3.0 benchmark suite for supercomputers, which measures the performance of training machine learning models for scientific applications and data on these machines. 

Regarding the industry’s ability to continue increasing training performance, MLCommons said that the most recent training results pertaining to the GPT-3 benchmark showed a 2.8x performance gain compared to the results from last June, when the LLM benchmark first became part of the program. Overall, this shows how fast the industry is responding to the growing popularity of generative AI.

“GPT-3 is among the fastest growing benchmarks we’ve launched,” said David Kanter, Executive Director, MLCommons, in a statement. “It’s one of our goals to ensure that our benchmarks are representative of real-world workloads and it’s exciting to see 2.8X better performance in mere months.”

The MLPerf benchmarks have become on occasion for a growing number of companies to tout the performance of their systems, and Nvidia this week did so with its typical energy, saying that its new Eos AI supercomputer, armed with 10,752 Nvidia H100 Tensor Core GPUs and the company’s Quantum-2 InfiniBand networking capability, completed a training benchmark based on a GPT-3 model with 175 billion parameters trained on one billion tokens in a record 3.9 minutes. 

That figure is down from the 10.9 minutes it took just after the model became part of benchmarking five months ago, a feat which demonstrates how quickly Nvidia is responding to the demands of generative AI and how much cost, time, and energy concsumption could be saved in the process, said Dave Salvator, director of product marketing for accelerated cloud computing at Nvidia.

“The benchmark uses a portion of the full GPT-3 data set behind the popular ChatGPT service that, by extrapolation, Eos could now train in just eight days, 73x faster than a prior state-of-the-art system using 512 A100 GPUs,” Salvator stated in a blog post touting Nvidia’s latest MLPerf achievements.

Intel, meanwhile, talked up the MLPerf Version 3.1 results achieved by its Habana Labs Gaudi2 AI accelerator chip, stating, “Gaudi2 demonstrated a 2x performance leap with the implementation of the FP8 data type on the v3.1 training GPT-3 benchmark, reducing time-to-train by more than half compared to the June MLPerf benchmark, completing the training in 153.58 minutes on 384 Intel Gaudi2 accelerators.”

In its own media posting, Intel described the Gaudi2 as “the only viable alternative to Nvidia’s H100 for AI compute needs, delivering significant price-performance.

Karl Freund, Founder and Principal Analyst, Cambrian-AI Research LLC, who has become on of the top analyst voices when it comes to translating MLPerf results, took a closer look in a Forbes contributed article at the latest MLperf results posted by Nvidia, Intel, and Google.

It is worth noting that the latest benchmark results come just a few days before the SC23 supercomputing event is set to kick off and play host to a new round of supercomputing product announcements from these companies and others.