MLPerf v2.1 showcases record number of results as AI use grows

The MLCommons MLPerf AI and machine learning inference benchmark program used to be a showcase for Nvidia, with hardly any other companies participating. The most recent version of the program – MLPerf v2.1 – finds that it is still something of a showcase for Nvidia, but with many more companies seeing the value of the benchmarking, both as a sales tool and as a measuring stick for their own progress.

Nvidia was again the only company to submit systems in every benchmark category, and its solutions, particularly the new H100 Tensor Core GPU, performed well and beat the competition in many cases. However, the more intriguing news this time around was that the benchmark program posted new records of more than 5,300 performance results and 2,400 power measurement results, according to Vijay Janapa Reddi, vice president and founding member of MLCommons.

“We want to nurture innovation by having systematic benchmarks that are reflective of what inference capabilities these systems have,” he said. “We want them to be fair and useful measurements for everybody in the ecosystem. Both the commercial and research communities should be aligned equally because this is a still evolving field.”

He added that the best way for the benchmarking to be of value is for its to mimic real-world inference workload types, such as medical imaging, natural language processing, speech recognition, image classification and recommendation. MLPerf v2.1 also included a new workload – the RetinaNet large object detection model. The benchmarks use both data center and edge products in different use cases and usage scenarios.

The full results for the data center inference benchmark results can be found here on the MLCommons website, and the full edge inference benchmark results can be found here. The results also include a new benchmark added in v2.1 - the RetinaNet large object detection model.

Nvidia, as in previous MLPerf benchmarks, posted strong results across the board for all of its submissions, particularly pertaining to the H100, the A100 GPU and the Jetson AGX Orin module aimed at AI-powered robotics applications. The H100 appeared in a “preview” category designed for products that are not yet generally available.

“The big takeaway from this round is this is the first run where we're publishing measured data on the H 100,” said Dave Salvator, director of AI inference benchmarking and cloud at Nvidia. “H100 came in and really kind of brought the thunder with some very impressive results including up to a four-and-a-half times speed up versus the previous generation GPU benchmark.”

Salvator added that the A100 won more benchmark categories than any other data center GPU submitted. In the edge category, the Orin-based Jetson AGX edged out the competition on more tests than any other low-power system-on-a-chip. Just as important, Salvator said, is that the Orin module performed five times better and with twice the power efficiency of an earlier Jetson version that was benchmarked earlier in the year.

Salvator lauded growing industry participation in the MLPerf benchmarking, saying the benchmarking is a necessity as AI use cases in the enterprise continue to grow, and enterprises need ways to sort through the complexity.

“The number of applications and areas of application for AI are almost without limit,” he said. “It continues to be this growing technology being used in more and more places, and the complexity of the technology and the demands of the technology also continue to grow. The datasets are doubling every couple of months, and the networks themselves in terms of their size and complexity also continue to grow. There are now networks out there that have hundreds of billions of parameters. We're heading towards a world where there will be a trillion-plus parameters, even tens of trillions of parameters… How do we put those to work? The short answer is you need a ton of compute because first of all, you have to be able to train it all in a reasonable amount of time. And then once you deploy, it also needs to run in a way that is responsive.”

Meanwhile, Qualcomm again performed well, particularly with the AI 100 in the edge category, where it beat all comers on the ResNet-50 single-stream benchmark. Also, an increasinging number of Qualcomm partners, such as Foxconn and Inventec, submitted for benchmarks, worth noting because Nvidia always has had a strong MLPerf showing through its large numbers of partner submissions.

Karl Freund, Founder and Principal Analyst, Cambrian-AI Research LLC , noted in a Forbes post that Qualcomm’s showing in power efficiency benchmarks should help its sales prospects, adding, “the Qualcomm platform is still the most energy efficient AI accelerator in the industry for image processing.”

In addition to Nvidia and Qualcomm, among the other MLPerf submitters were Alibaba, ASUSTeK, Azure, Biren, Dell, Fujitsu, GIGABYTE, H3C, HPE, Inspur, Intel, Krai, Lenovo, Moffett, Nettrix, Neural Magic, OctoML, SAPEON, and Supermicro.

Like Nvidia, Intel had a submission in the preview category – its next-generation Xeon CPU, which has been going by its codename – Sapphire Rapids. Intel recent has talked a lot about how Sapphire Rapids will be a major accelerator for AI in the data center, though the next-generation product has been delayed multiple times and is not due out until next year.