Nvidia touts latest MLPerf AI inference results, while the rest of the sector remains silent

The latest MLPerf benchmark results are out, and as has become typical Nvidia dominated the AI inference test program, while participation from other companies lagged, leaving it difficult to use the information to make direct comparison between competitive products. Even participants that had some positive results worth touting, such as Qualcomm, chose not to heavily promote their performance to analysts and media.

By contrast, Nvidia gathered a large number of analysts and media members on a Microsoft Teams presentation for a lengthy, detailed overview of its performance in the recent MLPerf 2.0 program. Across the board, Nvidia and its partners, such as Microsoft Azure, submitted more products for more individual benchmark tests than any other company.

Dave Salvator, senior manager for product marketing with Nvidia’s Accelerated Computing group, told the group that the company’s low-power Orin system-on-chip based on its Ampere architecture and available in the Nvidia Jetson AGX Orin developer kit was a particular highlight.

In edge AI, a pre-production version of Orin led in five of six performance tests run during the MLPerf program. “It made quite a splash in terms of the results,” Salvator said. “It's delivering ML right up to 5x more performance on inference, and a little bit over 2x in terms of energy efficiency. It ran up to 5x faster than our previous generation Jetson AGX Xavier, while delivering an average of 2x better energy efficiency.”

He added that performance criteria such as latency and power efficiency are becoming increasingly important. “A latency target is often used in real time applications where you have to not only get the right answer, but you have to get the right answer right now. Suddenly, acceleration goes from being a nice-to-have to being a must-have. As we head into the future, I believe that actually acceleration for AI inference essentially becomes table stakes in order to run these larger workloads in real time. If you’ve got a latency target, whatever throughput you generate doesn't really mean anything if you can't generate it in the targeted latency.” 

In regard to power efficiency, Qualcomm’s AI 100 beat Nvidia submissions for performance per watt in the RESNET-50 and SSD-Large benchmark categories for data center inference, which Salvator acknowledged. Still, Nvidia beat Qualcomm for overall performance in these two categories, and in the other four data center categories, although Qualcomm did not have submissions in some categories, and neither did any other companies.

And that has been the knock against MLPerf. While the number of submissions and submitters has been rising through the MLPerf benchmarking rounds that have taken place over the last year (MLPerf 1.0, 1.1 and now 2.0), there are many categories that feature a lack of competition. 

During the Q&A portion of Nvidia’s presentation, Karl Freund, founder and principal analyst at Cambrian-AI Research, reiterated the criticism of MLPerf that has stated on earlier Nvidia’s call touting MLPerf results.

“We're seeing less and less participation by chip vendors and more and more participation by system vendors, which is good,” he said, before adding, “It's time to just be plain: Where the hell is everybody? I mean, even Intel CPUs didn't submit… Nobody is submitting.” 

He then asked Salvator, “What do you think that foretells about the future of MLPerf?”

Salvator answered at length, “I've been around the benchmarking world for close to 30 years, and… industry standard benchmarks [have] always been important to our industry across multiple domains, because they become a common measuring stick against which you can directly compare platforms. One of the problems we've seen over the last several years really since the inception of deep learning… is that the results tend to be posted in sort of Wild West fashion, which is to say they're not really comparable.”

For example, Salvator said that in such cases one company may claim a particular performance level on something like a RESNET-50 workload, while another company claims to beat that performance, but neither company discloses details of how their scores were achieved. “What MLPerf does is it provides a common yardstick against which platforms can be directly compared, which has huge value for customers”

More results and details about the testing for data center, edge, mobile and TinyML inference can be found through the following links:

https://mlcommons.org/en/inference-datacenter-20/,

https://mlcommons.org/en/inference-edge-20/,

https://mlcommons.org/en/inference-mobile-20/ https://www.mlcommons.org/en/inference-tiny-07/

RELATED: Nvidia scores big at MLPerf, but where’s the competition