Nvidia, Google celebrate MLPerf performance

By Dan O'Shea Jun 30, 2021 1:01pm

The latest MLPerf benchmark testing results are out, and Nvidia and Google each are separately touting the performance of their processors in AI training scenarios.

But of possibly greater significance than the test results themselves is the claim by Paresh Kharya, senior director of product management and marketing at Nvidia, that “MLPerf is starting to show up in RFI criteria of organizations that are choosing their AI infrastructure.”

As for Nvidia’s performance in the latest round of benchmarks, Kharya said the seven of the company’s partners submitted at least 12 commercially available systems for benchmarking, and delivered up 3.5x greater performance than last year’s scores. The results show that systems using Nvidia’s A100 Tensor Core GPUs can train AI faster than systems using other companies’ processors, he said.

In addition, the Selene supercomputer, based on Nvidia’s DGX SuperPOD architecture, set performance records across all eight benchmarks in the commercially available submissions category.

Overall, submissions by Nvidia and its partners--Dell, Fujitsu, GIGABYTE, Inspur, Lenovo, Nettrix and Supermicro--accounted for more than three-quarters of all MLPerf submissions, and ran all eight workloads that were subject to benchmarking.

Past MLPerf benchmarks have been dominated by Nvidia and its partners, and seen a lower degree of participation from other companies, which has led to questions about the ultimate value of MLPerf to prospective customers. For example, an executive from a semiconductor company that hasn’t participated in MLPerf recently told Fierce Electronics that MLPerf doesn’t adequately address the wide variety of AI use cases or reflect how AI processors function in a broader system scenario.

Kharya said AI use cases are indeed growing quickly in number and variety, but that they still can be adequately represented in a handful of applications used for benchmarking. He added, referring to his earlier comment about RFIs, that “the important thing to note from all this is that MLPerf is now getting adopted by customers.”

Google done good

It’s also true that Nvidia and its partners are not the only companies that made strong showings in the latest MLPerf results. Google in a blog post touted the performance of its TPU v4 supercomputers, submitted in preview version.

Google said its TPU v4 Pod was designed, in part, to satisfy expansive AI training needs, “and TPU v4 Pods set performance records in four of the six MLPerf benchmarks Google entered using TensorFlow and JAX [software].” Google also said that TPU v4 Pods, already deployed in Google data centers, will be available via Google Cloud later this year.

Regardng the overall results, Karl Freund, principal analyst at research firm Cambrian AI, said by email, "Nvidia won once again on a chip-to-chip and cluster-to-cluster basis. However, Google TPUv4, available later this year, and Graphcore also demonstrated excellent performance, scalability, and price/performance. The competition is heating up, but Nvidia's advantage in broad acceptance and breadth of software will keep them in the lead for the near future." He also noted that with 3.5x performance improvement over last year, Nvidia is not exactly "standing still."

artificial intelligence machine learning Google MLPerf NVIDIA Embedded Sensors IoT & Wireless