Nvidia breaks AI speed records again with shipping chips

Nvidia broke 16 records for AI performance with its A100 GPUs and DGX supercomputers based on the latest round of benchmark tests conducted by MLPerf and revealed Wednesday.

For eight difference benchmarks, the A100 TensorCore GPU performed the fastest, while a massive cluster of more than 2,000 A100 systems in Nvidia’s DGX SuperPOD system connected with HDR Infiniband also set eight performance milestones.

Previously, Nvidia set six benchmark records in December 2018 and eight in July 2019.  ML Perf is an industry benchmarking consortium of more than 70 companies and researchers created in 2018.  In this latest round, Nvidia provided the only commercially available products for testing. The full MLPerf spreadsheet of results is available online with a separate short press release.

Nvidia has been touting its commercially-available AI capabilities in recent months.  On Tuesday, auto supplier Continental said it has been using since early 2020 more than 50 networked Nvidia DGX units for work on simulation and deep learning required for use in development of assisted, automated and autonomous driving. The simulations reduce the need for actual road tests. Continental called it the “fastest supercomputer in the auto industry” based on the TOP500 supercomputer list.

Independent analyst Karl Freund at Moor Insights & Strategy noted via email that no Nvidia competitor is publishing its benchmark results, at least not yet. “Nobody can compete with Nvidia, even after two to three years of expectations,” he said.

In the MLPerf benchmarks, a supercomputer or GPU (such as the A100s from Nvidia) is judged by how fast it can train different models to a set metric.  Image classification, object detection, translation, recommendation and reinforcement learning are among the training tasks. With reinforcement learning in the latest benchmark round, a full sized MiniGo game was used.  Under the rules, AI was used to train software agents to rival humans at playing the game with a 50% win rate. 

Reinforcement learning requires an AI program to both learn from its experience through inference while also training itself for future moves and games.  The AI generates training data through exploration instead of relying on a predetermined data set.  The training uses self-play between agents to generate data.  The latest MLPerf benchmark used a full-sized Go board, which increased complexity.

Using reinforcement learning, engineers can create a broad array of applications for robotics and optimization tasks.  In one example, an industrial robot can be trained to work alongside humans in a factory or other setting. In that sense, there’s not any pre-existing data set on which to train an AI robot, so the robot learns from its various encounters, perhaps even training from a video simulation of future work.  In similar fashion, reinforcement learning can be applied to optimizing dozens or hundreds of traffic signals working as a system to lessen congestion.

RELATED: BMW and Nvidia train logistics robots with AI to move car parts

“With reinforcement learning with MiniGo, there was a lot of inference and training going on back and forth between the two,” said Paresh Kharya, senior director of product management at Nvidia, during an online interview with reporters. He wrote a separate blog describing the MLPerf results.

Kharya said Nvidia was able to perform all the benchmarks under 18 minutes, which was 3.5 times faster than playing MiniGo before.   Nvidia’s A100 chips can be used for both inference and training work, and in the MiniGo example a group of A100s were used to handle training tasks while another group of the chips were used to handle inference.

“Delivering exceptional performance on AI is really hard,” Kharya said.  Successful AI works requires more than custom silicon, but also a broad ecosystem of software and components. 
“Nvidia has been investing billions and working on this for almost a decade.”

In the latest benchmark, no other companies submitted commercially-available chips. They included Huawei Ascent and Google TPUv3. The latest MLPerf found added two new tests for recommendation systems and conversational AI using BERT, a neural network model.

Nvidia took MLPerf results as an opportunity to mention customer use cases, including Alibaba, which used Nvidia GPUs to support its recommendation system for $38 billion in sales on Singles Day last November. 

Only nine companies submitted results from the MLPerf results and seven used Nvidia GPUs, including Alibaba Cloud, Google Cloud, Tencent Cloud and servers makers Dell, Funitsu and Inspur.  Separately, Google Cloud, Intel and the Shenzhen Institutes of Advanced Technology offered submissions.

Nvidia’s performance on the MLPerf benchmarks is important, said Freund, the senior analyst at Moor Insights & Strategy.  “These benchmarks demonstrate to all buyers that Nividia is the fastest and also the fastest at every popular AI task, which should help shorten purchase cycle time,” he said. “Cost is important, but developers need to train neural networks in hours, not days.  As Facebook once said, there are three key buying criteria for AI accelerators: performance, performance and performance.”