Nvidia is changing itself into a full-stack computing force

Is Nvidia still a chip company?

A brand new GPU architecture and a CPU “Superchip” unveiled by the firm at its GTC Spring event this week may resoundingly affirm that it is, but the drivers behind this unveiling and a long list of other announcements, including software innovations, that Nvidia rattled off at GTC suggest that the company is transforming from a fabless chip designer into much more of a full-stack computing company addressing every angle of high-performance computing and AI adoption across multiple industries.

“Nvidia is well past being a chip supplier and is becoming a full-service system provider, including offering several important software platforms,” said Jack Gold, principal analyst at J. Gold Associates. “It’s moving from components to solutions, based on its heavy emphasis on highly parallel systems, high speed connectivity, AI/ML expertise, and solutions-centric designs for various workloads.”

AI, and its roles in industrial enterprise applications in the real world and the emerging metaverse, continues to be Nvidia’s primary muse. As Nvidia CEO Jensen Huang said in his GTC keynote speech, data centers essentially are becoming “AI factories” with massive computing resources dedicated to AI training models.

Paresh Kharya, senior director of product management and marketing for accelerated computing at Nvidia, said during a briefing on the GTC announcements that the rise of Transformer AI models, which are critical building blocks involved in deep learning for neural networks, are rapidly growing in size and complexity beyond what current processors can efficiently handle.

“The computing requirements to train large transformer models has been exploding,” he said. “Training these giant models still takes months. Even on one of the world's fastest AI supercomputers–Nvidia’s Selene–training the Megatron 530 model [Megatron-Turing NLG 530B, the largest natural language processing training model, at 530 billion parameters] would take one and a half months.”

Kharya added, “A key challenge to reducing the time to train is the performance gains start to decline as you increase the number of GPUs in a data center. Wouldn't it be great if you can innovate to both dramatically increase the performance at smaller scale, as well as continue to scale up the performance over 1000s of GPUs?”

Hopper Pops Up

That’s what the company’s new Hopper GPU architecture, the successor to its two-year-old Ampere GPU architecture, proposes to do. It is launching with a new chip, the 80-billion transistor H100, that employs a new Transformer Engine and Nvidia NVLink interconnection capabilities that Kharya said accelerate Transformer model network functions by 6x without sacrificing training accuracy. 

The H100, built using TSMC’s 4N process, supports nearly 5 terabytes per second of external connectivity, and Nvidia claims it is the first GPU to support PCIe Gen5 and HBM3, enabling 3TBps of memory bandwidth. Twenty H100 GPUs can sustain the equivalent of the entire world’s internet traffic, the company claimed.

The Hopper architecture also supports: 

  • 2nd-Generation Secure Multi-Instance GPU technology, which allows a single GPU to be partitioned into seven smaller, fully isolated instances to handle different types of jobs.

  • Confidential Computing, which protects AI models and customer data while they are being processed, and also can be applied to federated learning for privacy-sensitive industries like healthcare and financial services, as well as on shared cloud infrastructures. 

  • 4th-Generation NVIDIA NVLink, combining a new external NVLink Switch to extend NVLink as a scale-up network beyond the server, connecting up to 256 H100 GPUs at 9x higher bandwidth–900 Gbps–versus the previous generation using NVIDIA HDR Quantum InfiniBand. 

  • DPX Instructions to accelerate dynamic programming, which is used in a broad range of algorithms, including route optimization and genomics, by up to 40x compared with CPUs and up to 7x compared with previous-generation GPUs. 

The Confidential Computing capability caught Gold’s eye. He observed, “CC provides for data to be protected by enabling all processing within a trusted execution environment. This is a major step forward as CC was previously offered only on CPUs. With the increased amount of AI being run with sensitive data (e.g., personally identifiable data), the need to assure security and privacy means organizations should all be adopting CC as a way to protect against hacking and data exfiltration. CC will be a required capability in many systems in the near future.”

Nvidia said the H100, to be available later in the third quarter, can be deployed in all types of data center, including on-premises, cloud, hybrid-cloud and edge. 

The H100 also is the basis for Nvidia’s fourth-generation DGX system, DGX H100, which features eight H100 GPUs providing 32 petaflops of AI performance at new FP8 precision. The company’s NVSwitch allows all eight of the H100 GPUs to connect over NVLink, and an external NVLink Switch can network up to 32 DGX H100 nodes in the next-generation Nvidia DGX SuperPOD supercomputers.

Then There’s Grace

Hopper extends Nvidia’s penchant for naming products after computing pioneer Grace Hopper, and comes a year after Nvidia unveiled the Grace CPU. While Hopper took top billing at GTC Spring this year, Grace was not to be left too far behind. Nvidia unveiled the Grace CPU Superchip, which it said leverages the Arm Neoverse-based discrete data center CPU architecture, and comprises two CPU chips connected, coherently, over NVLink-C2C, a new high-speed, low-latency, chip-to-chip interconnect.

The Grace CPU Superchip complements the CPU-GPU integrated module, the Grace Hopper Superchip, announced last year, the company saidBoth superchips share the same underlying CPU architecture, as well as the NVLink-C2C interconnect.

Gold said it’s important for Nvidia to keep innovating in both CPU and GPU arenas. “Both have their place in the computing world,” he said. “GPUs are very good at highly parallel processing (like graphics, AI, ML, etc.) while not very good at running non-parallel general purpose processing functions (e.g., simple data analysis, mathematical functions, things like word processing, databases, etc.). So you really need both CPU and GPU for most workloads, and workloads are often optimized for a particular CPU or GPU capability.”

Any company that claims to have a full-service computing platform really needs to have both processor types, he added.  

But Wait, There’s More

As Gold indicated, Nvidia is increasingly demonstrating its capabilities in the software realm, and this week released more than 60 updates to its CUDA-X collection of libraries, tools and technologies to accelerate work in quantum computing and 6G research, cybersecurity, genomics and drug discovery. 

Nvidia also announced updates to its AI software suite to improve support for worloads around speech, recommender system, hyperscale inference and more. The company also announced the Nvidia AI Accelerated program, which helps to ensure performance and reliability of AI applications developed by Nvidia’s software and solution partners. Adobe, Red Hat and VMware are among the more than 100 partners participating at launch.

“Nvidia AI is the software toolbox of the world’s AI community — from AI researchers and data scientists, to data and machine learning operations teams,” said Huang, in a statement. “Our GTC 2022 release is massive. Whether it’s creating more engaging chatbots and virtual assistants, building smarter recommendations to help consumers make better purchasing decisions, or orchestrating AI services at the largest scales, your superpowered gem is in Nvidia AI.” 

Expanding Omniverse

Finally, but not lastly (Seriously, Nvidia announced a lot more at GTC, some of which we’ll get into in other stories this week). Nvidia made several announcements expanding the capabilities of its Omniverse platform, which it has been positioning as the development foundation for metaverses and related applications like digital twins. Key among the latest news is Omniverse Cloud, which puts the platform capabilities in the cloud to make them more widely available.

“This is something that's going to allow people that don't have an RTX based system to have full access to Omniverse and run it as if it was local,” said Richard Kerris, vice president of the Omniverse platform at Nvidia. “That means whether you're running a Chromebook or Mac or tablet or any other type of device that you connect to the cloud service, you'll have access to running Omniverse and be able to work with other people using the platform locally or globally. This is an answer to a huge demand that we've had from a number of customers that have really wanted access to this groundbreaking platform, but have been limited because of the platform they're on or the access that they have to RTX solutions. So this is going to bring Omniverse to everyone.”

Gold said Omniverse Cloud should be yet another proof point for Nvidia’s claim that it is becoming more than just a chip company.

“Omninverse is a major initiative for them to move from a chip company to a full scale platform company delivering a complete computing function to enable multiple workloads,” he said. “Components are great, but solutions are better – and that is where they want to go.”