Nvidia ramps up Hopper, AI large language model service

Nvidia likes to describe modern enterprise data centers as “AI factories” that have a seemingly insatiable appetite for computing power, and the company keeps feeding the beast. At its GTC Fall event this week, Nvidia founder and CEO Jensen Huang announced that the firm's Hopper H100 Tensor Core GPU, its latest product aimed at helping those AI factories keep producing, has entered full production, with such as Amazon Web Services, Google, Microsoft Azure, HP, Dell, and Cisco Systems, set to offer in coming months their own products and services that will leverage the new chip architecture.

The Hopper architecture was unveiled last spring at Nvidia’s GTC Spring event, and its main aim was to do more with less, deliver 3.5 times the power efficiency of Nvidia’s previous generation A100 GPUs by using five times fewer server nodes, resulting in a total cost of ownership that is three times lower than the previous generation. The Hopper architecture leverages Nvidia’s Transformer Engine technology and NVLink interconnect capability to accomplish this.

Enterprises also will be able to buy a DGX H100 platform that includes eight H100 systems delivering 32 petaflops of performance. That kind of muscle is needed to handle ever larger and more complex AI frameworks and models, including large language models, chatbots, vision AI, AI-powered recommendation engines, and more, Nvidia officials said.

Initially, H100 server products will be available from Dell and others as early as next month, but because AI consumption is increasingly migrating to the cloud, the cloud giants mentioned above will start rolling out H100 capabilities early next year. H100 systems also will be used in several supercomputing centers, such as Barcelona Supercomputing Center, Los Alamos National Lab, Swiss National Supercomputing Centre (CSCS), Texas Advanced Computing Center and the University of Tsukuba.

Large language models

As mentioned above, large language models (LLMs) will be one of the chief AI use cases for the H100. Traditionally these models are used to absorb and translate information that enables chatbots and other AI-based applications, but they increasingly are being used in more complex applications, such as automated code generation. 

“We see large language models exploding everywhere,” said Ian Buck, general manager and vice president of accelerated computing at Nvidia. “They are being used for things outside of human language like coding and helping software developers write software faster, more efficiently, and with fewer errors…. We’re also seeing these models applied to the language of chemistry and biology to predict the properties of materials or for drugs. Hopper was explicitly designed to help accelerate these kinds of models.”

But Nvidia is not limiting its support for LLM workloads to the chip level. The company also announced at GTC Fall that as early as next month, Nvidia will offer early access to two cloud-based LLM services – the Nvidia NeMo Large Language Model Service and the Nvidia BioNeMo LLM Service.

These platforms essentially will give developers access via the cloud to LLM like Megatron 530B, one of the world’s largest LLMs, and allow them to customize and train their own LLM platforms to speed development of applications like content generation, text summarization, chatbots, code development, as well as protein structure and biomolecular property predictions, and more.

“Large language models hold the potential to transform every industry,” said Jensen Huang, founder and CEO of NVIDIA. “The ability to tune foundation models puts the power of LLMs within reach of millions of developers who can now create language services and power scientific discoveries without needing to build a massive model from scratch.”

This is just a sampling of news and product unveilings from GTC Fall. Nvidia also announced the new Thor system-on-a-chip for the automotive market. Look for other stories to follow detailing more news from the event.