Generative AI and large language model applications and high-performance computing workloads seem to know no limits. They continue to raise the bar for processors in terms of memory and performance, so the semiconductor players need to keep pushing past their won previous limits.
Nvidia announced at the SC23 supercomputing conference in Denver that its latest answer to these increasing demands is the HGX H200 platform. Based on Nvidia’s Hopper architecture and featuring the H200 Tensor Core GPU, it is the first GPU to support HBM3e memory. It can deliver 141GB of memory at 4.8 terabytes per second, nearly double the capacity and 2.4x more bandwidth compared with its predecessor, the Nvidia A100, according to Nvidia officials who briefed media prior to this week’s event.
The support for HBM3e in the HGX H200 comes after Nvidia earlier this year said it would support HBM3e on its GH200 Grace Hopper Superchip.
“The integration of faster and more extensive memory will dramatically improve and accelerate HPC and AI applications,” said Dion Harris, director of accelerated data center products and solutions at Nvidia. “The reason we're doing this should come as no surprise when you look at sort of what's happening in the market. Model sizes are rapidly expanding, demanding increased computational power, as well as overall faster and stronger memory subsystems.”
He added that the new platform is expected to lead to greater performance in dealing with massive large language models (LLM), including nearly doubling inference speed on Llama 2, a 70 billion-parameter LLM, compared to the H100. Additional performance leadership and improvements with H200 are expected with future software updates.
Harris said Nvidia is lining up server manufacturer partners and cloud service providers that will be able to obtain the platform starting in the second quarter of 2024. It will be suitable for all types of data centers, including on premises, cloud, hybrid-cloud and edge, according to Nvidia, which said that current server partners such as ASRock Rack, ASUS, Dell Technologies, Eviden, GIGABYTE, Hewlett Packard Enterprise, Ingrasys, Lenovo, QCT, Supermicro, Wistron, and Wiwynn can update their existing systems with an H200. Initial cloud providers expected to deploy H200 instances next year include Amazon Web Services, Google Cloud, Microsoft Azure and Oracle Cloud Infrastructure, along with CoreWeave, Lambda and Vultr.
The H200 will be available in Nvidia HGX H200 server boards with four- and eight-way configurations, which are compatible with both the hardware and software of HGX H100 systems, leveraging software such as the open-source Nvidia TensorRT-LLM software library and others. An eight-way configuration would provide more than 32 petaflops of FP8 deep learning compute and 1.1TB of aggregate high-bandwidth memory, the company said.
Going to Jupiter
The H200 unveiling is not Nvidia’s only big piece of news from SC23. The company also announced that its GH200 Grace Hopper Superchip will be deployed in a node
Configuration as part of the Jupiter supercomputer system hosted at the Forschungszentrum Jülich facility in Germany. The development of that system has been contracted to Eviden and ParTec, and is being built in collaboration with Nvidia and SiPearl with the aim of accelerating the creation of foundational AI models in climate and weather research, material science, drug discovery, industrial engineering and quantum computing.
This GH200 Grace Hopper Superchip node configuration will be based on Eviden’s BullSequana XH3000 liquid-cooled architecture, with a booster module comprising close to 24,000 Nvidia GH200 Superchips interconnected with Nvidia’s Quantum-2 InfiniBand networking platform. Jupiter aims to deliver more than 90 exaflops of performance for AI training, about 45x more than Jülich’s previous JUWELS Booster system, along with 1 exaflop for HPC applications, while consuming only 18.2 megawatts of power. The quad GH200 features a node architecture of 288 Arm Neoverse cores capable of achieving 16 petaflops of AI performance using up to 2.3 terabytes of high-speed memory. Four GH200s are networked through a high-speed Nvidia NVLink connection.
Harris touted Jupiter’s exascale performance and software ecosystem and what that will provide for users, but also said Jupiter will be a groundbreaking system in another way. “What's really interesting and unique is this really is a pivotal point in terms of how supercomputers are being designed,” he said. “It's no longer just being designed with the sense of how can I build it to optimize a top 500 run [a reference to making the list of top 500 supercomputers]? It's thinking, ‘How can I get real applications out of this?’ It’s, ‘How can I treat AI as a first class citizen in terms of how I design and develop these systems to take advantage and fully leverage of this capability?”