In less than a year, generative AI has become a dominant influence in enterprise computing, so processor innovations need to move quickly, too. Less than three months after announcing its new GH200 Superchip, basis for its DGX GH200 supercomputing system, Nvidia already is giving the GH200 “a boost,” according to Nvidia co-founder, president, and CEO Jensen Huang, who at SIGGRAPH 2023 this week unveiled a “next-generation” version of the chip with more robust memory capabilities.
Setting the stage for the announcement, Huang was his typical expansive and philosophical self, saying during the keynote, “After 12 years of working on artificial intelligence, something gigantic happened: The generative AI era is upon us, the iPhone moment of AI, if you will, where all of the technologies of artificial intelligence came together in such a way that it is now possible for us to enjoy AI in so many different applications. The revolutionary transformer model allows us to learn from a large amount of data across large spans of space and time to find patterns and relationships to learn the representation of almost anything with structure… we can generate almost anything that we can learn from structure, and we can guide it with our human natural language.”
He added, “Billions of dollars are being invested into companies in just about every single domain, and every single industry that are pursuing ideas based on generative AI.”
Many of these developments driven by ever-larger AI models, are going to require ever-greater memory performance.
While the Nvidia GH200 Grace Hopper Superchip announced back in May at Computex and now in production has an HBM3 memory processor, the latest version, due to be in production in the second quarter of 2024, has the faster HBM3e processor. Huang said in a keynote speech at SIGGRAPH, “We're gonna give this processor a boost with the world's fastest memory called HBM3e.”
HBM3e (the "e" stands for "evolutionary") follows on HBM3 Gen 2 technology, which several companies are advancing on, and which is believed to be headed to wide adoption in the coming years to address the memory needs of growing AI models. The Nvidia announcement in fact comes just a couple of weeks after Micron Technology announced an HBM3 Gen 2 memory update, an unveiling which drew a quote from Ian Buck, vice president of Hyperscale and HPC Computing at Nvidia, who said at the time, “We have a long history of collaborating with Micron across a wide range of products and are eager to be working with them on HBM3 Gen2 to supercharge AI innovation.”
Asked if Nvidia is working with Micron directly to enable HBM3e in the next-generation GH200, a company spokesperson said via email, "We partner closely with leading memory suppliers. We’re not disclosing specific vendors today."
Huang stated that HBM3e memory is 50% faster than current HBM3, delivering a total of 10TB/sec of combined bandwidth, allowing the new GH200 platform, with 282GB of HBM3e memory, to run models 3.5x larger than the previous version, while improving performance with 3x faster memory bandwidth.
The GH200 also can be connected with additional Superchips by NVIDIA NVLink, allowing them to work together to deploy the giant models used for generative AI. This high-speed, coherent technology gives the GPU full access to the CPU memory, providing a combined 1.2TB of fast memory when in dual configuration, according to a company statement. The new version is fully compatible with the NVIDIA MGX server specification also unveiled in May at Computex, meaning it can be easily added by manufacturers to many different server variations.