Nvidia, AWS re:Invent partnership with generative AI in mind

By Dan O'Shea Nov 28, 2023 2:48pm

Nvidia and Amazon Web Services this week issued an array of announcements at the AWS re:Invent conference that will serve to drive the ongoing partnership between the companies deeper into the world of generative AI and large language models.

These announcements, which include the addition of Nvidia GH200 Grace Hopper Superchips to the Amazon Elastic Compute Cloud (Amazon EC2), come after an initial series of moves by Nvidia and AWS last spring to upgrade Amazon EC2 performance to address the rapidly exploding demands of AI.

The new aspects coming to the decade long collaboration between the two companies, which were announced live on-stage at re:Invent by AWS CEO Adam Selipsky and “surprise guest” Nvidia CEO and founder Jensen Huang, include the following:

AWS will be the first cloud provider to bring Nvidia’s GH200 Superchips with new multi-node NVLink technology to the cloud. The Nvidia GH200 NVL32 multi-node platform connects 32 GH200 Superchips with NVLink and NVSwitch on-node switch architecture technologies into one instance in a liquid-cooled rack architecture. Joint customers of the partners will be able to scale to thousands of GH200 Superchips through Amazon EC2 instances connected with Amazon’s Elastic Fabric Adapter (EFA) network interface, supported by AWS Nitro System advanced virtualization and Amazon EC2 UltraClusters hyper-scale clustering technology.

The GH200 NVL32 platform features 4.5 TB of HBM3e memory—a 7.2x increase compared to current generation H100-powered EC2 P5d instances—allowing customers to run larger models, while improving training performance. Additionally, CPU-to-GPU memory interconnect provides up to 7x higher bandwidth than PCIe, enabling chip-to-chip communications that extend the total memory available for applications.

The partners also will host Nvidia’s DGX Cloud AI-training-as-a-service platform–its so-called “AI factory in the cloud”--on AWS. It will be the first DGX Cloud featuring GH200 NVL32.

In a media briefing prior to the on-stage unveiling, Ian Buck, vice president of hyperscale and HPC at Nvidia, described the significance that platform, saying each of the 32 GPUs is “a 600-gigabyte GPU, effectively, for a total of 20 terabytes of fast memory across the entire rack. It's an amazingly powerful architecture, with over 57.6 terabytes per second memory bandwidth and a total 120 petaflops of AI just in one rack alone The benefit of connecting those 32 GPUs together is performance and cost. Both are achievable. By connecting GPUs together and building a larger, super GPU, we can train trillion-parameter large language models with 2x less cost, and we can double the inference performance of giant large language models.”

David Brown, vice president of AWS compute and networking services, added, “This is not the normal DGX that you've seen to date. This is a version of DGX Cloud that we will be innovating on together with Nvidia to actually use the GH200 with multi-node NVLink to bring our customers the next generation of large language model training and generative AI performance. We’re very excited about it.”

Another bit of teamwork between Nvidia and AWS is Project Ceiba, which aims to build the world’s fastest GPU-powered AI supercomputer–an at-scale system with GH200 NVL32 and Amazon EFA interconnect hosted by AWS for Nvidia’s own R&D team. Featuring 16,384 GH200 Superchips and capable of processing 65 exaflops of AI, the result is being touted as a “first of its kind” supercomputer that will be used by Nvidia to propel its next wave of generative AI innovation.

AWS also will introduce three additional new Amazon EC2 instances: P5e instances, powered by H200 Tensor Core GPUs–Nvidia’s latest generation–for large-scale and cutting-edge generative AI and HPC workloads; and G6 and G6e instances, powered by Nvidia L4 GPUs and L40S GPUs, respectively, for applications like AI fine-tuning, inference, graphics and video workloads. G6e instances are particularly suitable for developing 3D workflows, digital twins and other applications using Nvidia Omniverse, a platform for connecting and building generative AI-enabled 3D applications.

Nvidia also announced NeMo Retriever, a generative AI microservice with retrieval-augmented generation (RAG) capabilities that lets enterprises connect custom large language models to enterprise data to deliver highly accurate responses for their AI applications. It is now part of the NVIDIA AI Enterprise software platform, available in AWS Marketplace.

AWS also announced a team of AWS scientists and developers creating Amazon Titan foundation models for Amazon Bedrock, a generative AI service for foundation models, has been using Nvidia’s NeMo for over the past several months.

In addition to these and a few other announcements, AWS and Nvidia also talked about some of the other ways in which the partners have been working together to infuse Amazon online buyer and seller tools with generative AI.

For example, Buck described a feature allowing users to “provide a simple product description and using this tool Amazon can automatically generate a compelling Amazon catalog description to engage users.” The capability relies on H100 GPUs and Nvidia’s NeMo LLM software.

Brown added, “Of 600 million products [listed on Amazon] half of them are actually third-party sellers, so it's not Amazon actually products, and those sellers have to put in their own descriptions and photographs in their listings. Using generative AI to improve that not only makes their products look more appealing, but actually improves the end user customer experience for folks shopping on Amazon.com as well.”

AWS NVIDIA Generative AI GPU Electronics Sensors Embedded