Oracle Cloud deploying 'tens of thousands' of Nvidia GPUs

Oracle announced at its CloudWorld conference this week that it is adding “tens of thousands” of Nvidia GPUs to the Oracle Cloud Infrastructure to increase support for a growing array of AI training and deep learning inference needs.

The GPU deployment, which will involve both Nvidia A100 GPUs and the upcoming H100 GPUs leveraging the new Hopper architecture, is at the center of a newly expanded partnership between the two companies. The partnership also calls for  the OCI to eventually feature more Nvidia offerings, including the full Nvidia AI platform and AI Enterprise software, RAPIDS GPU acceleration software libraries, and the Clara healthcare AI and HPC application framework for medical imaging, genomics, natural language processing and drug discovery.

Leo Leung, vice president of OCI and Oracle technology, said during a briefing on the announcement, “There has been a tremendous demand for GPUs, as well as GPU clusters across specific industries. And customers are really basically leveraging our technologies like a supercomputer. They've been able to build AI supercomputers in the cloud, so we're going to expand the investment there.”

Officials from Nvidia and Oracle said that the addition of tens of thousands of GPUs means that individual customers will be able to access clusters of up to 512 GPUs. “We've done a lot of work to build the ecosystem around the processing power,” Leung said. “And that includes the ability to cluster these GPU machines in large clusters… very much the ability to pull all these machines together into a server. Within that, as well, we're able to provide bare metal, no virtualization, and nothing getting in the way between the customer and what they're trying to do with the infrastructure.”

Beyond the GPUs, Pat Lee, head of strategic enterprises at Nvidia, said bringing the full Nvidia AI stack to OCI also is significant. “We see great opportunity for bringing rich software to take advantage of the accelerated computing infrastructure of the GPUs because it's the full stack required to deliver what customers need for AI.”

Regarding Nvidia RAPIDS, Oracle is now offering early access to RAPIDS-driven acceleration for Apache Spark data processing on the OCI Data Flow fully managed Apache Spark service. That service gives customers the ability to manage huge data sets without requiring a huge infrastructure investment. Leung said that using RAPIDS to accelerate Spark processing means that there will be no code changes for developers. “It will just magically get faster for them,” he said.

In addition, he said, incorporating the Clara framework means that PCI will be able to deliver faster imaging and scanning results in medical use cases, and accelerate work with genomics to help increased the speed of new drug discovery.