AI

AMD GPUs join Nvidia's in Oracle Cloud Infrastructure

Oracle is snapping up new GPUs as fast as its chip giant partners can produce them. In the company’s latest move, it is putting thousands of AMD Instinct MI300X GPUs into Oracle Cloud Infrastructure (OCI) Compute bare metal instances.

OCI Supercluster with AMD Instinct MI300X accelerators offers a high-throughput, ultra-low latency RDMA cluster network architecture for up to 16,384 MI300X GPUs, according to statements from Oracle and AMD. With 192GB of memory capacity per accelerator, AMD Instinct MI300X can run a 66-billion parameter Hugging Face OPT transformer large language model (LLM) on a single GPU, the companies claimed.

The deployment news follows extensive performance testing of the MI300X that Oracle said it conducted earlier this year. Back in May, Microsoft became the first cloud provider to deploy the new AMD GPUs, and at the time AMD told Fierce Electronics that Oracle was among the other companies “gearing up” to deploy the MI300X.

“The inference capabilities of AMD Instinct MI300X accelerators add to OCI’s extensive selection of high-performance bare metal instances to remove the overhead of virtualized compute commonly used for AI infrastructure,” said Donald Lu, senior vice president, software development, Oracle Cloud Infrastructure. “We are excited to offer more choice for customers seeking to accelerate AI workloads at a competitive price point.”

The announcement comes just a couple of weeks after Oracle announced availability of its “first zettascale” OCI Supercluster, powered by Nvidia’s Blackwell GPUs, and with the ability to scale up to 131,072 Blackwell GPUs with Nvidia’s ConnectX-7 NICs for RoCEv2 or Quantum-2 InfiniBand networking, supporting 2.4 zettaflops for AI computing. The OCI has used previous models of both Nvidia GPUs and AMD CPUs in the past, and Oracle Founder Larry Ellison has suggested over the past year or so that his company is increasing its spending with both suppliers.