Meta adds more muscle from Nvidia GPUs, Azure cloud to target AI

Facebook parent Meta, which earlier this year unveiled a project to build a massive AI Research SuperCluster stuffed with thousands of Nvidia GPUs, this week announced it is expanding its use of Nvidia GPUs to include the creation of an additional cloud-based supercomputer cluster through Microsoft Azure.

“Meta is expanding their consumption of our technology,” according to an email to Fierce Electronics from a company spokeswoman. “In addition to our announcement earlier this year that they’re building one of the world’s largest DGX SuperPOD clusters for AI research, they’re now creating a dedicated supercomputer cluster in the cloud, specifically on MSFT Azure.”

The move, outlined in an announcement from Eric Boyd, corporate vice president of Azure AI for Microsoft, involves building for Meta AI a dedicated “a dedicated Azure cluster of 5,400 GPUs using the latest virtual machine (VM) series in Azure (NDm A100 v4 series, featuring NVIDIA A100 Tensor Core 80GB GPUs) for some of their [Meta’s] large-scale AI research workloads.”

Meta started using Azure VMs last year, in that case employing Nvidia A100 80GB GPUs for large-scale AI research purposes.Azure claims to offer Meta “four times the GPU-to-GPU bandwidth between VMs compared to other public cloud offerings,” enabling faster AI training.

The announcement added, “Meta used this, for example, to train their recent OPT-175B language model. The NDm A100 v4 VM series on Azure also gives customers the flexibility to configure clusters of any size automatically and dynamically from a few GPUs to thousands, and the ability to pause and resume during experimentation. Now, the Meta AI team is expanding their usage and bringing more cutting-edge ML [machine learning] training workloads to Azure to help further advance their leading AI research.”

Microsoft Azure alo is building new development accelerators for the PyTorch machine learning framework to help developers speed up their journeys from experimentation to production.

This week’s announcement comes after the January launch of Meta’s effort to builds its AI Research SuperCluster, a project involving Nvidia, Pure Storage and Penguin Computing. That supercluster, which Meta claimed will be the fastest AI supercomputer in the world when it’s finished later this year, uses 760 Nvidia DGX A100 systems as its compute nodes, for a total of 6,080 GPUs, and eventually evolving to 16,000 GPUs.

A Meta blog post stated at that time, “We expect such a step function change in compute capability to enable us not only to create more accurate AI models for our existing services, but also to enable completely new user experiences, especially in the metaverse.”

RELATED: Meta, with metaverse in mind, unveils AI Research SuperCluster