What does Microsoft's AI chip news mean for Nvidia, others?

This week’s Microsoft Ignite event started as a showcase of sorts for all the resources–from processor hardware to software–that Nvidia is bringing to bear for Microsoft as the software giant ramps up its investment in AI. But then it turned into a launch event for Microsoft’s own custom-built AI chips.

The juxtaposition of this week’s news announcements could have proven awkward, though for now it remains unclear how directly the custom AI chip movement being driven by hyperscalers like Microsoft, Google, and others will affect the traditional sector of chip designers and manufacturers.

As for Microsoft’s big announcement, the company unveiled two custom-designed AI chips. The Microsoft Azure Maia AI Accelerator is aimed specifically at generative AI and other AI-specific workloads, while the Azure Cobalt CPU is optimized for general purpose computing workloads on the Microsoft Cloud. These unveilings do not come as a surprise and may be overdue.

“Microsoft’s long awaited AI accelerator is finally here,” wrote Dylan Patel, chief analyst at SemiAnalysis. “They are the last of the big 4 US hyperscalers (Amazon, Google, Meta, Microsoft) to unveil their product.”

These announcements came on the heels of Nvidia and Microsoft together announcing that Nvidia will provide resources like its AI foundry service for creation and management of customized large language models and its DGX Cloud AI training-as-a-service supercomputing platform through Azure. Microsoft also said that it is deploying Nvidia’s H100 GPU platform in the firm of new virtual machine instances to boost Azure’s AI performance, and also plans to deploy Nvidia’s recently announced H200 Tensor Core GPU platform with HBM3e memory.

As an Nvidia blog detailing the announcements put it:

“Microsoft announced its new NC H100 v5 VM series for Azure, the industry’s first cloud instances featuring NVIDIA H100 NVL GPUs. This offering brings together a pair of PCIe-based H100 GPUs connected via NVIDIA NVLink, with nearly 4 petaflops of AI compute and 188GB of faster HBM3 memory. The NVIDIA H100 NVL GPU can deliver up to 12x higher performance on GPT-3 175B over the previous generation and is ideal for inference and mainstream training workloads.

Additionally, Microsoft announced plans to add the NVIDIA H200 Tensor Core GPU to its Azure fleet next year to support larger model inferencing with no increase in latency. This new offering is purpose-built to accelerate the largest AI workloads, including LLMs and generative AI models. The H200 GPU brings dramatic increases both in memory capacity and bandwidth using the latest-generation HBM3e memory. Compared to the H100, this new GPU will offer 141GB of HBM3e memory (1.8x more) and 4.8 TB/s of peak memory bandwidth (a 1.4x increase).”

General media reaction to the Microsoft news jumped on the notion that Microsoft’s custom chips could compete with Nvidia’s in the future, and that Nvidia’s early AI chip dominance could prove to be short-lived, but such reactions may misconstrue what the hyperscalers are up to with their custom chip efforts, according to Jack Gold, president and principal analyst at J. Gold Associates.

“This is really more about Microsoft optimizing AI chips for their own environment and needs (just like AWS and GCP [Google Cloud Platform] do),” Gold said in an email to Fierce Electronics. “It’s really about controlling their own destiny, and optimizing what they see as the most important aspects for their needs. By doing their own thing, they have the ability to add into the hardware special acceleration/control/interface functions that are not available on commodity chips from Nvidia, Intel, etc. It also gives them a price-competitive way to offer services on Azure that they can use for a wider variety of use cases (Nvidia chips are mostly used at the very high end for training and are costly to utilize by customers, but lots of activity in lower end training and inference are also required, and need a more reasonable cost of operation). It could even enable them to offer edge services more effectively.”

Patel's analysis noted that Microsoft also may use AI chips from AMD, in addition to Nvidia's processors, and more broadly, the custom plans of hyperscalers also have caused concern around Intel.

Gold added, “I don’t see this as a huge impact on Nvidia or Intel. They both have their place in the ecosystem and all the hyperscalers will continue to offer their devices for use where it fits best. This just gives Microsoft another option for offering its customers AI at a certain price/performance. If I equate it to cars, it’s a little like Toyota also having Lexus for high end, Honda having Acura, etc.”