This story has been updated since its original publication on March 17 with comments from analyst Dylan Patel.
A question likely to come up for CEO Jensen Huang at Nvidia’s global AI GTC conference starting Monday is how well his company is keeping up with semiconductor demand—mainly GPUs-- for training and inference functions used in a cascade of applications, especially to sate the seemingly endless appetite for chatbots like Open.AI’s Chat GPT4.
While GTC is explicitly not a conference about Nvidia financials, even the average genius garage developer wants to know if the chips are going to be there in sufficient numbers to justify the time, sweat and dollars invested in developing new applications. (And this includes little developers who contract with huge companies that can afford to buy lots of expensive hardware.)
Gamers are also justifiably concerned about a possible graphics card shortage if a prolonged GPU shortage materializes, after being burned by the high demand for crypto-mining some time ago. But any possible shortage, so far, appears to be related primarily to high-end GPUs, the kind a massive cloud outfit would use. So far, what’s happening appears to be more than simply a shortage scare, justified or not. Some analysts, however, are more declarative, arguing the shortage is already here, bigtime.
"There is a huge supply shortage of Nvidia GPUs and networking equipment from Broadcom and Nvidia due to a massive spike in demand," wrote SemiAnalysis analyst Dylan Patel to Fierce Electronics. "While Nvidia and Broadcom are ramping up production quickly, there is still a big gap."
To dig into what’s happening, here’s some insight, meager as it may seem:
Earlier in 2023, financial analyst Timothy Arcuri at UBS wrote that 10,000 Nvidia GPUs were used for the training function of GPT3, leaving it unclear if it would require double, or more, that many GPUs for the inference functions, which are clearly more advanced in the recently-announced v4, just to take one example.
Huang has talked to various media, including CNBC, about how fortunate he and Nvidia engineers were in prepping for AI more than a decade ago, although he was wise to tell Katie Tarasov at CNBC that “some serendipity” has been involved. (Has there ever been a morie refreshing CEO in tech?)
In a funny kind of apparent response to Huang’s modest brag about seeing the future, Microsoft issued a recent blog, “How Microsoft’s bet on Azure unlocked an AI revolution.” Well, really, what matters more? Having a massive global cloud operation and investing billions in Open.AI or making the chip-brains at the heart of it all? ( Answer: It’s obvious that it all matters.)
The same day, Microsoft Azure touted scalable virtual machines to speed up generative AI, noting it is using Nvidia H100 Tensor Core GPUs, along with Nvidia Quantum-2 InfiniBand networking, in its NH H100 v5.
Some of these GPUs can run…wait for it…$10,000 apiece, so the chatbot craze is like striking gold for Nvidia, mainly, although AMD is making a show in commercial GPUs as well. (Intel, Google, Cerebras and, soon, company X-chip married to your cousin will be in it as well.) With questions over possible supply constraints for the hottest GPUs, prices could soar, surely, putting thousands of startup AI companies and even big cloud providers in the same shoes worn by US automakers in 2021 when they had to park finished pickup trucks in empty lots for want of a vital chip.
In some cases, these worries about supplies of GPUs and related hardware are out in the open, while others are not. There’s a lot of disagreement over whether there’s a shortage coming, while some analysts believe there already is a shortage. As stated above, Dylan Patel has already declared a "huge supply shortage of Nvidia GPUs" as well as networking gear from Broadcom and Nvidia due to "massive demand."
In a recent tweet, Patel pointed out that Microsoft and Open.AI are “consuming a lot of GPUs for inference.” He noted the two companies are “so far ahead on deploying AI to consumers and businesses” giving them first mover advantage.
Patel suggested Fierce Electronics query whether it could take a year or more for Nvidia to catch up to demand for H100 GPUs. Nvidia has not commented, but it makes sense that the explosion of demand for AI apps like ChatGPT shows strong demand, probably above what anybody expected three months ago, a short time frame for a vast chip supply chain to start chugging to full production.
Meanwhile, Google has been developing in-house silicon since about 2013, while Nvidia built AI into GPUs in 2017, now in its third generation with A100, then H100. In fact, Google had a more scalable version of NVLink in 2021 with TPU v 4 than Nvidia in 2023 has with H100, in Patel’s opinion.
“All of this doesn’t matter at all of course, because Google didn’t commercialize it properly,” Patel said in another tweet. Google’s product is “harder to program, harder to use, harder to get good utilization and harder to get access to. Great for internal workloads, but external they dropped the ball.”
Back to Nvidia: the big question for the market is whether any temporary H100 or other GPU shortage could provide a significant opening for AMD to grow in commercial GPUs, much less other contenders. The answer depends on the duration of any shortage, no doubt. Patel is equally worried about networking gear, not just GPUs.
"Microsoft cannot deploy their new Office Copilot with a year's worth of GPUs, or rather, they would compromise heavily on model size and quality," Patel said.
With a "big gap" for meeting a massive spike in demand, Patel said some firms are turning to AMD GPUs and Cerebras WSE to supplement a shortage of Nvidia hardware. Cerebras first introduced its WSE-2 in 2021 and calls it the fastest AI processor on Earth, although analysts are well-aware there are many elements involved in training and inference work.
“With a short supply in many products, it’s definitely an opening for others to make market inroads, assuming they can offer at least competitive products. We’ve already seen some price wars breaking out in GPUs for PCs and Intel has priced pretty aggressively particularly at the lower end,” said Jack Gold, an analyst at J. Gold Associates.
Gold added: “It’s highly likely that the hyperscalers will offer Intel based high end GPUs to their clients, and how that service is priced may determine how well Intel high end GPUs compete with Nvidia for AI and other HPC tasks.
“High end Nvidia AI-based products still need a CPU to work with, and Nvidia has a working relationship with Intel, especially with their new Xeon Scalable processors, which are powering the CPU functions in the Nvidia high end H100 series.”
Time will tell on the question of supplies and pricing for GPUs and related hardware, although one thing is sure: demand for generative AI is upon us.