ISC: Growing HPC/AI convergence has Intel adjusting its gameplan

Intel provided updates at the International Supercomputing Conference in Hamburg, Germany this week, discussing how it is adjusting its technology roadmap to respond to an increasingly dynamic market for high-performance computing processors and the rapidly converging worlds of HPC and AI.

“We're seeing a convergence between HPC and AI, and they benefit from each other,” said Jeff McVeigh, Intel corporate vice president and general manager of the Super Compute Group. “But the question is, how are these systems being designed to date? If we look at the top 500 [HPC] system rankings [from November 2022] we see about two-thirds of those systems being CPU-only, the others being accelerated with GPUs and a little bit less than half of those have less than four GPUs per node. The [rest of those] have greater than four GPUs per node. There's a diversity of architectures used for HPC systems.”

He continued, “Conversely, large scale AI systems much more heavily weighted to greater than four GPUs per node–typically eight is what we see… and we see CPU-only systems but a much smaller slice than we did in the HPC world. So, as these two worlds collide, what do we expect the future to look like? …There will be diversity, and there also needs to be flexibility.”

Further clarifying Intel’s strategy, McVeigh said the rapidly changing market environment and growing diversity of workloads has influenced Intel to hold off a bit on its vision for integrating CPUs and GPUs into an “XPU” architecture, something it has talked about before going back to at least 2019.

McVeigh compared that grand integration scheme to attempting to summit a mountain. “It's important to think about the idea that there are multiple routes summit [the CPU-only route, GPU, etc.] The other thing is you want to go for the summit when the right window was available. If the weather is turning bad, you’re not feeling right, you don't push to the summit, just because it’s there. You push for it when you're ready, when the ecosystem’s ready, when the climate is ready.”

He went on to describe his own previous championing of the integrated XPU vision “premature,” adding that the current market climate is “a much more dynamic market than we thought even just a year ago” regarding new types of commercial and scientific workloads that are coming about with the sudden push into generative AI, large language models, and other factors. 

“When the workloads are fixed and when you have really good clarity on them, that they're not going to be changing dramatically, integration is great… It helps with cost and to drive down power,” McVeigh said. “But… we just feel like our rails are reckoning with where the market is today, and that it's not time to integrate.”

And all of that philosophizing goes a long way to explain why Intel is now planning on making its Falcon Shores processor not an XPU, but a next-generation GPU targeting both HPC and evolving AI applications.

Meanwhile, Intel also provided updates on its CPU products, saying the fourth-generation Xeon processor, Sapphire Rapids, which launched earlier this year after several delays, now has more than 400 design wins, McVeigh said. The fifth-generation Xeon, called Emerald Rapids, will be available later this year, and the next, Granite Rapids, is scheduled for 2024. (Intel also touted performance achievements for CPUs and GPUs in comparison to competitors’ products, claims you can read about in Intel’s ISC press release.)

Additionally, Intel provided an update on its work with Argonne National Laboratory on the Aurora supercomputer, which previously had faced reported delays due to the Sapphire Rapids delays. This week, Intel claimed that it has completed the physical delivery of more than 10,000 blades for the Aurora systems, which is being built using HPE Cray EX supercomputers, and eventually will have 63,744 GPUs and 21,248 CPUs and 1,024 DAOS storage nodes. And it will utilize the HPE Slingshot high-performance Ethernet network. Aurora is expected to offer more than 2 exaflops of peak double-precision compute performance when launched this year, Intel said.

“Early results show leading performance on real-world science and engineering workloads, with up to 2x performance over AMD MI250 GPUs, 20% improvement over [Nvidia] H100 on the QMCPACK quantum mechanical application, and near linear scaling up to hundreds of nodes,” Intel stated in its press release.