Intel details GPU Flex Series, next-gen Xeon accelerators

Following its chiplets announcement and CEO Pat Gelsinger’s industry pep talk at the Hot Chips event this week, Intel had more to say about boosting GPU performance and flexibility, and how the company primed its CPU processor with accelerator technology to address AI workload demands in data centers.

First up, Intel unveiled details of its Data Center GPU Flex Series, which formerly was code-named Arctic Sound-M when it was first discussed in early 2022.

“Flex” in this case refers to a single GPU’s capability to handle a growing variety of data center workloads–media delivery, cloud gaming, AI, metaverse and other emerging visual cloud use cases–at optimum performance instead of forcing customers to relegate some of workloads to more siloed, discrete solutions.

“We are in the midst of a pixel explosion driven by more consumers, more applications and higher resolutions,” Jeff McVeigh, Intel vice president and general manager of the Super Compute Group. “Today's data center infrastructure is under intense pressure to compute, encode, decode, move, store and display visual information.”

He described the ​Flex Series GPU, based on Intel’s Xe-HPG architecture, as having “a breakthrough design” with a “industry's first” hardware-based AV1 encoder in a data center GPU, enabling five times the media transcode throughput performance and two times decode throughput performance at half the power of competitive solutions, along with 30% bandwidth improvement and lower total cost of ownership. It all leads to more flexible scaling of AI inference workloads from media analytics to smart cities to medical imaging between CPUs and GPUs without locking developers into proprietary software, he said.

It is all being brought into a world with an insatiable demand for streaming content related gaming, video and audio. To that end, Intel said a single Flex Series 170 GPU can achieve up to 68 streams of 720p30 while a single Flex Series 140 GPU can achieve up to 46 streams of 720p30 (performance measured on select game titles, the company said). When scaled with six Flex Series 140 GPU cards, it can achieve up to 216 streams of 720p30.

The Flex Series relies on a software stack enabled by the oneAPI unified programming standard supporting accelerated computing. With oneAPI, developers can avoid programming language “lock-in,” and create open, portable code that will take maximum advantage of various combinations across Intel CPUs and GPUs, the company said.

CPU Acceleration for AI

The Flex Series GPU announcement came the same day that Intel hosted a “Chalk Talk” discussion about its various computing acceleration efforts in play as media, AI and more reshape the data center and require ever-greater performance from CPUs as well

Sailesh Kottapali, Chief Datacenter CPU Architect at Intel said during the session that the Intel Xeon server processor continues to be a general purpose computing workhorse for the data center, but that the company is looking at AI trends to influence how it designs accelerators to augment performance.

“The data center landscape continues to evolve across different kinds of workloads, usage models, and deployment models,” he said. “The one constant that we see is the ever increasing need for general purpose computing that is delivered with the highest level of efficiency and sustainability.”

He said that notion guided Intel as it developed Sapphire Rapids, the fourth-generation Xeon processor. “In Sapphire Rapids, we architected the processor to make sure that it can deliver significant performance out of the box,” he said, adding that Intel also wanted to make sure “that we look at some of these growing and emerging usages, and identify the predominant functions that are at the center of all of those, and build architecture capabilities that actually provide multif-old improvements. We did that using both instruction set architecture acceleration capabilities, as well as dedicated engines.”

Kottapali added that Intel used a software-first approach to make sure that as it developed acceleration techniques they would be easy to integrate into existing infrastructure. Among the new accelerators are advanced matrix extension and a data streaming accelerator. The former “massively speeds up some of the tensor operations which are at the heart of most of the deep learning solutions,” he said, while the latter “helps offload and speed up a lot of the common data movement operations that you see in the data center, across CPU caches and memory as well as from the CPU to IO, including storage and networking. And what we see is essentially a multifold speed up in some of these [data movement] functions, which actually helped the overall workload performance.”

Unfortunately, prospective Sapphire Rapids adopters will have to wait a while longer to put these acceleration techniques in play, as the new processor, which originally was scheduled to be available in 2021, and later was delayed to this year, reportedly is now delayed until next year.