AI

AMD, Intel, others propose UALink to connect, scale AI chips

AI accelerator chips are expected to flood data centers in the years to come, and they will be most powerful and useful when linked together to create greater scale and computing power. That’s why Nvidia offers NVLink, which enables GPU interconnectivity within data centers, but what if you don’t use Nvidia chips?

A throng of big-name companies–AMD, Broadcom, Cisco, Google, Hewlett Packard Enterprise (HPE), Intel, Meta, and Microsoft–is working on the answer to that question. These firms have agreed to create a new open method–an Ethernet-based potential industry standard called Ultra Accelerator Link (UALink)–for supporting high-speed, low-latency links between AI accelerators. As the group put it in a statement, it is working on “a specification to define a high-speed, low-latency interconnect for scale-up communications between accelerators and switches in AI computing pods.” 

Jack Gold, founder and principal analyst of J. Gold Associates, observed via email, “AI is VERY bandwidth intensive, especially with memory subsystems. So anything that can speed up the interconnect bandwidth helps you process more AI packets and accelerate the overall workload.”

This positions UALink as a challenger to what Nvidia is doing with NVLink (and to some degree InfiniBand, which links servers and storage devices and aspects of which Nvidia leveraged in NVLink after it acquired InfiniBand player Mellanox). UALink also will be part of a universe of intra-data center connectivity options that additionally includes technologies like the PCI Express (PCIe) serial expansion bus standard and the Compute Express Link (CXL) protocol viewed mostly as a method for pooling and connecting CPUs.

“The bottom line for all of this is really Proprietary (Nvidia) vs. Industry Standard (UA Link),” Gold said. “Most of the companies out there building infrastructure don’t want to go NVLink because Nvidia controls that tech.” Gold also described NVLink as “expensive tech and requires a fair amount of power.”

The UALink 1.0 specification, which its promoters expect to be available sometime in the third quarter of this year, will enable the connection of up to 1,024 accelerators within an AI computing pod, and allow for direct loads and stores between the memory attached to accelerators, such as GPUs, in the same pod. 

Forrest Norrod, executive vice president and general manager, Data Center Solutions Group, AMD, said UALink most likely would be used to connect AI accelerators of the same make within a single server environment, and not used to link accelerators of different makes in a “heterogeneous” way or be used all the way across a data center or between data centers.

“We would not anticipate this being used across a data center or certainly across the oceans,” he said. “It’s for a tightly optimized domain… a relatively local domain… but also not necessarily just one rack.”

Norrod added, “UA link is very tightly defined to be extremely efficient, both in terms of communications and also the silicon area and to be a power-efficient mechanism for interconnecting accelerators. So it doesn't have a lot of the features and functionality that you would find in CXL or PCI Express. It is tightly targeting this particular use case, as we think it allows for much more efficient scaling of accelerators.”

The companies that came together to announce UALink are calling themselves the UALink Promoter Group, and also have formed the UALink Consortium, and will make the UALink specification available to current and future consortium members.

They announced UALink in a video call starring a group of officials from different firms that one is otherwise very unlikely to see assembled on the same screen. But something that at least AMD and Intel have in common is that they are living in the shadow of Nvidia’s AI market dominance, at least for now. Still, the UALink Consortium does not plan to exclude Nvidia from its ranks. Asked if the group would pick up the phone if Nvidia came calling, Norrod answered, “Of course. An open standard is just that–it’s open for other folks who want to join. All members of this promoter group have been pretty clear for some time about supporting open standards here, and I think there are active conversations across the ecosystem about that, and we just leave it at that.”

Gold concluded, “The standards body [the UALink Consortium] is hoping to build on top of industry standard Ethernet that will make the tech less expensive, and since it’s an open technology it will be able to have multiple suppliers (also a cost/power advantage in that there is competition). And they think they can compete at the speed of NVLink or even beyond eventually (especially as they move to optical interconnect). I expect that at some point in the next few years, NVLink will fade and the primary interconnect will move to UALink or its successors as system makers adopt an open industry standard.”