AI

CXL gains Big Mo as memory chokes on AI workloads

The emergence of the Compute Express Link (CXL) interconnect predates the recent artificial intelligence (AI) boom, but it’s AI that might be what accelerates its adoption in the data center.

Introduced in 2019, CXL already has three iterations, which has meant those wishing to leverage it are drinking from a fire hose of features. Widely embraced by industry, CXL is a cache-coherent interconnect for processors, memory expansion and accelerators that enables resource sharing – particularly memory.

Like the more mature Non-Volatile Memory Express (NVMe) protocol, CXL uses PCI Express (PCIe) as its foundation with a flexible processor port that can auto-negotiate to either the standard PCIe transaction protocol or the alternative CXL transaction protocols.

CXL has three sub-protocols: CXL.io; which is necessary to do I/O instructions, CXL.cache, and CXL.memory. In CXL 1.0, memory can be directly attached, while 2.0 added the capability to attach memory to a pool of processors, allowing for the use of storage-class memory or persistent memory, or tiers of memory with different performances and cost structures.

CXL 3.0 increases the ability for more disaggregation by adding advanced switching and fabric capabilities, efficient peer-to-peer communications, and fine-grained resource sharing across multiple compute domains. Most recently, 3.2 was released with additional optimization for CXL memory device monitoring and management and extended security using the Trusted Security Protocol (TSP), including expanded IDE protection.

One of the reasons CXL leapt from 1.0 to 3.2 so quickly is that there were already a bevy of features envisioned for the interconnect from Day One, but those guiding the specification opted to layer on functionalities iteratively, knowing that device makers and systems builders needed to time to effectively adopt the protocol.

Jim Handy, principal analyst with Objective Analysis, said the approach to building up the CXL specification reflects that reality that it’s hard to get every stakeholder on the same page right away. “What they did was they took small steps.”

He said the end goal with CXL was always to have multiple layers of switches, but CXL 1.0 didn’t have any, and CXL 2.0 only had one. It wasn’t until 3.0 that this vision became reality and that’s what many people were waiting for, Handy said. “One of the big applications for CXL is called memory pooling and you can't do memory pooling with it without at least a switch, so you need at least CXL 2.0 to do that.”

CXL lacks software

Because it relies on PCIe, CXL is not all that complex from a hardware perspective to implement as there are processors to support it, Handy said, and the big players such as Intel and AMD are providing them. The missing piece needed for CXL to take off is software. “There isn't really any software that supports it, so this is going to be a big hyperscale data center play in its first few years,” he said.

Broader adoption could take as long as five years, Handy added, with the immediate interest being in the memory pooling capabilities of CXL because it can help to access underutilized, “stranded” memory. “The whole idea with memory pooling is that data centers will be able to get by with less memory.”

Handy said the conundrum is whether this capability is a threat to memory sales, or whether it will prompt people to buy huge memories. There is also the question of how it fits into the memory / storage hierarchy in that it may be easier to use more slow, cheap memory at one layer than expensive, fast memory at another.

Compounding that conundrum is the CXL does add latency that has the effect of slowing down a memory such as DRAM because there needs to be a CXL controller between the memory and the processor, Handy said, which also adds cost.

Software will solve some of latency issues in the long run, he said, just as software addressed performance issues with SSDs so that they weren’t held back by hard drive paradigms.

Expansive standard can be overwhelming

Even though CXL is an open standard, achieving interoperability can be challenging because the specification is so expansive, Gary Ruggles, senior staff manager for the Synopsys solutions group told Fierce Electronics in an interview, encompassing memory pooling and sharing, symmetric coherency, and multi-level switching, among other capabilities.

Synopsys provides a controller, PHY, security modules, and verification IP for CXL 3.x and previous iterations of the specification aimed at supporting all CXL devices, ranging from accelerators, memory expander, and smart I/O products built on its existing PCIe IP. Because of the nature of the IP that Synopsys provides, it always must be ahead of the curve to test and validate to ensure interoperability.

Along with Teledyne LeCroy, Synopsys delivered what it said is the world’s first CXL 3.1 multi-vendor interoperability demonstration at SC24, showcasing how a Teledyne LeCroy Summit M616 Protocol Exerciser emulated a CXL host connected to a Synopsys CXL physical layer device (PHY) and controller while communicating over CXL 3.1 without the assistance of an interposer.

demo at an event

Ruggles said the CXL Consortium, which governs the interconnect protocol, is following the path of PCIe with compliance workshops, but has yet to do 3.0, while the group shepherding PCIe has yet to do 6.0.

He said CXL 2.0 focused on adding switching capabilities while CXL 3.0 supports fabrics, which means every memory attached to a device on the fabric can be shared. “In theory you get close to 100% memory utilization instead of having every device having its own memory.”

The ability to leverage additional resources that are networked together supports the concept of distributed computing, added Ron Lowman, product manager for the Synopsys solutions group, with CXL enabling memory sharing capabilities. 

From an adoption perspective, storage providers have been developing solutions on PCIe 5.0, which aligns with CXL 2.0, but Synopsys customers are choosing CXL 3.0, having already enabled the capability of distributed computing, Lowman said.

He said CXL provides the capability to reach out to a network to leverage a little bit more memory, get a little bit more bandwidth and more resources – not just memory, but the compute resources of a nearby node.

AI workloads are driving CXL adoption, Lowman added. “The thing that's unique about AI workloads is they are always memory constrained. You get these bottlenecks where you've just run out of memory on a single monolithic SoC.”

AI will drive CXL adoption in 2025

The initial excitement of CXL has somewhat been overshadowed by the onset of AI. In an interview with Fierce Electronics, Thad Omura, chief business officer at Astera Labs, said there’s tremendous interest in the industry and that CXL will truly start to ramp up in 2025. “You're going to see large customers deploying the technology,” he said. “A lot of activity continues to happen right now in terms of qualifications and getting the technology ready for mass production and deployment.”

As a controller company, Omura said Astera is biased in that it sees CXL performance gains heavily tied to the controller that is selected as it affects reliability, availability and serviceability of the entire platform for the CXL memory. “A lot of what Astera has focused on is to make sure that reliability is there for the CXL attached memory that we are adding into systems,” he said.

Controller technology will heavily influence new CXL configurations, including how much memory density can be added, Omura said, and generate the best total cost of ownership. Astera’ s Leo CXL Smart Memory Controller supports memory expansion, sharing and pooling.

Omura said initial applications will involve memory expansions and that CXL is starting to have a positive impact on AI workloads such as inference for deep learning recommendation models. “You'll start to see some more activity on AI with CXL starting to also emerge either later this year and into next year.”