AI

Marvell addresses AI's intra-data center connectivity challenges

AI-driven data center modernization is affecting a wide range of areas, including connectivity–not just connectivity between data centers, but also inside data centers as more powerful AI computing engines are being created by clusters of processors like GPUs and AI accelerators.

Inside data centers, there is a need for connectivity to carry AI workloads between AI servers, but also a need for connectivity to support the computing fabric inside AI servers, where the processor clusters live. Here, the distance of the connections between servers is short–5 meters or less–but the need for higher-speed connectivity is growing. To meet the increasing demands of AI, connectivity links are migrating toward 100 Gbps per lane and eventually will move to serial 200 Gbps. The need for more such high-speed connections will play out over the next couple of years as data centers require more processor clusters to tackle more AI model training and inference.

However, the passive direct-attach copper cables (DACs) often in use in data centers, which can carry 50 Gbps links over 5 meters, can support the much higher 200G-per-lane capacity for only a meter or so. More active electrical cables (AECs) which allow a longer connectivity reach over thinner, more flexible cables, are needed, along with digital signal processors (DSPs) that can enable the communications between AI elements.

Marvell Technology, via its electro optical products division, is tackling the challenge with its new Marvell Alaska A 1.6T PAM4 DSP for AECs, which is designed specifically to address emerging 200 Gbps/lane-based accelerated infrastructure architectures supporting AI and machine learning applications. The 1.6 Tbps capacity translates to support for eight 200G SerDes lanes in an AEC cable, with I/O interfaces on AI accelerators, GPUs, NICs and switches, and connecting at distances of greater than three meters, all of which allows not only maximum connectivity to achieve maximum computing power in AI clusters, but also more flexibility in how clusters and, more broadly, how data centers are designed.

“The higher compute performance coming from the Nvidias and AMDs of the world, and all of the custom compute silicon as well, is driving the need for higher bandwidth connectivity inside data centers because these compute elements are all interconnected and clustered together to process model training and enable generative AI outputs,” said Nigel Alvares, vice president of global marketing at Marvell Technology. “As the interconnects become faster, they need to start to move from passive interconnections to active interconnects… and as these compute clusters are getting larger and larger, there are more and more elements that you have to connect as well.”

The recent “AEC Quarterly Market and Long-Term Forecast Report” from research firm 650 Group suggested that the AEC silicon market for devices like the new Marvell Alaska DSP is expected to grow at 64% per year to reach $1 billion by 2028, with DSPs powering AECs to reach nearly 40 million units per year. “AI is driving the need for short-reach copper connectivity at 1.6T,” said Alan Weckel, 650 co-founder. 

Marvell’s new DSP is looking to catch that wave of growth soon, as it will begin sampling to partners and customers next month, Alvares said. Marvell partners like TE Connectivity, Molex, and Amphenol already appear to be on board, as Vishwas Rao, vice president for product management at TE Connectivity, noted, “200G/lane will be the speed of choice for next generation AI data centers, with copper cables playing a critical role for short reach intra-rack connectivity. The combination of the Marvell Alaska A 1.6T PAM4 AEC DSP with TE's advanced cabling solutions will deliver the bandwidth needed for next-generation, intra-rack AI copper interconnect.”

Alvares said the migration to higher-speed links (1.6 Tbps/200 Gbps per lane) over AEC cables fits on with the broader movement toward the modernization of data centers. “Every hyperscaler as their own unique architectural and design requirements,” he said, and as they are rethinking their data centers to determine how to add more AI processors, clusters, and servers to handle more AI model demands they also will need connectivity options that grant them greater flexibility and more bandwidth to continue their ramping to more compute power and efficiency.