As Arm continues to enjoy an AI-associated run-up on the stock market, the U.K. company is continuing to focus on improving the aspects of chip architecture that are making the whole AI revolution possible–and that means support for not only AI training, but AI inference as well.
The company this week announced its two latest Arm Neoverse Compute Subsystems (CSS) built on “brand new third-generation Neoverse IP,” according to Mohamed Awad, senior vice president and general manager for Arm’s Infrastructure Line of Business.
“In infrastructure, we are continuing that transformation to more sophisticated warehouse-scale computing,” Awad said in a briefing prior to his week’s announcement. “It's no longer about chip or server or rack. It's about the entire data center. In 2023, we saw that transition accelerate as the world embraced generative AI. And, in 2024 and beyond, we expect massive innovation as AI permeates things like education, employment, manufacturing, healthcare, and transportation.”
Part of that infrastructure transformation includes the trend toward hyperscale firms creating some of their own chips, and with that occurring, a pre-built subsystem that tightly integrates hardware and software to give customers a head start on those projects makes abundant sense, Awad said.
“They're redesigning systems from the ground up, starting with custom SOCs,” he said of Arm’s hyperscale customers like AWS and Microsoft. “This works because they know their workloads better than anyone else, which means they can fine-tune every aspect of the system, including the networking, acceleration, and even general-purpose compute specifically to optimize for efficiency performance and, ultimately, total cost of ownership.”.
This week’s Neoverse IP upgrade should please both those big customers that have been using Arm IP to build their own custom chips, and big enterprise clients who are ready for an AI transition from training to inferencing, said Jack Gold, president and principal analyst at J. Gold Associates.
“Arm sees the server market, and especially the extensions into AI as its next big market area,” Gold said. “It’s been doing server chips (Neoverse) for a couple of years, with so far limited success, but the hyperscalers have been doing some of their own chip designs on the Arm IP, and this newer version is an improvement over older generations.”
As for the market transition from AI training to inference, Gold added, “As edge use of AI takes off, the Neoverse acceleration has a major opportunity. Of course Arm is working with Nvidia on the high end Grace Hopper, where Arm IP is running the CPU side [of the CPU+GPU ‘Superchip’]. But there is a whole market area in the inference space that Nvidia doesn’t/can’t cover, and this is a prime area for [Arm] to exploit.”
The new Neoverse CSS products include extensions of the company’s N-Series and V-Series Neoverse products. The Neoverse CSS N3 beefs up performance and power efficiency to the tune of 20% higher performance-per-watt compared to earlier CSS N2. Meanwhile, the new Neoverse CSS V3 marks the inaugural extension of the Neoverse CSS architecture to the company’s higher-performance V-series portfolio, bringing a 50% performance-per-socket improvement over CSS N2.
Dermot O’Driscoll, vice president of product solutions for Arm’s Infrastructure Line of Business, explained, “The first instantiation of CSS N3 offers 32 cores in as low as 40 watt TDP. It's highly scalable, covering a range of applications like telecommunications, networking, and DPU. We're also looking at scale-out cloud configurations.” He also noted that the CSS N3 is the first that brings Arm v9.2 features to the N-Series, including a two megabyte per core private L2 and support for the latest PCIE and CXL IO standards, as well as UCIE chiplet standards.
Customers who want those CSS advantages, but with bigger performance may opt for the Neoverse CSS V3. “CSS V3 can scale up to 128 cores per socket and supports the latest high-speed memory and IO standards. CSS V3 is built on our new Neoverse V3 core, which is Arm's highest single thread performance Neoverse core ever,” O’Driscoll said.
Circling back to Gold observation about the inference market opportunity, O’Driscoll added, “Much of the focus has been on training large language models, but we know that focus will shift to inference as gen AI is applied to real world business problems and applications. Some analysts already estimate that as much as 80% of deployed AI servers are dedicated to inference and that number is expected to rise, This shift will mean finding the right models and model configurations, training them, and then deploying them to the most cost-effective computing infrastructure.”