AI

IBM accelerates AI for the mainframe market

It might seem like all AI processing is happening on clouds full of GPUs, but some companies do AI inferencing on their own data center mainframe computing systems, perhaps for data privacy or security reasons. IBM knows this group well, and just announced a new chips to bring them more AI acceleration capabilities.

During this week’s Hot Chips 2024 conference in Palo Alto, California, Big Blue introduced its Telum II processor and the Spyre AI Accelerator.

The Telum II arrives on the scene almost exactly three years after IBM launched the first Telum processor to bring AI inference to applications like anti-fraud detecting in the financial sector. The Telum II offers eight high-performance cores running at 5.5GHz, according to an IBM blog post, which added, “Telum II will include a 40% increase in on-chip cache capacity, with the virtual L3 and virtual L4 growing to 360MB and 2.88GB respectively. The processor integrates a new data processing unit (DPU) specialized for IO acceleration and the next generation of on-chip AI acceleration.”

The addition of on-chip AI acceleration helps the Telum II achieve 24 trillion operations per second (TOPS), the company said.

Meanwhile, the Spyre Accelerator has an architecture similar to that of the on-chip AI accelerator integrated into the Telum II chip. It contains 32 AI accelerator cores, and IBM said multiple Spyre Accelerators can be connected into the I/O Subsystem of its IBM Z mainframe computers via PCIe. “Combining these two technologies can result in a substantial increase in the amount of available acceleration,” the blog post stated.

AI acceleration for mainframe computers might not be high on the list of targets for a company like Nvidia, but IBM is well-known as a premiere and long standing mainframe supplier, and there is customer interest, too, according to Jack Gold, president and principal analyst at J. Gold Associates.

“Many enterprises want to run AI inference workloads on existing data center systems,” Gold told Fierce Electronics. “These chips are not meant to compete with the high end GPUs from Nvidia in training situations. It's more closely aligned to what Intel is doing with adding AI acceleration components to its Xeon cores running in data center servers.”

He added, “Many IBM mainframe clients run their own data centers since they consider the cloud not secure enough, or not compliant with regulatory requirements [such as in financial organizations or Wall Street or government agencies]. Once a model is developed, running it on a local mainframe is safer for them, or at least more comforting knowing they control all the data that never leaves their premises.”

Gold explained that  training such a model probably entails using much more generic data from outside the company. “So, running the model on the mainframe, albeit at a lower capability, relatively limited TOPS compared to chips like Nvidia, still gives them enough performance and complete vaulted processing infrastructure,” he said. “And they don’t need to reprogram all their software as it's still IBM compatible. It will be attractive to existing IBM clients in highly regulated industries and the government who are already large IBM customers.”

The AI use cases of these parties are evolving rapidly. IBM said that both IBM Telum II and the Spyre Accelerator are designed to support a broader, larger set of models with “ensemble AI method use cases.” Using ensemble AI leverages the strength of multiple AI models to improve overall performance and accuracy of a prediction as compared to individual models, the company said.

Both the Telum II and the Spyre are expected to be available in 2025.