AI computing power demands liquid cooling

Computing needs to chill out.

As artificial intelligence (AI) devours bandwidth, faster processors, denser storage, and lightning-fast connectivity alone can’t help to scale the necessary compute capabilities. Keeping everything cool is critical, and traditional air-cooling methods are not enough –- liquid-based solutions are essential.

AI is just one of the drivers for more advanced cooling technologies, according to recent research released by IDTechEx. Cloud computing and crypto mining have also driven up power density of data center racks, which puts more pressure on thermal management and cooling. In an interview with Fierce Electronics, IDTechEx technology analyst Yulin Wang said traditional air-cooling methods struggle to meet the cooling demands of data centers brimming with densely packed servers.

On one hand, those making components, including the chips in the servers, can make them more energy efficient, but there’s only so much that can be done to reduce wasted heat, Wang said, especially when the thermal design power of the GPU has increased over the past two decades, rising from 150 watts to 500 or 600 watts, he said. “That represents a lot of heat being generated which requires a more efficient cooling technology when it comes to optimizing the performance of the data.”

There are two broad categories of cooling –- active and passive. With passive cooling, air flows through like natural light, Wang said. “The cooling efficiency is extremely low.”

Active cooling has several subcategories, including air cooling and liquid cooling. Air cooling is the most widely adopted and means multiple server racks are placed in an air-conditioned room. “That is how the heat is dissipated,” Wang said. But with the increased density of computing driven by AI, it is not efficient enough, so liquid immersion and cold plate cooling is becoming more appealing.

Immersion cooling is still in its early stages, but the cooling performance is high, Wang said. However, the key downsides are material compatibility, the coolant being used, and how to integrate the immersion cooling into existing data centers. He said there are capital costs across the supply chain, and whether each player is willing to pay those costs is unclear.

Wang said the appeal of cold plates is there is no need to retrofit the entire server room, just the server rack. Cold plate cooling transfers heat from devices to a liquid coolant that flows to a remote heat exchanger and dissipates the heat.

Air and liquid cooling will co-exist

IDTechEx’s research suggests that designing data centers with both air cooling and liquid cooling infrastructure will allow for a future transition to liquid cooling, although it acknowledges that building data centers with redundant features like liquid cooling manifolds and pipes from scratch may not always be viable if there are budget limitations. The other, most common approach is to integrate liquid cooling into existing air-cooled facilities over the short and mid-term and involves transitioning some of the capacity of air systems to liquid cooling systems.

No matter the size of the data center, the cooling challenge remains the same, even if it is a smaller data center at the edge – they can be just as power hungry, Wang said. “It still depends on the total power consumption. It does not make a fundamental difference.”

In the meantime, innovation in other industries may inspire advances in the cooling computing environments, like automotive, where cold plate technology is used to cool battery packs.

Advances in cooling may also come from chipmakers and other component suppliers. Wang noted that Nvidia is participating in the COOLERCHIPS program created through the United States Department of Energy’s Advanced Research Projects Agency-Energy (ARPA-E) efforts. Nvidia was awarded $5 million out of a total $40 million in grants to fund research into the development of concepts that address cooling within a computer’s chassis.

Jeremy Rodriguez, Nvidia’s senior director of data center engineering, led the team that applied for the government program. The team is responsible for mechanical, electrical, and plumbing solutions for the company, as well as research and development in power and cooling technologies that help scale data centers.

He said the heat load of microchips has been steadily increasing over time and an inflection point has been hit where air cooling is no longer optimal, which is why cold plate technology looks to be the most efficient way to cool a chip. “Air isn't the greatest cooling medium,” Rodriguez said. “When you have something like a fluid, you can transfer a lot more heat into that fluid and do heat exchange more efficiently.”

Coinciding with hotter chips have been concerted efforts to densify computing because it reduces networking related costs and improved performance at a cluster level. Rodgriguez said everyone who is part of the data center has a vested interest in making it as efficient as possible, hence Nvidia’s participation in the COOLERCHIPS program.

Rodgriguez said Nvidia recognizes the concerns around water use and refrigerants, as well as chemicals that are labelled as “forever chemicals,” or per- and polyfluoroalkyl substances (PFAS). “We don't want to introduce those as solutions in the data center.”

Cooling research is heating up

Nvidia is collaborating with several companies that contribute different types of expertise to solve the cooling problem, including BOYD Corp., Durbin Group, Honeywell and Vertiv. Its initial concept combines liquid and immersion cooling.

diagram
Nvidia's collaboration to solve cooling includes a concept to combine liquid and immersion. (Nvidia)

Given that the data center has many building blocks, it makes sense that cooling solutions require an ecosystem of different vendors. British liquid cooling company Iceotope Technologies is collaborating with Hewlett Packard Enterprise, Intel, and nVent. The company’s recently announced Kul Ran is an ultra-resilient and highly-energy efficient precision liquid cooled server solution that addresses extreme edge deployment challenges. Kul Ran is based on the company’s precision liquid cooling technology, which the company said delivers power savings up to 40% compared to other edge servers in its class and removes nearly all the heat generated by the electronic components of a server.

Iceotope’s cooling technology also eliminates all water consumption. It also addresses the challenge of transitioning to liquid cooling by using the same rack-based architecture as air cooled systems and fits into existing deployed infrastructure. Iceotope offers a rack-mounted chassis, which pumps coolant directly to the CPU, GPU, memory, and hard disks, while a device's motherboard sits beneath a thin layer of coolant, eliminating the need for fans.

cabinet with drawer
Iceotope's cooling approach eliminates all water consumptions and uses the same rack-based architecture of air-colled systems, meaning it fits into existing infrastructure.  (Iceotope)

Intel, meanwhile, has efforts underway to tackle cooling, having supported immersion cooling for more than a decade. Among the solutions company researchers are exploring are 3D vapor chambers – sealed, flat metal pockets filled with fluid – to spread the boiling capacity using minimal space and improved boiling enhancement coatings. This approach aims to reduce thermal resistance by promoting high nucleation site density.

Intel is also exploring the use of arrays of fluid jets to cool the highest-power devices. These cooling jets route fluid directly at the surface. The thermal lid that contains the jets can be attached directly to the top of a standard lidded package, eliminating thermal interface material, and reducing thermal resistance.

green light on cooling cables
Intel is taking is looking at a number of immersion cooling methods, including using a tank filled with synthetic non-electrically conductive oil to hold Xeon-based servers.  (Intel)

The devices have a role themselves in contributing to a cooler data center. Rodriguez said Nvidia is going to be making chips that are hot, but the company makes them efficient as possible as dies shrink.

Collaboration on cooling research leaves the planet

Rodriguez sees participation in the COOLERCHIPS program bringing the industry together, while other industries can be sources of inspiration, including the space program as it has had to deal with rejecting heat and keeping chips cool in extreme conditions. “There's a cross-pollination of cooling technologies that can be applied across to any given industry,” Rodriguez said.

Research into cooling technologies for electronics is in fact happening in space. In late summer, the crew of the International Space Station (ISS) was set to do experiments to determine if microgravity holds the key to preventing the overheating of advanced electronics. The goal is to see if it’s possible to improve the efficiency of heat transfer devices used in various technologies – everything from laptops to NASA’s Hubble Telescope.

The experiments build on previous research done on the ISS and are leveraging microgravity to better understand the vapor-liquid interfaces of organic mixtures used in heat pipes, which could lead to their improved efficiency. Heat pipes rely on the complex interplay between the vapor and liquid phases within a sealed system, a dynamic that can strongly affect their performance; the motion and dynamics of the interface between the two phases can significantly affect the performance of heat pipes and similar systems. 

Researchers will study the liquid-to-vapor phase change and shape of the vapor-liquid interface in microgravity, which lacks buoyancy-driven convection – the resulting reduction in surface tension due to the higher temperature at the heated end of the pipe significantly impact heat pipe dynamics and thermal performance.

Back on earth, there are many vendors and OEMS providing a variety of servers in many configurations with differing levels of quality. IDTechEx’s Wang said such a complex ecosystem makes tackling cooling more challenging. A smartphone maker like Apple has much more control over the supply chain and the assembly of the final product, so it is easier to optimize the workflow, he said.

Within the data center industry, Wang said the segmentation means many players are doing their own thing. “There is an increasing collaboration across the supply chain, but more collaboration is needed,” he said. “Everyone is doing their own thing without caring too much about the whole picture.”