Memory is key to future AI and ML performance

Memory remains one of the most critical technologies for enabling continued advances in artificial intelligence/machine learning (AI/ML) processing.

From the rapid development of PCs in the 1990s, to the explosion of gaming in the 2000s, and the emergence of mobile and cloud computing in the 2010s, memory has played an integral role in enabling these new computing paradigms. The memory industry has responded to the needs of the industry over the last 30 years, and is being called upon again to continue innovating as we enter a new age of AI/ML.

PCs drove an increase in memory bandwidth and capacity, as users processed growing amounts of data with applications like Word, Excel, and PowerPoint. Graphical user interfaces, the Web, and gaming pushed performance even higher. This gave rise to a new type of memory called Graphics DDR, designed to meet increased bandwidth demands.

Mobile phones and tablets ushered in a new era with on-the-go computing, and the need for long battery life drove the memory industry to create new Mobile-specific memories to meet the needs of these markets. Today, cloud computing continues to drive increases in capacity and performance to tackle larger workloads from connected devices.

Looking forward, AI/ML applications are driving the need for better memory performance, capacity, and power efficiency, challenging memory system designers on multiple fronts all at the same time. According to OpenAI, AI/ML training capability has increased by a factor of 300,000 between 2012 and 2019—a doubling every 3.43 months. AI/ML models and training sets are growing in size as well, with the largest models now exceeding 170 billion parameters and even larger models on the horizon.

While part of the improvement in performance and model size has been driven by silicon improvements due to Moore’s Law, the challenge is that Moore’s Law is slowing. This makes it harder to continue achieving these types of performance increases. In addition to silicon improvements, another crucial part of the gains in system performance has been due to improved memory systems.

When discussing the role of memory, it’s important to understand that AI/ML applications are composed of two tasks, training and inference, and each have their own requirements driving the choice of the best memory solutions for each.

Training is the process of “teaching” tasks to a neural network, and often require large data sets to be presented to the neural network so that it can comprehend its task. This sometimes takes place over the course of days or weeks, so that it can become proficient. Inference is the use of trained neural networks on data it hasn’t seen before, with inference accuracy determined by how well the neural network was trained.  

Cloud computing infrastructure allows the training of neural network models to be split across multiple servers, each running in parallel, to improve training time and to handle extremely large training sets. Additionally, because of the value created through these training workloads, a powerful “time-to-market” incentive makes completing training runs a priority.

Moving data to the AI model can become a bottleneck, driving the need for higher memory bandwidth. Because this training is taking place in data centers that are facing increasing space and power constraints, there’s an increased need for power-efficient and compact solutions to reduce cost and improve ease of adoption.

Once an AI/ML model has been successfully trained, it can be deployed either in the data center, at the network edge or increasingly in IoT devices. AI/ML inferencing requires memory to have both high bandwidth and low latency to produce answers in real time. Because inferencing is appearing in an increasingly wide range of devices, cost becomes more of a priority for this hardware than for hardware intended for data center applications.

There are also market-specific needs that must be addressed. In the case of Advanced Driver-Assistance System (ADAS) applications, the memory needs to meet stringent automotive qualification requirements to be tolerant of extreme temperatures which could otherwise potentially lead to dangerous failures. As 5G continues its roll out and autonomous cars move closer towards deployment, the market will witness an increase in the number of AI-powered devices performing complicated inferencing.

AI/ML solutions typically choose between three types of memory depending on their needs: on-chip memory, High Bandwidth Memory (HBM), and GDDR Memory. HBM and GDDR memory are the two highest-performing external memories, and have evolved over multiple generations in order to continue meeting the needs of the most demanding applications.

On-chip memory is the fastest memory available and has the best power efficiency, but is severely limited in capacity by comparison. On-chip memory is typically a good choice for systems with smaller neural networks that only do inferencing. HBM is ideally positioned for training, and GDDR6 is well suited for both training and inference for large neural network models and training sets.

Introduced in 2013, HBM is a high-performance, 3D-stacked DRAM architecture, which provides the needed high bandwidth and capacity in a very small footprint. Furthermore, by keeping data rates low and the memory closer to the processor, overall system power efficiency is improved.

The newest version of HBM memory, HBM2E, is compelling for many applications, but is more complex and pricier compared to more traditional DRAMs like GDDR. For many systems, this unmatched combination of capabilities outweighs other concerns, making HBM a great solution for AI/ML training.

GDDR SRAM, which was originally designed for the gaming and graphics market, gradually evolved since it was originally introduced over two decades ago. The current generation, GDDR6, supports data rates of up to 16 Gbps, allowing a single DRAM device to achieve 64GB/s of bandwidth.  

While HBM can provide better performance and power-efficiency for AI/ML, GDDR6 can reduce cost and implementation complexities, as it leverages existing infrastructure and processes. The main challenge for GDDR6 implementations is maintaining good signal integrity at much higher data rates used to transfer data between the SoC and the DRAM. For AI/ML inferencing in IoT devices (such as autonomous vehicles), GDDR6 offers a strong combination of performance, power-efficiency, capacity, cost and reliability, while also maintaining the more familiar and cheaper implementation approach.

It’s still the early days in this next chapter of the AI/ML revolution, and there’s no signs of slowing in the demand for more computing performance. Improvements to every aspect of computing hardware and software will be needed to maintain the historic improvements we’ve witnessed over the past 5 years. Memory will continue to be critical to achieving continued performance gains. For AI/ML training and inference, HBM2E and GDDR6 provide best-in-class performance for training and inference, and the memory industry is continuing to innovate to meet the future needs of these systems.

Steven Woo is Fellow and Distinguished Inventor at Rambus, a silicon IP and chip provider. He earned a PhD in electrical engineering from Stanford University.