How to decide on an Ethernet protocol for your data center

Modern applications are pushing storage technology to be faster, more efficient and extremely scalable so data can be processed as quickly and as seamlessly as possible.

In the data center world, these goals often come down to the networking protocol used to move data between applications and environments. Different protocols have different speeds and capabilities that dictate how fast, reliable or efficient data movement will be.

One such technology that has transformed the landscape of storage and storage protocols is Non-Volatile Memory Express (NVMe™). NVMe has not only solved critical challenges within the data center but has also become the go-to standard for modern storage solutions.

Utilizing the advantages that NVMe attached storage provides - namely high performance, efficiency and scaling while separating storage from compute using NVMe over fabrics (NVMe-oF™) often makes sense. This disaggregated approach allows storage to be shared across multiple servers for more flexibility, better resource utilization, and more efficient scaling overall. However, to really gain the most benefits from disaggregation, how do you know which storage protocol is the best for a particular application or use case?

There are multiple NVMe-oF protocols to choose from that extend NVMe functions over a network fabric. Fibre Channel and InfiniBand™ are options, but this article will compare two of the most common Ethernet fabrics: RDMA (Remote Direct Memory Access) over Converged Ethernet (RoCE) and NVMe/TCP (NVMe over Transmission Control Protocol).  

RoCE is a network protocol and is emerging as one of the dominant network protocols for disaggregated storage in the data center, but advancements in TCP/IP (Internet Protocol) performance and scalability combined with its ubiquitousness in the networking world are making NVMe/TCP an intriguing option for organizations looking for a plug-and-play option.

Both can play a role in meeting the dynamic storage needs of modern applications and can help organizations create open, composable pools of storage in the data center that can be allocated on demand as needed. This ability to create pools of shared storage and the flexibility to support multiple protocols is key to enabling modern applications, ensuring they have quick, seamless access to the data they need without adding complexity to the IT stack.

RDMA with RoCE

In the enterprise sector, Ethernet is by far the most popular transport technology.

RDMA is a technology that allows data to be transferred directly between the memory of two systems without involving the CPU or OS, reducing latency and CPU overhead in data transfer. NVMe-oF RDMA uses RDMA protocols, such as RoCE, to enable high-performance, low-latency access to NVMe storage devices over the network. It is suitable for applications that require ultra-low latency and high throughput, such as high-performance computing (HPC), data analytics, and real-time databases.

However, when implementing RDMA with RoCEv2, there is the potential for the need to upgrade network infrastructure to allow for the support of certain lossless mechanisms and their implementation. This may add cost and complexity.

The term "lossless" means setting up the network in such a way that there is no packet loss during data transmissions. Lossless transmission is critical in environments where RDMA is utilized because RDMA's efficiency is highly sensitive to packet loss. To achieve a lossless RDMA network, technologies like Priority Flow Control (PFC) and Explicit Congestion Notification (ECN) are employed.

NVMe/TCP

NVMe/TCP uses standard TCP/IP networking protocols to transport NVMe commands and data over existing Ethernet networks – effectively extending the high-performing NVMe protocol across the entire data center using existing infrastructure and standard drivers.

NVMe/TCP can be used at every scale, even outside the data center (such as for edge deployments) with few or no changes to the network configuration. The flexibility of TCP makes NVMe/TCP a compelling transport option. Users may see additional microseconds of latency but then, will that be relevant to those where slight delays in data processing or communication do not have critical or immediate consequences?

TCP is also ubiquitous across the data center. It’s well understood and it’s highly scalable. In addition, there is a huge ecosystem of vendors in the TCP world that have market incentives to continue to make major investments in improving its performance capabilities. And while RoCE provides better performance and lower latency today, that gap is likely to shrink significantly as new innovations in NVMe/TCP hit the market.

Protocol of Choice

The choice of protocol isn’t necessarily a straightforward answer and depends on multiple factors. The specific requirements of the storage environment, including performance, latency, cost, and existing infrastructure interoperability all play their role.

Here is a high-level side-by-side comparison on the advantages of each. For more information, there is also this whitepaper, which goes into more detail.

ethernet protocols compared in chart form

Choosing the Right Storage Solution to Match

So, how do you prepare for the future while meeting the storage needs of today? Choose a vendor that offers NVMe solutions and storage platforms that agnostically support RoCE and NVMe/TCP so you have a choice. This will allow you to extend the high performance of     NVMe-oF in shared storage environments without having to rip and replace existing TCP infrastructure.

Providing storage resources in an open composable infrastructure environment across either of these two protocols allows data to be quickly and seamlessly shared between applications – effectively meeting the performance, efficiency and scalability needs of today’s modern applications. The flexibility that a protocol-agnostic storage strategy allows organizations to meet today’s needs while keeping open the option to meet the needs of tomorrow’s applications.

Niall Macleod is director of applications engineering at Western Digital focused on data center storage and specializing in NVMe-oF disaggregated storage architectures.