AMD touts VCK5000 performance improvements, claims it TOPS Nvidia for AI workloads

AMD is claiming a 3x performance improvement for its VCK5000 Versal development card for data center AI, along with boasting that it has virtually eliminated the “dark silicon” issue that limits what can be achieved with AI chips.

The performance improvement and the card’s ability to use more of its available operations capacity compared to competing products means the previously-launched VCK5000 is ideal for increasingly important AI inference needs, according to Nick Ni, director of product marketing for AI and software solutions for the Adaptive and Embedded Computing Group at AMD, which includes former Xilinx operations (AMD recently completed its acquisition of Xilinx, though the VCK5000 was launched by Xilinx long before the deal closed.)

Regarding the company’s ability to triple the performance of the VCK5000, Ni told Fierce Electronics, “It's actually part of the why we have an adaptive computing platform, so we can continue to improve the hardware architecture as well as software runtime to achieve more on the same network. We triple the performance, but at the same power and our same price.” 

At the same time, Ni said the VCK5000 is getting close to achieving “zero dark silicon,” or the notion of using closer to 100% of the chip’s performance capacity to ensure maximum performance and computing efficiency in AI ResNet-50 neural network and inference applications.

Chip performance often is measured in trillions of operations per second (TOPS), although there has been some question, as raised by a 2020 Venture Beat story, if it’s really the best way of measuring AI chip performance, especially at a time when efficiency metrics like performance-per-watt and performance-per-dollar ar becoming more important.

Ni said his group has been able to improve the VCK5000’s TOPS efficiency mark to 90% for real AI workloads, meaning that only 10% of the card’s computing capability is not being leveraged. He said this is well ahead of the TOPS efficiency of Nvidia’s A100, A30, A10 and T4 products. For example, the A100 achieves actual TOPS of 42%, meaning that 58%of the GPU’s computing capability is not used, according to a slide Ni displayed while talking to Fierce Electronics.

Steve Leibson, principal analyst and partner at Tirias Research, saw the same slide, and noted in an email to Fierce Electronics that “the GPUs in the table are not fully loaded by the ResNet-50 workload being run. I think users care more about how fast the application runs overall rather than how much of the silicon is being exercised, but efficiency contributes to superior performance/watt and performance/dollar figures… Because of their programmable logic fabrics, FPGAs and ACAPs can achieve something closer to their maximum potential when properly tailored for a specific workload like ResNet-50.”

Asked about the validity of TOPS as a metric, Leibson added, “The very best metric or benchmark for any processor, AI or otherwise, is the application or workload you will run on the device. That means that standardized benchmarks can give you a relative feel for performance, but you only know for sure when you run your target application on the processor.”

He said that while TOPS is used as an easily derived proxy for performance, “Peak TOPS is likely to not be the same as actual TOPS for real workloads. Different customers prefer different metric preferences. One may prefer the fastest execution for specific workload(s), another may prefer the best performance/watt, and yet another may prefer the best performance/dollar.”

Regarding per-watt and per-dollar performance, Leibson said AMD has shown that the VCK5000 “really delivers in the performance/watt and performance/dollar categories against the specified competitors, but it's not the fastest overall solution at any price” from among the products Ni compared it to. “The VCK5000's relatively low power consumption and price give it an advantage when combined with its good performance.”

Dark silicon defined

It should be noted that AMD’s use of the term “dark silicon” does not exactly track to the industry’s typical use of the term.

The Wikipedia definition of dark silicon, Leibson pointed out, “is the amount of circuitry of an integrated circuit that cannot be powered-on at the nominal operating voltage for a given thermal design power (TDP) constraint." 

Leibson added, “That Wikipedia definition is consistent with the way I use the term. It's a way to ensure that the chip doesn't melt down by making sure that all of the functions etched into the chip are never powered up at the same time. This is not the way AMD-Xilinx have used the term for the VCK5000 launch.”

Ni acknowledged this, saying that “dark silicon’ was typically used more in a thermal power type of context… You pay for this much silicon, but maybe half the functions are turned off. That is the origin of dark silicon and that terminology is really well accepted in the industry. Now, we're kind of piggybacking on that terminology to apply this to AI… where everyone is trying to get maximum TOPS.” 

RELATED: AMD's $35B purchase of Xilinx clears all regulatory reviews