Groq TSP blows away single chip record at 1 PetaOp/s

Groq believes its TSP architecture on silicon is many times faster than chips on the market today. It is currently being tested by customers. (Groq)

Start-up Groq announced a new processor architecture capable of 1 quadrillion operations per second on a single chip when implemented, marking a record. That’s a 1 followed by 15 zeroes.

The Mountain View, California, company said Thursday that its new Tensor Streaming Processor (TSP) supports up to 250 trillion floating point operations per second (FLOPS).

It is designed for deep learning inference processing requirements of computer vision, machine learning and artificial intelligence workloads, all of which are part of edge computing and IoT applications, especially those involving safety and a high degree of accuracy.

Fierce AI Week

Register today for Fierce AI Week - a free virtual event | August 10-12

Advances in AI and Machine Learning are adding an unprecedented level of intelligence to everything through capabilities such as speech processing and image & facial recognition. An essential event for design engineers and AI professionals, Engineering AI sessions during Fierce AI Week explore some of the most innovative real-world applications today, the technological advances that are accelerating adoption of AI and Machine Learning, and what the future holds for this game-changing technology.

Jonathan Ross, co-founder of Groq, said the architecture is many multiples faster than anything else available for inference work in terms of low latency and inferences per second. While top GPU companies have talked about 1 PetaOp/s (one peta operations per second) coming in a few years, Groq has already announced it and shipped samples to customers, he added in a statement.

Samples of silicon running the architecture have been tested by customers who have reported the top performance, Ross said. Those customers are running both x86 and non-x86 systems. A key feature of TSP is that software is used to implement execution planning instead of using valuable silicon real estate for dynamic instruction execution.

As a result, Groq claimed its architecture for TSP chips is less complex than traditional architectures for CPUs, GPUs and Field Programmable Gate Arrays.

Grog has posted a whitepaper describing the Tensor Streaming architecture.

The company is showing the architecture at Supercomputing Denver Nov. 17-21.

RELATED: IBM challenges Google’s quantum superiority claim

Suggested Articles

Power management is critical for many products. One expert advises pushing complexity to areas where energy is less of a concern.

HP leads the pack, but Apple sees 36% surge in notebooks, desktops compared to a year ago

New York City residents moving upstate will give hundreds of towns extra tax revenue to invest in new streetlights with IoT sensors, NYPA foresees