Groq TSP blows away single chip record at 1 PetaOp/s

By Matt Hamblen Nov 14, 2019 4:08pm

Start-up Groq announced a new processor architecture capable of 1 quadrillion operations per second on a single chip when implemented, marking a record. That’s a 1 followed by 15 zeroes.

The Mountain View, California, company said Thursday that its new Tensor Streaming Processor (TSP) supports up to 250 trillion floating point operations per second (FLOPS).

It is designed for deep learning inference processing requirements of computer vision, machine learning and artificial intelligence workloads, all of which are part of edge computing and IoT applications, especially those involving safety and a high degree of accuracy.

Jonathan Ross, co-founder of Groq, said the architecture is many multiples faster than anything else available for inference work in terms of low latency and inferences per second. While top GPU companies have talked about 1 PetaOp/s (one peta operations per second) coming in a few years, Groq has already announced it and shipped samples to customers, he added in a statement.

Samples of silicon running the architecture have been tested by customers who have reported the top performance, Ross said. Those customers are running both x86 and non-x86 systems. A key feature of TSP is that software is used to implement execution planning instead of using valuable silicon real estate for dynamic instruction execution.

As a result, Groq claimed its architecture for TSP chips is less complex than traditional architectures for CPUs, GPUs and Field Programmable Gate Arrays.

Grog has posted a whitepaper describing the Tensor Streaming architecture.

The company is showing the architecture at Supercomputing Denver Nov. 17-21.

semiconductors supercomputer Groq