Nvidia announced a third-generation artificial intelligence (AI) system on Thursday with 5 petaflops of compute power that is initially being used by Argonne National Lab to study the spread of the COVID-19 virus and to explore treatments and vaccines.
The performance of the new DGX A100 systems will help researchers at Argonne do “years’ worth of AI accelerated work in months or days,” said Rick Stevens, associate lab director for computing at Argonne, in a statement. The first DGX A100s were delivered to Argonne in early May.
Also, the University of Florida will receive DGX A100 systems to use across its entire curriculum, according to a statement from university president Kent Fuchs. Other early adopters include the Center for Biomedical AI in Germany, Chulalongkorn University in Thailand, Element AI in Montreal and several others.
Nvidia said there are already thousands of previous generation DGX systems in use around the globe that are used for autonomic vehicle AI, natural speech research and recommendation AI that is used by retailers and online search engines.
The DGX A100 has eight A100 Tensor Core GPUs that provide 5 petaflops of power with 320 GB in total GPU memory running at 12.4 Terabytes per second in bandwidth.
There are also six Nvidia NVSwitch interconnect fabrics and nine Nvidia Mellanox ConnectX-6 network interfaces for a total of 3.6 terabytes per second of bandwidth. Nvidia completed the purchase of Mellanox in April for $7 billion. Nvidia uses the Mellanox in-network acceleration engines for high performance in the DGX A100 as well.
Nvidia also provides the software for AI and data science workloads.
On a briefing with reporters, CEO Jensen Huang noted that the DGX is the first system to provide all the elements of machine learning, from data analytics to training to inference work. Many small workloads can be accelerated by partitioning the DGX A100 into as many as 56 computing instances to allow enterprises and researchers to conduct different AI functions on demand for diverse workloads.
Huang said the entire system starts at $199,000, noting that Dell Technologies, IBM and three other storage providers plan to integrate the DGX A100 into their products.
Huang illustrated how a single rack of DGX A100 systems costing $1 million could replace an entire data center doing AI training and inference costing $11 million, or less than 10% of the cost. The DGX A100s would also use just 4% of the space of the data center and just 5% of the electricity.
“This is unquestionably the first time that the unified workload of an entire data center has been brought into one rack for video analytics, voice, and data processing,” Huang said. “The amount of savings is actually off the charts.”
At the University of Florida, the arrival of DGX A100 happens as the university implements a plan to advance AI, research and teaching. UF plans to hire 100 faculty specifically focused on AI in addition to 500 new faculty hired across disciplines, many who will them integrate the AI into their teaching and research, the university said in a statement.
UF is already conducting AI research into areas such as the Intelligent ICU, which relies on video and image analysis of intensive care patients to recognize pain and sleep disruption. The university is also using AI to enable drug design for the COVID-19 virus and has research into an adaptive personalized online education system.