Microsoft Azure extols virtues of Nvidia A100 chips for AI work

By Matt Hamblen Aug 19, 2020 5:08pm

Microsoft Azure on Wednesday announced its first use of the Nvidia A100 GPU, appearing in a virtual machine series that offers up to 20 times more performance for artificial intelligence supercomputing work.

AI brawn has led to recent innovations, including natural language generation with enough creativity to write poetry. At Microsoft, natural language models already power various tasks in Bing, Word, Outlook and Dynamics.

In a blog, Microsoft called the new ND A100 v4 VM series “our most powerful and massively scalable AI VM, available on-demand from eight to thousands of interconnected Nvidia GPUs across hundreds of virtual machines.”

Google Cloud made A100 available for customers in early July and more than 50 A100-powered servers were announced in June from ASUS, Cisco, Dell Technologies, HP Enterprise and others.

Not to be outdone, Azure said its new A100 v4 clusters can scale to thousands of GPUs with an “unprecedented 1.6 Tb/s of interconnect bandwidth per virtual machine.” That means that thousands of GPUs can work together as part of a Mellanox Infiniband HDR cluster “to achieve any level of AI ambition,” exclaimed Ian Finder, senior program manager for accelerated HPC Infrastructure at Azure. (Editor’s Note: Any level, really?)

Most customers will see a boost of up to 3x compute performance over the previous system, Finder added. A boost of up to 20x is possible with new A100 features like multi-precision Tensore Cores with sparsity acceleration and Multi-Instance GPU (MIG).

He noted that the advantage of using such large models is that they need to be trained once with massive amounts of data using AI supercomputing, then fine-tuned for different domains with smaller datasets. Such an approach can be used in natural language generation so that a model has the ability to understand language well enough to summarize a document or answer questions from it when seen for the first time, Finder said.

Azure has been supporting OpenAI its quest to build safe artificial general intelligence. In May, OpenAI showed the ability to use AI in a GPT-3 model to support tasks not specially trained for, such as writing poetry or translation. GPT-3 has even been shown to generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. (NOTE: This article was written by a human.)

OpenAI CEO Sam Altman said the AI quest toward general intelligence requires powerful systems that train more capable models. “The computing capability required was just not possible until recently,” he said in a statement, crediting Azure AI with helping accelerate OpenAI’s progress.

Azure’s ND A100 v4 VM series and clusters are in preview and will become a standard offering, although no timeline was announced.

artificial intelligence semiconductors Azure Google GPU NVIDIA open source Power Management Sensors