Huang posits Nvidia has moved well beyond GPUs, but isn't a CSP

Nvidia GTC2024 introduced new chips, software and important advances to how AI works, moving the nearly 31-year-old company originally devoted to 3D graphics (mainly for gaming) to a broadened AI platform.

Nvidia CEO Jensen Huang loves to speak in metaphors, and while he is not a modern-day Dylan Thomas, he is always engaging, while sufficiently professorial. His keynote before 11,000 developers in person at the SAP Center was almost a rock concert. He also made revealing remarks in a separate wide-ranging press conference the next day with hundreds of international reporters and another interview with Jim Cramer on CNBC. Some of his commentary offered new definitions of technology terms with a focus on his trademark precision and where Nvidia sits in the emerging AI world.

1. Nvidia is a software company, Huang said. Or more precisely, he told Cramer, “Everything we do starts with software, in the service of software developers and solving these really difficult problems.” In his keynote, he said, “We are effectively an AI foundry,” drawing on the metaphor of melting metals into something stronger and better (and providing a little nudge at Intel Foundry, perhaps?)  “Today, our killer app is AI.”

Huang argues you have to have the software to create the need for all the GPUs and other components that Nvidia makes and uses to be able to lead the market for AI chip accelerators. This is perhaps counter-intuitive, but essential to everything being done with hardware. Many reporters and analysts in the AI space don’t prefer to cover software, maybe because it is often soft-edged and flexible, while a computer processor is hard-edged, concrete and defined by speeds and feeds and dimensions in space and is something you can hold in the hand.

So no, Nvidia is not just a software company. It is a far bigger, offering hardware platforms like the powerful new Blackwell platform with its various GPU versions and advanced with services like the new NIM that is available across hardware. Nvidia is more properly a “computing platform company,” he said at various points at GTC. Still, as a reminder, Huang harkened back to CUDA, a parallel computing platform and programming model for GPUs that Nvidia introduced all the way back in 2006.

But there are some things Nvidia is not…

2. Nvidia is not a cloud service provider. This statement might seem obvious, but maybe not to everyone. If the hyperscalers are designing custom AI accelerator chips, which is also what Nvidia does, could it be that Nvidia is positioning itself to be a cloud provider?, one reporter asked Huang. His response, boiled down for clarity, went like this:  “We work with the cloud service providers to put the Nvidia cloud in their cloud. We’re not a cloud computing company. Our cloud is called DGX Cloud, but we’re actually in their cloud…Our goal is to bring customers to their cloud.”

He elaborated that in addition to HGX for Dell servers, Nvidia is bringing DGX Cloud to CSPs like Azure.  “We will develop the software and we will develop and cultivate developers and we will create demand for the CSPs that use our architecture. And it has nothing do with anybody’s chips. It has everything to do with Nvidia being a computing platform company. A computing platform company has to develop its own developers. That’s the reason why GTC exists-- a developers conference.”

Just in case you were wondering.

(Some analysts also see the creation of NIM runtime that runs on Nvidia hardware as a play to collect fees for software, bringing it perhaps closer to some of what a CSP does.)

3.  Huang's take on tokens and data centers.  As a good science teacher, Huang stopped in his keynote to describe what’s meant by a token with a fun example.  But he also redefined “data center.” Going from general-purpose computing to accelerated computing, generative AI has come about: “Because of generative AI, a new type of instrument has emerged and this new instrument is an AI generator. Some people call it a data center, but a data center is used by a whole lot of people…But in the case of generative AI, it is doing one thing. It’s processing one thing for one person, one company and it’s producing AI. It’s producing tokens.

“When you’re interacting with ChatGPT, that’s revolutionary AI and when you’re interacting with Chat GPT, it’s generating tokens. It’s generating floating point numbers and these floating point numbers become words, images, or sounds in the future. Proteins, chemicals, computer-animated machines and robots animating a machine are no different than speaking. If a computer can speak, why can’t it animate a machine? That’s why we say there is an industrial revolution happening, because it’s new. And this new industry creates rooms, these buildings. I call them AI factories.”

In computer programs, a token is usually defined as the smallest element in a program that is meaningful to the functioning of a compiler, potentially every punctuation or word you come across. Huang gave a couple of examples, meant partly to be amusing, that seem to show he describes a token a little differently.  “Three tokens is about a word. ‘Space: the final frontier’ is 80 tokens,” he said.  He then smiled and cracked that maybe the audience of 11,000 developers and others didn’t relate to the older Trekkie reference.

4. For many people it is hard to see how generative AI can be applied to robots.  Huang explained a critical point in how a generative AI tool can move from responding to a human’s question to becoming the means of animating a robot or other machine.  “If a computer can speak, why can’t it animate a machine?” he asked.  In other words, if a computer is trained on many millions of books written in English at the rate of one book ingested in a split second, what it has done is convert that text into a digital form, a group of numbers. Words and sentences and logical thought converted to digits use the same digits in a different sequence to control the movement of a robot arm or the steering in an autonomous car. 

This transition from a LLM being able to ingest a book for helping answer inquiries in the inference stage to being able to be the basis of controlling a robot is complex, but here’s how Huang explained the process when asked:  “The large language model is operating in a completely unconstrained world. It’s the unstructured world, which is one of the problems. It learned from a lot of text. So the ability for large language models, these foundation models, to generalize is the magic.” 

He then described how a user with LLM might give context to a model in how to help a robot prep an omelet. “You specify the problem; you specify the context. These are the only utensils you can work with, and you don’t have butter, ok? And…everything is in the fridge. You describe the context just like you’re doing with large language models and this robot should be able to generalize sufficiently if you apply some of the magic that you’ve already seen with ChatGPT. That’s what I mean by the ChatGPT moment might be around the corner. There’s still a lot of great [things] that have to be solved, but you can see the extrapolation of it and that this robot can generate tokens and the tokens that the robots generate with the largest language model.”

He continued to explain that computer scientists will tokenize all the gestures a robot might make. “Once they tokenize all of these gestures, they will generalize it just as you’ve tokenized the words, generalized it, contextualized it. And then the last part is grounding it [with] human feedback in ChatGPT. You would give it a whole bunch of examples, Q&A, and examples of appropriate answers in philosophy, in chemistry, in math and just human-appropriate Q and A pairs that are really, really well-crafted and they’re not normal. It’s really hard work what they’ve done in ChatGPT. Now, what’s the analogy here?

“Human examples. So, let me show you how to make coffee. It’s a very well-articulated example that the robot then says, oh, I get it. Well, let me generalize that. Do you mean if I move this a little bit here that is still same activity? Yep. Make coffee.

“The only reason why we can’t see the [analogy of ChatGPT to robotic movements] is because somehow in our brain we can’t disassociate the difference between words and robotics movements. That’s the only reason. That’s the only blockage. And if I told you, to the computer, they’re both just numbers; it doesn’t know the difference, not even a little bit. Then all of a sudden you say, wow, that’s interesting. It might be possible.”

5. Potential perils of AI did not come out in Huang’s keynote, but one reporter wondered if the Blackwell platform might be so powerful that it could help speed up the arrival of AGI (Artificial General Intelligence), which Huang has previously said could arrive in five years. Jim Cramer called Huang the modern-day DaVinci, but the reporter also mused if Huang could also be the modern-day Oppenheimer.  Huang’s response was classic, but read past his initial retort for his insight:

“Oppenheimer was building a bomb. We’re not doing that. First of all define AGI, because right now…I’m certain everybody is having a hard time trying to do that.” He went on to say that if AGI is defined in a very specific way, it could reach AGI “probably” in five years.

Here’s the context he provided:

“I would like you to define an AGI specifically so that every one of us knows when we have arrived. Like for example, define where is Santa Clara? Its geospatial location is so specific. All of you know when you've arrived. Define New Year. All of us know when New Year arrives even based on our time zone that we know we have arrived. Do you guys agree? Okay, but AGI is a little different. However, if I specified, if we specified AGI to be something very specific, meaning a large collection of tests, math tests, reading tests, reading comprehension tests, logic tests, medical exams, bar exams, economy tests, can you guys see this? Okay, GMATs, SATs, you name it—a bunch of tests. If I take a bunch of tests, I collect up all these tests, and I say the definition of AGI is when this set of tests a software program can do very well, meaning let's pick 80% and better than most people, better than in fact almost all the people.

“If I say that, do you think a computer can do that in five years? The answer is probably yes. And so that specification of AGI, every time I answer this question, I specify it, but every time it gets reported, nobody specifies it. And so it just depends on what your goals are. My goal is to communicate with you. Your goal is to figure out what story you want to tell. Okay? And so, I believe that AGI, as I specified it, is probably within five years. AGI as in the three words, artificial general intelligence… I have no idea how we specify each other. That's why we have so many different words for each other's intelligence.”

RELATED: GTC2024: Jensen unveils AI's next phase: Blackwell GPU, Humanoid GROOT, NIM runtime