Nvidia’s robotic Project GROOT pairs Apptronik humanoid with gen AI

By Matt Hamblen Mar 22, 2024 5:01pm

Nvidia GTC2024 was alive with an expo hall full of vendors in the San Jose Convention Center offering variations on AI, including robots of the humanoid variety, including one version being piloted for use as a warehouse assistant at a Mercedes-Benz factory.

Apptronik, based in Austin, Texas, showed off a 160-pound, 5-foot-8 humanoid called Apollo that Mercedes-Benz is piloting for factory work bringing parts to the production line for workers to assemble, while also inspecting the components at a factory in Hungary, according to an official at GTC. Apollo stood at the edge of Nvidia’s large booth in the expo hall, alongside another humanoid called Digit from Agility Robotics.

Neither of the two humanoids were moving or walking on the show floor, but Nvidia identified them both along with several other versions from companies ready to take the next step toward Nvidia’s ambitious Project GROOT which is designed to help humanoids quickly learn new tasks from human demonstrations.

Onstage in his GTC2024 keynote, Nvidia CEO Jensen Huang separately introduced Project GROOT as a means of adding generative AI components to the company’s existing hardware and software platform for robots. The software includes significant upgrades to the Isaac robotics platform (available in second quarter) and works with a powerful Jetson Thor computer based on the Thor SoC. It will have sufficient horsepower to help humanoids perform complex tasks and interact with people and machines. The upgrades to Isaac include generative AI foundation models and tools for simulation and AI workflow infrastructure, Nvidia said. (Thor is also used in autonomous vehicle development, Huang said.)

GROOT features are designed to operate for any robot embodiment in any environment, including the ability to train the software to make better decisions, called reinforcement learning, Nvidia said.

GROOT stands for Generalist Robot 00 Technology, which will be used in robots to understand natural language and emulate movements by observing human actions to learn coordination, dexterity and skills to navigate and adapt to the real world. The Jetson Thor SoC includes a GPU based on Nvidia Blackwell architecture with a transformer engine that delivers 800 teraflops of 8-bit floating point AI to run multimodal general AI models like GROOT. It also has a CPU cluster and 100 GB of ethernet bandwidth.

Nvidia said it is building a comprehensive AI platform for various humanoid companies including 1X Technologies, Agility Robotics, Apptronik, Boston Dynamics, Figure AI, Fourier Intelligence, Sanctuary AI, Unitree Robotics and XPENG Robotics.

Apptronik CEO Jeffrey Cardenas with Apollo at GTC2024 (Matt Hamblen)

In a press release, Apptronik said GROOT will be integrated with Apollo to “enable developers to take text, video and human demonstrations as task prompts, learn generalizable skills like coordination and dexterity and generate actions as output on the robot hardware. Instead of simply repeating the actions in the training data, Apollo will recognize the environment and predict what to do next to achieve its goal.”

Toward the end of Huang’s keynote in a short video (at 1:51 into the keynote , Apollo operated a juicer and prepped the juice to serve to a person—skills it had learned. Apollo and other humanoids will learn via models and simulations in Omniverse Isaac Sim which is scaled out with Nvidia Osmo, a compute orchestration service that coordinates workflows across DGX systems for training and OVX systems for simulation. The humanoids will also learn via human demonstrations of tasks.

Connecting GROOT to a large language model allows it to generate motions by following natural language instructions. In one short example in the same video, an engineer verbally asks a humanoid called GR-1 to give him a high five, and the robot complies with the response, “Sure thing, let’s high five.” (Nvidia engineers Yuke Zhu and Jim Fan helped develop the technology, appearing at the Nvidia booth.)

Huang described on several occasions at GTC how generative AI will leap from responses in drawings, video, text and code based on human instructions in LLM to instructions recognized by robots.

“If a computer can speak, why can’t it animate a machine?” he asked in a session with reporters. To a computer, there’s no difference between offering up words or robotic movements. “To the computer, they’re both just numbers; it doesn’t know the difference, not even a little bit.”

More robots

Robots of other varieties were on display at GTC2024, including quadrupeds and mobile robots that would assume the role of forklifts in warehouse operations. Nvidia engineer David Hoeller directed a quadruped robot dog to climb over a three-foot-high box.

robotics GTC2024 Jensen Huang NVIDIA Electronics