Nvidia says it has a quicker, cheaper way to train robotic hands

While robots increasingly can walk and run and climb over obstacles with amazing and disconcerting ease, it has been much harder for robotic systems to imitate the fine motor skills that the human hand is capable of, and which most of us take for granted.

Deep Reinforcement Learning (RL) techniques, which can train a neural network to learn by trial and error how to control the joints of a robot, offer a method to advance these capabilities. However, there are 27 joints in the human hand, and billions of samples are needed to run Deep RL for such a project, making it impractical.

Nvidia believes Issac Gym, which the company has described as a GPU-accelerated “physics simulation environment for reinforcement learning,” can help. The simulation environment is related to the company’s Isaac Sim robotics simulator, and Nvidia said this week that Isaac Gym, now available in preview release, enables robots to be trained inside a simulated universe that can run more than 10,000 times faster than the real world.

Researchers from Nvidia convened the DeXtreme project to use Isaac Gym to help teach a robotic hand how to manipulate a cube to match a provided target position and orientation or pose. The neural network brain learned to do this entirely in the Issac Gym simulation environment before the project advanced to physically teaching the robotic hand in the real world.

Nvidia noted that OpenAI researchers have been able to accomplish something similar, but Nvidia also observed that OpenAI’s project required a supercomputing cluster of hundreds of computers for training purposes, as well as a far more sophisticated and expensive robot hand, and a cube tricked out with precise motion control sensors. The robotic hand that Nvidia used, an Allegro hand, was chosen for its affordability, the fact that it has four fingers instead of five and no moving wrist. The company made these attributes a priority because it wants as many other researchers as possible to be able to affordably replicate the efforts of the DeXtreme project.

An Nvidia blog post noted that the GPU-accelerated simulation comes at much less costs than if a CPU cluster is used for the simulation. The post added that the company’s PhysX software “simulates the world on the GPU, and results stay in GPU memory during the training of the deep learning control policy network. As a result, training can happen on a single Omniverse OVX server. Training a good policy takes about 32 hours on this system, equivalent to 42 years of a single robot's experience in the real world.”