Researchers Make Robotic Smart Picking Accessible

As another step toward enabling robots to work effectively in complex environments, robotics researchers from NVIDIA have developed a novel deep learning-based system that allows a robot to perceive household objects in its environment for grasping the objects and interacting with them. With this technique, the robot can perform simple pick-and-place operations on known household objects, such as handing an object to a person or grasping an object out of a person’s hand.


The research allows robots to precisely infer the pose of objects around them from a standard RGB camera. Knowing the 3D position and orientation of objects in a scene, often referred to as 6-DoF (degrees of freedom) pose is critical, as it allows robots to manipulate objects even when those objects are not in the same place every time.

Fierce AI Week

Register today for Fierce AI Week - a free virtual event | August 10-12

Advances in AI and Machine Learning are adding an unprecedented level of intelligence to everything through capabilities such as speech processing and image & facial recognition. An essential event for design engineers and AI professionals, Engineering AI sessions during Fierce AI Week explore some of the most innovative real-world applications today, the technological advances that are accelerating adoption of AI and Machine Learning, and what the future holds for this game-changing technology.


The algorithm aims to solve a disconnect in computer vision and robotics, namely, that most robots currently do not have the perception they need to be able to handle disturbances in the environment. This work is important because it is the first time in computer vision that an algorithm trained only on synthetic data (generated by a computer) can beat a state-of-the-art network trained on real images for object pose estimation on several objects of a standard benchmark. Synthetic data has the advantage over

real data in that it is possible to generate an almost unlimited amount of labeled training data for deep neural networks.


“We want robots to be able to interact with their environment in a safe and skillful manner,” said Stan Birchfield, a Principal Research Scientists at NVIDIA. “With our algorithm, and a single image, a robot can infer the 3D pose of an object for grasping and manipulating it,” he explained.


“Most industrial robots being sold today lack perception, they don’t really have a sense of the world around them,” Birchfield explained. “We’re laying the groundwork for the next generation robot, and we’re a step closer to collaborative robots with this work.”


Using NVIDIA Tesla V100 GPUs on a DGX Station, with the cuDNN -accelerated PyTorch deep learning framework, the researchers trained a deep neural network on synthetic data generated by a custom plugin developed by NVIDIA for Unreal Engine. This plugin is publicly available for other researchers to use.


“Specifically, we use a combination of non-photorealistic domain randomized (DR) data and photorealistic data to leverage the strengths of both,” the researchers stated in their paper. “These two types of data complement one another, yielding results that are much better than those achieved by either alone. Synthetic data has an additional advantage in that it avoids overfitting to a particular dataset distribution, thus producing a network that is robust to lighting changes, camera variations, and backgrounds,” the team explained.

Example images from the domain randomized (left) and photorealistic (right) datasets used for training.


The NVIDIA team was comprised of researchers Jonathan Tremblay, Thang To, Balakumar Sundaralingam, Yu Xiang, Dieter Fox, and Stan Birchfield. More insights are available by reading their paper titled, “Deep Object Pose Estimation for Semantic Robotic Grasping of Household Objects” and checkout the video.


Suggested Articles

Hydrogen refueling stations are limited in the U.S., restricting interest in use of fuel cell electric cars

Silicon Labs is providing the BT module needed for detecting proximity with another Maggy device

Test automation won't fix everything, but can help, according to an automation engineer. Here are five problems to avoi to improve chances of success