Using a robot as a last-mile delivery vehicle may become a reality if the robot can find the door. Standard robotic navigation approaches involve mapping an area ahead of time, then using algorithms to guide a robot toward a specific goal or GPS coordinate on the map. While this approach might make sense for exploring specific environments, it can become unwieldy in the context of last-mile delivery.
Now, MIT engineers have developed a navigation method that doesn’t require mapping an area in advance. Instead, their approach enables a robot to use clues in its environment to plan out a route to its destination, which can be described in general semantic terms, such as “front door” or “garage,” rather than as coordinates on a map. For example, asking a robot to deliver a package to someone's front door would require recognizing a sidewalk, a driveway, and then a front door. The technique can greatly reduce the time a robot spends exploring a property before identifying its target, and it doesn’t rely on maps of specific residences.
“We wouldn’t want to have to make a map of every building that we’d need to visit,” said Michael Everett, a graduate student in MIT’s Department of Mechanical Engineering, in an article appearing on the university’s website. “With this technique, we hope to drop a robot at the end of any driveway and have it find a door.”
Everett will present the group’s results this week at the International Conference on Intelligent Robots and Systems. The paper, which is co-authored by Jonathan How, professor of aeronautics and astronautics at MIT, and Justin Miller of the Ford Motor Company, is a finalist for “Best Paper for Cognitive Robots.”
The scientists are banking on a recent trend of using natural, semantic language to train robot systems to recognize objects and locations, by leveraging pre-existing algorithms that extract features from visual data to generate a new map of the same scene, represented as semantic clues, or context. For instance, a door would be visually processed as a door and not simply as a solid, rectangular obstacle.
In their case, the researchers used an algorithm to create a map of the environment as the robot moved around, using the semantic labels of each object and a depth image. This algorithm is called semantic SLAM (Simultaneous Localization and Mapping).
While other semantic algorithms have enabled robots to recognize and map objects in their environment for what they are, they haven’t allowed a robot to make decisions in the moment while navigating a new environment, on the most efficient path to take to a specific semantic destination.
The researchers developed a new “cost-to-go estimator,” an algorithm that converts a semantic map created by preexisting SLAM algorithms into a second map, representing the likelihood of any given location being close to the goal.
“This was inspired by image-to-image translation, where you take a picture of a cat and make it look like a dog,” Everett said. “The same type of idea happens here where you take one image that looks like a map of the world, and turn it into this other image that looks like the map of the world but now is colored based on how close different points of the map are to the end goal.”
This cost-to-go map is colorized, in gray scale, to represent darker regions as locations far from a goal, and lighter regions as areas that are close to the goal. The researchers trained this algorithm on satellite images from Bing Maps containing 77 houses from one urban and three suburban neighborhoods. The system converted a semantic map into a cost-to-go map, and mapped out the most efficient path, following lighter regions in the map, to the end goal.
During training, the team also applied masks to each image to mimic the partial view that a robot’s camera would likely have as it traverses a yard. “Part of the trick to our approach was [giving the system] lots of partial images,” How explains. “So, it really had to figure out how all this stuff was interrelated. That’s part of what makes this work robustly.”
The researchers then tested their approach in a simulation of an image of an entirely new house, outside of the training dataset, first using the preexisting SLAM algorithm to generate a semantic map, then applying their new cost-to-go estimator to generate a second map, and path to a goal, in this case, the front door.
The cost-to-go technique found the front door far faster than classical navigation algorithms, which do not take context or semantics into account.
Everett said the results illustrate how robots can use context to efficiently locate a goal, even in unfamiliar, unmapped environments.
“Even if a robot is delivering a package to an environment it’s never been to, there might be clues that will be the same as other places it’s seen,” Everett says. “So, the world may be laid out a little differently, but there’s probably some things in common.”