Bringing pedestrian detection for autos to the next level

Studies from the National  Highway  Traffic  Safety  Administration  (NHTSA)  show that human  error  (e.g., speeding,  fatigue,  and  drunk  or  distracted  driving)  causes  94  to  96 percent  of  all  motor  vehicle  accidents. Therefore, equipping cars with advanced driver assistance services (ADAS) that anticipate and mitigate people’s driving errors will likely reduce the number of traffic fatalities.

In pursuit of ever more powerful Advanced Driver-Assistance Systems (ADAS) features, industry and academia are looking into the potential of multiple sensor technologies and sensor suite configurations – including video, ultrasound, radar, and lidar. But since no one system covers all needs, scenarios, and (traffic/weather) conditions, the next generation of powerful ADAS will likely stem from a combination of technologies.

In this article, Jan Aelterman and David Van Hamme from IPI (Imec’s Image Processing and Interpretation research group at Ghent University, Belgium) report on two of the latest breakthroughs that bring pedestrian detection for automotive to the next level: radar-video sensor fusion and automatic tone mapping.

Radar and video: a particularly interesting sensor fusion match

The ability of tomorrow’s cars to detect road users and obstacles rapidly and accurately will be instrumental in reducing the number of traffic fatalities. Yet, no single sensor or perceptive system covers all needs, scenarios, and traffic/weather conditions.

Cameras, for instance, do not work well at night or in dazzling sunlight; and radar can get confused by reflective metal objects. But when combined, their respective strengths and weaknesses perfectly complement one another. Enter radar-video sensor fusion.

Sensor fusion enables the creation of an improved perceptive (3D) model of a vehicle’s surroundings, using a variety of sensory inputs. Based on that information and leveraging deep learning approaches, detected objects are classified into categories (e.g., cars, pedestrians, cyclists, buildings, sidewalks, etc.). In turn, those insights are at the basis of ADAS’ intelligent driving and anti-collision decisions.

Cooperative radar-video sensor fusion: the new kid on the block

Today’s most popular type of sensor fusion is called late fusion. It only fuses sensor data after each sensor has performed object detection and has taken its own ‘decisions’ based on its own, limited collection of data. Late fusion comes with the main drawback that every sensor throws away all the data it deems irrelevant. As such, a lot of sensor fusion potential is lost. In practice, it might even cause a car to run into an object that has remained under a single sensor’s detection threshold.

In contrast, early fusion (or low-level data fusion) combines all low-level data from every sensor in one intelligent system that sees everything. Consequently, however, it requires high amounts of computing power and massive bandwidths – including high-bandwidth links from every sensor to the system’s central processing engine.

 

                                                     

 

Fig 1: Late fusion: sensor data are fused after each individual sensor has performed object detection and has drawn its own ‘conclusions’. Source: imec.

 


Fig 2: Early fusion builds on all low-level data from every sensor – and combines those in one intelligent system that sees everything. Source: imec.

 

In response to these shortcomings, the concept of cooperative radar-video sensor fusion has been developed. It features a feedback loop, with different sensors exchanging low-level or middle-level information to influence each other’s detection processing. If a car’s radar system suddenly experiences strong reflection, for instance, the threshold of the on-board cameras will automatically be adjusted to compensate for this. As such, a pedestrian that would otherwise be hard to detect will effectively be spotted – without the system becoming overly sensitive and being subject to false positives.

A 15% accuracy improvement over late fusion in challenging traffic & weather conditions  

Studies conducted in the course of last year already showed that cooperative sensor fusion outperforms the late fusion method commonly used today. On top of that, it is easier to implement than early fusion since it does not come with the same bandwidth issues and practical implementation limitations.

Concretely, evaluated on a dataset of complex traffic scenarios in a European city center, cooperative sensor fusion showed to track pedestrians and cyclists 20% more accurately than a camera-only system. What is more, the first moment of detection proved to outperform competitive approaches by a quarter of a second.

And over the past months, the system has been finetuned even more – improving its pedestrian detection accuracy even further, particularly in challenging traffic and weather conditions.

When applied to easy scenarios – i.e., in the daytime, without occlusions, and for not too complex scenes – the cooperative sensor fusion approach now comes with a 41% accuracy improvement over camera-only systems and a 3% accuracy improvement over late fusion.

But perhaps even more important is the progress that has been made in the case of bad illumination, pedestrians emanating from occluded areas, crowded scenes, etc. After all, these are the instances when pedestrian detection systems really have to prove their worth. In such difficult circumstances, the gains brought by cooperative radar-video sensor fusion are even more impressive, featuring a 15% improvement over late fusion.

 

chart 1 imec
chart 2 imec

Fig 3: Comparing the F2 scores of various pedestrian detection methods, both in easy scenarios (graph on the top) and more challenging circumstances (graph on the bottom). The F2 score allows to assess the systems’ accuracy objectively, with a high weight being attributed to miss rates (false negatives). In both scenarios, cooperative radar-video sensor fusion outperforms its camera-only and late fusion contenders. Source: imec.

 

Significant latency improvements

When it comes to minimizing latency, or tracking delay, a lot of progress has been made as well. For example, in difficult weather and traffic conditions, a latency of 411ms is achieved. That is more than a 40% improvement over the latency that comes with camera-only systems (725ms), and the one that comes with late fusion (696ms).