A self-driving car must exactly track the movement of bicycles, pedestrians, and other vehicles around it to ensure safety. A new technique designed at the Carnegie Mellon University (CMU) could help train such tracking systems more effectively.
Improved results can be obtained with the high road and traffic data available for training tracking systems. Moreover, researchers at CMU have discovered a method to open a wealth of autonomous driving data for this purpose.
Our method is much more robust than previous methods because we can train on much larger datasets.
Himangi Mittal, Research Intern, Carnegie Mellon University
Mittal works with David Held, an assistant professor in CMU’s Robotics Institute.
A majority of the autonomous vehicles navigate mainly based on a sensor known as lidar, a laser device that produces 3D data on the environment around the car.
This 3D information does not include images but a cloud of points. The vehicle interprets this information through a method called scene flow. In this method, the speed and trajectory of every 3D point are calculated. Groups of points that move jointly are interpreted as pedestrians, vehicles, or other moving objects by using the scene flow.
Previously, advanced techniques for training such a system have necessitated the use of labeled datasets—sensor data that has been noted to monitor each 3D point over time. Labeling such datasets manually is costly and difficult so there is, obviously, very little labeled data.
Consequently, scene flow training is rather carried out usually with simulated data, which is less effective, and further tweaked with the little amount of labeled real-world data that is present.
Mittal, Held, and Brian Okorn, a PhD student in robotics, chose a different method, which involves using unlabeled data to carry out scene flow training. Since unlabeled data is comparatively simple to produce by fitting a lidar on a car and driving around, there is no lack of it.
The key to their method was to design a procedure for the system to detect its faults in scene flow. At every instant, the system attempts to estimate where and how quickly each 3D point moves.
In the following instant, it quantifies the distance between the estimated location of the point and the exact position of the point closer to that estimated location. This distance develops one kind of error to be reduced.
Then, the system inverts the process, beginning from the estimated point location and working reversely to map back to where the point started. At this stage, it quantifies the distance between the actual origination point and the estimated position. The resulting distance is the second type of error. Furthermore, the system functions to rectify those errors.
It turns out that to eliminate both of those errors, the system actually needs to learn to do the right thing, without ever being told what the right thing is.
David Held, Assistant Professor, Robotic Institute, Carnegie Mellon University
This might sound rather complicated, but Okorn identified that it worked well. The team estimated that scene flow precision with the help of a training set of synthetic data was only 25%. While the synthetic data was tweaked with a little amount of real-world labeled data, the precision improved to 31%. When a huge amount of unlabeled data was added to train the system using their method, the scene flow precision increased to 46%.
The researchers will present their technique at the Computer Vision and Pattern Recognition (CVPR) conference, which will be held online from June 14th to 19th. This study was financially supported by the CMU Argo AI Center for Autonomous Vehicle Research, with more support from a NASA Space Technology Research Fellowship.