Improving Visual Place Recognition Performance with Continuous Place-Descriptor Regression
Category Computer Science Tuesday - May 23 2023, 04:15 UTC - 1 year ago Researchers at Delft University of Technology (TU Delft) have recently introduced a new approach to improve the performance of deep learning algorithms used in visual place recognition (VPR) applications. The proposed method, Continuous Place-Descriptor Regression (CoPR), replaces the similarity-based image retrieval of VPR with more consistent descriptors, producing reliable pose estimates even when the weather or time of day changes. CoPR has been tested on various datasets, showing significantly better results than existing VPR algorithms.
Visual place recognition (VPR) is the task of identifying the location where specific images were taken. Computer scientists have recently developed various deep learning algorithms that could effectively tackle this task, letting users know where within a known environment an image was captured.A team of researchers at Delft University of Technology (TU Delft) recently introduced a new approach to enhance the performance of deep learning algorithms for VPR applications. Their proposed method, outlined in a paper in IEEE Transactions on Robotics, is based on a new model dubbed continuous place-descriptor regression (CoPR).
"Our study originated from a reflection on the fundamental bottlenecks in VPR performance, and on the related visual localization approaches," Mubariz Zaffar, first author of the study, told Tech Xplore.
"First, we were talking about the problem of 'perceptual aliasing,' i.e., distinct areas with similar visual appearances. As a simple example, imagine we collect reference images with a vehicle driving on the rightmost lane of a highway. If we later drive on the leftmost lane of the same highway, the most accurate VPR estimate would be to match these nearby reference images. However, the visual content might incorrectly match a different highway section where reference images were also collected on the leftmost lane." .
One possible way to overcome this limitation of VPR approaches identified by Zaffar and his colleagues could be to train the so-called image descriptor extractor (i.e., a component of VPR models that extracts descriptive elements from images) to analyze images similarly irrespective of the driving lane in which they are taken in. However, this would reduce their ability to effectively determine the place where an image was taken.
"We thus wondered: is VPR only possible if we collect images on all lanes for each mapped highway or if we only drive in the exact same lane? We wanted to extend VPR's simple but effective image retrieval paradigm to handle such practical problems," Zaffar said.
"Second, we realized that even the pose estimate of a perfect VPR system would be limited in accuracy, as the finite size of the reference images and their poses meant that the map cannot contain a reference with the exact same pose for every possible query, We therefore considered that it might be more important to address this sparsity, rather than trying to build even better VPR descriptors." .
When reviewing previous literature, Zaffar and his colleagues also realized that VPR models are often used as part of a larger system. For instance, visual simultaneous localization and mapping (SLAM) techniques can benefit from VPR approaches to detect so-called loop closures, while coarse-to-fine localization approaches can achieve sub-meter localization accuracy by refining the coarse pose estimates of VPR.
"Compared to these more complex systems, the VPR step scales well to large environments and is easy to implement, but its pose estimate is not that accurate, as it can only return the pose(s) of the previously seen image(s) that best visually match the query," Zaffar said.
"Still, SLAM and re-localization approaches depend on the ability of VPR to quickly and accurately estimate the pose of a query image. Due to old images, local weather conditions or autonomous vehicles driving faster than the previous dataset was collected, the pose estimation could be quite far off from what is desired for a successful application. We expected most of these methods to work well on top of a VPR system that would not suffer from such limitations, but other solutions weren't available—so we decided to do something about it ourselves!" .
The team set out to develop a VPR model that can accurately estimate the pose of a query image by using continuously-learned descriptors. The CoPR model maps data points onto latent descriptor spaces, and then uses learned region-pairings to predict poses. As they describe in their paper, the strength of this method lies in its descriptor learning mechanism, as it gives better and more consistent descriptors than the similarity-based image retrieval of VPR alone.
The team tested the model on various datasets, showing significantly better results than existing VPR algorithms. Their experiments show that CoPR is less affected by different weather or time of day conditions, which can compromise the performance of existing VPR algorithms.
"An interesting aspect of our work is that it is not just an additional feature of a VPR model, or an improved descriptor," Zaffar said. "It addresses a fundamental problem of VPR performance that was never fully faced before. By replacing the similarity-based image retrieval of VPR with more consistent descriptors, the CoPR model gives reliable pose estimates, even when the weather or time of day changes." .
Not only does the CoPR model improve accuracy, but it also retains the scalability and easy implementation of VPR, making it an ideal solution for real-world scenarios. The team also believes that their model could help autonomous vehicle navigation, patient monitoring, natural resource exploration and many other AI-related applications.
"We are currently further exploring ways to scale-up the CoPR model to larger datasets, as well as to extend its capabilities for robots operating in dynamic or otherwise challenging settings," Zaffar concluded. "We are also aiming to integrate the CoPR model with more complex localization and mapping systems, such as SLAM solutions, which can benefit from VPR and its ability to quickly estimate poses." .
Share