ETH Zurich - D-INFK - IVC - CVG - Research - Large-Scale Localization

Large-Scale Localization

Large-scale image-based localization

Hyperpoints and Fine Vocabularies for Large Scale Location Recognition

Structure-based localization is the task of finding the absolute pose of a given query image w.r.t. a pre-computed 3D model. While this is almost trivial at small scale, special care must be taken as the size of the 3D model grows, because straight-forward descriptor matching becomes ineffective due to the large memory footprint of the model, as well as the strictness of the ratio test in 3D. Recently, several authors have tried to overcome these problems, either by a smart compression of the 3D model or by clever sampling strategies for geometric verification. Here we explore an orthogonal strategy, which uses all the 3D points and standard sampling, but performs feature matching implicitly, by quantization into a fine vocabulary. We show that although this matching is ambiguous and gives rise to 3D hyperpoints when matching each 2D query feature in isolation, a simple voting strategy, which enforces the fact that the selected 3D points shall be co-visible, can reliably find a locally unique 2D-3D point assignment. Experiments on two large-scale datasets demonstrate that our method achieves state-of-the-art performance, while the memory footprint is greatly reduced, since only visual word labels but no 3D point descriptors need to be stored.


  • T. Sattler, M. Havlena, F. Radenovic, K. Schindler, M. Pollefeys, Hyperpoints and Fine Vocabularies for Large Scale Location Recognition, ICCV 2015 [PDF]
  • Supplementary Material [PDF]

Camera Pose Voting for Large-Scale Image-Based Localization

Image-based localization approaches aim to determine the camera pose from which an image was taken. Finding correct 2D-3D correspondences between query image features and 3D points in the scene model becomes harder as the size of the model increases. Current state-of-the-art methods therefore combine elaborate matching schemes with camera pose estimation techniques that are able to handle large fractions of wrong matches. In this work we study the benefits and limitations of spatial verification compared to appearance based filtering. We propose a voting based pose estimation strategy that exhibits O(n) complexity in the number of matches and thus facilitates to consider much more matches than previous approaches – whose complexity grows at least quadratically. This new outlier rejection formulation enables us to evaluate pose estimation for 1-to-many matches and to surpass the state-of-the-art. At the same time, we show that using more matches does not automatically lead to a better performance.


  • B. Zeisl, T. Sattler, M. Pollefeys, Camera Pose Voting for Large-Scale Image-Based Localization, ICCV 2015, oral [PDF]
  • Supplementary Material [ZIP]
  • Project Page

Get Out of My Lab: Large-scale, Real-Time Visual-Inertial Localization

Accurately estimating a robot’s pose relative to a global scene model and precisely tracking the pose in real-time is a fundamental problem for navigation and obstacle avoidance tasks. Due to the computational complexity of localization against a large map and the memory consumed by the model, state-ofthe-art approaches are either limited to small workspaces or rely on a server-side system to query the global model while tracking the pose locally. The latter approaches face the problem of smoothly integrating the server’s pose estimates into the trajectory computed locally to avoid temporal discontinuities. In this paper, we demonstrate that large-scale, real-time pose estimation and tracking can be performed on mobile platforms with limited resources without the use of an external server. This is achieved by employing map and descriptor compression schemes as well as efficient search algorithms from computer vision. We derive a formulation for integrating the global pose information into a local state estimator that produces much smoother trajectories than current approaches. Through detailed experiments, we evaluate each of our design choices individually and document its impact on the overall system performance, demonstrating that our approach outperforms state-of-the-art algorithms for localization at scale.


  • S. Lynen, T. Sattler, M. Bosse, J. Hesch, M. Pollefeys, R. Siegwart, Get Out of My Lab: Large-scale, Real-Time Visual-Inertial Localization, RSS 2015, Best Systems Paper Award Finalist [PDF]
  • Poster [PDF]
  • Video MP4

© CVG, ETH Zürich