Visual Image Localization in Mountainous Areas
We address the problem of localizing any given photograph (of a mountainous landscape) using vision techniques only. We propose an automated approach for very large scale visual localization that can efficiently exploit visual information and geometric constraints at the same time. We validate the system on the scale of a whole country (Switzerland, 40'000 km²) using a new dataset of more than 200 landscape query pictures with ground truth.
- Image Based Geo-Localization in the Alps (IJCV 2015)
- Large Scale Visual Geo-Localization of Images in Mountainous Terrain (ECCV 2012 oral)
The appearance of natural scenes changes dramatically with seasons, time of day or weather conditions. This makes the use of traditional patch-based features impractical. We use the horizon as recognizable feature, since it remains stable over very long timespans.
We take smaller parts of the horizon curve, which after smoothing and downsampling constitute local feature descriptors. These are quantized to obtain visual words.
We propose a novel bag-of-words scheme that votes for both location and viewing direction simultaneously. This enables a rough geometric consistency check already at the voting stage.
Our approach localizes 88% of the query images correctly withing 1km of the ground truth and estimates the full 3D orientation of the camera.
Robustness to Tilt
Our algorithm uses the fact that landscape images usually are not subject to extreme tilt angles. In this experiment, we virtually rotate the extracted horizon of the query images by various angles in order to simulate camera tilt and observe how recognition performance is affected.
We designed our proposed feature descriptor to be robust with respect to camera tilt. It turns out that we still get over 60% recognition, even for 30° tilt (which is a lot, since the horizon is usually straight in front of the camera, not above or below).
Robustness to Field-of-View
The field-of-view (FoV) extracted from the EXIF data may not always be 100% accurate. This experiment studies the effects of a slight inaccuracies. We modify the FoV obtained from the EXIF by various percentages and plot it against recognition performance on the entire query.
In the paper, we state that we only need to know an approximate value for the field-of-view. Here, we see that even if that value is off by ±5%, we still get 70-80% recognition.
Estimation of Instrinsics
We can even obtain a rough estimate of the camera intrinsics by hypothesizing different values for the field-of-view (FoV) and retaining the best one.
Every frame of the animation displays the matching costs arising from a different assumed FoV. For FoVs between 25° and 45°, the optimum (in blue) travels along the camera's viewing direction northwest. The last frame shows for each location the best matching cost over all FoVs. The minimum corresponds to the best combination of location and FoV.
This plot shows the matching cost of the best location as a function of the FoV. For FoVs around the 33° (the value from the EXIF tag), the matching cost is lower and varies more smoothly than further away.
We thank Simon Wenner for his help with rendering the DEMs. We also thank Hiroto Nagayoshi, José Henrique Brito, Lionel Heng and the Panoramio users bp_meier, fourpier, JGAlarcon, loamvalley, tompon and tressy for contributing photographs to the queryset. This work has been supported through SNF grant 127224 by the Swiss National Science Foundation.