Assessing the Geolocation Capabilities, Limitations and Societal Risks of Generative Vision-Language Models

Published in AAAI Fall Symposium Series (FSS-25), 2025

Geo-localization is the task of identifying the location of an image using visual cues alone. Recently, Vision-Language Models (VLMs) are increasingly demonstrating capabilities as accurate image geo-locators, bringing significant privacy risks including those related to stalking and surveillance.

We present a comprehensive black-box evaluation of 25 state-of-the-art generative VLMs — including open-source, open-weight and closed-source systems — on four benchmark datasets capturing diverse environments. Our findings indicate that current VLMs perform poorly on generic street-level images yet achieve notably high accuracy (61%) on images resembling social media content, raising significant and urgent privacy concerns.

Key contributions:

  • Benchmarked 25 state-of-the-art generative VLMs on four diverse geo-localization datasets
  • Analyzed the broader societal and ethical risks associated with VLM-based geo-localization, offering insights for future research and policy

Recommended citation: Grainge, O.*, Waheed, S.*, Stilgoe, J., Milford, M., & Ehsan, S. (2025). "Assessing the Geolocation Capabilities, Limitations and Societal Risks of Generative Vision-Language Models." AAAI Fall Symposium Series (FSS-25), 161-168.
Download Paper | Download Slides