Assessing the Geolocation Capabilities, Limitations and Societal Risks of Generative Vision-Language Models
Published in AAAI Fall Symposium Series (FSS-25), 2025
Geo-localization is the task of identifying the location of an image using visual cues alone. Recently, Vision-Language Models (VLMs) are increasingly demonstrating capabilities as accurate image geo-locators, bringing significant privacy risks including those related to stalking and surveillance.
We present a comprehensive black-box evaluation of 25 state-of-the-art generative VLMs — including open-source, open-weight and closed-source systems — on four benchmark datasets capturing diverse environments. Our findings indicate that current VLMs perform poorly on generic street-level images yet achieve notably high accuracy (61%) on images resembling social media content, raising significant and urgent privacy concerns.
Key contributions:
- Benchmarked 25 state-of-the-art generative VLMs on four diverse geo-localization datasets
- Analyzed the broader societal and ethical risks associated with VLM-based geo-localization, offering insights for future research and policy
Recommended citation: Grainge, O.*, Waheed, S.*, Stilgoe, J., Milford, M., & Ehsan, S. (2025). "Assessing the Geolocation Capabilities, Limitations and Societal Risks of Generative Vision-Language Models." AAAI Fall Symposium Series (FSS-25), 161-168.
Download Paper | Download Slides
