TAT-VPR: Ternary Adaptive Transformer for Dynamic and Efficient Visual Place Recognition

Published in arXiv preprint, 2025

What’s inside

Dynamic accuracy–efficiency trade-offs. A top-k activation mask lets you dial computation down by up to 40 % at run time with < 1 % loss in Recall@1. :contentReference[oaicite:0]{index=0}
Extreme compression. Ternary weights (2-bit) shrink the backbone five-fold, freeing memory on micro-UAV and embedded SLAM stacks.
Distillation keeps performance high. A two-stage token-level distillation from a full-precision DINOv2-BoQ teacher preserves descriptor quality despite aggressive quantization. :contentReference[oaicite:1]{index=1}

Key benchmarks

Pitts30k: Near-dense Recall@1 with up to 40 % fewer TOPs.
SVOX condition splits: Outperforms convolutional baselines under snow, rain, and night conditions while using an order of magnitude less memory.

TL;DR

TAT-VPR shows that extreme quantization and adaptive sparsity aren’t mutually exclusive—you can have a single model that scales from high-accuracy desktop runs down to ultra-efficient on-board inference without retraining.

Recommended citation: Grainge, O., Milford, M., Bodala, I., Ramchurn, S. D., & Ehsan, S. (2025). "TAT-VPR: Ternary Adaptive Transformer for Dynamic and Efficient Visual Place Recognition." arXiv preprint, arXiv:2505.16447.
Download Paper | Download Slides

Share on

X (formerly Twitter) Facebook LinkedIn

Oliver Grainge

What’s inside

Key benchmarks

TL;DR

Share on