Oliver Grainge

AI Engineer & Researcher

Specializing in production ML optimization and deployment. I take models from research to production with measurable performance improvements across edge and cloud platforms—through model compression, custom kernel development, and hardware-software co-design.

4 publications (IEEE RAL, AAAI) PhD candidate, Southampton CUDA / Triton / ARM NEON Open source contributor

Experience

Contract Researcher — Performance Engineering Feb 2025 – Present
Arm · Remote
Architected 6 hands-on tutorials for the Arm Total Performance toolkit covering memory optimization, library acceleration (APL, KleidiAI), and automated porting for AWS Graviton instances.
Research Assistant Jun 2025 – Nov 2025
University College London · London, UK
Engineered 1.58-bit precision pipeline for Stable Diffusion (4x memory reduction, 95% quality retention). Designed custom CUDA and Triton kernels for bit-packed tensor operations, delivering 30% speedup over PyTorch baseline.
Visiting Researcher Aug 2024 – Jan 2025
Queensland University of Technology · Brisbane, Australia
Engineered speculative decoding for vision-language transformers achieving 2.5x inference speedup for sub-100ms robotic navigation. Implemented training data filtering methods achieving equivalent accuracy with 38% less data.
Research Fellow Jan 2024 – Jan 2025
AI Security Institute · Remote
Built automated benchmarking framework evaluating 25+ VLMs across 26k geo-tagged images with 99.9% reliability over 500k+ API calls. Developed privacy-preserving techniques reducing geolocation accuracy by 40%, with interactive demo attracting 5k+ users.
Contract Researcher — AI Inference Optimization Nov 2024 – Jan 2025
Arm · Remote
Engineered demonstrations achieving 40% latency reduction via SIMD/INT8 on mobile and 2.1x throughput on cloud instances. Built Hyperopt-based per-layer precision optimizer demonstrating 22% memory reduction on GPT models.

Selected Publications

Assessing the Geolocation Capabilities, Limitations and Societal Risks of Generative Vision-Language Models
AAAI 2025 · First comprehensive benchmark of VLM geolocation capabilities across 4 datasets
TeTRA-VPR: A Ternary Transformer for Compact Visual Place Recognition
IEEE RAL · 65% memory reduction and 35% latency reduction for VPR transformers
Design Space Exploration of Low-Bit Quantized Visual Place Recognition
IEEE RAL · Deployment guidelines for extreme quantization on embedded devices
Structured Pruning for Efficient Visual Place Recognition
IEEE RAL · Channel pruning achieving 21% latency and 16% memory reduction with <1% accuracy loss

Open Source Projects

BitOps
C++ · CUDA · ARM NEON · AVX2
High-performance ternary matrix multiplication library with multi-backend support. 16x memory reduction via 2-bit weight packing with kernels outperforming PyTorch FP32 on edge devices.
BitCore
PyTorch
Quantization-aware training toolkit with drop-in BitLinear layers (BitNet, TWN, ParetoQ 1.58-bit). Train-to-deploy workflow with optional BitOps acceleration for 8x inference memory savings.
BitNet Chat Interface
Python · Gradio · CUDA
Interactive chat with 1.58-bit BitNet models. 24x speedup and 80% memory reduction on ARM M4 vs PyTorch FP32, with backend switching and streaming responses.
VSLAM
Python · OpenCV · NumPy
Pure-Python stereo SLAM pipeline (KITTI-compatible) with feature tracking, stereo matching, PnP/ICP motion estimation, and bundle adjustment.

Technical Skills

Languages

Python C/C++ CUDA Triton Bash SQL

ML Frameworks

PyTorch TensorFlow Hugging Face vLLM llama.cpp ONNX Runtime TensorRT OpenCV

Model Optimization

Quantization (INT8/INT4/ternary) QAT / PTQ / GPTQ / AWQ Pruning Knowledge Distillation LoRA / QLoRA Custom CUDA Kernels FlashAttention SIMD (NEON, AVX2)

Infrastructure & MLOps

AWS (EC2, Graviton) Docker Kubernetes SLURM Ray MLflow W&B Triton Inference Server FastAPI

Education

PhD (iPhD) in Machine Intelligence

University of Southampton · Oct 2022 – Present

Thesis: Efficient Resource-Constrained Visual Place Recognition

BEng Electronics and Electrical Engineering — First Class Honours (83%)

University of Southampton · Sept 2019 – Jul 2022

Interested in collaborating on efficient ML research or production AI deployment?

Get in touch GitHub LinkedIn Scholar