The Secret to Context Engineering for Spatial AI is Accuracy Transfer

Sean Gorman

Entrepreneur and Geographer

Published Jul 3, 2025

Creating a compelling Spatial AI demo is a pretty manageable task given the quality of LLMs available today. The next step of creating a reliable MCP for that service that works globally is a more challenging leap. The problem of bridging the gap from “cheap demo” to “magical product” is lately being called “context engineering”. More specifically:

Context Engineering is the discipline of designing and building dynamic systems that provides the right information and tools, in the right format, at the right time, to give a LLM everything it needs to accomplish a task.

This really resonated with our work enabling Spatial AI. Our focus isn’t so much building a new LLM as providing the correct spatial data and tools in a structure that LLMs can perform orientation and navigation tasks in real time. The key to doing this well is providing high accuracy map data and imagery which is well aligned with the localization of your device. Then providing an interface for the LLM to be able to consume that data fluidly.

The Accuracy Challenge

We’ve seen that LLM’s do quite well when we provide real time localization of what a user is looking at. Also with GNSS augmentation and sensor fusion magic our localization can be quite precise. By way of example we benchmarked our localization accuracy against Google’s ARcore VPS and we were within 7 degrees with a 99.5% heading correlation.

Article content — Correlation of Zephr Heading to Google's ARcore VPS Heading

When you combine this with GNSS accuracy improvements you get a very competitive localization service without the VPS baggage (e.g. battery drain, open camera, expensive 3D mapping and compute). The downside of sensor based localization is you need geospatially accurate vector map data to localize against. While buildings and roads largely match reality, the places (a.k.a. points of interest) we do commerce with are far less accurate. We did a series of posts (#1 and #2) where we tested different techniques for improving the positions of those places.

Further complicating this problem, open street level imagery can also often suffer from poor positioning and camera pose data coming from subpar smartphone measurements. This includes explicit street level projects like Mapillary and KartaView as well as user generated images of places from smartphones. Ideally we want high accuracy position and pose for both street level imagery and user generated photos.

Accuracy Transfer

The motivation to generate better foundational accuracy for data collected with smartphones is to help facilitate accuracy transfer. A big reason roads and buildings are so accurate is because their real world position is derived from aerial and satellite imagery, which has rigorous orthorectification and absolute accuracy specifications. Unfortunately, this georectification is totally lacking for “places”. We can potentially bridge this gap with street level imagery and photos of places by reconstructing the georectification problem for vector map data in a novel way. In this process we can take the improved accuracy from the position and pose of a smartphone when generating photos, and then transfer that accuracy to the vector map data (e.g. places) we generate from those photos.

Traditionally the mapping of street level imagery has been done with “structure for motion” (SfM) pipelines that turn 2D photos into 3D point clouds that can then be geo-registered (RIP Pixel8earth). This approach has been popular for creating 3D feature databases to power city scale augmented reality (AR). The problem is these 3D feature databases for AR don’t play particularly well with vector maps or the data we’d like to feed an LLM.

Geographically Anchored Visual Data

Instead of point clouds derived from images we want geographically anchored visual data. For Spatial AI, we only need to anchor the contents of image in the real world, rather than the centimeter scale feature geometries computed in an SfM process. While SfM reconstruction can be aligned geographically to allow us to back out the position and orientation of each image we've found it to be an unnecessarily heavy lift. At it's best, this approach can decompose the pixels into a precise 3D geographic representation of what's in the picture. However, doing this with crowdsourced images taken with different cameras, in different conditions, at different times can be quite brittle.

Precise absolute position and pose from a smartphone allows us to avoid SfM complexity and computational cost when triangulating map features, as well as removing the VPS battery usage when localizing against the map on device. The core of this process was covered in our relocalization blog posts, but there is an additional bundle adjustment step in order to do fun things like map entrances. Also this approach allows us to explicitly link visual data with vector map data through GERS IDs and better conflate places with buildings in the process. Below is an example of work we've done in our test area to geographically anchor images while conflating their context with Overture places and building through GERS.

Conclusion

In the buzz surrounding generative AI we often get focused on foundational models and our need for one specific to the geospatial domain. While this is critical work I think we are missing equally important piece of “context engineering” to geospatially enable existing LLMs. Andrej Karapathy summarizes this process of productionizing LLM capabilities well.

For the geospatial use case the challenge multiplies because in addition to the standard LLM problems you are also bridging spatial operations with linguistic relationships. The key to executing this well is accurate and aligned data coming from the device and the LLM/RAG. Stay tuned for an MCP to help facilitate this goal and more content of how we can better geographically anchor visual data.

Patrick McFadden

Architect of Thinking OS™ | Inventor of Refusal-First Cognition | Built the Seatbelt for AI — Eliminates Fines, Ensures Explainability, Stops Drift

3mo

Sean Gorman The term “context engineering” is catching fire — but clarity gets lost in the blaze. You’re right to bring it back to structure: grounded data, decision architecture, and constraint-aware reasoning. Just published a deep dive on why “context engineering” alone isn’t enough and what must be upstream if you want reliable cognition, not just interface logic. https://www.thinkingoperatingsystem.com/context-engineering-is-a-mirage-the-system-still-doesnt-know

2 Reactions

Ryan Baumann

Lead DevX @ Google Maps | Architect of AI-native Platforms | Context Engineering & Agent Workflows

3mo

Really curious to learn how far you can push spatial inference with run time context (e g. Context engineering - a term I like - via tools) vs. fine-tuning a specialized geospatial model. Thanks for sharing Sean!

1 Reaction

Anthony Townsend

Mostly Human-Powered Content

3mo

structured data FTW

2 Reactions

DataInsta

3mo

It's fascinating how context shapes AI reliability. What are your thoughts on enhancing frameworks? 🤔

LinkedIn respects your privacy

The Secret to Context Engineering for Spatial AI is Accuracy Transfer

Sean Gorman

Entrepreneur and Geographer

The Accuracy Challenge

Accuracy Transfer

Geographically Anchored Visual Data

Conclusion

More articles by Sean Gorman

Others also viewed

The Ultimate Guide to Transformer Architecture : How Modern AI Really Works

A creative gallery for Generative AI art emphasising the universal principles of culture

Why Diffusion is the Future of Text Generation

DALÍ VS DALL-E: WHY AI NOVELTY LAGS HUMAN CREATIVITY

Can an LLM Design Like Jony Ive?

How Does Stable Diffusion Work? Explained

The Art of Inventive Prompting: A TRIZ Perspective

Machine Learning Meets Movies: How AI is Transforming the Film Industry

A Deep Dive into Generative World Models

From Concept to Construction: Harnessing AI and Graph Neural Networks in Building Design

Explore content categories

The Accuracy Challenge

Accuracy Transfer

Geographically Anchored Visual Data

Conclusion

More articles by Sean Gorman

Early Access: Zephr's Virtualized Positioning Engine

SatCamp Hot Take: Diseconomies of Scale and the Case for Embeddings

Foundation Models vs. Context Engineering for Geo/Spatial AI

The Localization of AI

A Quick Analysis of Spoofing in Haifa Israel

How Spatially Aware Can an LLM Be?

Maps in a Screenless Future

Scaling POI Relocalization with Mapillary

Get Your POI Out of My Parking Lot

Converting the World of Coordinates to Language: When Maps Disappear

Others also viewed

The Ultimate Guide to Transformer Architecture : How Modern AI Really Works

A creative gallery for Generative AI art emphasising the universal principles of culture

Why Diffusion is the Future of Text Generation

DALÍ VS DALL-E: WHY AI NOVELTY LAGS HUMAN CREATIVITY

Can an LLM Design Like Jony Ive?

How Does Stable Diffusion Work? Explained

The Art of Inventive Prompting: A TRIZ Perspective

Machine Learning Meets Movies: How AI is Transforming the Film Industry

A Deep Dive into Generative World Models

From Concept to Construction: Harnessing AI and Graph Neural Networks in Building Design

Explore content categories