SatCamp Hot Take: Diseconomies of Scale and the Case for Embeddings

Sean Gorman

Entrepreneur and Geographer

Published Sep 28, 2025

Another fun filled and positivity refueling SatCamp is in the books. There were lots of engaging talks and wonderful conversations, but one panel has really stuck with me. Chloe Hampton 's collection of luminaries for the "Embeddings & GeoAI" was really an "aha" moment. First, though, a little background to tee it up.

Back when I worked at DigitalGlobe a group of us were working tirelessly trying to solve the puzzle for how to scale ML (machine learning) models for satellite imagery. The engineering problem ended up being very tractable, but scaling the science+business model was way more elusive.

The business problem was that each customer's use case was bespoke. There was just enough novelty in the use cases that you could seldomly reuse ML models. Even when you did get the same use case repeated it would be in a different geographic location, which meant lots of new training data and model tuning. In short ML models for satellite imagery were not generalizable. The upstream impact is that you couldn't build a high margin repeatable business on top of infrastructure like GBDX. Although it was a far better data access platform, but that is a story for another day.

Diseconomies of Scale

Back to the excellent SatCamp embedding panel. Jeff Albrecht gave one of the best encapsulations of the problem we ran into at DigitalGlobe. The challenge is ML models for satellite imagery run into "diseconomies of scale". The bigger you scale your business the more it costs to run the operation and profit margins shrink. This is the opposite of most software that benefits from "economies of scale". Where the bigger the operation the easier it is to produce the good. This results in the business increasing profit margins and making more money.

For example King Inc. spent "$x" to make Candy Crush and there is very minimal costs for each additional copy of Candy Crush they sell in the app store. As a result their cumulative marginal cost for each new download of Candy Crush decreases. This makes for a really scalable and profitable business. To date building ML models has been the opposite of this well worn positive externality associated with software and digital goods.

Since the vast majority of customer use cases for ML models are different you need a new or updated model each time. This means human time from R&D to create or update the model. Also you will need additional training data and likely annotations, which is a fixed additional costs. Then we'll need some engineering time to deploy and scale the model. Even in the case where a new customer wants the exact same model as a previous use case; unless it is in the same geography you still need to retrain the model. Every vector of growth requires additional sunk cost and a more complex model or system of interconnected models.

Arguably this is why so many satellite and associated AI/ML companies end up as defense tech plays. Defense contracts love a "butts in seats" business model. That is one of the few ways you can scale this classic approach to building ML models for satellite imagery. So how can embeddings unravel this Gordian knot?

Embeddings: The Path to Generalizability

In essence, an embedding is a way of representing information, like an image, a patch of terrain, or even an entire city block, as a vector in a high dimensional space. Two images that are visually or semantically similar (say, two types of farmland) will have embeddings that are close together in that space, even if those images come from different continents, sensors, or seasons.

This property is powerful because it lets models trained in one geography to generalize to others. Instead of starting from scratch every time you move from Kansas to Kenya, embeddings can capture the underlying relationships between features in the imagery, like textures, shapes, spectral patterns, and spatial context. These features remain consistent across the globe. A model built on top of these embeddings doesn’t need to relearn what a road, river, or building “looks like” in each new dataset. It can recognize those features because their embeddings occupy similar regions in this learned space.

The net result is that embeddings flatten the cost curve. Once a general embedding model has been trained on a broad corpus of satellite imagery, it can be reused everywhere. New geographies require far less bespoke data collection or tuning, dramatically reducing the marginal cost of deploying ML models across the globe. This is the transition from bespoke science projects to scalable software systems. The same shift that turned natural language processing from handcrafted rule sets into general-purpose foundation models like GPT.

Extensibility and the End of Reinventing the Wheel

Embeddings don’t just make models more generalizable; they make them more extensible. Because the embeddings capture a shared understanding of imagery, you can use the same underlying model for entirely different applications. For instance the same model could be used for deforestation monitoring, flood mapping, and urban growth detection by simply retraining a lightweight classifier or fine-tuning a small layer on top. Today this is a stretch for geospatial foundational models, but it is the direction these approaches hope to open up

This creates a modular workflow: instead of building a dozen specialized models from scratch, you maintain a single foundational embedding model that serves as the base for many related use cases. Each downstream task becomes faster and cheaper to develop because it leverages the same learned spatial and spectral representations.

In practice, this means an imagery/model company can go from a one-off project-based business to a product-based one. The same global embedding model can support a long tail of customer needs. This means each fine-tuned slice serving a niche, but all powered by the same core. This turns “diseconomies of scale” into "economies of scope" with the ability to address more problems, faster, with less incremental cost.

The only downside is that right now both "generalizability" and "extensibility" through embeddings for satellite imagery are a postulation not a reality. That said the work discussed on the panel from Ash Hoover , Alex Kovac , Leo Thomas and Jeff Albrecht are all exciting steps in that direction. Thanks to everyone for making the three days such an invigorating and mind opening experience.

Mikel Maron

Product Lead at Earth Genome

That's a great hot take explainer. Sorry I missed SatCamp and that panel, would really like to hear the discussion. The potential is definitely there ... we've seen dataset creation which would have required an order of magnitude more effort with traditional model training ... the global cattle feedlot search with Climate TRACE has been greatly accelerating. But yes, we don't know how truly generalizable embeddings are. Different tasks require still need different resolutions. The myriad models have strengths and weaknesses as well. One example: we have a project on seagrass mapping, and most models are not particularly trained on the peculiarities of coastal areas. That said, it's super important to lean into open sharing of embeddings, so we can reduce duplication and leverage each other's work. Earth Genome has been advocating for open distribution of embeddings https://github.com/Element84/vector-embeddings-catalog-whitepaper, and sharing what we generate on source.coop https://docs.source.coop/case-studies/earth-genome

3 Reactions

Ashley Deaner

Senior Staff Software Engineer at Maxar Technologies

well summarized! SatCamp was a blast and the AI panel left me with hope.

2 Reactions

Alex Diamond

Director of Products and Engineering at Carbon Mapper

I’d love to see the promise of embeddings fulfilled. It’s still early, but a bit skeptical since deforestation in Brazil doesn’t always look like deforestation in Sumatra, and a well pad in the Permian can look pretty different from one in the Marcellus, etc., etc.

1 Reaction

sophia parafina

Rock and Roll Hoochie Koo

Thank you! I was struggling to articulate the limitations of machine learning applied to early earthquake warning systems due to variations in subsurface conditions. The use of transformers in seismology is still in its early stages, but your post neatly frames a possible direction.

1 Reaction

Bruno Sanchez-Andrade Nuño

Building AI for Earth, with Clay and LGND.

I wish I had listened to that panel! I could only join the Happy Hour on the last day (in Boulder for family reasons), and several people referenced it. Kudos to the speakers and organizers — and thanks, Sean, for this thoughtful write-up. Sad I missed chatting with you too! What your post sparks for me is that geo-embeddings aren’t really about economies or diseconomies of scale. They’re about a new (?) "planetary-scale economies". That there’s a level of EO data-scale big enough, AND a semantic vocabulary useful enough — where the reusability beats the cost of complexity. As projects and requests pile up, you’re only serving one-offs AND you must scaffold to understand reusability. The path of scale is really painful. And Earth is large, BUT not infinite, and less so are useful embeddings. It’s not just the volume of data that matters, but the capacity to learn value during scaling. Makes sense? A mental exercise I like: try to describe to anyone any EO image using only 1,000 words of your choice. It gets easier the more you do it — not because you run out of things to say, but because you’ve learned the most common words to use. https://xkcd.com/1133/ That’s the promise of embeddings at planetary scale. That's LGND AI, Inc.

LinkedIn respects your privacy

SatCamp Hot Take: Diseconomies of Scale and the Case for Embeddings

Sean Gorman

Entrepreneur and Geographer

Diseconomies of Scale

Embeddings: The Path to Generalizability

Extensibility and the End of Reinventing the Wheel

More articles by Sean Gorman

Others also viewed

Battle of the Frameworks: LangChain vs AutoGen vs CrewAI for Multi-Agent Systems

AI Infrastructure vs AI Features: What Product Teams Must Know in 2025

Understanding A2A, ADK and MCP

The Human Code: Rediscovering Our Inner Operating System in an Age of Digital Noise

From Prompts to Plans: Why 2025 is the Year AI Stops Taking Orders

Pricing AI Won’t Save You If Your Positioning Sucks

Government-as-a-Platform: Reglab Operating Model

New Years fun with NotebookLM

Beyond Time Saved: Measuring GenAI's True Impact on Developer Capability

Building for the Future

Explore content categories

Diseconomies of Scale

Embeddings: The Path to Generalizability

Extensibility and the End of Reinventing the Wheel

More articles by Sean Gorman

Early Access: Zephr's Virtualized Positioning Engine

Foundation Models vs. Context Engineering for Geo/Spatial AI

The Localization of AI

The Secret to Context Engineering for Spatial AI is Accuracy Transfer

A Quick Analysis of Spoofing in Haifa Israel

How Spatially Aware Can an LLM Be?

Maps in a Screenless Future

Scaling POI Relocalization with Mapillary

Get Your POI Out of My Parking Lot

Converting the World of Coordinates to Language: When Maps Disappear

Others also viewed

Battle of the Frameworks: LangChain vs AutoGen vs CrewAI for Multi-Agent Systems

AI Infrastructure vs AI Features: What Product Teams Must Know in 2025

Understanding A2A, ADK and MCP

The Human Code: Rediscovering Our Inner Operating System in an Age of Digital Noise

From Prompts to Plans: Why 2025 is the Year AI Stops Taking Orders

Pricing AI Won’t Save You If Your Positioning Sucks

Government-as-a-Platform: Reglab Operating Model

New Years fun with NotebookLM

Beyond Time Saved: Measuring GenAI's True Impact on Developer Capability

Building for the Future

Explore content categories