I don't think local models will win

Building AI agents taught me that the real moat is not the model itself, but ownership of the memory, context, and operating layer around it.

By Luis Borgesai infrastructure agents

Share —LinkedInXEmail

When I first started building AI agents on a Mac Mini at home, I wasn't thinking about geopolitics. I was thinking about privacy. I wanted a Gmail assistant that could help me process information without handing more of my personal context to a vendor I had no leverage over. Running a local model felt like the obvious answer. It was mine. It stayed mine.

The Fable 5 situation pulled me out of that frame. A small product team gets caught in a public fight, their infrastructure provider gets pressured, and suddenly the conversation isn't about model quality at all. It's about who can turn off the lights, and how quickly. That is not a model problem. That is an ownership problem.

Why I started building locally

The first agents I built were embarrassingly small. A summarizer. A label-router. A weekly digest. None of them needed a frontier model. What they needed was uninterrupted access to my data and a stable place to live. Running them on hardware in my apartment meant the work survived a price change, a policy change, or a vendor deciding my use case was no longer welcome.

I expected that to feel like a technical preference. It started to feel like a posture.

What the Fable 5 situation actually revealed

The headline version is a content moderation story. The version that matters is quieter. A product gets built on a stack of services, each of which seemed neutral at the time. Then the wind changes, and the dependencies stop being neutral. The model provider is one node. The hosting provider is another. The payments processor, the email vendor, the identity layer, the analytics pipe. Every one of them is a place where someone else can decide your product no longer exists.

The Fable team did not lose because their model was worse. They lost because the operating layer underneath them was never theirs.

The model is rarely where the leverage lives. The leverage lives in everything wrapped around it.

The model is the cheapest part of the stack

This is the part I keep coming back to. The model is becoming a commodity. Open weights are catching up faster than most incumbents want to admit, and the gap that remains is narrowing on the workloads most teams actually run. The expensive, defensible part of an AI product is no longer the weights. It is the memory the agent accumulates, the context it has earned, the identity it operates under, the permissions it holds, the routing logic that decides which model handles which task, the scheduling that keeps it cheap, and the workflows it has been tuned against. That is the operating layer. That is what is hard to rebuild on a Saturday because a vendor changed terms on a Friday.

Local models, by themselves, do not solve that. You can run a fully local model on top of a stack you do not own and still be one policy change away from a dead product.

What ownership looks like in practice

Ownership is not a single decision. It is a pattern of small ones. Keeping memory in a store you control, even when the model lives elsewhere. Writing the orchestration layer in code you can read, not in a vendor's drag-and-drop canvas. Preferring open-source components for the pieces you would not want to rewrite in a crisis. Designing workflows so they can move between providers without the business logic having to come with them. Sometimes it just means understanding, with honesty, which dependencies you have already accepted and which ones you have not priced in yet.

A hybrid posture is usually the realistic answer. Frontier model for the hard reasoning, smaller local model for the work that touches sensitive data, your own glue holding it together. The point is that the glue is yours.

The real question

I don't think local models will win. I think ownership will. The Fable 5 situation felt less like a warning about one company and more like a reminder that the question is no longer whether we use AI. The question is which parts of that future we are comfortable renting.

Share this essay

LinkedIn X / Twitter Email

Have a product bottleneck of your own?

Bring one problem. We'll spend 30 minutes mapping the visible symptom and the likely upstream constraint.

Book a 30-min bottleneck call