Why your AI pilot is stuck (and what to do about it)

The most common call I get starts with some variation of: "We ran a pilot six months ago, it looked promising, but now everything's stalled and the team doesn't know what to do next."

This is pilot purgatory. It's where AI initiatives go to slowly suffocate without anyone noticing, and it's not because the technology stopped working.

The three stages of a stuck pilot

Every stuck pilot I've seen is in one of three places.

Stage 1: The demo is great, production is silent. The pilot runs against a curated dataset, delivers impressive numbers, and then... nothing. Nobody knows how to run it against real data, because nobody owns the data pipeline, and the team that built the pilot doesn't have access to the systems that would let them productionize it.

Stage 2: Production works, nobody trusts it. The thing runs. It answers questions. It generates reports. And every single one of those outputs gets manually reviewed by the humans it was supposed to replace, because nobody has decided what "good enough" means. You're paying for two workflows instead of one.

Stage 3: Trust exists, budget doesn't. The business unit wants to scale but the CFO is asking why the last pilot cost €80k and produced a Slack bot. This is the most salvageable of the three — it's a narrative problem, not a tech problem.

How to unstick each one

Stage 1 needs an owner for the data pipeline. Not a project manager. Not a consultant. A named engineer from the data team whose performance review includes "shipped this to production." If you can't get that, the pilot is a vanity project and you should kill it.

Stage 2 needs a defined quality contract. Write down — literally, on paper — the five failure modes you care about and the acceptable rate for each. Then build evals that measure those five things and nothing else. When the evals pass, you stop manually reviewing. This is the single highest-leverage intervention in AI adoption.

Stage 3 needs a business case that matches the audience. CFOs don't want to hear about prompt engineering. They want to know what this replaces, what it accelerates, and what the unit economics look like at scale. I've rewritten a dozen of these and they're all the same template: one paragraph of context, three numbers, one decision.

The ninety-day rule

If a pilot has been stuck for more than ninety days, the problem is never the model. It's ownership, quality definition, or narrative. Pick the one that's broken and fix it specifically. Do not "re-evaluate the approach." Do not "explore alternatives." Specific unstucks for specific problems. Everything else is theatre.