Notes from the Coalface
Most of what gets written about AI and automation is written from a distance. Strategy decks. Analyst reports. Case studies polished until the friction has been removed.
This isn't that.
Workhorse is an inventory and order management system for UK product businesses — the kind of operations that run on spreadsheets, supplier relationships, and people who've been doing this long enough to know where the bodies are buried. We're building AI automation into that environment rightnow, and we're writing down what we find.
The question that started this series was deceptively simple: if automation is supposed to remove labour, why does the labour keep showing up?
Not everywhere. The routine work does disappear. Orders that used to be processed manually get handled without anyone touching them. But the people who used to do that work aren't sitting idle — they're busier, in some cases — doing something slightly different. Checking. Deciding. Catching the things the system flagged, or worse, didn't flag. The headcount problem doesn't solve itself. It moves.
What we've found, piece by piece, is that this isn't a technology problem. It starts with a distinction that sounds technical but turns out to matter more than almost anything else about how operational software is built: a system that records what happened and a system that owns what happens next are different products. Most operational software is only the first one.
From there, the problems compound. Fragmented stacks — a purchasing tool, a forecasting tool, an ERP underneath all of it —move decisions between systems without any of them owning the outcome. Operational systems routinely create executable state without ever declaring whether execution is actually permitted — and the humans who catch the ones that shouldn't go out aren't a safety feature. They're evidence that the system never defined what execution requires. That pattern, it turns out, isn't accidental and it isn't fixable with better tooling. It's structural.
Which is where things get uncomfortable for anyone selling AI as the answer. Better forecasts and higher confidence scores don't reduce the review load — because the review was never about whether the record is correct. It's about whether the system is permitted to act on it. Authority is the bottleneck, and intelligence doesn't move it.
What makes this harder to fix than it looks is that the volume doesn't stay flat. As automation increases, more records arrive at the permission boundary — and the humans absorbing that load aren't checking arithmetic, they're carrying commercial risk the system was never authorised to carry. Accuracy makes it worse, not better: a 95% success rate doesn't produce 95% less work, because the system still can't tell you which 95% are safe. Until it can, you check everything. The pressure compounds precisely because the system is getting smarter.
The automation boundary in an operational workflow isn't placed where someone decided to stop. It forms after an irreversible mistake reaches a customer — a wrong shipment, a duplicated order, a disputed invoice — and it lands at the last point the error was still fixable. After that, every order gets checked. The boundary doesn't move because nothing in the system has changed that would prevent the same event from recurring undetected.
The instinct, when errors keep appearing, is to reduce them— to invest in accuracy until the checking becomes optional. But accuracy isn't what drives the verification load. A team that took their error rate from 5% to 2% found the checking unchanged: the system still couldn't identify which orders it might have got wrong, so every order still got checked. The exit from universal verification isn't a higher score. It's a system that can tell you where it's uncertain.
There's a second problem running underneath the authority question, and in some ways it's harder to see. The review gate assumes records wait safely until someone looks at them — but in most operational systems, an unreviewed order is already participating in execution the moment it exists. Stock availability adjusts, demand signals update, warehouse teams start planning — all against a record no one has approved. By the time areviewer opens it, the downstream decisions are already made.
The gate-is-already-leaking problem assumed that records at least meant what they said — that the ambiguity was about timing, not content. That turns out to be only half of it. In anylive operational environment, the sources the AI draws from will always contain records that are individually accurate but collectively contradictory —configuration notes, JIRA tickets, code documentation that each reflect a different moment in a client's history. When an engineer corrects the AI's answer, she's resolving that conflict from knowledge the sources don't contain,and nothing about that resolution gets back to the system. Next time the same query arrives, the same conflicting sources produce the same confident wrong answer.
We're still in the middle of it. The series follows the evidence one step at a time, and the evidence keeps moving. What's below is what we've established so far.
.png)