Oversight Without Approval
Once a gate is actually running, the operator's job is no longer to approve what the machine produced; it's to govern the rule it ran on. Where a person is still signing off each output, authority hasn't arrived yet.
The sales order parser has been running long enough now that the change it made has settled into something we can see plainly, and it isn't the change we set out to make. We thought the win was taking the typing off someone's desk — the part where a document comes in and a person has to workout which customer account it belongs to, reading the address or the contact or just knowing, because they've seen this customer a hundred times. The system does that now. It scores each account it could mean against the others, and if one is far enough clear of the rest it goes through; if nothing is, the order is held for a person to settle. That part works, and it's removed a lot of reading.
But the bit that actually changed the work is further along, and it took us a while to notice it for what it was.
The operator doesn't look at orders any more.
The ones that go through, they never see — those aren't on a list waiting for a nod. What lands in front of them is the held ones, the orders where the system couldn't get clear of the others. And here's the part that matters: when they settle one of those, they're not clearing that order so it can proceed. They're fixing the thing that made it ambiguous — the account that was set up twice, the contact that was missing, the rule about how this customer's POs come in — and that fix changes what the gate does the next time an order like it arrives. They're not approving the order. They're shaping the thing that approves orders.
Because the obvious thing to call all this is oversight, and the obvious shape for oversight is a person in the loop — the machine produces something, a human looks at it before it goes out, waves it through or pulls it back. That's the answer everyone reaches for when you ask how you keep AI safe in an operation. Keep a human on the outputs. It feels like the responsible thing.
It's also the exact thing we spent the back half of last year pulling out. A person signing off each record is what we had before any of this — the review step that sat after creation and turned every automated record into something waiting on a human. If that's what oversight means, then nothing has been governed. You've automated the typing and left the deciding exactly where it was, on someone's desk, one record at a time. The labour didn't go anywhere. It just got renamed oversight.
So the line we've ended up holding is this. Oversight isn't looking at the outputs. It's governing the rule the outputs come out of. On the order surface that's a real, specific thing — the operator owns where the line sits, what counts as clear enough, what happens to a held order, and they change those when they're wrong. They do not reach into a held case and clear it on its own terms. That one matters more than it looks: settle a single order by lowering the bar just for it, and you haven't made one exception, you've moved the bar for every order behind it. The rule quietly becomes whatever you last let through. So a held case is a reason to change the rule, or to leave it exactly where it is — never to make a quiet exception to it. The discipline of not reaching in is most of what makes the gate worth having.
Now hold that next to the other two places we've been working, where it comes out differently.
On the support side we've got the RAG system answering customer questions, and a while back a client asked whether they could override a customer's credit limit for a single order. The system answered it correctly and completely — no, there's no per-order override, the limit lives on the account, you raise it there. Nothing wrong with the answer. The analyst didn't send it anyway, because a flat no was going to land badly with this particular client, and what went back was the same fact with some room around it and a half-promise to look at a proper option. There is no rule that makes that call. Whether a bare no needs softening for this client, what we'll stand behind for them — that doesn't live in a document, and it doesn't live in a threshold. Soon this surface there's nothing for an operator to govern yet, because the thing they add can't be written as a rule the system runs. Which means the human is still in the loop the old way, reading each answer before it goes.That isn't oversight. It's the analyst still carrying authority the system hasn't been given.
The reporting side lands in a third place again. We've got clients asking questions of their own data in plain English and getting a number back. The number's usually right, but when it's wrong it's wrong invisibly — a join that didn't complete, a filter that quietly dropped a category — and the working that would tell you sat back where the query ran instead of coming forward with the figure. Until that working travels with the number, there's nothing for any rule to read at the moment the number's used. So there's nothing to govern. There's just a person looking at a figure and deciding whether to believe it — the same sign-off as before, over a number instead of an order. Get the working to arrive with the number and there's finally something a rule could read. You haven't written that rule yet. But before, there was nothing to write one against — just the number, and someone deciding whether to trust it.
Put the three side by side and the divide is the same each time: either there's a rule for the operator to govern, or there isn't, and a person is approving outputs in its place. On orders there's a rule, and the operator governs it. On support there's no rule that can carry what the analyst adds, so they still read each answer. On reporting nothing arrives for a rule to read, so someone still believes each number. Where a person is still approving outputs one at a time, authority isn't there — not because the people are slow or the answers are bad, but because approving each output is what's left when there's no rule yet to govern. It isn't oversight of the machine. It's a person standing in for an authority that isn't there yet.
So that's where we've landed. The test of whether we've actually governed a surface was never how good the answers got. It's whether anyone's still saying yes to them one by one. Where they are, we're not finished — however sharp the answers look. Where they're not — where the only thing left for a person to do is change the rule when the rule turns out to be wrong — that's the whole of it. It's not a quieter way of checking everything. It's the end of checking things one at a time.
