A Man Without a Built-In LLM

Every generation gets to feel old about something. My father felt old when cars stopped having carburetors. His father felt old when tractors replaced the kind of farming where you understand each plant individually. I feel old, specifically, about error messages.

This is not where I expected to end up.

When Errors Were Instructive

The thing about a computer that was wrong in 1998 was that it told you something about itself when it failed. The error message was an artifact of the system’s actual internals — a stack trace that meant something, an exception from a specific line in a specific file, a failure mode that corresponded to a real place in the code.

Modern AI systems fail differently. They fail with confidence. They fail in complete sentences. They fail in ways that are indistinguishable from success until you check the output against reality, which you increasingly do not, because the output sounds so certain.

I have started calling this the authoritative hallucination problem, which is not original — everyone has noticed it — but I want to describe the specific texture of it in an enterprise context.

The CIO Problem

My job involves making decisions about technology adoption for a large organization. When a new tool arrives, I evaluate it. The evaluation process used to involve: what does it do, how does it fail, what do the failures tell you.

With AI systems, the failure taxonomy is unfamiliar. The system does not fail in ways that correspond to its internals. It fails in ways that correspond to the distribution of text it was trained on, which is a completely different kind of thing, and a kind that my twenty years of systems experience did not prepare me to recognize quickly.

I am adapting. But I want to register that this is a real adaptation, not a trivial one. The people saying that experienced engineers will quickly get good at prompting are right, and they are also understating how different the mental model has to become.

What I Actually Think

The technology is real. The capability improvement is real. The thing I am skeptical of is the speed at which people have decided they understand it.

My grandmother’s samogon apparatus had a specific failure mode: if the pressure fitting on the condensing coil developed a small leak, the output would smell faintly of solder. Once you knew that, you checked for it. The system taught you how to use it by failing in legible ways.

We do not yet know, at a societal level, what the legible failure modes of LLMs are. We are discovering them through incidents, which is how we always discover failure modes, and which is uncomfortable at the scale these systems are being deployed.

I have a US patent on anticipating customer calls before they happen. The invention is about prediction from behavioral signals. I think about that work when I think about what it would mean to have a reliable failure-prediction signal for AI systems in production.

We will get there. We are not there yet. That is my considered position as a man without a built-in LLM, which is what everyone was, just two years ago.