Digital Economy Dispatch #289 -- AI Tokenmaxxing: When the Meter Becomes the Metric

Tokenmaxxing, the search for AI's returns, and a lesson I first learned counting lines of code.

When I started as a software developer more than thirty years ago, the numbers that were supposed to matter were lines of code, test coverage, defect density, and code churn. Each was meant to capture something real about the quality of the work and the productivity of the person producing it. What’s more, they had the great merit of being easily countable.

What became clear before long, though, was that they correlated poorly with the only question that truly mattered: Was any of this improving the experience of the people who actually had to use the software I was building? I have found myself thinking about those early measures a great deal lately, because we appear to be in danger of making the same measurement mistakes with AI.

Over the past few months, the surest status symbol in Silicon Valley was not your title or your stock grant. It was how many AI tokens you burned last month. Engineers compared their monthly token counts the way a previous generation compared lines of code or defects fixed. Some firms built internal leaderboards to crown the heaviest users, and a few began treating "token budgets" as a form of compensation. The practice acquired a name, borrowed from the internet: tokenmaxxing. The premise was simple. The more AI you consume, the more productive you must be.

Rather than just dismissing this, I want to take this seriously, because the joke and the warning are the same thing. Tokenmaxxing looks like a curiosity from the engineering fringe. It is in fact, a near perfect illustration of the question hanging over every boardroom this year: Are we actually getting a return on what we spend on AI, or have we simply found a more sophisticated way to mistake activity for value?

When the meter becomes the scoreboard

A token is the basic unit an AI model reads and writes, and every provider meters it because metering is how they bill their users. That makes tokens one of the few things in an AI workflow that can be counted precisely. And there lies the trap.

Goodhart's Law, named for the economist Charles Goodhart, holds that when a measure becomes a target, it stops being a good measure. Token consumption is an input. It tells you how hard the machine worked, not whether the work was any good, whether it survived review, or whether a customer was better served at the end of it. The moment that input becomes a scoreboard, people optimise the scoreboard. They run agents in parallel, pad their prompts, and automate consumption for its own sake. The number goes up. Whether anything of value was produced is a separate question that the metric was never designed to answer.

This is precisely the trap those early code metrics fell into. Lines of code and churn were easy to tally and reassuring to report, yet a developer could push every one of them in the right (or wrong) direction while shipping software that was slower, more brittle, and harder for anyone to use. Tokenmaxxing is that same confusion reborn in a faster and far more expensive form. The meter is more precise than ever, which only strengthens the temptation to mistake it for value.

What the meter is hiding

This matters now because patience with AI is beginning to run out. After several years of experimentation, boards and investors have shifted from asking what AI might do to demanding details of what it has actually returned. The pressure is real, and it is documented: in one large survey, around three in five senior leaders said they felt more pressure to prove a return on AI than they had a year earlier. Research from MIT's NANDA initiative, in its study of AI in business, found that roughly 95% of enterprise GenAI pilots had produced no measurable P&L impact. Similarly, Forrester's analysts have noted that only a small minority of decision-makers can point to a real earnings lift, and fewer than a third can tie AI spending to a change in the bottom line.

The phrase doing the rounds is "AI sticker shock": The ballooning bill arrives long before the proven benefit. The correction is already underway. By late spring, Microsoft had cancelled internal AI coding subscriptions in several divisions over cost, Meta had quietly removed its tokenmaxxing leaderboard, and Uber had admitted to burning through its entire annual token budget in the first four months of the year. The unease is not confined to finance directors, eithe. A Nature Machine Intelligence editorial recently urged firms to stop tokenmaxxing and deploy AI sensibly instead.

Set the two trends side by side and the absurdity becomes clear. At the precise moment finance directors are demanding to see returns, the culture has produced a metric that rewards inflating the cost. ROI is a fraction. Value over spend. Tokenmaxxing optimises the denominator in the wrong direction and calls the result success. It is the ROI crisis in miniature, lived out one leaderboard at a time.

In fairness, there is a serious counter-argument to be considered. Nvidia's Jensen Huang has reportedly said he expects a top engineer to get through around $250,000 of tokens a month, and the optimistic reading is that the returns on this investment are a timing problem rather than an absence. From this perspective, the genuinely valuable agentic workflows are still being built, the heavy consumption today is the necessary investment, and the P&L impact will follow once those systems mature. That may prove partly true. But it is an argument for patient, governed investment with a clear value hypothesis attached. It is not an argument for a leaderboard. The honest version measures what the spending has changed. The vanity version simply measures the spending.

Why Britain should be watching

For a UK audience there is a further dimension. When token consumption becomes the badge of being serious about AI, that consumption flows overwhelmingly to a small number of providers, almost all of them American.

The dependence is not hypothetical: the US "Big Three" of AWS, Microsoft Azure and Google Cloud already supply cloud services to more than 90% of UK public sector organisations, and the AI layer is being built on top of exactly that base. Every token burned for show, rather than for value, is a small transfer of money and capability offshore. The ROI question is usually framed as a corporate one, a matter for a single company's accounts. At national scale it is also a question of sovereignty and of the balance of payments. Uncontrolled, status-driven demand is the opposite of what I have argued we need, which is to consolidate demand so that we understand and shape it, and to diversify supply so that we are never captive to a single meter.

Sovereignty by default does not mean spending less on AI. It means refusing to let spend become a proxy for progress, and insisting that demand is governed, routed intelligently, and spread across meaningful alternatives. A smart buyer asks a different question from the tokenmaxxer. Not "how much did we use", but "what changed, and what did it cost per outcome we actually accepted". That single shift, from input to outcome, is the whole of the argument.

What matters now

So, before the next dashboard lands on your desk, three questions worth asking of your own organisation.

First, where are you already counting activity and quietly hoping it stands in for value? Second, if you replaced "tokens consumed" or "tools adopted" with "cost per accepted outcome", which of your AI initiatives would still look like a success? And third, for those of us thinking about the country and not only the company: if AI spend is becoming a measure of ambition, who exactly is on the receiving end of it, and what are we building here at home in return?

The meter will keep running either way. The only choice is whether we let it tell us a flattering story or insist that it earns its keep.