AI brokers and unhealthy productiveness metrics

Right here’s just a little little bit of snark from developer John Crickett on X:

Software program engineers: Context switching kills productiveness. Additionally software program engineers: I’m now managing 19 AI brokers and doing 1,800 commits a day.

Crickett’s quip lands completely as a result of it’s not really a joke. It’s a preview of the following administration fad, whereby we change one unhealthy productiveness proxy (strains of code) with a good worse one (agent output), then act stunned when high quality collapses.

And sure, I do know, no person is doing 1,800 significant commits. However that’s the purpose. The metric is already being gamed, and brokers make gaming easy. In case your group begins celebrating “commit velocity” within the agent period, you aren’t measuring productiveness. You’re measuring how shortly your workforce can manufacture legal responsibility.

The nice promise of generative synthetic intelligence was that it could lastly clear our backlogs. Coding brokers would churn out boilerplate at superhuman speeds, and groups would lastly ship precisely what the enterprise desires. The fact, as we settle into 2026, is much extra uncomfortable. Synthetic intelligence is just not going to save lots of developer productiveness as a result of writing code was by no means the bottleneck in software program engineering. The true bottleneck is validation. Integration. Deep system understanding. Producing code and not using a rigorous validation framework is just not engineering. It’s merely mass-producing technical debt.

So what do we modify?

Pondering appropriately about code

First, as I argued lately, we have to cease enthusiastic about code as an asset in isolation. Each single line of code is floor space that have to be secured, noticed, maintained, and stitched into all the pieces round it. As such, making code cheaper to put in writing doesn’t scale back the whole quantity of labor however as a substitute will increase it as a result of you find yourself manufacturing extra legal responsibility per hour.

For years, we handled builders like extremely paid Jira ticket translators. The idea was that you may take a well-defined requirement, convert it to syntax, and ship it. Crickett rightfully factors out that if that is all you’re doing, then you’re completely replaceable. A machine can do primary translation, and a machine is completely completely satisfied to do all of it day with out complaining.

What a machine can not do, nevertheless, is perceive important enterprise context. AI can not really feel the monetary value of a compliance mistake or take a look at a buyer workflow and instinctively acknowledge that the underlying requirement is basically mistaken. For this we’d like individuals, and we’d like individuals to thoughtfully think about precisely what they need AI to do.

Crickett frames this transition as a essential transfer towards spec-driven improvement. He’s proper, however we should be extremely clear about what a specification means within the agent period. It’s not yet one more Jira ticket however, somewhat, a set of constraints tight sufficient to make sure an LLM can’t escape them. In different phrases, it’s an executable definition of accomplished, backed completely by checks, API contracts, and strict manufacturing indicators. That is the precise kind of foundational work we now have underinvested in for many years as a result of it doesn’t appear like uncooked output; it appears like course of. , that “boring stuff” that slows you down.

You may see the friction taking part in out in actual time simply by trying on the feedback to Crickett’s tweet. You’ll discover individuals desperately making an attempt to sq. the circle of agentic improvement. One commenter tries to reframe the chaos by calling it structure versus engineering. One other insists that managing 19 brokers is definitely orchestrating, not context switching. A 3rd bluntly states that working greater than 5 brokers concurrently begins to appear like vibe coding, which is merely a well mannered phrase for playing with manufacturing programs. They’re all highlighting the core difficulty: You haven’t eradicated the work. You’ve simply moved it from implementation to supervision and evaluation.

The extra you parallelize your code era, the extra “evaluation debt” you create.

Observability to the rescue

That is the place Charity Majors, the co-founder and CTO of Honeycomb, turns into pissed off. Majors has argued for years which you could’t actually know if code works till you run it in manufacturing, beneath actual load, with actual customers, and actual failure modes. While you use AI brokers, the burden of improvement shifts completely from writing to validating. People are notoriously unhealthy at validating code merely by studying giant pull requests. We validate programs by observing their habits within the wild.

Now take that concept one step additional into the agent period. For many years, some of the widespread debugging methods was completely social. A manufacturing alert goes off. You take a look at the model management historical past, discover the one that wrote the code, ask them what they have been making an attempt to perform, and reconstruct the architectural intent. However what occurs to that workflow when nobody really wrote the code? What occurs when a human merely skimmed a 3,000-line agent-generated pull request, hit merge, and moved on to the following ticket? When an incident occurs, the place is the deep information that used to reside contained in the creator?

That is exactly why wealthy observability is just not a nice-to-have function within the agent period. It’s the one viable substitute for the lacking human. Within the agent period, we’d like instrumentation that captures intent and enterprise outcomes, not simply generic logs that say one thing occurred. We want distributed traces and high-cardinality occasions wealthy sufficient that we are able to reply precisely what modified, what it affected, and why it failed. In any other case, we’re making an attempt to function a black field constructed by one other black field.

Majors additionally presents important operational recommendation: Deploy freezes are a whole hack. The widespread human intuition when change feels dangerous is to cease deploying. However in case you maintain merging agent-generated code whereas not deploying it, you’re merely batching threat, not lowering it. While you lastly execute a deploy, you’ll have completely no thought which particular AI hallucination simply took down your cost gateway. So if you wish to freeze something, freeze merges. Higher but, make the merge and the deploy really feel like one singular atomic motion. The sooner that loop runs, the much less variance you might have, and the simpler it’s to pinpoint precisely what broke.

Golden paths are the best way

The repair for this impending chaos is to not depend on heroic engineers. As Majors factors out, resilient engineering requires a dedication to platform engineering and golden paths (one thing I’ve additionally argued). Such golden paths make proper habits extremely simple and the mistaken habits extremely exhausting. The most efficient groups of the following decade won’t be those with essentially the most freedom to make use of no matter framework an agent suggests, however as a substitute people who function safely inside the most effective constraints.

So how do you measure success within the agentic period?

The metrics that matter are nonetheless the boring ones as a result of they measure precise enterprise outcomes. The DORA metrics stay the most effective sanity examine we now have as a result of they tie supply pace on to system stability. They measure deployment frequency, lead time for modifications, change failure fee, and time to revive service. None of these metrics cares in regards to the variety of commits your brokers produced immediately. They solely care about whether or not your system can soak up change with out breaking.

So, sure, use coding brokers. Use them aggressively! However don’t confuse code era with productiveness. Productiveness is what occurs after code era, when code is constrained, validated, noticed, deployed, rolled again, and understood. That’s the important thing to enterprise security and developer productiveness.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles