It’s been a while since I’ve written about AI. The last time I wrote, the catalyst was a divergence between my own thinking and the Hacker News zeitgeist. I think things are getting divergent again, so time to share.
Two quick “calibrating” data points so you know where I’m coming from:
- On the “Agentic Adoption” curve, I’m probably between 6 and 7 as far as my main, production-path coding. Almost exclusively Claude code, but I mix it up fairly often. I’m one of the apparently dwindling number who still use Cursor auto complete. I don’t have “swarms of AI agents” coding stuff – I review everything AI writes (to varying levels). I wouldn’t be surprised if this changed this year though, but as you’ll find in this post, I have some considerable, durable concerns at this point.
- My company, Truss, is doing quite a bit with AI. We wrote our first commit two days after ChatGPT was released in November 2022. A week later ChatGPT had 1 million users. We had zero 🤣. Are we “AI native?” Well, here are two data points, and you can judge if we’re “AI native”: (1) our AI bill is about as large as our AWS bill, and (2) at this point, if there’s an AI outage, about half of our major features would significantly degrade.
(Also, a random but ironic point. I’m laboriously typing this with my thumbs on a plane with bad WiFi. No AI. No slop. — any AI cliches are mine alone — sadly 😉)
Ok, here are some of my observations about where we are with AI, in no particular order:
The best AI thing I’ve read in this space so far
I’m a big fan of Thomas Ptacek. His article on the Fly.io blog may have missed your notice, because it doesn’t even have the words AI in it: The Design & Implementation of Sprites. Read it. If you think it doesn’t have to do with AI, please re-read. This is a piece of great engineering, but more importantly it’s clear that Fly.io deeply understands what agentic stuff is going on, where the pressure points are, and how to provide services that anticipate those needs. If you’re one of those “SaaS is dead” people, the service Thomas is describing and its value proposition is what a strong counterpoint looks like.
The worst thing I’ve read so far
Well, actually you should read it. Steve Yegge’s “Welcome to Gas Town.” It’s highly entertaining, so worth the read from that perspective alone. But it’s also the worst thing I’ve read, because it is the highest quality example I know of in the genre of delusional, AI agent hype, fever-dream posts that I continuously see echoed on LinkedIn all the time.
Claude Opus 4.6 is smarter than me
I hate admitting this, but I’m surprised I haven’t seen more people acknowledging this because it has some important consequences (see Kernigan’s Law below).
AI is officially smarter than me:
1. It can solve problems that I can’t.
2. It debugs faster than me, and can find more subtle bugs (this in particular stings, I consider myself to be a good debugger).
3. And crucially, it writes less bugs – particularly basic logic errors (if-else statement logic missing an elsif case, corner cases, writing regex). It’s debugging ability is different though – it’s strongest on bugs where it benefits from being able to evenly apply it’s attention over larger spans of code, whereas my attention is much narrower but perhaps a bit deeper at its best. I realize this seems contradictory to other parts of this post but, that’s part of why I’m writing this. I need to figure this out.
Vibe coding has two deadly weaknesses: Dunning-Krueger and Kernigan’s Law
Let me start off by saying that I really, really love it when Claude elegantly solves a gnarly problem I’ve been wrestling with, sometimes for months. In these moments, I too think anything is possible. This is what drives the AI hype on Hacker News, and I get it, and yes it is exciting.
But I cannot ignore the other reality too: I regularly – like at least 50% of the time – have to throw out Claude’s code or do a major redirect.
Basically, Claude code lacks taste, sometimes. And to quote Steve Jobs: “And when I say Claude lacks taste, I don’t mean that in a small way, I mean it in a big way.”
Let me be specific about the failure cases to see if they resonate or if maybe it’s evidence that my AGENTS.md is garbage:
- Writing too much code. At least twice in the past weeks, I’ve looked at Claude’s “done, verified” code, and redirected it to a solution that was in one case 40 lines down to 2 lines. And in another several hundred lines down to 50 lines. In neither case did the code not work, at least superficially. I can’t begin to describe how unnerving this is. Claude has absolutely no incentive to write maintainable code from what I can tell.
- Not finding an existing pattern/design that is directly applicable. I actually subconsciously save myself from this even more often because I include a reference of where to find the pattern in my prompt. Now to Claude’s credit, part of its magic is how often it DOES find the right pattern, unprompted – that feels great. But when it doesn’t find it, it feels like a big liability.
- Claude does not optimize for human understanding. I regularly have to eat humble pie and tell Claude “look, I’m sure what you wrote works and everything, but I have no idea how this works…wouldn’t it be simpler if you just looped over the array twice instead of doing this complicated regex cache thing that works in one pass?” To which Claude replies. “Absolutely, that’s a good call. This was premature optimization, let’s rewrite it.” Maybe I could put “don’t optimize prematurely” in the prompt, but I have a sneaking feeling that won’t help.
Personal observation: the frequency that I do one of the above redirects is WAY higher in code I know well. This really bothers me, because the unavoidable conclusion is that it feels like a modified version of Dunning-Kruger: I seem to be overestimating Claude’s ability in areas I know poorly. This is the main reason I haven’t moved to hands-off orchestration yet – if I don’t understand what’s being written, I am really not confident, at least with the current state of the art, that what’s being written is any good. The only thing I’ll know is whether it works, and unfortunately, even that is true in a very narrow sense.
And the second concern with this is Kernigan’s Law: debugging code requires more intelligence than writing that code. So if you write maximally clever code, you will by definition not be able to debug it.
Isn’t this exactly what we’re doing with AI written PRs that we LGTM and merge in? Is this going to be a cycle where we continuously NEED better frontier models, just to debug the crap the previous frontier model wrote?
Simon Wilson wrote an excellent article yesterday where one point was “code is cheap.” That’s true. And yet, while code being cheap probably does change the building costs, hasn’t maintaining code always been the more expensive cost, long term, driving engineering decisions? And from what I’m seeing, if we aren’t heavily factoring that during planning, aren’t we going to end up skipping over the hard but necessary design questions?
It feels like the discourse around the changes that will happen now that “code is cheap” are all positively framed. But what about:
- The fact that code is cheap now leads to almost unbelievable pressure to “just Claude” whatever random feature pops into an executive or PM’s (or engineer’s!) head. This is at least not a wholly good thing, even from the business perspective.
- It used to be that up-front development costs were a flawed but somewhat proportional proxy for future maintainability costs. How does this look-ahead happen, now that up front code costs are 2-10x lower?
- If only AI is smart enough to maintain the code it’s written, and Kernigan’s Law is true, how dependent are we making ourselves on AI capability continuing to go up?
I don’t need smarter frontier models – I need a model that’s better at saying “no”
For my use cases, AI doesn’t need to be smarter or get 110% on the AIME or SWE Bench or whatever, it needs to learn how to say no. The recent advances are mostly for narrow STEM and true research (which is exciting).
I need a model that can say “no” at the right times.
You’ve probably worked at places where engineering never said no. It does not end well for the business.
And now with things like OpenClaw, everyone has their own yes-man engineer.
Why is it suddenly acceptable that we’re letting AI get away with never saying no?
I need to get to the point where AI can say “look, I could do this, but we’ve tried 3-4 solutions now, and the solution sucks – I don’t think the business is going to get more valuable by implementing this exact thing, we need to go back to the drawing board and come back to this later.” My engineering team routinely does this, and it drives my CEO and product crazy, but it’s probably one of the most valuable things we do.
There’s a corollary to this: the only way to avoid the Lethal Trifecta, which is the main blocker in my opinion to agentic assistants doing any meaningful actions that require even an ounce of responsibility, is if an Agent can reliably say no. To you. To the prompt injection. Currently we are far, far away from this. Maybe we need an always-on “discernment-toolcall” that allows AI to phone a friend. This just isn’t happening though. Then again, no one has owned this guy yet, from what I can see!
To wrap it up: AI tech you probably shouldn’t use in 2026
- Don’t build MCP into your product. What?! Yes. There are better ways to do this now that will generalize and scale as the complexity grows. If you did MCP already it’s fine – look, we’re all experimenting here.
- Don’t use OpenClaw. At least not in production. It is so exciting and hilarious to watch it honeybadger a problem, and the future is bright. But for now, the security concerns are terrifying and for all the reasons above of why Claude is deficient in certain ways, the idea of an agent self-modifying its own code seems like it’s not going to have a happy ending. We have some work to do.
- Don’t use Vector search. Unless you’re building Bing. Tool calling with regex has completely won. This is sad, because vector search is a really cool concept (cosine similarity applied to these byte-array text representations maps to conceptual closeness, that’s craaazy!)
- Don’t fine tune. This is the Bitter Lesson applied on a small scale. You spend money fine tuning to save, then openAI drops the price on mini or Gemini drops the price on flash. The rate of cost/model change is too great for now. Of course there are counter examples but they are very niche.
- Don’t use agentic frameworks. The truth is, no one has the architecture figured out, even a little. For now, roll it yourself.
Let me know your thoughts!

