Ken Kantzer's Blog

logging my thoughts on technology, security & management

Let’s be honest about AI

It’s been a while since I’ve written about AI. The last time I wrote, the catalyst was a divergence between my own thinking and the Hacker News zeitgeist. I think things are getting divergent again, so time to share.

Two quick “calibrating” data points so you know where I’m coming from:

  1. On the “Agentic Adoption” curve, I’m probably between 6 and 7 as far as my main, production-path coding. Almost exclusively Claude code, but I mix it up fairly often. I’m one of the apparently dwindling number who still use Cursor auto complete. I don’t have “swarms of AI agents” coding stuff – I review everything AI writes (to varying levels). I wouldn’t be surprised if this changed this year though, but as you’ll find in this post, I have some considerable, durable concerns at this point.
  2. My company, Truss, is doing quite a bit with AI. We wrote our first commit two days after ChatGPT was released in November 2022. A week later ChatGPT had 1 million users. We had zero 🤣. Are we “AI native?” Well, here are two data points, and you can judge if we’re “AI native”: (1) our AI bill is about as large as our AWS bill, and (2) at this point, if there’s an AI outage, about half of our major features would significantly degrade.

(Also, a random but ironic point. I’m laboriously typing this with my thumbs on a plane with bad WiFi. No AI. No slop. — any AI cliches are mine alone — sadly 😉)

Ok, here are some of my observations about where we are with AI, in no particular order:

The best AI thing I’ve read in this space so far

I’m a big fan of Thomas Ptacek. His article on the Fly.io blog may have missed your notice, because it doesn’t even have the words AI in it: The Design & Implementation of Sprites. Read it. If you think it doesn’t have to do with AI, please re-read. This is a piece of great engineering, but more importantly it’s clear that Fly.io deeply understands what agentic stuff is going on, where the pressure points are, and how to provide services that anticipate those needs. If you’re one of those “SaaS is dead” people, the service Thomas is describing and its value proposition is what a strong counterpoint looks like.

The worst thing I’ve read so far

Well, actually you should read it. Steve Yegge’s “Welcome to Gas Town.” It’s highly entertaining, so worth the read from that perspective alone. But it’s also the worst thing I’ve read, because it is the highest quality example I know of in the genre of delusional, AI agent hype, fever-dream posts that I continuously see echoed on LinkedIn all the time.

Claude Opus 4.6 is smarter than me

I hate admitting this, but I’m surprised I haven’t seen more people acknowledging this because it has some important consequences (see Kernigan’s Law below).

AI is officially smarter than me:

1. It can solve problems that I can’t.

2. It debugs faster than me, and can find more subtle bugs (this in particular stings, I consider myself to be a good debugger).

3. And crucially, it writes less bugs – particularly basic logic errors (if-else statement logic missing an elsif case, corner cases, writing regex). It’s debugging ability is different though – it’s strongest on bugs where it benefits from being able to evenly apply it’s attention over larger spans of code, whereas my attention is much narrower but perhaps a bit deeper at its best. I realize this seems contradictory to other parts of this post but, that’s part of why I’m writing this. I need to figure this out.

Vibe coding has two deadly weaknesses: Dunning-Krueger and Kernigan’s Law

Let me start off by saying that I really, really love it when Claude elegantly solves a gnarly problem I’ve been wrestling with, sometimes for months. In these moments, I too think anything is possible. This is what drives the AI hype on Hacker News, and I get it, and yes it is exciting.

But I cannot ignore the other reality too: I regularly – like at least 50% of the time – have to throw out Claude’s code or do a major redirect.

Basically, Claude code lacks taste, sometimes. And to quote Steve Jobs: “And when I say Claude lacks taste, I don’t mean that in a small way, I mean it in a big way.”

Let me be specific about the failure cases to see if they resonate or if maybe it’s evidence that my AGENTS.md is garbage:

  1. Writing too much code. At least twice in the past weeks, I’ve looked at Claude’s “done, verified” code, and redirected it to a solution that was in one case 40 lines down to 2 lines. And in another several hundred lines down to 50 lines. In neither case did the code not work, at least superficially. I can’t begin to describe how unnerving this is. Claude has absolutely no incentive to write maintainable code from what I can tell.
  2. Not finding an existing pattern/design that is directly applicable. I actually subconsciously save myself from this even more often because I include a reference of where to find the pattern in my prompt. Now to Claude’s credit, part of its magic is how often it DOES find the right pattern, unprompted – that feels great. But when it doesn’t find it, it feels like a big liability.
  3. Claude does not optimize for human understanding. I regularly have to eat humble pie and tell Claude “look, I’m sure what you wrote works and everything, but I have no idea how this works…wouldn’t it be simpler if you just looped over the array twice instead of doing this complicated regex cache thing that works in one pass?” To which Claude replies. “Absolutely, that’s a good call. This was premature optimization, let’s rewrite it.” Maybe I could put “don’t optimize prematurely” in the prompt, but I have a sneaking feeling that won’t help.

Personal observation: the frequency that I do one of the above redirects is WAY higher in code I know well. This really bothers me, because the unavoidable conclusion is that it feels like a modified version of Dunning-Kruger: I seem to be overestimating Claude’s ability in areas I know poorly. This is the main reason I haven’t moved to hands-off orchestration yet – if I don’t understand what’s being written, I am really not confident, at least with the current state of the art, that what’s being written is any good. The only thing I’ll know is whether it works, and unfortunately, even that is true in a very narrow sense.

And the second concern with this is Kernigan’s Law: debugging code requires more intelligence than writing that code. So if you write maximally clever code, you will by definition not be able to debug it.

Isn’t this exactly what we’re doing with AI written PRs that we LGTM and merge in? Is this going to be a cycle where we continuously NEED better frontier models, just to debug the crap the previous frontier model wrote?

Simon Wilson wrote an excellent article yesterday where one point was “code is cheap.” That’s true. And yet, while code being cheap probably does change the building costs, hasn’t maintaining code always been the more expensive cost, long term, driving engineering decisions? And from what I’m seeing, if we aren’t heavily factoring that during planning, aren’t we going to end up skipping over the hard but necessary design questions?

It feels like the discourse around the changes that will happen now that “code is cheap” are all positively framed. But what about:

  1. The fact that code is cheap now leads to almost unbelievable pressure to “just Claude” whatever random feature pops into an executive or PM’s (or engineer’s!) head. This is at least not a wholly good thing, even from the business perspective.
  2. It used to be that up-front development costs were a flawed but somewhat proportional proxy for future maintainability costs. How does this look-ahead happen, now that up front code costs are 2-10x lower?
  3. If only AI is smart enough to maintain the code it’s written, and Kernigan’s Law is true, how dependent are we making ourselves on AI capability continuing to go up?

I don’t need smarter frontier models – I need a model that’s better at saying “no”

For my use cases, AI doesn’t need to be smarter or get 110% on the AIME or SWE Bench or whatever, it needs to learn how to say no. The recent advances are mostly for narrow STEM and true research (which is exciting).

I need a model that can say “no” at the right times.

You’ve probably worked at places where engineering never said no. It does not end well for the business.

And now with things like OpenClaw, everyone has their own yes-man engineer.

Why is it suddenly acceptable that we’re letting AI get away with never saying no?

I need to get to the point where AI can say “look, I could do this, but we’ve tried 3-4 solutions now, and the solution sucks – I don’t think the business is going to get more valuable by implementing this exact thing, we need to go back to the drawing board and come back to this later.” My engineering team routinely does this, and it drives my CEO and product crazy, but it’s probably one of the most valuable things we do.

There’s a corollary to this: the only way to avoid the Lethal Trifecta, which is the main blocker in my opinion to agentic assistants doing any meaningful actions that require even an ounce of responsibility, is if an Agent can reliably say no. To you. To the prompt injection. Currently we are far, far away from this. Maybe we need an always-on “discernment-toolcall” that allows AI to phone a friend. This just isn’t happening though. Then again, no one has owned this guy yet, from what I can see!

To wrap it up: AI tech you probably shouldn’t use in 2026

  • Don’t build MCP into your product. What?! Yes. There are better ways to do this now that will generalize and scale as the complexity grows. If you did MCP already it’s fine – look, we’re all experimenting here.
  • Don’t use OpenClaw. At least not in production. It is so exciting and hilarious to watch it honeybadger a problem, and the future is bright. But for now, the security concerns are terrifying and for all the reasons above of why Claude is deficient in certain ways, the idea of an agent self-modifying its own code seems like it’s not going to have a happy ending. We have some work to do.
  • Don’t use Vector search. Unless you’re building Bing. Tool calling with regex has completely won. This is sad, because vector search is a really cool concept (cosine similarity applied to these byte-array text representations maps to conceptual closeness, that’s craaazy!)
  • Don’t fine tune. This is the Bitter Lesson applied on a small scale. You spend money fine tuning to save, then openAI drops the price on mini or Gemini drops the price on flash. The rate of cost/model change is too great for now. Of course there are counter examples but they are very niche.
  • Don’t use agentic frameworks. The truth is, no one has the architecture figured out, even a little. For now, roll it yourself.

Let me know your thoughts!

GPT is the Heroku of AI

4507
Views
Points
Comments

I read a comment on HN that sparked this article: GPT is kind of like DevOps from the early 2000s.

Here’s the hot take: I don’t see the primary value of GPT being in its ability to help me develop novel use cases or features – at least not right now.

The primary value is that it MASSIVELY lowers the barrier of entry to machine learning features for startups.

What’s my line of reasoning? Well, here are some surprising things about how we use it:

Continue reading

Lessons after a half-billion GPT tokens

43043
Views
Points
Comments

My startup Truss (gettruss.io) released a few LLM-heavy features in the last six months, and the narrative around LLMs that I read on Hacker News is now starting to diverge from my reality, so I thought I’d share some of the more “surprising” lessons after churning through just north of 500 million tokens, by my estimate.

Some details first:

– we’re using the OpenAI models, see the Q&A at the bottom if you want my opinion of the others

– our usage is 85% GPT-4, and 15% GPT-3.5

Continue reading

The Parable of the Wise Hiring Manager

367
Views
Points
Comments

One day, while The Manager was walking back from a morning coffee run, a group of frazzled engineers came near and spake unto him, saying: “Most Esteemed Boss, we are unable to hire Talent – and many of our candidates refuse to take our coding challenges! The labor market is tight and our staff are burning out, should we not ease our process and forgo coding challenges, so that we might secure butts-in-seats more quickly?”

The Manager turned to them, and upon seeing that they were truly desperate, sought to teach them them using this parable:

Continue reading

Learnings from 5 years of tech startup code audits

89425
Views
Points
Comments

While I was at PKC, our team did upwards of twenty code audits, many of them for startups that were just around their Series A or B (that was usually when they had cash and realized that it’d be good to take a deeper look at their security, after the do-or-die focus on product market fit).

It was fascinating work – we dove deep on a great cross-section of stacks and architectures, across a wide variety of domains. We found all sorts of security issues, ranging from catastrophic to just plain interesting. And we also had a chance to chat with senior engineering leadership and CTOs more generally about the engineering and product challenges they were facing as they were just starting to scale.

It’s also been fascinating to see which of those startups have done well and which have faded, now that some of those audits are 7-8 years ago.

Continue reading

The Unreasonable Effectiveness of Secure-by-default

This is one in a series of deeper-dives into various Learnings from 5 years of tech startup code audits. In that article, I list several observations I had during the course of doing code audits fro 20-30 tech startups at or around the Series A / B mark.

Security seems to be on the up-and-up, despite all the bad news you hear in the media. What’s driven this improvement? Well, frameworks, cloud infrastructure, and the big cloud platforms have been hard at work creating the ”pit of success” — places where users just fall into being secure rather than having to fight to be secure — as a way to discourage the most severe security practices has by and large been an enormous success. This article is about our on-the-ground observations of these success stories.

We started doing code audits in 2014.

Continue reading

You Don’t Need Hundreds of Engineers to Build a Great Product

This is one in a series of deeper-dives into various Learnings from 5 years of tech startup code audits. In that article, I list several observations I had during the course of doing code audits fro 20-30 tech startups at or around the Series A / B mark.

We did several code audits for companies that rapidly scaled their engineering orgs relatively early on (we’re talking 50-100 engineers, maybe 10-35M annual revenue, series A/B). None of them are doing well right now, and some are out of business.

What made this observation interesting is how different it is from stories of the well-known counterexamples (Uber, FB, etc, etc, etc), where it seems like the ability to double headcount every 6 months is assumed to be a key component of their ability to scale rapidly with hypergrowth.

But I think we are taking the wrong lesson from these success stories. Here’s the real lesson:

You don’t grow engineering headcount like crazy in order to achieve hyprgrowth: these companies underwent hypergrowth first, and then are forced to grow headcount rapidly.

Typically the architectures we saw from these large engineering groups were not bad: they just were not really necessary. We often looked at the code and the infrastructure and kind of scratched our heads: why were they doing all this? It just didn’t seem to make sense. But then we talked to the very smart engineers they had, and it seemed like there were good explanations for most things. Looking back, I think a lot of the complexity was frankly busy work: you bring in a lot of engineers without enough truly business-essential work to do, and they will come up with things to do. This is not a crack on them, it’s just human nature.

Technology ROI Discussions are Broken

A note before we begin: I’m arguing that technology ROI discussions are broken, not that ROI as a decision-making tool is broken. A solid understanding of how to calculate and use ROI is an essential skill for any tech executive, and when done right, it’s a powerful decision-making tool. This post is about how technology discussions that exclusively look at ROI often result in a one-eyed analysis that lacks depth.

Technical leaders need a wider range of tools for communicating the value of technology, and especially technology innovation. Communicating the value of technology is not a trivial task—and the point of this post is that exclusive reliance on the most commonly used tool for communicating value—Return on Investment (ROI)—will lead to broken discussions.

Continue reading

5 Software Engineering Foot-guns

7044
Views
Points
Comments

“In the brain of all brilliant minds resides in the corner a fool.”

Aristotle

Writing about “Best Practices” can get boring, so I thought I’d take a break this week, and write about some bad engineering practices that I’ve found the absolute hardest to undo once done. Real foot-guns, you could say.

Each of these is a bit controversial in its own way, and that’s how it should be—I’d welcome any counter-views. The prose in this post is a bit more irreverent than normal—in most cases, I’m poking fun at myself (both past and present!), as I’ve been guilty of each of these foot-guns—and a lot of them, frankly I still struggle with. Hopefully this post will generate some “motivation through transparency” 🙂

Engineering Foot-gun #1—Writing clever code instead of clear code

It’s because optimizing is fun. https://xkcd.com/1691
Continue reading

The Backlog Peter Principle

A few years ago, I was in one of my ruts. Everything I was working on seemed to be bogged down or low-leverage. What was so frustrating was that this had come on the heels of a few amazingly productive months, where I had gotten a lot done. Worse yet, this seemed to happen cyclically: periods of productivity and a sense of accomplishment were followed by periods of delays and a sense of frustration.

Coincidentally, around the same time, I had just heard of the Peter Principle, which goes something like this:

“Every employee tends to rise to their level of incompetence”

 Laurence J. Peter, THE Peter Principle (1969)
Continue reading

How to find great senior engineers

3077
Views
Points
Comments

Hiring experienced engineers is one of the most difficult and important things that engineering leaders have to pull off. But it’s hard to gauge experience in a series of short interviews. I’ve definitely worked with some amazing engineers who probably wouldn’t have been hired in some of my previous hiring pipelines.

Here are some tips on things I’ve found that work and don’t work.

Continue reading

The Googler’s Dilemma: Why Experience Will Always Have a Premium

I’ve been thinking recently about how to discover and hire great engineers in the hottest job market in decades. One of the biggest hurdles to hiring good engineers, and especially experienced engineers, is that they’re so. unbelievably. expensive.

Just take a look at some of the total compensation packages on levels.fyi:

Data is for GOOG (other companies are similar). Courtesy of the data at Levels.fyi (htps://www.levels.fyi/charts.html). This data was eyeballed quickly into a spreadsheet, check the source for actuals.
Continue reading

5 Red Flags Signaling Your Rebuild Will Fail

372
Views
Points
Comments

There’s always a reason to rebuild your app. Always. But once you’ve been through a few rebuilds, you realize that talk of rebuilds, like talk of tax reform or anarchy, is just a tad bit dangerous—you never know what kind of danger you’ll end up in if you actually convince yourself to go through with it.

Continue reading

Core Control #6: Log Everything

The core principle is this: fish nets over fishing lines. In the case of security monitoring, fish nets are alerting on anomalies, where anomalies are defined as universal constants that have been broken. Fishing lines are manual search procedures. Phrase this principle like this addresses the two seemingly intractable problems with security monitoring:

Continue reading

Core Principle #2: Know Your Software

The same Golden Rule that applies to hardware applies to software: know what you have. No user on your systems should be able to install an executable onto a company device without the approval of security. This may seem like a draconian policy (and a short-circuit process does have to be in place for certain technology-heavy teams like R&D or the dev team), but it’s necessary.

Continue reading

Core Principle #1: Know Your Hardware

There are only six controls in the Top 20 list that are designated “Basic,” and an inventory of your hardware is number one. I actually would like to rephrase this control slightly, so it better fits the core principle I wanted to highlight: if there was ever a Golden Rule in enterprise security, it’s this: know what you have.

Continue reading

© 2026 Ken Kantzer's Blog

Theme by Anders NorenUp ↑