Same here. I have now deleted 43k and counting lines of my codebase. There is no point in putting any AI code into production anymore as it almost always uses none or the wrong abstractions.
When you try to throw more agents at the problem or even more verification layer, you just kill your agility even if they would still be able to work
I thought it was a good article, till I saw the Slack example.
The copy doesn’t even remotely grasp the scale of what the actual Slack sofware does in terms of scale, relaiability, observability, monitorability, maintability and pretty sure also functionality.
Author only writes about the non-dev work as difference, which seems like he doesn’t know what he’s talking about in all, and what running an application at that scale actually means.
This "clone" doesn’t get you any closer to an actualy Slack copy than a white piece of paper
I had the same experience (though I agree with other comments that the numbers are a little optimistic in terms of variance; I think there's a huge amount of variance in product work, you can't know what's a good investment until it's too late, many companies fail because of this, and there's huge survivorship bias in the ones that get lucky and don't initially fail). Slack spent tons of money in terms of product and engineering hours finding out what works and what doesn't. It's easy to copy/paste the thing after all that effort. Copy/paste doesn't get you to the next Slack though--it can get you to Microsoft's Slack-killing Teams strategy, but we obviously don't want more of that. And, obviously I agree with you about all the infra/maintenance costs, costs in stewarding API usage and extensions, etc. LLMs won't do any of that for you.
Yeah, I can build a Slack "clone" in a couple of weeks with my own two hands, no AI required. But it's not going to actually be competitive with Slack.
Just to pick an incredibly, unbelievably basic enterprise feature, my two-week Slack clone is not going to properly support legal holds. This requires having a hard override for all deletion and expiration options anywhere in the product, that must work reliably, in order to avoid accidental destruction of evidence during litigation, which comes with potentially catastrophic penalties. If you don't get this right, you don't sell to large corporations.
And there are hundred other features like this. Engineering wants an easy-to-use API for Slack bots. Users want reaction GIFs. You need mobile apps. You need Single Sign-On. And so on. These are all table stakes.
It was a cliche for many years that Microsoft Word had "too many features." So people would start companies to sell "lightweight word processors" that only implemented "the most used 20% of features." And most of these companies sank without a trace (with a couple of admirable exceptions that hyperfocused on specific niches). Google finally made progress against the monopoly, but to it, they actually invested in a huge number of features.
Believe me, I wish that "simple, clean" reimplementations were actually directly competitive with major products. That version of our industry would be more fun. But anyone who thinks that an LLM can quickly reimplement Slack is an utter fool who has never seriously tried to sell software to actual customers.
> It was a cliche for many years that Microsoft Word had "too many features." So people would start companies to sell "lightweight word processors" that only implemented "the most used 20% of features." And most of these companies sank without a trace (with a couple of admirable exceptions that hyperfocused on specific niches). Google finally made progress against the monopoly, but to it, they actually invested in a huge number of features.
The other issue is that yes, perhaps most users only use 20% of the features, but each user uses a different 20% of the features in products like Word. Trust me, it's super hard to get it right even at the end-user level, let alone the enterprise level like you say.
There are at most 5% of the features of word that are common to everyone. Things like spell check everyone uses. Actually I suspect it is more like 0.1% of the features are common, and most people use about 0.3% of the features and power users get up to 5% of the features - but I don't have data, just a guess.
Yeah but 98% of Word features were buried in like 2004. They were added when it was a selling point to use unicorn and gnome icons as your table border in under 100mb of RAM. So we’re talking about 20% of the limited set of features that remain not just for backwards compatibility.
When I look at the big non-tech industry companies that have a chill life and print money. It’s usually the companies that are just the very best in what they do and have a quasi monopoly or so much competitive andvantage that everybody is just using them.
I‘m so happy about this article. I was forming a thought in my head the last couple of days, which is how to describe what it is that makes AI code practically unusable in good systems.
And one of the reasons is the one described in this article and the other is, that you skip training your mental model when you don’t grind these laziness patterns. If you are not in the code grinding to your codebase, you don’t see the fundamental issues that block the next level nor you have the itch to name and abstract it properly so you wont have to worry about in the future, when somebody or you have to extend it.
Knowing your shit is so powerful.
I believe now that my competive advantage is grinding code, whilst others are accumulating slop.
> Then, unprompted, Altman offers up a kind of shocking timeline for the groundbreaking feature of counting: “Maybe another year before something like that works well.” Per Altman, ChatGPT’s voice model doesn’t have the capability of starting a timer or keeping track of time. “But we will add the intelligence into the voice models,” he said.
From my perspective there are some people that have never built real processes in their life that enjoy having some processes now. But agent processes are less reliable slower and less maintenable then a process that is well-defined and architectured and uses llm’s only where no other solution is sufficient. Classification, drafting, summarizing.
I’ve had a Whatsapp assistant since 2023, jailbraked as easy assistant. Only thing I kept using is transcription.
https://github.com/askrella/whatsapp-chatgpt was released 3 years ago and many have extended it for more capabilities and arguably its more performant than Openclaw as it can run in all your chat windows. But there’s still no use case.
I like to experiment with AI flows to make iteration quicker, then once something work investing in is found, back up and build something that's actually repeatable.
Same thing could be said with SKILL.md yet they are highly useful...
Yes you can automate via scripting, but interacting with a process using natural language because every instance could be different and not solid enough to write a spec for, is really handy.
tl;dr: there's a place for "be liberal in what you receive and conservative in what you send", but only now have LLMs provided us with a viable way to make room for "be loosey goosey with your transput"
I understand but there still is usually 80-95% of the skill flow that you can script out that is repeated. Script it out and
simplify your skill, make it more stable, and provide more opportunity to scale it up or down i.e use stronger or weaker models if need be. We should be scripting and forming process first then seeing where we can put AI after that.
The AI for everything thinking is really easy to let infect you. I was trying to figure out how to make some SQL alerting easier to understand quickly. The first thing my brain went to was "oh just shove it into an LLM to pull out the info of what the query is doing". And it unfortunately wasn't until after I said that out loud that I realized that was a stupid idea when you could just run a SQL parser over the query and pull the table names out that way. Far faster, more cost effective, and reliable than asking an LLM to do it.
That’s actually an awesome idea and totally helps to reduce wasting context size - move repeatable instructions to a SKILL.md, and once they’re repeatable and no longer have variability to input, turn it into a tool! Rinse repeat.
Oh nice, you could even eventually turn the whole process including inference into an app so that you’ve cut out the LLM from the whole process saving you execution time
I find that it's usually management that ask for such things "because AI".
I mean using AI is a great way to interpret a query, determine if a helper script already exists to satisfy it, if not invoke a subagent to write a new script.
Problem with your "script" approach is how does that satisfy unknown/general queries? What if for one run you want to modify the behavior of the script?
Exploring with AI doesn’t lead to the same level of learning. They are doing the equivalent of paying to skip the level up of their character and going to the final boss with level 1 armor
I look at it more like speedrunning a level. You're skipping the parts of the level that take up the most time, some times using hacks. Is it universally as much fun as playing the game? No, just like using AI to prototype might get you to the same place, but without the experience of discovery and blockers along the way.
Fully agree with your comment regarding real processes. Being a Six Sigma Black Belt, studying processes and reducing the errors is critical.
The Openclaw processes at the moment scare me.
One really should have digested the manifold hypothesis. It’s the most likely explanation of how AI works.
The question is if there are ultradimensional patterns that are the solutions for meaningful problems. I’m saying meaningful, because so far I’ve mainly seen AI solve problems that might be hard, but not really meaningful in a way that somebody solving it would gain a lot of it.
However if these patterns are the fundamental truth of how we solve problems or they are something completely different, we don’t know and this is the 10 Trillion USD question.
I would hope its not the case, as I quite enjoy solving problems. Also my gut feeling tells me it’s just using existing patterns to solve problems that nobody tackled really hard. It also would be nice to know that Humans are unique in that way, but maybe this is the exact same way we are working ? This really goes back to a free will discussion. Yes very interesting.
But just to give an example on what I mean on meaningful problems.
Can an AI start a restaurant and make it work better than a human. (Prompt: "I’m your slave let’s start a restaurant)
Can an AI sign up as copywriter on upwork and make money? (Prompt: "Make money online")
Can an AI without supervision do a scientific breakthrough that has a provable meaningful impact on us. Think about("Help Humanity")
Can an AI manage geopolitics..
These are meaningful problems and different to any coding tasks or olympiad questions. I’m aware that I’m just moving the goalpost.
One of AI’s strengths is definitely exploration, f.e. in finding bugs, but it still has a high false positive rate. Depending on context that matters or it wont.
Also one has to be aware that there are a lot of bugs that AI won’t find but humans would
I don’t have the expertise to verify this bug actually happened, but I’m curious.
It's not even clear if AI was used to find the bug: they mention modeling the software with an "ai native" language, whatever that means. What is not clear is how they found themselves modeling the gyros software of the apollo code to begin with.
But, I do think their explanation of the lock acquisition and the failure scenario is quite clear and compelling.
Anyways, it seems it would take a dedicated professional serious work to understand if this bug is real. And considering this looks like an Ad for their business, I would be skeptical.
(Apache Drools is an open source rule language and interpreter to declaratively formulate and execute rule-based specifications; it easily integrates with Java code.)
That does not answer my confusion, especially when static analysis could reveal the same conclusion with that language. It's not clear what role ai played at all.
The article does not explain anything about how they used AI—it just has some relation with the behavioral model a human seems to have written (and an AI does not seem necessary to use!)
Where do you think my confusion came from? All it says is that ai assists in resolving the gyroscope lock path, not why they decided to model the gyroscope lock path to begin with.
Please, keep your offensive comments to yourself when a clarifying comment might have sufficed.
When you try to throw more agents at the problem or even more verification layer, you just kill your agility even if they would still be able to work
reply