Likely, and via vertex on gcp (or whatever they are calling it this year).
Which also means, if you are a big boring AWS or GCP shop, and have a spend commitment with either as part of a long term partnership, it will count towards that. And, you won't likely have to commit to a spend with OpenAI if you want the EU data residency for instance. And likely a bit more transparency with infra provisioning and reserved capacity vs. OpenAI. All substantial improvements over the current ways to use OpenAI in real production.
For sure, but these days product management mistakes can be more easily rectified. Before that, if we invested 4 months in building something that did not land, we'd be quite reluctant to jettison this and start fresh. Egos, career considerations, sunk cost, etc. I think I will soon be able to say "not any more", since doing a U turn can be cheaper than pretending the bad choice is the best choice. "Oops, lets redo this" vs. 6 months of executive squabbling about whose fault it is that we wasted $3M in development costs on something that clearly does not perform.
Also, give it time. Real adoption in boring companies started Q1. Q2 is, I think, this settling in and people learning how to do their work and manage their responsibilities. Q3/Q4 will be the time when I expect to start seeing higher velocities across all IT-adjacent products I use.
As a CTO I can say that this is not my experience.
My experience these days is fighting corporate bureaucracy and inertia to make sure we reap the benefits of faster coding. Feeding agents with work is not a problem. Building teams that use those tools effectively is the problem. (Say, shall we merge product and engineering teams? Do we start getting rid of people who refuse to use AI? What do we do with pentests? How do we strengthen the tools that do code analysis and weed out lazy devs who can now more easily pretend to be invested in their work? Stuff like this keeps me busy.
As a CTO this has been my experience as well. I would add in every non-technical C-suite member aiming to use AI as some magic lever to avoid prioritizing projects or engaging in real critical thinking. Too many people are offloading their cognitive decisionmaking to some magic box, thinking it has all the answers, because its output appears magical and complete.
After 25 years in programming I think I’ll finally start that farm ;)
> Do we start getting rid of people who refuse to use AI?
I don't even think the bigger companies are going to waste time on figuring out how to retrain, they're just going to do industrial scale layoffs and then rebuild from the ground up with people who won't get past interviews without demonstrating hard skills in this area.
There is a shocking gap growing right now, it's a Wile E. Coyote not realizing he already walked off the cliff type of situation for a lot of people.
Ultimately the shareholders want to see the money. They dont give a crap about what you think or what the poster above thinks - you're both accountable to the shareholders who do not employ you for fun. They employ you for the sole purpose of making them wealthier. All this incremental spend on tokens shows up in the financials positively or it doesn't.
> Ultimately the shareholders want to see the money.
Seems like we're saying the same thing?
> All this incremental spend on tokens shows up in the financials positively or it doesn't.
Right, and we're talking about the staff failing to spend the incremental tokens at all, thus failing to discover whether or not they'll show up positively. I'm just saying, investors are probably going to decide to roll the dice on a complete staffing rebuild rather than try to wait for the existing corporate culture to adapt because they're going to get fomo. Arguably it's already happening.
A neutral hobbyist on a $20 budget will build something and immediately bump into quotas. Its not going to be an enjoyable experience.
A negatively predisposed pro who only dabbles in AI gets to the first disappointment, smiles, and thinks "yeah, about what i expected" and quits.
To learn those new tools one needs to not be stingy. Invest as much as needed into tokens, subscriptions, and maybe most importantly invest the time. Spend time building various things. Try out various models not just for coding, but as part of apps being built. For bonus points, meaningfully experiment with local models. I try to avoid discussions with sceptics who have not put at least a few months of effort into learning those tools. It's like discussing driving with my mother in law, who spent maybe 20hrs behind the wheel through her whole life (and is very, very opinionated!).
In my opinion it's a complete waste of time and money to learn something that is gated by a company that might disappear tomorrow.
It's akin to company courses to learn something that is specific to that company. Of course you do them on the job, there is no point in doing them if you don't work there.
Similarly what's the point of trying 300 different models if any job will decide for you which one they approve the use of, and you are liable to get fired and asked for damages if you let anything else access company intellectual property?
What I see in my backyard: coding now takes significantly less time, but its just coding. Before one gets to building there are squabbles between business and product people. Testing takes just as much as it used to. Since nice to haves are easy to add and product people begin to take it for granted, the product cycles don't get shorter.
Give it time. Right now its just coding, but procedural AI will come after product development, architecture, and then whatever is left of management.
The best people can not only envision products but also possess great judgement without needing data. For AI to even come close it would need an insane amount of data that is nuanced and subtle - by the the time the AI has obtained all the necessary data and made sense of it the human is long gone working on something else.
My experience as well, unfortunately. I am really looking forward to reading, in a few years, a proper history of the wild west years of AI scaling. What is happening in those companies at the moment must be truly fascinating. How is it possible, for instance, that I never, ever, had an instance of not being able to use Claude despite the runaway success it had, and - i'd guess - expotential increase in infra needs. When I run production workloads on vertex or bedrock i am routinely confronted with quotas, here - it always works.
We already shipped 3 things this year built using Claude. The biggest one was porting two native apps into one react native app - which was originally estimated to be a 6-7 month project for a 9 FTE team, and ended up being a 2 months project with 2 people. To me, the economic value of a claude subscription used right is in the range of 10-40k eur, depending on the type of work and the developer driving it. If Anthropic jacked the prices 100x today, I'd still buy the licenses for my guys.
Edit: ok, if they charged 20k per month per seat I'd also start benchmarking the alternatives and local models, but for my business case, running a 700M budget, Claude brings disproportionate benefis, not just in time saved in developer costs, but also faster shipping times, reduced friction between various product and business teams, and so on. For the first time we generally say 'yes' to whichever frivolities our product teams come up with, and thats a nice feeling.
Who's going to review that output for accuracy? We'll leave performance and security as unnecessary luxuries in this age and time.
In my experience, even Claude 4.6's output can't be trusted blindly it'll write flawed code and would write tests that would be testing that flawed code giving false sense of confidence and accomplishment only to be revealed upon closer inspection later.
Additionally - it's age old known fact that code is always easier to write (even prior to AI) but is always tenfold difficult to read and understand (even if you were the original author yourself) so I'm not so sure this much generative output from probabilistic models would have been so flawless that nobody needs to read and understand that code.
I am not sure how others are doing this, but here is our process:
- meaningful test coverage
- internal software architecture was explicitly baked into the prompts, and we try to not go wild with vibing, but, rather, spec it well, and keep Claude on a short leash
- each feature built was followed by a round of refactoring (with Claude, but with an oversight of an opinionated human). we spend 50% building, 50% refactoring, at least. Sometimes it feels like 30/70%. Code quality matters to us, as those codebases are large and not doing this leads to very noticeable drop in Claude's perceived 'intelligence'.
- performance tests as per usual - designed by our infra engineers, not vibed
- static code analysis, and a hierarchical system of guardrails (small claude.md + lots of files referenced there for various purposes). Not quite fond of how that works, Claude has been always very keen to ignore instructions and go his own way (see: "short leash, refactor often").
- pentests with regular human beings
The one project I mentioned - 2 months for a complete rewrite - was about a week of working on the code and almost 2 months spent on reviews, tests, and of course some of that time was wasted as we were doing this for the first time for such a large codebase. The rewritten app is doing fine in production for a while now.
I can only compare the outputs to the quality of the outputs of our regular engineering teams. It compares fine vs. good dev teams, IMHO.
The part about refactoring is very interesting and reassuring. I sometimes think I'm holding it wrong when I end up refactoring most of the agent's code towards our "opinionated" style, even after laying it out in md files. Thank you very much for this insight.
Thanks! In our limited experience, Claude does not focus that much on guardrails and code quality when building a feature - but can be pretty focused on code quality and architecture when asked to do just that. So, one a few hours to iterate a feature, a few hours to refactor. Rinse and repeat.
Very nice insight, that’s where the value is, even with a lot of time refactoring, testing and reviewing the compressed code phase is so much gziped than it’s still worth it to use an imperfect LLM. Even with humans we have all those post phases so great structure around the code generation leads to a lot of gains.
It depends on industries and what’s being developed for sure
I don't want to defend LLM written code, but this is true regardless if code is written by a person or a machine. There are engineers that will put the time to learn and optimize their code for performance and focus on security and there are others that won't. That has nothing to do with AI writing code. There is a reason why most software is so buggy and all software has identified security vulnerabilities, regardless of who wrote it.
I remember how website security was before frameworks like Django and ROR added default security features. I think we will see something similar with coding agents, that just will run skills/checks/mcps/... that focus have performance, security, resource management, ... built in.
I have done this myself. For all apps I build I have linters, static code analyzers, etc running at the end of each session. It's cheapest default in a very strict mode. Cleans up most of the obvious stuff almost for free.
> For all apps I build I have linters, static code analyzers, etc running at the end of each session.
I think this is critically underrated. At least in the typescript world, linters are seen as kind of a joke (oh you used tabs instead of spaces) but it can definitely prevent bugs if you spend some time even vibe coding some basic code smell rules (exhaustive deps in React hooks is one such thing).
Well it's all tradeoffs, right? 6 months for 9 FTEs is 54 man months. 2 months for 2 FTEs is 4 man months. Even if one FTE spent two extra months perusing every line of code and reviewing, that's still 6 man months, resulting in almost 10x speed.
Let's say you dont review. Those two extra months probably turns into four extra months of finding bugs and stuff. Still 8 man months vs 54.
Of course this is all assuming that the original estimates were correct. IME building stuff using AI in greenfield projects is gold. But using AI in brownfield projects is only useful if you primarily use AI to chat to your codebase and to make specific scoped changes, and not actually make large changes.
I do greenfield in fluid dynamics and Claude doesn't help: I need to be able to justify each line of my code (the physics part) and using Claude doesn't help.
On the UI side Claude helps a lot. So for me I'd say I have a 25% productivity increment. I work like this: I put the main architecture of the code in place by hand, to get a "feel" for it. Once that is done, I ask Claude to make incremental changes, review them. Very often, Claude does an OK job.
What I have hard times with is to have Claude automatically understand my class architectures: more often than not it tries to guess information about objects in the app by querying the GUI instead of the data model. Odd.
Your estimate of "6-7 month project for a 9 FTE team" was probably waaay off. I mean, what is this mobile app? Without even seeing your app, I would say 2 months TOPS with 2 devs. So, the "AI" version is really not that much better, and probably even worse.
You copied two human coded native apps into a vibe coded react app? If the vibe coding is so good why wouldn't you keep the native apps and vibe code on top of them instead of spending a bunch of money to reach feature parity with a worse version?
Boring corporate Ai will surely come, but hey, lets enjoy the wild west while it lasts. I am grateful to see Boris come here to address problems people face. I 100% sure nobody is making him - he has one of the coolest jobs in the world.
So that means we just eject any critical thinking when it comes to companies, especially where they is no liability or obligation for them (Boris or Anthropic) to be honest.
Don’t like Anthropic? Use a competing service. At this point the sheer volume of your commentary is not particularly complimentary to your own critical thinking skills. It’s not your job to correct the internet or to convince randoms of the rightness of your position. Of all the things in the world to be pissed at so insistently, this seems to be a pretty minor one.
I do M&As at my company - as a cto. I have seen lots of successful companies' codebases, and literally none of them elegant. Including very profitable companies with good, loved products.
The only good code I know is in the open source domain and in the demoscene. The commercial code is mostly crap - and still makes money.
Which also means, if you are a big boring AWS or GCP shop, and have a spend commitment with either as part of a long term partnership, it will count towards that. And, you won't likely have to commit to a spend with OpenAI if you want the EU data residency for instance. And likely a bit more transparency with infra provisioning and reserved capacity vs. OpenAI. All substantial improvements over the current ways to use OpenAI in real production.
reply