Doctors make errors all the time though, so the real argument is about the error percentage. If AIs is lower then it's safer (but it's hard to have that convo, I recognise).
Besides; this article was about diagnosis not prescribing. It's pretty obvious, I think, that diagnosis is one area where AI will perform extremely well in the long run.
I think there are two metrics; the first is outright misdiagnosis, which studies put between 5 and 8% in US/Europe. That's a meaningful number to tackle.
Secondly; overdiagnosis. Where a Dr says on balance it could be X on a difficult to diagnose but dangerous problem (usually cancer). The impact of overdiagnosis is significant in terms of resources, mental health, cost etc.
Do you believe the issue is because they don't have enough technicians to diagnose or because they don't have enough x-ray machines?
Or in a ER environment, how an AI would speed up things in a real way that improves patients' lives?
We just minted the term "cognitive debt" for software engineers that cannot keep up with what the AI spits out. How would that apply to ER doctors, or any other kind of doctor?
I'm not talking in particular about the X rays. It's about general lack of hospitals, equipment and doctors.
In Europe, there are some rich cities which have on average one doctor per hundred people. And there are large areas in Eastern Europe that have ten times less than that.
If you have some unusual symptoms or a little pain somewhere and no access to doctors you will most likely ignore it.
If you can get any diagnosis it can help you e.g. decide to travel to get treatment.
And the locally available alternative for ai diagnosis is a doctor you can get to in few months, who works 80 hours a week and has 10 minutes per patient.
For ai to be valuable you really don't need to be better than average physician in top American clinic.
AI is also excellent at reverse engineering specs from existing code, so you can also ask it to reflect simple iterative changes to the code back into the spec, and use that to guide further development. That doesn't have much of an equivalent in the old Waterfall.
Yeah, if done right. In my experience, such a reimplementation is often lossy, if tests don’t enforce presence of all features and nonfunctional requirements. Maybe the primary value of the early versions is building up the test system, allowing an ideal implementation with that in place.
Or put this way: We’re brute forcing (nicer term: evolutionizing) the codebase to have a better structure. Evolutionary pressure (tests) needs to exist, so things move in a better direction.
What matters ultimately is the system achieves your goals. The clearer you can be about that the less the implementation detail actually matters.
For example; do you care if the UI has a purple theme or a blue one? Or if it's React or Vur. If you do that's part of your goals, if not it doesn't entirely matter if V1 is Blue and React, but V4 ends up Purple and Vue.
I just feel this is a great example of someone falling into the common trap of treating an LLM like a human.
They are vastly less intelligent than a human and logical leaps that make sense to you make no sense to Claude. It has no concept of aesthetics or of course any vision.
All that said; it got pretty close even with those impediments! (It got worse because the writer tried to force it to act more like a human would)
I think a better approach would be to write a tool to compare screenshots, identity misplaced items and output that as a text finding/failure state. claude will work much better because your dodging the bits that are too interpretive (that humans rock at and LLMs don't)
I meant that frame very deliberately. Use of the word AI is misleading people that LLMs are intelligent.
They model what looks like intelligence but with very hard limits. The two advantages they have over human brains are perfect recall and data storage. They are also faster.
But the brain is vastly more intelligent:
- It can learn concepts (e.g. language) with an order of magnitude less information
- It responds in parallel to multiple formats of stimuli (e.g. sight/sound)
- LLMs lack the ability to generalise
- The brain interprets and understands what it experienced
That's just the tip of the iceberg. Don't get me wrong: I use AI, it is by far some of the most impressive tech we have built so far, and it has potential to advance society significantly.
But it is definitely, vastly, less intelligent than us.
The blog frequently refers to the LLM as "him" instead of "it" which somehow feels disturbing to me.
I love to anthropomorphize things like rocks or plants, but something about doing it to an AI that responds in human like language enters an uncanny valley or otherwise upsets me.
From the CDC report [1], it's pretty clear that rabies was not considered for the donor until after the donee died and rabies was confirmed. Possibly because the donor had been scratched by a skunk and not biten. The report says the scratch had been noted on the donor risk assessment interview (DRAI), but that skunks are not considered a reservoir for rabies in his area.
If a manager is handling (almost) all disputes of all sorts, then they will fundamentally lack authority to enforce an outcome on a real dispute. They simply are too involved because resolution requires you to take some sort of side.
If my children won't speak to each other I will refuse to be the go between because I become a proxy for one to the other. If one then punches the other they won't respect my perspective that this was wrong because I've set myself up as the proxy for the others feelings.
If you need a manger to resolve the above example, the org is broken and the engineers are poor engineers.
> If a manager is handling (almost) all disputes of all sorts, then they will fundamentally lack authority to enforce an outcome on a real dispute. They simply are too involved because resolution requires you to take some sort of side.
Bullshit. Being a routine mediator makes you a better mediator when big things come up, not a worse one. It means you are in tune with the particular needs and idiosyncrasies of the people involved, and assuming you are any good at it, it means you have the trust of all parties to mediate fairly.
> If my children won't speak to each other I will refuse to be the go between because I become a proxy for one to the other.
First of all, managing adults and parenting children are two radically different things. Second, being a go between is not handling a dispute, if anything it facilitates the dispute. Kids can't agree on whose turn it is to play with a toy? Toy gets taken away with the understanding they'll get it back when they agree to a system - that's conflict resolution.
> If one then punches the other they won't respect my perspective that this was wrong because I've set myself up as the proxy for the others feelings.
What?
> If you need a manger to resolve the above example, the org is broken and the engineers are poor engineers.
The fact there is this conflict to resolve is evidence that the org is broken and the engineers are poor engineers, but given that there is a conflict, the manager should be the one resolving it, because, again, that is their job.
You may not mean it but I do think sometimes framing it this way implies leading and managing is something that requires less ability (it's a skill in its own right).
What I think is true is people cap out their technical competency, and look to shift their skillset and, globally, we are bad at a) training them to be good managers (because there is a wrong assumption it's an innate skill) and b) weeding out the many who also lack the ability to be a manager.
Agree, it’s a skill, it can be learned and improved, and of course some people have some natural ability.
But for every skill there’s a floor and a ceiling. The floor for managers is imo far lower than it is for tech ICs. Incompetent managers have many options to hide their misdeeds. That doesn’t say anything about the average or the ceiling.
I suspect this is written by someone who stepped into managing a team and no further.
My reflection overall is; he's probably heard of servant leadership but not understood it? It's not about sweeping away problems but more a mindset that your role is to empower. I feel strongly that all new managers should embrace and get good at this because it instills the mindset that the best leaders ultimately only succeed through their team.
A servant leader who becomes overworked is either not doing their job well (delegation isn't contrary to the mindset!) or, more likely, has a poor leader themselvesw.
I actually love the concept of transparent leadership but sadly I can't see it come through in his points. They are all things a good leader, a good servant leader, should also do.
For me transparent leadership becomes more critical as you move up the stack. Once you get to multiple teams or teams of teams leaders must pivot strongly to strategy setting, and in this your servant leadership comes in painting a clear destination for everyone to get to.
At this point I believe the best leaders are genuinely transparent and the worst keep secrets. One of my most respected mentors framed it as deliberately over-sharing. Which I love, even if I get into trouble for it constantly!
(I do like the writers anarchic streak; the best leaders are radicals)
Yeh I think you are right and I am also finding larger apps built using SDD steadily get harder to extend.
> For large existing codebases, SDD is mostly unusable.
I don't really agree with the overall blog post (my view is all of these approaches have value, and we are still to early on to fnd the One True Way) but that point is very true.
I did this first too. The trick is realising that the "spec" isn't a full system spec, per se, but a detailed description of what you want to do.
System specs are non trivial for current AI agents. Hand prompting every step is time consuming.
I think (and I am still learning!) SDD sits as a fix for that. I can give it two fairly simple prompts & get a reasonably complex result. It's not a full system but it's more than I could get with two prompts previously.
The verbose "spec" stuff is just feeding the LLMs love of context, and more importantly what I think we all know is you have to tell an agent over and over how to get the right answer or it will deviate.
Early on with speckit I found I was clarifying a lot but I've discovered that was just me being not so good at writing specs!
Example prompts for speckit;
(Specify) I want to build a simple admin interface. First I want to be able to access the interface, and I want to be able to log in with my Google Workspaces account (and you should restrict logins to my workspaces domain). I will be the global superadmin, but I also want a simple RBAC where I can apply a set of roles to any user account. For simplicity let's make a record user accounts when they first log in. The first roles I want are Admin, Editor and Viewer.
(Plan) I want to implement this as a NextJS app using the latest version of Next. Please also use Mantine for styling instead of Tailwind. I want to use DynamoDB as my database for this project, so you'll also need to use Auth.js over Better Auth. It's critical that when we implement you write tests first before writing code; forget UI tests, focus on unit and integration tests. All API endpoints should have a documented contract which is tested. I also need to be able to run the dev environment locally so make sure to localise things like the database.
The plan step is overly focused on the accidental complexity of the project. While the `Specify` part is doing a good job of defining the scope, the `Plan` part is just complicating it. Why? The choice of technology is usually the first step in introducing accidental complexity in a project. Which is why it's often recommended to go with boring technology (so the cost of this technical debt is known). Otherwise go with something that is already used by the company (if it's a side project, do whatever). If you choose to go that route, there's a good chance you're already have good knowledge of those tools and have code samples (and libraries) lying around.
The whole point of code is to be reliable and to help do something that we'd rather not do. Not to exist on its own. Every decision (even little) needs to be connected to a specific need that is tied to the project and the team. It should not be just a receptacle for wishes.
I wouldn't call that accidental complexity? It's just a set of preferences.
Your last point; feels a bit idealistic. The point of code is to achieve a goal, there are ways to achieve with optimal efficiency in construction but a lot of people call that gold plating.
The setup these prompts leave you with is boring, standard, and something surely I can do in a couple of hours. You might even skeleton it right? The thing is the AI can do it both faster in elapsed time but also, reduces my time to writing two prompts (<2 minutes) and some review 10-15 perhaps?
Also remember this was a simple example; once we get to real business logic efficiencies grow.
It may be a set of preferences for now, but it always grow into a monstrosity when future preferences don't align with current preferences. That's what accidental complexity means. Instead of working on the essential needs (having an admin interface that works well), you will get bogged down with the whims of the platform and technology (breaking changes, bugs,...). It may not be relevant to you if you're planning on abandoning it (switching jobs, side project you no longer care,...).
Something boring and standard is something that keeps going with minimal intervention while getting better each time.
I'm going to go out on a limb here and say NextJs with Auth.js is pretty boring technology.
I'm struggling to see what you'd choose to do differently here?
Edit: actually I'll go further and say I'm guiding against accidental complexity. For example Auth.js is really boring technology, but I am annoyed they've deprecated in favour of better Auth - it's not better and it is definitely not boring technology!
Your card doesn't know the balance, it doesn't work like that.
Offline transactions mostly died off when the limit in the UK for contactless was raised to £100. At £20/30 (the original limits) issuers/merchants risk accept some payments not being valid (and the total limit before you had to chip and pin was fairly low top).
And worth saying, the merchant has some control on the terminal but mostly the decision of offline/online is down to the issuer and configured on the card.
Some debit cards don't allow offline transactions, usually when the cardholder isn't allowed to be in debt.
In the olden days, you'd get a Visa Electron or Solo debit card in the UK if you were under 18 or had a poor credit history.
Visa Electron and Solo were online authorisation-only card brands (also known as "immediate authorisation").
If you didn't have enough money in your account, the transaction would be declined. Visa Electron cards didn't have embossed numbers on the front, so couldn't be used with the old-fashioned card imprinters.
Visa Electron and Solo have been discontinued now, so people with poor credit can get a Visa Debit or MasterCard debit card, but with offline authorisation disabled.
That does mean those cards can't work in some places (e.g. on aeroplanes or trains).
Credit Cards generally always support offline authorisation.
Besides; this article was about diagnosis not prescribing. It's pretty obvious, I think, that diagnosis is one area where AI will perform extremely well in the long run.
I think there are two metrics; the first is outright misdiagnosis, which studies put between 5 and 8% in US/Europe. That's a meaningful number to tackle.
Secondly; overdiagnosis. Where a Dr says on balance it could be X on a difficult to diagnose but dangerous problem (usually cancer). The impact of overdiagnosis is significant in terms of resources, mental health, cost etc.
reply