More

gjulianm · 2026-02-15T18:53:30 1771181610

> What this fails entirely to capture is that doing something to increase your odds of survival, damn the consequences, is an individual choice.

What you're failing to capture is that this is a hard problem because it's both an individual choice and a collective one as well. Those "terrible side effects" might actually end up killing someone. You're choosing between a high-chance lottery on a small population or a low chance lottery on a far larger one. It's not that simple.

gjulianm · 2026-02-15T12:37:29 1771159049

> Maybe the right answer isn't to do a biopsy, but to monitor the area with follow-up scans?

Doctors have already thought of this. Several issues with it:

* Monitoring still causes anxiety and mental health issues which come with real effects on patient's quality of life. It's not "harmless".

* Unclear when to monitor and when to treat. It's also really hard to get enough data to characterize these early unspecific findings enough to get confidence on what to do.

* Monitoring via MRI might be just as useful as monitoring via symptoms or any other "passive" methods that do not require a previous scan.

gjulianm · 2026-02-15T12:32:21 1771158741

> This is much more sensible than just not testing at all and letting people die from cancer.

This is not what happens. You're assuming that if the cancer does not get detected by the screening then it never gets detected. What actually happens is that the test gives information that might actually be redundant and obtainable in less risky way. What the studies are showing is that waiting until there are other, more specific signs and symptoms of the prostate cancer results in the same survival rates.

IshKebab · 2026-02-15T17:40:39 1771177239

Interesting. Do you have a source for that?

gjulianm · 2026-02-15T18:48:47 1771181327

See https://pubmed.ncbi.nlm.nih.gov/38926075/. I was not aware of the ERSPC which came out late last year and gives better outcomes for screening, but overall the evidence is not super clear yet. There are possibly certain groups that can benefit from PSA screening more than others. Also, modern, more effective treatments might allow for later diagnosis with the same clinical results.

gjulianm · 2026-02-15T12:27:16 1771158436

> If the test detects cancer that doesn't need treatment, don't treat it!!

How do you know which ones to treat and which ones to leave?

IshKebab · 2026-02-15T12:52:07 1771159927

When the result is above a chosen threshold (which may depend on other factors like family history etc.).

gjulianm · 2026-02-15T18:49:59 1771181399

Unfortunately that "chosen threshold" is really hard to know, specially if you want to balance individual and population level necessities.

gjulianm · 2026-02-15T12:25:47 1771158347

> With my change: 95% of people who are shown scans have cancer and are treated earlier. Without my change: many of those 95% die

Why? What happens if the cancer still doesn't respond to treatment even when detected early? Or, to the contrary, if the cancer also responds to treatment when it starts becoming symptomatic?

That's why we have studies to understand if screening is a good practice or not. It's not that clear cut.

gjulianm · 2026-02-15T12:18:51 1771157931

> it's likely this result in innovations that would drive down costs, improve accuracy, as well as producing a much larger corpus of data with which to guide diagnosis and reduce false positives.

Why is it likely? We already have a lot of MRI data. There are already a lot of incidental findings. It might also be an issue of the MRI not being able to produce enough information to discriminate.

> To use a software analogy, if your downtime detection system kept producing false negatives, would your solution to be just turn it off? You'd get some better night's sleep, but you'd pay for it when the system really went down and you had no idea.

The analogy is rather something like this: your downtime detector is not just a "ping" but a full web browser that tests everything and it sometimes flags things that are not actually issues. So you don't turn it off, but you only use it when you have another signal that indicates that something might be going wrong.

OneDeuxTriSeiGo · 2026-02-15T13:37:00 1771162620

> Why is it likely? We already have a lot of MRI data. There are already a lot of incidental findings. It might also be an issue of the MRI not being able to produce enough information to discriminate.

This is the main reason. Well technically the opposite of the main reason but more or less it's the same. MRIs are extremely high fidelity nowadays and as a result it's really really hard to read an MRI. Every person is different and there's a lot of variations and weird quirks. You get all the data rather than clearly identified problem areas like you get with say a CT w/ contrast, etc.

That's actually exactly why it's important to have MRIs more frequently to be able to establish baselines and identify trends as they develop.

gjulianm · 2026-02-15T19:29:14 1771183754

> That's actually exactly why it's important to have MRIs more frequently to be able to establish baselines and identify trends as they develop.

How? How do you establish baselines? How do you build a classification of incidental findings? It's very possible that you'll find a lot of types and not a lot of representatives of each type. And then you have to correlate that to actual clinical results, but the population will be so heterogeneous that it'll be really hard to find an actual result.

It's not just "let's throw more data at the problem".

OneDeuxTriSeiGo · 2026-02-16T01:10:18 1771204218

When I say establish baselines what I mean is to establish baselines for the individual.

If you have records of the locations and sizes of various atypical structures and forms throughout the body going back for years and all of a sudden one of them starts changing in size at a rate disproportionate to its history, that's probably cause to dig a little deeper.

It's certainly not "throw more data at the problem". Instead it's about giving the data a time axis with some decent fidelity.

gjulianm · 2026-02-16T10:48:06 1771238886

> and all of a sudden one of them starts changing in size at a rate disproportionate to its history, that's probably cause to dig a little deeper.

That sentence is doing a lot of heavy lifting.

- What's "disproportionate to its history"? Obviously something going from 1mm to 10cm is worth checking out, but what about something going from 1mm to 2mm? Might be a tumor, might be that the position is just slightly different.

- What about other less measurable factors? Example, border features. That's harder to measure and things like movement or different machines can change how the borders of a feature look. How do you know what's a baseline and what's not.

- How frequently do you run these scans? It's likely that if something "starts changing in size" suddenly it will start giving symptoms before you have your next scheduled scan.

> It's certainly not "throw more data at the problem". Instead it's about giving the data a time axis with some decent fidelity.

It's definitely throwing more data at the problem, and you're assuming that it's viable to give "a time axis with decent fidelity". MRIs are much more complicated to interpret than people think, and screening is a much harder problem too. There are a lot of studies testing MRI imaging as a screening technique (among other techniques) and they don't always show an increase in survival rates.

terminalshort · 2026-02-15T13:59:12 1771163952

We do not have a lot of MRI data. The average person probably gets a couple MRIs in their lifetime, and this is biased because we wait until something is clearly wrong to get the MRI. If you want to find an MRI scan of an early stage asymptomatic cancer, the only data on that will be the exceedingly rare case that someone has something else unrelated wrong with them in the same general area and gets an MRI for that, and then just by chance also has the early stage cancer at the same time.

gjulianm · 2026-02-15T19:37:47 1771184267

> we wait until something is clearly wrong to get the MRI. f you want to find an MRI scan of an early stage asymptomatic cancer, the only data on that will be the exceedingly rare case that someone has something else unrelated wrong

Not always. There are bunch of studies for MRI screening in high-risk populations for specific cancers. There are scoring systems for a lot of them based on imaging features and they do find asymptomatic cancers.

In fact, if you add low-risk populations to the studies used to design imaging scores, you might end up adding more noise and making the study more difficult and the scoring less accurate.

buckle8017 · 2026-02-15T14:45:17 1771166717

> We already have a lot of MRI data.

That's true but not in a useful way for improving MRI screening.

What we have is lots of days from people who were sent to get an MRI because they had a complaint.

That's a very different group than people doing screening.

gjulianm · 2026-02-15T19:30:14 1771183814

And the fact that they have a complaint (or have known risks) makes it easier to classify, compare and understand the data.

gjulianm · 2026-02-15T12:10:18 1771157418

The argument for better screening would require that finding those asymptomatic cancers actually improves survival rates. There are several reasonable scenarios where early screening doesn't improve it:

* The cancer is aggressive and resistant to treatment. Chemo/radiation only pause the growth for a bit, but ultimately the cancer keeps growing and the total survival time is the same (only that the patient spent more time knowing they had cancer).

* The cancer is susceptible enough to treatment that it's still curable when it becomes symptomatic and found through other means.

* The cancer is slow enough that the patient dies from other causes before.

Early screening brings benefits only when the cancer ends up causing issues and responds differently to treatment between the "early screening detection" time and the "normal detection" time.

It's impossible to know beforehand which of the scenarios have more weight, specially because we have very little data on what happens way before cancer is detected via the usual methods. We need better studies on this, and for now the evidence doesn't really point out to these large, indiscriminate screenings being actually helpful.

gjulianm · 2026-02-13T13:29:11 1770989351

> Swap out "AI" for any other group and see how that sounds.

- AIs should not take issues that are designed to onboard first time contributors - Experienced matplotlib mantainers should not take issues that are designed to onboard first time contributors

Sounds about the same

gjulianm · 2026-02-10T11:07:32 1770721652

Honestly, I've been using the frontier models and I'm not sure where people are seeing these massive improvements. It's not that they're bad, it's just that I don't see that much of an improvement the last 6 months. They're so inconsistent that it's hard to have a clear idea of what's happening. I usually switch between models and I don't see either those massive differences either. Not to mention that sometimes models regress in certain aspects (e.g., I've seen later models that tend to "think" more and end up at the same result but taking far more time and tokens).

gjulianm · 2026-02-09T12:00:26 1770638426

> This was the equivalent of a "weekend project", and it's amazing

I mean, $20k in tokens, plus the supervision by the author to keep things running, plus the number of people that got involved according to the article... doesn't look like "a weekend project".

> Building a C compiler which can correctly compile (maybe not link) the modern linux kernel is damn hard.

Is it correctly compiling it? Several people have pointed out that the compiler will not emit errors for clearly invalid code. What code is it actually generating?

> Building a C compiler which can correctly compile sqlite and pass the test suite at any speed is damn hard.

It's even harder to have a C compiler that can correctly compile SQLite and pass the test suite but then the SQLite binary itself fails to execute certain queries (see https://github.com/anthropics/claudes-c-compiler/issues/74).

> which, in comparison with a correct modern C compiler, is far less performance critical, complex, broad, etc.

That code might be less complex for us, but more complex for an LLM if it has to deal with lots of domain-specific context and without a test suite that has been developed for 40 years.

Also, if the end result of the LLM has the same problem that Anthropic concedes here, which is that the project is so fragile that bug fixes or improvements are really hard/almost impossible, that still matters.

> it really seems that the complaints here aren't about the LLMs themselves, or the agents, but about what people/organizations do with them, which is then a complaint about people, but not the technology

It's a discussion about what the LLMs can actually do and how people represent those achievements. We're point out that LLMs, without human supervision, generate bad code, code that's hard to change, with modifications specifically made to address failing tests without challenging the underlying assumptions, code that's inconsistent and hard to understand even for the LLMs.

But some people are taking whatever the LLM outputs at face value, and then claiming some capabilities of the models that are not really there. They're still not viable for using without human supervision, and because the AI labs are focusing on synthetic benchmarks, they're creating models that are better at pushing through crappy code to achieve a goal.