More

sophiebits · 2026-04-08T00:46:46 1775609206

Presumably they mean they could make user code trigger a write out of bounds to kernel memory, but they couldn’t figure out how to escalate privileges in a “useful” way.

LiamPowell · 2026-04-08T00:53:35 1775609615

They should show this then to demonstrate that it's not something that has already been fully considered. Running LLMs over projects that I'm very familiar with will almost always have the LLM report hundreds of "vulnerabilities" that are only valid if you look at a tiny snippet of code in isolation because the program can simply never be in the state that would make those vulnerabilities exploitable. This even happens in formally verified code where there's literally proven preconditions on subprograms that show a given state can never be achieved.

As an example, I have taken a formally verified bit of code from [1] and stripped out all the assertions, which are only used to prove the code is valid. I then gave this code to Claude with some prompting towards there being a buffer overflow and it told me there's a buffer overflow. I don't have access to Opus right now, but I'm sure it would do the same thing if you push it in that direction.

For anyone wondering about this alleged vulnerability: Natural is defined by the standard as a subtype of Integer, so what Claude is saying is simply nonsense. Even if a compiler is allowed to use a different representation here (which I think is disallowed), Ada guarantees that the base type for a non-modular integer includes negative numbers IIRC.

[1]: https://github.com/AdaCore/program_proofs_in_spark/blob/fsf/...

[2]: https://claude.ai/share/88d5973a-1fab-4adf-8d29-8a922c5ac93a

SpicyLemonZest · 2026-04-08T03:49:16 1775620156

They've promised that they will show this once the responsible disclosure period expires, and pre-published SHA3 hashes for (among others) four of the Linux kernel disclosures they'll make.

> Running LLMs over projects that I'm very familiar with will almost always have the LLM report hundreds of "vulnerabilities" that are only valid if you look at a tiny snippet of code in isolation because the program can simply never be in the state that would make those vulnerabilities exploitable.

Their OpenBSD bug shows why this is not so simple. (We should note of course that this is an example they've specifically chosen to present as their first deep dive, and so it may be non-representative.)

> Mythos Preview then found a second bug. If a single SACK block simultaneously deletes the only hole in the list and also triggers the append-a-new-hole path, the append writes through a pointer that is now NULL—the walk just freed the only node and left nothing behind to link onto. This codepath is normally unreachable, because hitting it requires a SACK block whose start is simultaneously at or below the hole's start (so the hole gets deleted) and strictly above the highest byte previously acknowledged (so the append check fires).

Do you think you would be able to identify, in a routine code review or vulnerability analysis with nothing to prompt your focus on this particular paragraph, how this normally unreachable codepath enables a DoS exploit?

LiamPowell · 2026-04-08T03:56:39 1775620599

I agree they found at least some real vulnerabilities. What I think is nonsense is the claim of finding thousands of real critical vulnerabilities and claims that they've found other Linux vulnerabilities that they simply can't exploit.

There are notably no SHA-3 sums for all their out-of-bound write Linux vulnerabilities, which would be the most interesting ones.

SpicyLemonZest · 2026-04-08T04:00:09 1775620809

Sure. I guess it's a question of whether this is the worst they found or a representative case among thousands. It sounds like you'd know better than me, so I'm going to provisionally hope you're right...

tptacek · 2026-04-08T04:23:52 1775622232

Why is that nonsense? Do you think they exhausted all their compute finding just the few big vulnerabilities they've already discussed, and don't have a budget to just keep cranking the machine to generate more?

They're not publishing SHAs for things that aren't confirmed vulnerabilities. They're doing exactly the thing you'd want them to do: they claim to have vulnerabilities when they have actual vulnerabilities.

SpicyLemonZest · 2026-04-08T04:36:04 1775622964

If I understand Anthropic's statements correctly, they've been cranking for a while, and what they have now is the results of Mythos-enabled vulnerability scans on every important piece of software they could find. (I do want to acknowledge how crazy it is that "vulnerability scan all important software repos in the world" is even an operation that can be performed.)

tptacek · 2026-04-08T04:44:52 1775623492

We talked to Nicholas Carlini on SCW and did not at all get the impression that they've hit everything they can possibly hit. They're still proving the concept one target at a time, last I heard.

0123456789ABCDE · 2026-04-08T16:56:58 1775667418

which statement, specifically, led you to interpret this claim?

SpicyLemonZest · 2026-04-08T17:15:28 1775668528

> Over the past few weeks, we have used Claude Mythos Preview to identify thousands of zero-day vulnerabilities (that is, flaws that were previously unknown to the software’s developers), many of them critical, in every major operating system and every major web browser, along with a range of other important pieces of software.

They don’t explicitly rule out, I suppose, that these were only limited partial scans they did to find the vulnerabilities. But I don’t know why they’d do it that way, it’s not like they don’t have the resources to scan the entire Linux kernel.

0123456789ABCDE · 2026-04-08T17:34:02 1775669642

i was trying to map "vulnerability scan all important software repos in the world" to an actual quote on their writing, but "every major operating system and every major web browser, along with a range of other important pieces of software" is not the same.

tptacek · 2026-04-08T17:40:21 1775670021

Important to understand it's not one-and-done; you can't "Mythos" Chrome and then put a checkmark next to it. It's a continuous process.

SpicyLemonZest · 2026-04-08T17:46:53 1775670413

Can't you? My understanding is that that's exactly how security scans usually work - you run an analysis, find all the vulnerabilities, and then the continuous process is only there to check against the introduction of new vulnerabilities. Is that not the right mental model?

tptacek · 2026-04-08T18:19:17 1775672357

No, you cannot.

(A "security scanner" is a one-and-done proposition because it's deterministic and is going to find what it finds the first time you run and nothing more. But a software security assessment project you run every year on the same target with different teams will turn up different stuff every year. I'm at pains to remind people how totally lame source code security scanners are. People keep saying "static analyzers already do this" and like, nobody in security takes those tools seriously.)

SpicyLemonZest · 2026-04-08T19:28:01 1775676481

Interesting. Thanks for the info, I’m going to have to read up on this at some point.

sophiebits · 2026-01-29T13:19:09 1769692749

(Wrong thread; think you meant to post this on https://news.ycombinator.com/item?id=46809069.)

sophiebits · 2025-12-24T16:39:36 1766594376

https://webkit.org/blog/7846/concurrent-javascript-it-can-wo...

sophiebits · 2025-12-05T06:20:46 1764915646

“30% of viewing” I think clearly means either time played or items played. I’ve never worked with a data team that would possibly write that and mean users.

If it was a stat about users they’d say “of users”, “of members”, “of active watchers”, or similar. If they wanted to be ambiguous they’d say “has reached 30% adoption” or something.

0manrho · 2025-12-05T06:29:56 1764916196

Agreed, but this is the internet, the ultimate domain of pedantry, and they didn't say it explicitly, so I'm not going to put words in their mouth just to have a circular discussion about why I'm claiming they said something they didn't technically say, which is why I asked "Where did it say that" at the very top.

Also, either way, my point was and still stands: it doesn't say 30% of devices have hardware encoding.

csdreamer7 · 2025-12-05T16:29:34 1764952174

I am not in data science so I can not validate your comment, but 30% of viewing I would assume mean users or unique/discreet viewing sessions and not watched minutes. I would appreciate it if Netflix would clarify.

sophiebits · 2025-12-03T19:48:01 1764791281

The endpoint is not whatever the client asks for. It's marked specifically as exposed to the user with "use server". Of course the people who designed this recognize that this is designing an RPC system.

A similar bug could be introduced in the implementation of other RPC systems too. It's not entirely specific to this design.

(I contribute to React but not really on RSC.)

cluckindan · 2025-12-03T21:34:57 1764797697

”use server” is not required for this vulnerability to be exploitable.

sysguest · 2025-12-04T08:09:59 1764835799

wait I'm only using React for SPA (no server rendering)

am I also vulnerable??????

cluckindan · 2025-12-04T08:45:13 1764837913

Only if you are running a vulnerable version of Next.js server.

__jonas · 2025-12-04T13:02:21 1764853341

No, unless you run the React Server Component runtime on your server, which you wouldn't do with a SPA, you would just serve a static bundle.

brown9-2 · 2025-12-03T21:48:56 1764798536

so any package could declare some modules as “use server” and they’d be callable, whether the RSC server owner wanted them to or not? That seems less than ideal.

cluckindan · 2025-12-04T08:47:19 1764838039

The vulnerability exists in the transport mechanism in affected versions. Default installs without custom code are also vulnerable even if they do not use any server components / server functions.

sophiebits · 2025-11-19T20:30:44 1763584244

ZDR is a risk thing for them. They want to make sure you're a legitimate company and have monitoring in place on your side to reduce the chance you're using them for illegal things.

sophiebits · 2025-10-22T03:27:10 1761103630

“360 degree peer review” isn’t a thing, the whole idea is that a 360 includes feedback from both your manager and your peers, that’s what distinguishes it from a 180!

:)

boesboes · 2025-10-22T13:44:02 1761140642

Tell that to the HR people!

I was once 'asked' to rate all my colleagues in a excel sheet so HR had 'something to base their evaluation on' smh

sophiebits · 2025-09-29T20:18:45 1759177125

You need to enable the new code interpreter mode: https://simonwillison.net/2025/Sep/9/claude-code-interpreter...

mrheosuper · 2025-09-30T07:06:32 1759215992

Interesting, enable those setting and the claude on claude.ai become claude code, and it try to run everything on claude container like it owns the machine. I don't want that.

sophiebits · 2025-09-09T21:56:15 1757454975

Website says "Up to 27 hours video playback", which is apparently 7–8 hours more than the iPhones 13–15 and 4–5 more than the 13–15 Pro. Also normally their battery estimates are conservative.

lostmsu · 2025-09-10T21:49:33 1757540973

These days the question is more about continuous use of Gmaps.

sophiebits · 2025-08-24T18:37:43 1756060663

TIL, thanks! I know Postgres and MySQL don’t include an equivalent.