More

vmg12 · 2026-04-11T17:50:10 1775929810

The technique Anthropic uses was demonstrated by Nicholas Carlini in a talk he gave 2 weeks ago and it's very simple, when asking LLMs to review code, ask them to focus its review on one file in a single session. Here is the video with the timestamp (watch through to ~5:30, they show two different ways of prompting claude).

https://youtu.be/1sd26pWhfmg?t=204

https://youtu.be/1sd26pWhfmg?t=273

IMO the big "innovation" being shown by Mythos is the effectiveness with prompting LLMs to look for security vulnerabilities by focusing on specific files one at a time and automating this prompting with a simple script.

Prompting Mythos to focus on a single file per session is why I suspect it cost Anthropic $20k to find some of the bugs in these codebases. I know this same technique is effective with Opus 4.6 and GPT 5.4 because I've been using it on my own code. If you just ask the agent to review your pr with a low effort prompt they are not exhaustive, they will not actually read each changed file and look at how it interacts with the system as a whole. If the entire session is to review the changes for a single file, the llm will do much more work reviewing it.

Edit: I changed my phrasing, it's not about restricting its entire context to one file but focusing it on one file but still allowing it to look at how other files interact with it.

mirsadm · 2026-04-11T18:19:53 1775931593

How is that going to find anything that interacts across files?

nodja · 2026-04-11T22:55:16 1775948116

You misunderstood.

Instead of asking the model: "Here's this codebase, report any vulnerability." you ask. "Here's this codebase, report any vulnerability in module\main.c".

The model can still explore references and other files inside the codebase, but you start over a new context/session for each file in the codebase.

doginasuit · 2026-04-12T05:25:59 1775971559

Honestly, that's the only way I've ever been able to trust the output. Once you go beyond the scope of one file it really degrades. But within a single file I've seen amazing results.

Eug894 · 2026-04-12T12:36:08 1775997368

Are you not supposed to include as many _preconditions_ (in the form of test cases or function constraints like "assert" macro in C) as you can into your prompt describing an input for a particular program file before asking AI to analyze the file?

Please, read my reply to one of the authors of Angr, a binary analysis tool. Here is an excerpt:

> A "brute-force" algorithm (an exhaustive search, in other words) is the easiest way to find an answer to almost any engineering problem. But it often must be optimized before being computed. The optimization may be done by an AI agent based on neural nets, or a learning Mealy machine.

> Isn't it interesting what is more efficient: neural nets or a learning Mealy machine?

...Then I describe what is a learning Mealy machine. And then:

> Some interesting engineering (and scientific) problems are: - finding an input for a program that hacks it; - finding a machine code for a controller of a bipedal robot, which makes it able to work in factories;

https://x.com/NENENENENE10/status/2042733015281914108

appcustodian2 · 2026-04-11T18:38:22 1775932702

I would think that it is still capable of exploring the codebase and reading other related files like any other coding agent already does.

vmg12 · 2026-04-11T18:36:20 1775932580

My phrasing wasn't clear but you aren't telling it to only look at one specific file but to focus its review on one file. Updated my original comment.

vmg12 · 2026-04-02T02:52:31 1775098351

> if the AI can just write really good code in C that doesn't exhibit any of the issues that rust protects you from?

"if"

If it could you wouldn't need to use Rust. It can't, qed.

vmg12 · 2026-04-02T02:50:52 1775098252

You can prevent unsafe from being used in a repo with linter rules.

vmg12 · 2026-04-01T16:30:29 1775061029

Other than exceptions like Roblox

vmg12 · 2026-03-18T18:16:35 1773857795

Also if anything it should disappear when scrolling down and appear when scrolling up.

hbn · 2026-03-18T20:54:38 1773867278

That's how they work. It's what the GP said.

Night_Thastus · 2026-03-19T00:16:38 1773879398

I accidentally had it flipped backwards in the original comment, and edited it to fix that. So the person you're responding to was right at the time.

vmg12 · 2026-03-18T01:12:34 1773796354

Does it only work with that specific version of firecracker and only with vms with 1 vcpu?

More than the sub ms startup time the 258kb of ram per VM is huge.

adammiribyan · 2026-03-18T06:56:50 1773817010

1 vCPU per fork currently. Multi-vCPU is doable (per-vCPU state restore in a loop) but would multiply fork time.

On Firecracker version: tested with v1.12, but the vmstate parser auto-detects offsets rather than hardcoding them, so it should work across versions.

vmg12 · 2026-03-14T12:36:24 1773491784

People don't want to buy 2x8gb because there are limited slots on a motherboard and they want to upgrade when they need the extra ram.

vmg12 · 2026-03-08T03:08:10 1772939290

You can go on OVH and get a dedicated server with 384 threads and a Turin cpu for $1147 a month. You have to pay $1147 for installation and the default has low ram and network speeds but even after upgrading those it's going to be 1/5 of what it would cost on public clouds.

vmg12 · 2026-02-22T22:26:28 1771799188

If I hand my shopping list to AI, why wouldn't I tell it to price match everything? People will start doing this sooner than you think. I still remember when people were scared to buy things on the internet, this will be faster.

krackers · 2026-02-22T22:36:19 1771799779

Are you going to choose to buy your protein bar online from mysteryBargainBar[.]com for a $1 savings, or just pick it up as part of your local grocery trip?

> I still remember when people were scared to buy things on the internet

People still /are/ scared to buy things from Amazon for things that go on or in their body.

riku_iki · 2026-02-23T00:14:48 1771805688

> Are you going to choose to buy your protein bar online from mysteryBargainBar[.]com for a $1 savings, or just pick it up as part of your local grocery trip?

1. I buy in bulk.

2. I check amazon vs walmart usually.

jacobr1 · 2026-02-23T18:10:04 1771870204

Yep.

ChatAI - show the top 50 online retailers by revenue in the US and note any that have credible new stories about quality control issues. Save all of them except StoreX and StoreY in your list you use for comparison shopping.

Or maybe another one, scan all my credit card purchases for all time that you have history and record all the stores.

Done. And plenty of third party sites (consumer reports, wirecutter, etc...) will do this kind of thing too. And you could perhaps transitively trust them - either view direct lists or just scraping the places they recommend.

And the average person doesn't need to figure this out ... skills encoding this will propagate.

vmg12 · 2026-02-22T22:40:02 1771800002

> mysteryBargainBar[.]com for a $1 savings

The AI could also research which stores are reputable.

> People still /are/ scared to buy things from Amazon for things that go on or in their body.

Sure, there are also people scared of flying in airplanes, those must be a dud too going by your logic.

Ancalagon · 2026-02-22T22:55:55 1771800955

Yes from all those reputable AI reviews

Terr_ · 2026-02-22T23:18:25 1771802305

"Reputable" + Stochastic LLMs + Profit motive = A vast sea of poisonously false data and prompt injection attacks

jonwinstanley · 2026-02-23T15:17:52 1771859872

Presumably the agents will band together on Moltbook and buld their own TrustPilot competitor? :-)

lossyalgo · 2026-02-23T15:59:09 1771862349

Too bad Moltbook was written by humans, for humans: https://arxiv.org/abs/2602.07432

NobleLie · 2026-02-23T10:44:51 1771843491

Grok, show me the place where the least people died eating product X.

vmg12 · 2026-02-22T22:16:01 1771798561

In other words, switching costs go to 0, margins collapse. Middle men and people with products that aren't differentiated get hit hardest.

A human can't search 10 apps for the best rates / lowest fees but an agent can.

Thinking ahead 100 years from now, companies like doordash and uber eats don't exist and are instead protocols agents use to bid for items their user asks for and price discovery happens in real time.

lm28469 · 2026-02-22T23:17:06 1771802226

Go to a supermarket, witness that dozens of brands sell the same things at wildly different prices, they still all make a profit, same for most services, you have comparator for subscriptions, mortgage rates, &c.

And a human can 100% search 10 apps and use his brain to do basic maths, that's what we've been doing until now. Sometimes I wonder if ai shills live in a parallel universe because it truly feels like they're living a completely different life than the vast majority of people...

warkdarrior · 2026-02-23T00:50:01 1771807801

> a human can 100% search 10 apps and use his brain to do basic maths

A human _can_ do all of that, but it takes time. If I have to search 10 apps for each item I want to buy (clothes, daily food, movie tickets, laptops, etc.), I will spend the rest of my life just searching for better deals. I'd rather have a bot do all of these searches for me.

vmg12 · 2026-02-22T23:38:39 1771803519

What exactly am i shilling?

kryptiskt · 2026-02-22T23:14:47 1771802087

I don't see what the role of AI is in this. You don't need an AI to aggregate data from a bunch of sources. You'd be better off having the AI write a scraper for you than burning GPU time on an agent doing the same thing every time.

jonwinstanley · 2026-02-23T15:33:45 1771860825

If you're paying a monthly fee for your agent, might as well use it to save you another few mins

fweimer · 2026-02-22T23:06:11 1771801571

> A human can't search 10 apps for the best rates / lowest fees but an agent can.

Why would those apps permit access by agents?

It's always been the case that “agents” could watch content with ads, so that the users can watch the same content later, but without ads. The technology never went mainstream, though. I expect agents posing as humans would have a similar whiff of illegality, preventing wide adoption.

Local agents running open weights models won't really work because everybody will train their services against the most popular ones anyway.

ndriscoll · 2026-02-23T13:43:08 1771854188

What whiff of illegality? Personal recording and ad skipping DVRs are completely legal products (at least in the US). Courts have ruled on this.

fweimer · 2026-02-23T20:04:19 1771877059

As a U.S. consumer, can you buy a DVR that can record HDCP streams (without importing it yourself from a different country)? Even one that does not automatically edit out ads?

ndriscoll · 2026-02-23T23:55:25 1771890925

If I search "HDCP remover" on Amazon I see tons of results for $15-$30, sure. Reviews say they work as advertised. That typically exists in a different space from DVRs since it's not relevant for broadcast TV as far as I know (AFAIK there's nothing for DVRs to remove in the first place), but it'd be easy enough to chain it if you needed to.

ilaksh · 2026-02-22T23:12:40 1771801960

Right, but why the heck would you guess 100 years when we could build and adopt that in less than two weeks? There are already many people working on this type thing. Some of them have been working on it for years and a few probably already have solutions ready to go or even in use.

vmg12 · 2026-02-22T23:37:30 1771803450

I was using 100 years as a way to handwave the timeframe to emphasize that this will happen some time in the future.