More

InkCanon · 2026-04-26T10:16:51 1777198611

The Stinger is an anti air weapon, the Javelin is an anti tank weapon.

InkCanon · 2026-04-25T12:20:24 1777119624

The field is massively hampered by the wishful mnemonics and anthropomorphization of LLMs. For example, even the hallucination idea arbitrarily assigns human semantics to LLM results. By the actual mathematical principles by which LLMs work, any hallucination is another output, with no clear definition between it and every other output.

InkCanon · 2026-04-21T06:26:46 1776752806

The more accurate version is only Chinese companies (plus Facebook briefly) really open source their frontier models. The rest are non frontier. They are either older or specialized for something.

InkCanon · 2026-04-12T12:33:49 1775997229

I strongly disagree with the claim that it's a phenomenal paper on exploits, the exploits themselves are nowhere near significant in the cybersecurity research sense. It's saying that implementations of these benchmarks has exploits on the way they conduct their tests. It doesn't discover that current LLMs are doing it (they highlighted several other exploits in the past), they only say it's a possible way they could cheat. It's a bit like they've discovered how to hack your codeforces score.

What they claim as exploits is also deeply baffling. Like the one where they say if you exploit the system binaries to write a curl wrapper, you can download the answers. This is technically true, but it is an extremely trivial statement that if you have elevated system privileges, you can change the outputs of programs running on it.

I'm actually deeply confused about why this is a paper. This feels like it should be an issue on GitHub. If I were being blunt, I'd say they are trying really hard to make a grand claim about how benchmarks are bad, when all they've done is essentially discovered several misconfigured interfaces and website exploits.

zero_k · 2026-04-12T14:00:27 1776002427

Yes, agree. At the same time, it's what these top-tier universities are known for: presenting something relatively simple as if it was ground-breaking, but in a way that the average person can (or has a better chance to) understand it. I am still unsure whether the communication quality has such added value. But people seem to like it, so here we are.

pxc · 2026-04-12T16:16:05 1776010565

There's a difference between a reliable hunch and really knowing something. What is obvious is not always (or even usually) easy to prove. And the process of proving the obvious sometimes turns up useful little surprises.

InkCanon · 2026-04-12T14:24:55 1776003895

I do think there's value in science communication, but it does take an intelligent understanding of it on a case by case basis as to whether it's genuine or hype marketing.

Side note: talking to someone from such a "elite" university, I discovered many labs in these unis have standing orders by PIs to tweet their papers/preprints when published. Varies by field, in AI it is by far the most common.

InkCanon · 2026-04-05T12:56:59 1775393819

Often at the start yes. So the students gets a bit of recognition, a bit of experience and a bit of knowledge.

InkCanon · 2026-04-02T06:55:39 1775112939

Yes, and it's a very interesting use case for Wasm. Firefox has a sandbox called RLbox built on this, and has been published in a few papers.

Performance is one benefit, but the real killer feature is Wasm's guarantees are incredibly strong and formally proved. So by definition, you won't get out of bounds memory reads, memory corruption etc, assuming the implementation is correct. And because of the thorough specification, these kinds of exploits are far rarer in wasm runtimes.

https://hacks.mozilla.org/2020/02/securing-firefox-with-weba...

InkCanon · 2026-03-26T09:16:43 1774516603

I think a more useful version of the law is

"Any good measure requires a good person."

For example, a good measure of research is to have an intelligent faculty member or members read it and decide if it's good. Converting it to a mechanical calculation is fundamentally bad.

InkCanon · 2026-03-17T11:30:32 1773747032

Whenever I see a sentence of the form:

"X isn't A, it's (something opposite A)" I twitch involuntarily.

InkCanon · 2026-03-13T00:17:40 1773361060

I only have marginal knowledge about neuroscience, but one of my neuroscience professors in class would tell us

"You can cure anything in mice."

I don't know the mechanism why, but you can find tons of papers with incredibly strong results for curing of mitigating dementia, cognitive decline, addiction, etc in mice, but these almost never seen to work on people.

nofriend · 2026-03-13T00:23:13 1773361393

They're human specific ailments. We create a fake version of them in mice, then we fix the fake version. The basic problem with these issues is we don't understand the root cause. So we can replicate the symptoms in a mouse model then fix the symptoms, but that doesn't work in humans because the root cause is still there.

mustaphah · 2026-03-13T00:25:54 1773361554

I guess it's because most major disorders and diseases have so many pathways at play that figuring out which one's actually causing the problem at the individual level is just too tricky.

The other thing concerns how potent the effect is to be therapeutic. In many cases, the effect is just marginal to be meaningful.

InkCanon · 2026-03-12T06:35:14 1773297314

How is this even legal? I'd think even basic conflict of interest rules between vendor and purchases would stop this.

tsimionescu · 2026-03-12T12:26:58 1773318418

It's almost certainly not legal (it could probably be tried as fraud), and it definitely is a breach of contract for the CISO. I'm not claiming it happened, I have no idea, just commenting on the legality of the claimed acts.