More

ievans · 2026-04-23T19:38:57 1776973137

Top comment has a great explicit refutation:

> This plan works by letting software supply chain companies find security issues in new releases. Many security companies have automated scanners for popular and less popular libraries, with manual triggers for those libraries which are not in the top N.

ievans · 2026-02-20T18:59:59 1771613999

Not super surprising that Anthropic is shipping a vulnerability detection feature -- OpenAI announced Aardvark back in October (https://openai.com/index/introducing-aardvark/) and Google announced BigSleep in Nov 2024 (https://cloud.google.com/blog/products/identity-security/clo...).

The impact question is really around scale; a few weeks ago Anthropic claimed 500 "high-severity" vulnerabilities discovered by Opus 4.6 (https://red.anthropic.com/2026/zero-days/). There's been some skepticism about whether they are truly high severity, but it's a much larger number than what BigSleep found (~20) and Aardvark hasn't released public numbers.

As someone who founded a company in the space (Semgrep), I really appreciated that the DARPA AIxCC competition required players using LLMs for vulnerability discovery to disclose $cost/vuln and the confusion matrix of false positives along with it. It's clear that LLMs are super valuable for vulnerability discovery, but without that information it's difficult to know which foundation model is really leading.

What we've found is that giving LLM security agents access to good tools (Semgrep, CodeQL, etc.) makes them significantly better esp. when it comes to false positives. We think the future is more "virtual security engineer" agents using tools with humans acting as the appsec manager. Would be very interested to hear from other people on HN who have been trying this approach!

michael-bey · 2026-02-20T21:08:15 1771621695

>There's been some skepticism about whether they are truly high severity

To be honest this is an even bigger problem with Semgrep and other SAST tools. Developers just want the .1% of findings that actually lead to issues, but flagging patterns will always lead to huge false positive rates.

I do something similar as what you suggested and it does work well -pattern match + LLM. The downside is this only applies to SAST and so far nobody has found a way to address the findings that make up 90% of a security team's noise, namely SCA and container images.

tkp-415 · 2026-02-20T21:29:14 1771622954

My first use case of an LLM for security research was feeding Gemini Semgrep scan results of an open source repo. It definitely was a great way to get the LLM to start looking at something, and provide a usable sink + source flow for manual review.

I assumed I was still dealing with lots of false positives from Gemini due to using the free version and not being able to have it memorize the full code base. Either way combining those two tools makes the review process a lot more enjoyable.

nikcub · 2026-02-20T20:10:46 1771618246

> What we've found is that giving LLM security agents access to good tools (Semgrep, CodeQL, etc.) makes them significantly better

100% agree - I spun out an internal tool I've been using to close the loop with website audits (more focus on website sec + perf + seo etc. rather than appsec) in agents and the results so far have been remarkable:

https://squirrelscan.com/

Human written rules with an agent step that dynamically updates config to squash false positives (with verification) and find issues while also allowing the llm to reason.

niros_valtos · 2026-02-21T01:21:17 1771636877

Definitely not a surprise they ship it. This is manageable for a small subset of repos scanned once. Reality is that code changes frequently and such rescans are expensive especially with thinking models. You can open a PR too, but then there are other missing workflows as rebasing when there are conflicts, finding the devs with the right expertise to review/test the fix, etc. bottom line - I see it is an interesting research tool but not more than that.

ievans · 2026-01-07T19:08:19 1767812899

"Staged publishing: A new publication model that gives maintainers a review period before packages go live, with MFA-verified approval from package owners. This empowers teams to catch unintended changes before they reach downstream users—a capability the community has been requesting for years."

Overdue but welcome!

ievans · 2025-07-16T14:12:17 1752675137

This is explicitly not the conclusion Pascal drew with the wager, as described in the next section of the Wikipedia article: "Pascal's intent was not to provide an argument to convince atheists to believe, but (a) to show the fallacy of attempting to use logical reasoning to prove or disprove God..."

savanaly · 2025-07-18T15:29:35 1752852575

Did he say Pascal drew that conclusion and remove it with an edit or something? As it's written now it seems like you're correcting him for something he didn't post.

ievans · on March 17, 2025

Do you store your SSDs powered? They can lose information if they're not semi-frequently powered on.

wtallis · on March 18, 2025

Powering on the SSD does nothing. There is no mechanism for passively recharging a NAND flash memory cell. You need to actually read the data, forcing it to go through the SSD's error correction pipeline so it has a chance to notice a correctable error before it degrades into an uncorrectable error. You cannot rely on the drive to be doing background data scrubbing on its own in any predictable pattern, because that's all in the black box of the SSD firmware—your drive might be doing data scrubbing, but you don't know how long you need to let it sit idle before it starts, or how long it takes to finish scrubbing, or even if it will eventually check all the data.

0xR1CK · on March 18, 2025

Adding to this... Spinrite can re-write the bits so their charge doesn't diminish over time. There's a relevant Security Now and GRC article for those curious.

wtallis · on March 18, 2025

Re-writing data from the host system is quite wasteful of a drive's write endurance. It probably shouldn't be done more often than once a year. Reading the data and letting the drive decide if it needs to be rewritten should be done more often.

svilen_dobrev · on March 18, 2025

How about a background cron of diff -br copyX copyY , once per week, for each X and Y .. if they are hot/cold-accessible

Although, in my case, the original is evolving, and renaming a folder and few files makes that diff go awry.. needing manual intervention. Or maybe i need a content-based-naming - $ ln -f x123 /all/sha256-of-x123 then compare those /all

myself248 · on March 18, 2025

I've been reading a lot of eMMC datasheets and I see terms like "static data refresh" advertised quite a bit.

You're quite right that we have no visibility into this process, but that feels like something to bring up with the SFF Committee, who keeps the S.M.A.R.T. standard.

wtallis · on March 18, 2025

Might need to go through the NVMe consortium rather than SFF/SNIA. Consumer drives aren't really following any SFF standards these days, but they are still implementing non-optional NVMe features so they can claim compliance with the latest NVMe spec.

UltraSane · on March 18, 2025

best is to have a filesystem that can do background bit rot scrubbing

ievans · on Feb 28, 2025

For C, you might be interested in https://github.com/weggli-rs/weggli or https://github.com/semgrep/semgrep (I work on the latter). Both are also tree-sitter based.

ievans · on Dec 23, 2024

Looks like the `ets` readme has a direct comparison:

> The purpose of ets is similar to that of moreutils ts(1), but ets differentiates itself from similar offerings by running commands directly within ptys, hence solving thorny issues like pipe buffering and commands disabling color and interactive features when detecting a pipe as output. (ets does provide a reading-from-stdin mode if you insist.) ets also recognizes carriage return as a line seperator, so it doesn't choke if your command prints a progress bar. A more detailed comparison of ets and ts can be found below.

ievans · on Sept 28, 2024

I wrote up a Semgrep rule as a comparison to add! (also tree-sitter based, `pip install Semgrep`, https://github.com/semgrep/semgrep, or play with live editor link: https://semgrep.dev/playground/s/nJ4rY)

    pattern: |-
       def $FUNC(..., database, ...):
           $...BODY
    fix: |-
      def $FUNC(..., db, ...):
          $...BODY

ievans · on Sept 25, 2024

So the argument is because the vulnerability lifetime is exponentially distributed, focusing on secure defaults like memory safety in new code is disproportionately valuable, both theoretically and now evidentially seen over six years on the Android codebase.

Amazing, I've never seen this argument used to support shift/left secure guardrails but it's great. Especially for those with larger, legacy codebases who might otherwise say "why bother, we're never going to benefit from memory-safety on our 100M lines of C++."

I think it also implies any lightweight vulnerability detection has disproportionate benefit -- even if it was to only look at new code & dependencies vs the backlog.

ievans · on March 21, 2024

Absolutely agreed, and copying from a comment I wrote last year: I think the fact that tree-sitter is dependency-free is worth highlighting. For context, some of my teammates maintain the OCaml tree-sitter bindings and often contribute to grammars as part of our work on Semgrep (Semgrep uses tree-sitter for searching code and parsing queries that are code snippets themselves into AST matchers).

Often when writing a linter, you need to bring along the runtime of the language you're targeting. E.g., in python if you're writing a parser using the builtin `ast` module, you need to match the language version & features. So you can't parse Python 3 code with Pylint running on Python 2.7, for instance. This ends up being more obnoxious than you'd think at first, especially if you're targeting multiple languages.

Before tree-sitter, using a language's built-in AST tooling was often the best approach because it is guaranteed to keep up with the latest syntax. IMO the genius of tree-sitter is that it's made it way easier than with traditional grammars to keep the language parsers updated. Highly recommend Max Brunsfield's strange loop talk if you want to learn more about the design choices behind tree-sitter: https://www.youtube.com/watch?v=Jes3bD6P0To

And this has resulted in a bunch of new tools built off on tree-sitter, off the top of my head in addition to difftastic: neovim, Zed, Semgrep, and Github code search!

drcongo · on March 21, 2024

Don't forget Zed! https://zed.dev

fransje26 · on March 23, 2024

> Don't forget Zed!

Mac only, for now.

Arcuru · on March 23, 2024

What's crazy is that the landing page doesn't even mention Mac at all.

I'm getting very annoyed by things that don't mention they only work on Mac until you go to install them.

germandiago · on March 21, 2024

Looks great! It has lsp support for code completion? Supports C++?

drcongo · on March 22, 2024

LSP support is semi-built-in, but lots of improvements to come in that area apparently to support more language servers. With Python, it currently only has Pyright built-in which is more of an annoyance if you're working with code where the venv is inside a container but there's very active tickets on their GitHub about building out the LSP support. I currently use it as my second editor - I have Sublime set up to be pretty much perfect for my usage, but Zed is catching up fast. I find I'm very fussy about editors, I can't get on with VSCode at all, but I feel warm and fuzzy toward Zed - the UX is great, performance superb, external LSP support is probably the one feature stopping me using it as my primary editor.

germandiago · on March 22, 2024

I tried Vs code a ton of times. It is reasonably good, but I am SO used to Emacs that it is almost impossible to move from there for me.

Vs code is better at debugging and maybe slightly better at remote connections, that yes. But for the rest of things I am way more productive with Emacs than anything else.

TeMPOraL · on March 22, 2024

Okay, but how does that work with language versions? Like, if I get a "C++ parser" for tree-sitter, how do I know if it's C++03, C++17, C++21 or what? Last time I checked (which was months ago, to be fair), this wasn't documented anywhere, nor were there apparent any mechanisms to support langauge versions and variants.

MathMonkeyMan · on March 22, 2024

You can probably rely on backward compatibility of the language and use the "latest." The question is, which version is the grammar written against?

pfdietz · on March 22, 2024

And then there's all the variants of SQL...

Arech · on March 22, 2024

That's what I was looking at in the very beginning. Here's how it unfolds: Grammar page (https://github.com/tree-sitter/tree-sitter-cpp) reference two documents at the very end:

- Hyperlinked C++ BNF Grammar (https://alx71hub.github.io/hcb/)

- EBNF Syntax: C++ (ISO/IEC 14882:1998(E)) https://www.externsoft.ch/download/cpp-iso.html

The second doc has a year in the title, so it's ancient af. The first one has multiple `C++0x` red marks (whatever that mean, afair that's how C++11 was named before standardization). It mentions `constexpr`, but doesn't know `consteval`, for example. And doesn't even mention any of C++11 attributes, such as [[noreturn]], so despite the "Last updated: 10-Aug-2021", it's likely pre-C++11 and is also ancient af and have no use in a real world.

Who might have thought. /s

TeMPOraL · on March 23, 2024

So I see nothing really changed :(.

ossusermivami · on March 22, 2024

don't forget old man emacs is now using tree sitter

jrave · on March 22, 2024

helix (https://helix-editor.com/) is using treesitter and LSP as well