Hacker Newsnew | past | comments | ask | show | jobs | submit | ievans's commentslogin

Top comment has a great explicit refutation:

> This plan works by letting software supply chain companies find security issues in new releases. Many security companies have automated scanners for popular and less popular libraries, with manual triggers for those libraries which are not in the top N.


Not super surprising that Anthropic is shipping a vulnerability detection feature -- OpenAI announced Aardvark back in October (https://openai.com/index/introducing-aardvark/) and Google announced BigSleep in Nov 2024 (https://cloud.google.com/blog/products/identity-security/clo...).

The impact question is really around scale; a few weeks ago Anthropic claimed 500 "high-severity" vulnerabilities discovered by Opus 4.6 (https://red.anthropic.com/2026/zero-days/). There's been some skepticism about whether they are truly high severity, but it's a much larger number than what BigSleep found (~20) and Aardvark hasn't released public numbers.

As someone who founded a company in the space (Semgrep), I really appreciated that the DARPA AIxCC competition required players using LLMs for vulnerability discovery to disclose $cost/vuln and the confusion matrix of false positives along with it. It's clear that LLMs are super valuable for vulnerability discovery, but without that information it's difficult to know which foundation model is really leading.

What we've found is that giving LLM security agents access to good tools (Semgrep, CodeQL, etc.) makes them significantly better esp. when it comes to false positives. We think the future is more "virtual security engineer" agents using tools with humans acting as the appsec manager. Would be very interested to hear from other people on HN who have been trying this approach!


>There's been some skepticism about whether they are truly high severity

To be honest this is an even bigger problem with Semgrep and other SAST tools. Developers just want the .1% of findings that actually lead to issues, but flagging patterns will always lead to huge false positive rates.

I do something similar as what you suggested and it does work well -pattern match + LLM. The downside is this only applies to SAST and so far nobody has found a way to address the findings that make up 90% of a security team's noise, namely SCA and container images.


My first use case of an LLM for security research was feeding Gemini Semgrep scan results of an open source repo. It definitely was a great way to get the LLM to start looking at something, and provide a usable sink + source flow for manual review.

I assumed I was still dealing with lots of false positives from Gemini due to using the free version and not being able to have it memorize the full code base. Either way combining those two tools makes the review process a lot more enjoyable.


> What we've found is that giving LLM security agents access to good tools (Semgrep, CodeQL, etc.) makes them significantly better

100% agree - I spun out an internal tool I've been using to close the loop with website audits (more focus on website sec + perf + seo etc. rather than appsec) in agents and the results so far have been remarkable:

https://squirrelscan.com/

Human written rules with an agent step that dynamically updates config to squash false positives (with verification) and find issues while also allowing the llm to reason.


Definitely not a surprise they ship it. This is manageable for a small subset of repos scanned once. Reality is that code changes frequently and such rescans are expensive especially with thinking models. You can open a PR too, but then there are other missing workflows as rebasing when there are conflicts, finding the devs with the right expertise to review/test the fix, etc. bottom line - I see it is an interesting research tool but not more than that.


"Staged publishing: A new publication model that gives maintainers a review period before packages go live, with MFA-verified approval from package owners. This empowers teams to catch unintended changes before they reach downstream users—a capability the community has been requesting for years."

Overdue but welcome!


This is explicitly not the conclusion Pascal drew with the wager, as described in the next section of the Wikipedia article: "Pascal's intent was not to provide an argument to convince atheists to believe, but (a) to show the fallacy of attempting to use logical reasoning to prove or disprove God..."


Did he say Pascal drew that conclusion and remove it with an edit or something? As it's written now it seems like you're correcting him for something he didn't post.


Do you store your SSDs powered? They can lose information if they're not semi-frequently powered on.


Powering on the SSD does nothing. There is no mechanism for passively recharging a NAND flash memory cell. You need to actually read the data, forcing it to go through the SSD's error correction pipeline so it has a chance to notice a correctable error before it degrades into an uncorrectable error. You cannot rely on the drive to be doing background data scrubbing on its own in any predictable pattern, because that's all in the black box of the SSD firmware—your drive might be doing data scrubbing, but you don't know how long you need to let it sit idle before it starts, or how long it takes to finish scrubbing, or even if it will eventually check all the data.


Adding to this... Spinrite can re-write the bits so their charge doesn't diminish over time. There's a relevant Security Now and GRC article for those curious.


Re-writing data from the host system is quite wasteful of a drive's write endurance. It probably shouldn't be done more often than once a year. Reading the data and letting the drive decide if it needs to be rewritten should be done more often.


How about a background cron of diff -br copyX copyY , once per week, for each X and Y .. if they are hot/cold-accessible

Although, in my case, the original is evolving, and renaming a folder and few files makes that diff go awry.. needing manual intervention. Or maybe i need a content-based-naming - $ ln -f x123 /all/sha256-of-x123 then compare those /all


I've been reading a lot of eMMC datasheets and I see terms like "static data refresh" advertised quite a bit.

You're quite right that we have no visibility into this process, but that feels like something to bring up with the SFF Committee, who keeps the S.M.A.R.T. standard.


Might need to go through the NVMe consortium rather than SFF/SNIA. Consumer drives aren't really following any SFF standards these days, but they are still implementing non-optional NVMe features so they can claim compliance with the latest NVMe spec.


best is to have a filesystem that can do background bit rot scrubbing


For C, you might be interested in https://github.com/weggli-rs/weggli or https://github.com/semgrep/semgrep (I work on the latter). Both are also tree-sitter based.


Looks like the `ets` readme has a direct comparison:

> The purpose of ets is similar to that of moreutils ts(1), but ets differentiates itself from similar offerings by running commands directly within ptys, hence solving thorny issues like pipe buffering and commands disabling color and interactive features when detecting a pipe as output. (ets does provide a reading-from-stdin mode if you insist.) ets also recognizes carriage return as a line seperator, so it doesn't choke if your command prints a progress bar. A more detailed comparison of ets and ts can be found below.


I wrote up a Semgrep rule as a comparison to add! (also tree-sitter based, `pip install Semgrep`, https://github.com/semgrep/semgrep, or play with live editor link: https://semgrep.dev/playground/s/nJ4rY)

    pattern: |-
       def $FUNC(..., database, ...):
           $...BODY
    fix: |-
      def $FUNC(..., db, ...):
          $...BODY


So the argument is because the vulnerability lifetime is exponentially distributed, focusing on secure defaults like memory safety in new code is disproportionately valuable, both theoretically and now evidentially seen over six years on the Android codebase.

Amazing, I've never seen this argument used to support shift/left secure guardrails but it's great. Especially for those with larger, legacy codebases who might otherwise say "why bother, we're never going to benefit from memory-safety on our 100M lines of C++."

I think it also implies any lightweight vulnerability detection has disproportionate benefit -- even if it was to only look at new code & dependencies vs the backlog.


Absolutely agreed, and copying from a comment I wrote last year: I think the fact that tree-sitter is dependency-free is worth highlighting. For context, some of my teammates maintain the OCaml tree-sitter bindings and often contribute to grammars as part of our work on Semgrep (Semgrep uses tree-sitter for searching code and parsing queries that are code snippets themselves into AST matchers).

Often when writing a linter, you need to bring along the runtime of the language you're targeting. E.g., in python if you're writing a parser using the builtin `ast` module, you need to match the language version & features. So you can't parse Python 3 code with Pylint running on Python 2.7, for instance. This ends up being more obnoxious than you'd think at first, especially if you're targeting multiple languages.

Before tree-sitter, using a language's built-in AST tooling was often the best approach because it is guaranteed to keep up with the latest syntax. IMO the genius of tree-sitter is that it's made it way easier than with traditional grammars to keep the language parsers updated. Highly recommend Max Brunsfield's strange loop talk if you want to learn more about the design choices behind tree-sitter: https://www.youtube.com/watch?v=Jes3bD6P0To

And this has resulted in a bunch of new tools built off on tree-sitter, off the top of my head in addition to difftastic: neovim, Zed, Semgrep, and Github code search!


Don't forget Zed! https://zed.dev


> Don't forget Zed!

Mac only, for now.


What's crazy is that the landing page doesn't even mention Mac at all.

I'm getting very annoyed by things that don't mention they only work on Mac until you go to install them.


Looks great! It has lsp support for code completion? Supports C++?


LSP support is semi-built-in, but lots of improvements to come in that area apparently to support more language servers. With Python, it currently only has Pyright built-in which is more of an annoyance if you're working with code where the venv is inside a container but there's very active tickets on their GitHub about building out the LSP support. I currently use it as my second editor - I have Sublime set up to be pretty much perfect for my usage, but Zed is catching up fast. I find I'm very fussy about editors, I can't get on with VSCode at all, but I feel warm and fuzzy toward Zed - the UX is great, performance superb, external LSP support is probably the one feature stopping me using it as my primary editor.


I tried Vs code a ton of times. It is reasonably good, but I am SO used to Emacs that it is almost impossible to move from there for me.

Vs code is better at debugging and maybe slightly better at remote connections, that yes. But for the rest of things I am way more productive with Emacs than anything else.


Okay, but how does that work with language versions? Like, if I get a "C++ parser" for tree-sitter, how do I know if it's C++03, C++17, C++21 or what? Last time I checked (which was months ago, to be fair), this wasn't documented anywhere, nor were there apparent any mechanisms to support langauge versions and variants.


You can probably rely on backward compatibility of the language and use the "latest." The question is, which version is the grammar written against?


And then there's all the variants of SQL...


That's what I was looking at in the very beginning. Here's how it unfolds: Grammar page (https://github.com/tree-sitter/tree-sitter-cpp) reference two documents at the very end:

- Hyperlinked C++ BNF Grammar (https://alx71hub.github.io/hcb/)

- EBNF Syntax: C++ (ISO/IEC 14882:1998(E)) https://www.externsoft.ch/download/cpp-iso.html

The second doc has a year in the title, so it's ancient af. The first one has multiple `C++0x` red marks (whatever that mean, afair that's how C++11 was named before standardization). It mentions `constexpr`, but doesn't know `consteval`, for example. And doesn't even mention any of C++11 attributes, such as [[noreturn]], so despite the "Last updated: 10-Aug-2021", it's likely pre-C++11 and is also ancient af and have no use in a real world.

Who might have thought. /s


So I see nothing really changed :(.


don't forget old man emacs is now using tree sitter


helix (https://helix-editor.com/) is using treesitter and LSP as well


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: