Hacker Newsnew | past | comments | ask | show | jobs | submit | weitendorf's commentslogin

Get outta my swamp! Just kidding, it’s cool to see other people working on this stuff.

I think right now this is still a bit too fresh out of Claude Code to be usable by anybody but the people developing it. I got to around the same point with my first tempt at building a tool registry (https://github.com/accretional/collector) and then realized I basically needed to start over with much more investment in supporting infrastructure to build the thing I really wanted.

I can go as far into the weeds as anybody would ever care to hear about this, but for the sake of brevity I’ll just say this: reflection and type systems over the network are pretty much the only way to get this stuff to work properly (I mean you could just go full MCP/Skills but then all you really have are giant blobs of markdown and unconstrained json that make integration/discovery/usability a nightmare, and require an agent in the loop to drive/integrate the tools when you really just need to give them the actual APIs and documentation). That ends up getting rather hairy, we ended up actually building a declarative meta-lexer/parser/transpiler (meta basically just meaning it’s generalized across languages and self-hosting/bootstrapped) recently (https://github.com/accretional/gluon) because it turns out building a cross-language distributed type system is rather difficult. But reflection alone gets you halfway there as far as benefits.


The UIs all bake in system prompts and other tunable configs that the API leaves open, so does Claude Code and other harnesses. So anything you notice different over the API when you're controlling the client is almost certainly that. Note that this is kind of something they have to do because consumer UI users will do stuff like ask models their name or date, or want it to respond politely and compassionately, and get upset/confused when they just get what's in the weights.

The problem with subscriptions for this kind of stuff is that it's just incompatible with their cost structure. The worst being, subscription usage is going to follow a diurnal usage pattern that overlaps with business/API users, so they're going to have to be offloaded to compute partners who most likely charge by the resource-second. And also, it's a competitive market, anybody who wants usage-based pricing can just get that.

So you basically end up with adverse selection with consumer subscription models. It's just kind of an incoherent business model that only works when your value proposition is more than just compute (which has a usage-based, pretty fungible market)


I find the most value to be in eval loops and multi-agent setups where a specialized or cheap model gets tasks that take load off the smarter model.

Most of the value in agentic development IMO is in the feedback loop/ability for the model itself to intelligently pull in context, but if you want to push a lot of context or have steps that are more proscribed, it's kind of a waste of money to have the big model do that. Much better to use it as a kind of pre-processing/noise-reduction step that filters out junk context.

I would say that right now the benefits are largest for this kind of work with medium-sized multimodal models. For example I have hooks/automation that use https://github.com/accretional/chromerpc to automatically screenshot UIs and then feed it into qwen-family models. It's more that I don't want to pay Opus to look at them or remember/be instructed to do that unless it goes through QA first.


> I find the most value to be in eval loops and multi-agent setups where a specialized or cheap model gets tasks that take load off the smarter model.

Yes, in theory, this should hold up, at least according to evaluations.

According to real, practical use though, none of the open weight models are generally strong enough to handle coding and programming in a professional environment though, unless you have tightly controlled scope and specialized models for those scopes, which generally I don't think you have, but maybe it's just me jumping around a lot.

Even with feedback loops, harnesses and what not, even the strongest local models I can run with 96GB of VRAM don't seem to come close to what OpenAI offered in the last year or so. I'm sure it'll be ready at one point, but today it isn't.

With that said, if you know specific models you think work well as a general and local programming models, please share which ones, happy to be shown wrong. Latest I've tried was Qwen3.6-35B-A3B which gets a bit further but still instruction following is a far cry from what OpenAI et al offered for years.


It’s to stop you from getting RL traces or using Claude without paying the big bucks for the Enterprise Security version

I really like Anthropic models and the company mission but I personally believe this is anticompetitive, or at least, anti user.

If they are going to turn into a protection racket I’ll just do RL black boxing/pentesting on Chinese models or with Codex, and since I know Anthropic is compute constrained I’ll just put the traces on huggingface so everybody else can do it too.

I just want to pay them for their RL’d tensor thingies it but if their business plan is to hoard the tokens or only sell it to certain people, they are literally part of every other security conscious person’s threat model.


They are training them on decompilation and reverse engineering/blackbox reimplementations/pentesting because it’s one of the best ways to generate interesting and rare RL traces for agentic coding AND teach them how lots of things work under the hood.

Just throw Claude at millions of binaries and you can get amazing training data. Oh wait 4.7 gives you refusals for that now


This is a price discrimination/upsell strategy. Sure, if you just want software, use our public model. Don’t worry; it’s safe.

But if you want your model to be secure, and you want to deal with dangerous stuff, contact us for pricing. BTW if you don’t pay for us to pentest you, maybe someone else will, idk.

Oh also you’re not allowed to pentest yourself with our public models anymore because it looks like hacking


Hey OP, sorry for the negativity, I think most of these commenters right now are pretty off-base. My company is building a lot of API infrastructure and I thought this was a great write up!

It is alright, I am learning a lot from them as well, healthy criticism is always useful :) I am very glad that you found this a great write up ^_^

Hey, I've been getting into visual processing lately and we just started working on an offline wrapper for Apple's vision/other ML libraries via CLI: https://github.com/accretional/macos-vision. You can see some SVG art I created in a screenshot I just posted for a different comment https://i.imgur.com/OEMPJA8.png (on the right is a cubist plato svg lol)

Since your app is fully offline I'd love to chat about photogenesis/your general work in this area since there may be a good opportunity for collaboration. I've been working on some image stuff and want to build a local desktop/web application, here are some UI mockups of that I've been playing with (many AI generated though some of the features are functional, I realized that with CSS/SVG masks you can do a ton more than you'd expect): https://i.imgur.com/SFOX4wB.png https://i.imgur.com/sPKRRTx.png but we don't have all the ui/vision expertise we'd need to take them to completion most likely.


Guys, I found out about this technology called Cascading Style Sheets recently and I think it's the missing piece we've been looking for. It lets you declaratively specify layout in a composable, hierarchical system based on something called the Document Object Model in a way that minimizes both clientside and serverside processing, based on these things called "stylesheets".

The best part is, it's super easy to customize them, read others for inspiration or to see how they did something, or even ship multiple per site to deal with different user preferences. Through this "forms" api, and little-known browser features like url-fragments, target/attribute selector, and style combinators, plus "the checkbox hack" you can build extremely responsive UIs out of it by "cascading" UI updates through your site! When do you think they're going to add it to next.js?

I'm tentatively calling this new UI paradigm "no-framework" or "no package manager", not sure yet https://i.imgur.com/OEMPJA8.png


> Cascading Style Sheets recently and I think it's the missing piece we've been looking for. It lets you declaratively specify layout in a composable, hierarchical system based on something called the Document Object Model in a way that minimizes both clientside and serverside processing, based on these things called "stylesheets"

I tried that and it was an absolute nightmare. There was no way to tell where a given style is used from, or even if it's used at all, and if the DOM hierarchy changes then your styles all change randomly (with, again, no way to tell what changed or where or why). Also "minimizes clientside processing" is a myth, I don't know what the implementation is but it ends up being slower and heavier than normal. Who ever thought this was a good idea?


> There was no way to tell where a given style is used from, or even if it's used at all

It's pretty easy. Open the inspector, select an element and you will find all the styles that apply. If you didn't try to be fancy and use weird build tools, you will also get the name of the file and the line number (and maybe navigation to the line itself). In Firefox, there's even a live editor for the selected element and the CSS file.

> if the DOM hierarchy changes then your styles all change randomly

Also styles are semantics like:

- The default appearance of links should be: ...

- All links in the article section should also be: ...

- The links inside a blockquote should also be: ...

- If a link has a class 'popup' it should be: ...

- The link identified as 'login' should be: ...

There's a section on MDN about how to ensure those rules are applied in the wanted order[1].

This way, your styles shouldn't need updates that often unless you change the semantics of your DOM.

[1]: https://developer.mozilla.org/en-US/docs/Web/CSS/Guides/Casc...


> It's pretty easy. Open the inspector, select an element and you will find all the styles that apply.

Of course it's not easy, 80% of that list will be some garbage like global variables I would only need when I actually see them in a style value, not all the time.

The names are often unintuitive, and search is primitive anyway, so that's of little help. And the values are just as bad, with --vars() and !important needless verbosity in this aborted attempt of a programming language

Then there is this potentiality more useful "Computed" styles tab, but even for the most primitive property: Width, it often fails and is not able to click-to-find where the style is coming from

> Also styles are semantics like:

That's another myth. You style could just be. ReactComponentModel.ReactComponentSubmodel.hkjgsrtio.VeryImportantToIncludeHash.List.BipBop.Sub

What does that inspire in you when you read it?


Someone is using Tailwind (and the likes) and/or CSS-in-JS. Which I dump squarely in the weird tools.


What do you dump the other 99% of the poor designs and papercuts on? But at any rate, shifting blame doesn't help you make a hard system easy by citing a broken tool


> Open the inspector, select an element and you will find all the styles that apply.

That tells me which styles apply to an element. You also need the converse - find which elements a given style applies to - and there's no way to do that AFAIK. It's very hard to ever delete even completely unused styles, because there is no way to tell (in the general case) whether a given style is used at all.

> This way, your styles shouldn't need updates that often unless you change the semantics of your DOM.

In my experience the DOM doesn't have semantics, or to the extent that it does, they change all the time.


> You also need the converse - find which elements a given style applies to - and there's no way to do that AFAIK.

I've never needed to do this, because I pay attention to my DOM structure and from CSS selectors can figure where a style applies. But I've just checked and the search bar for Firefox Inspector supports css selectors.

> In my experience the DOM doesn't have semantics, or to the extent that it does, they change all the time.

The DOM semantics are those of a hyper linked documents|forms. Take a page and think about what each elements means and their relations to each order. They will form a hierarchy with some generic components replicated. The due to how CSS is applied, you go from generic to specific elements, using the semantic structure to do the targeting

As an example, the structure of HN's reply page is

  page
    header
      logo + Title
    body
      comment_box
        upvote_button + comment_metadata
        comment_text
      textbox // reply text
      button // reply
This and the structure of the other pages will give you an insight on how to target the relevant elements.


> As an example, the structure of HN's reply page

Is made up of table tags, which the CSS people will tell you is wrong/impossible/has different semantics.


> Is made up of table tags, which the CSS people will tell you is wrong/impossible/has different semantics

A table is a grid. Lot of UI toolkits have a grid container, and even CSS added it specifically as a layout engine.


> A table is a grid. Lot of UI toolkits have a grid container

Sure. As long as the end result is the same grid, it shouldn't matter. But in a CSS world you switch your table for grid-layout divs (or vice versa) and suddenly one corner case thing that is in one grid cell somewhere in your app gets its styling flipped.


You should talk the people behind the vanillajs framework, this sounds like it might work well over there.

http://vanilla-js.com/


> Why does AI need that folder structure? Why not a flat list of files and let the AI agent explore with BM25 / grep, etc.

Progressive disclosure, same reason you don't get assaulted with all the information a website has to offer at once, or given a sql console and told to figure it out, and instead see a portion of the information in a way that is supposed to naturally lead you to finding the next and next bits of information you're looking for.

> use cases

This is essentially just where you're moving the hierarchy/compression, but at least for me these are not very disjoint and separable. I think what I actually want are adaptable LoRa that loosely correspond to these use cases but where a dense discriminator or other system is able to adapt and stay in sync with these too. Also, tool-calling + sql/vector embeddings so that you can actually get good filesystem search without it feeling like work, and let the model filter out the junk.

> let the AI calculate this at run time?

You still do want to let it do agentic RAG but I think more tools are better. We're using sqlite-vec, generating multimodal and single-mode embeddings, and trying to make everything typed into a walkable graph of entity types, because that makes it much easier to efficiently walk/retrieve the "semantic space" in a way that generalizes. A small local model needs at least enough structure to know these are the X ways available to look for something and they are organized in Y ways, oriented towards Z and A things.

Especially on-device, telling them to "just figure it out" is like dropping a toddler or autonomous vehicle into a dark room and telling them to build you a search engine lol. They need some help and also quite literally to be taught what a search engine means for these purposes. Also, if you just let them explore or write things without any kind of grounding in what you need/any kind of positive signals, they're just going to be making a mess on your computer.


Maybe it depends on the use case, but my opinion is, if you do need to apply compression, it should be done via a tool call real time instead of in a pipeline.

For example, if you’re trying to summarize the status of a project, instead of feeding an agent (in real time or via summarization pipeline), it’s better to write a script that summarizes the status of all of the jira tickets, instead of asking the agent to read all of the tickets to create a summary

Another small data point, I think people would prefer to ask questions of an AI model instead of reading the generated summaries.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: