Hacker Newsnew | past | comments | ask | show | jobs | submit | bobbiechen's commentslogin

Yeah, I had the same question myself. I think that's what you would want to do to make it airtight (plus some amount of rate limiting or flagging for devices that are part of dedicated device farms).

But even if not, there's still value in raising the barrier to entry. For example, you can buy 1000 reCaptcha solves for $1-2 from various captcha-solver services. And yet that $0.001-per-request fee does discourage mass-scale bot attacks.


... You... think... it would be a good thing.

Don't you...


I do. It has downsides of course, but what's the alternative at this point?

Depends on your specific problem. Usually redesign your system not to need to care if the other end is a bot or not.

How though? Can you also avoid DDoS simply by designing your system to not care if the requester is a bot or not.

Let's say I'm running https://grep.app/ for example. AI bots start heavily using it, costing me a ton of money. How would you magically design this so it doesn't matter if the end bots are using it?


Rate limit individual clients.

Let's play this out: how do you determine individual clients? By ip? By seasionid?

I suspect that the HN crowd is somehow insulated from the river of crap and fraud that is the internet experience for a majority of the population.

This is generally true of every application that handles sensitive data. Unless you explicitly clear that memory, it's likely to hang around forever.

For example, here is a 2019 writeup from KeePassXC with similar notes: https://keepassxc.org/blog/2019-02-21-memory-security/ - even though they explicitly clear sensitive data, there is still a window of opportunity.

During my time working on confidential computing, we had a variety of demos showing similar attacks against lots of different datastores, scripts, etc. That's just how computers work and your options are very limited if this is part of your threat model (imo just confidential computing and, if you can handle the performance hit, fully-homomorphic encryption).


Windows already has a secure kernel credential store, they could move the Edge password store there with a bit of effort, minimize the splash damage when you retrieve a single password to send over HTTP from the regular user space.

> Credential Guard prevents credential theft attacks by protecting NTLM password hashes, Kerberos Ticket Granting Tickets (TGTs), and credentials stored by applications as domain credentials.

> Credential Guard uses Virtualization-based security (VBS) to isolate secrets so that only privileged system software can access them.

https://learn.microsoft.com/en-us/windows/security/identity-...


This only works if credential guard has implemented a way to build a subsequent token/value from that secret. For things like basic auth the secret would need to eventually hit the userland process that needs it in some shape or form to then embed it in the HTTP payload which is plaintext.

Surely the prevalence of this saying contributes to the jailbreak's effectiveness.

I called this last year: https://digitalseams.com/blog/the-ai-lifestyle-subsidy-is-go... .

I see it as no different from the previous generation of consumer startups burning money - as Derek Thompson wrote,

> ...if you woke up on a Casper mattress, worked out with a Peloton, Ubered to a WeWork, ordered on DoorDash for lunch, took a Lyft home, and ordered dinner through Postmates only to realize your partner had already started on a Blue Apron meal, your household had, in one day, interacted with eight unprofitable companies that collectively lost about $15 billion in one year.


Everyone called it last year and the year before.

The conversation around AI being cheap now started when ChatGPT launched in 2023


It's an interesting concept but unfortunately I think the comment is actually AI slop so there's no real story behind it. Check the account history.

If I understand correctly, threat model here seems to be to protect against accidental issues that would impact performance, but doesn't cover malicious actor.

For example, Sketchy Provider tells you they are running the latest and greatest, but actually is knowingly running some cheaper (and worse) model and pocketing the difference. These tests wouldn't help since Sketchy Provider could detect when they're being tested and do the right thing (like the Volkswagen emissions scandal). Right?


Providers like OpenRouter default to the cheapest provider. They are often cheap because they are rediculously quantized and tuned for throughput, not quality.

This is probably kimi trying to protect their brand from bargain basement providers that dont properly represent what the models are capable of.


Openrouter has “exacto” verified models trying to combat this, but it seems like it’s not available for most of the models.


> This is probably kimi trying to protect their brand from bargain basement providers that dont properly represent what the models are capable of.

I'm curious what exactly they mean by this...

"because we learned the hard way that open-sourcing a model is only half the battle."


I'd take it at face value. Since they release open weights they would appear to genuinely want other providers to serve this as well as themselves, but the benefit of this depends on it being served accurately.


I agree, but how about some details.


Kimi, GLM, and Minimax are the "Big Three" of open source Chinese AI startups. There's also Qwen and DeepSeek but they are all subsidized by other lines of business.

The Chinese AI models are generally 5-6 months behind high end SOTA western models (and as of the time of this comment it's Opus 4.7 and ChatGPT 5.4 Thinking, it's rumored however that the Mythos and Spud codename models are even better).

To gain market share, the Chinese startup use open source as a distribution strategy and essentially made mid-high end AI a commodity. The best models are still Western but for any application that doesn't require the highest performance in the market or if there's a need for extensive customization or alignment (imagine if you are an oil rich petro state and you don't want your national AI strategy to be tied to liberal international order ideology).

It creates a lot of pricing pressure on the low and mid end, and it's also why Anthropic is desperately trying to go full B2B instead.

However if the third parties hosting the Chinese models at near cost doesn't perform good quality control, it ruins the strategy because customers are not inclined to use chinese models anymore (and first party hosting on chinese infrastructure is out of the question because of geopolitical reasons, so everybody hides behind the polite fiction of using resellers like OpenRouter, Fal.ai, Wavespeed, fireworks AI etc.).


I've been burned on openrouter getting routed through terrible quants with equally terrible quality. While paying maybe 15% less.

Nearly a year ago it was impossible to avoid it due to silly openrouter routing algorithm and the api. You had to set multiple things just right to make it work.

Similar to their other api quirks. You want valid json format response? sure, set response_format to "json" just like our documentation suggests. Oh, it only works some of the time? How silly, why would you expect it to work all of the time? If you want it to work more often, set require_params to true. We may still use other providers that don't offer it, but you want that, right? You don't? Well, then set our "very_require_params" to "very_true". And then switch a few toggles in the frontend. Oh and also add these 7 lines just so your other config options don't break. Oh wait they will break, how silly of us Is there any way to make it work as advertised? Of course no!

Sorry for the semi-offtopic rant. I still use them every day though, but not for open models anymore.


Catching accidental drift is still worth a lot. It's basically the same idea as performance regression tests in CI, nobody writes those because they expect sabotage. It's for the boring stuff, like "oops, we bumped a dep and throughput dropped 15%".

If someone actually goes out of their way to bypass the check, that's a pretty different situation legally compared to just quietly shipping a cheaper quant anyway.


Also it's not just about running an obviously worse quant.

Running different GPU kernels / inference engines also matters. It's easy to write an implementation that is faster and thus cheaper but numerically much noisier / less accurate.


Yeah, the threat model is nonexistent. Most people use a dozen or so well known providers, who have no incentives to so obviously cheat.


Yes and no.

For a truly malicious actor, you're right. But it shifts it from "well we aren't obviously committing fraud by quantizing this model and not telling people" to "we're deliberately committing fraud by verifying our deployment with one model and then serving customer requests with another".

I suspect there's a lot of semi-malicious actors who are only happy to do the former.


Seems like a great challenge for all these systems, see fromtier labs serving quants when under hesvy load.


I love how many interviews Larry Tesler did (he passed away in 2020), he was so influential and it's interesting to see what that looks like from the inside.

Gypsy (that first modeless editor) recently turned 50 years old and I wrote about it here largely from those first-hand accounts: https://digitalseams.com/blog/the-gypsy-document-editor-cele...

And it's not mentioned in this ACM interview but rather this one with the Computer History Museum https://archive.computerhistory.org/resources/access/text/20... that implementing a modeless editor was easier too, since you could use a simple case-switch instead of having a bunch of explicit modules for each mode.


He was so passionate about no modes he had a personal number plate for his car that was “NO MODES”



It's interesting how many people I know who jump instantly from hobby to thinking about hustling, Etsy, Patreon, fame, etc. and the thought that they'll never be good enough to go pro is a real barrier. You don't need to monetize your joy.


err ok but i wasnt saying to monetize (not that you were saying it)


I was agreeing with you! Though I can see how I came on a little strong there.


Enterprise userscripts? Very neat, though I wonder if typical enterprise security policies would allow for this.


Unless the browser locks down devtools you can't you always run userscripts to some extent?


One way to solve it is to partner with the enterprise directly and work within their guardrails

Shameless plug: my company does it, live with Series B companies.


got our extension approved, post which we had no issues.


What LLM tool are you using to write this comment? It must have been really good to lift the stress of _10 years_ of never commenting?

https://news.ycombinator.com/newsguidelines.html#generated


I’ll use an LLM, often Claude, to tighten up my writing. I have a tendency to use too many words.

I brain dump my candid reaction / thinking, and then I’ll get something to tighten it up. No LLM used for this follow up.

I apologize if my use of Claude to tidy up my thoughts was offensive — here was my unfiltered, original comment:

> There's a new type of product and service that's now possible with LLMs improving each month. The new value prop is shifting from time savings to stress relief.

Tools need to be built around human psychology like the self-checkout example. It's not faster, but it provides relief. Some tools, while powerful, can add anxiety to one's day, especially if it's built promising efficiency, but the user feels like they're not getting more done, getting things done faster, or both.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: