Hacker Newsnew | past | comments | ask | show | jobs | submit | ru552's commentslogin

You won't like it, but the answer is Apple. The reason is the unified memory. The GPU can access all 32gb, 64gb, 128gb, 256gb, etc. of RAM.

An easy way (napkin math) to know if you can run a model based on it's parameter size is to consider the parameter size as GB that need to fit in GPU RAM. 35B model needs atleast 35gb of GPU RAM. This is a very simplified way of looking at it and YES, someone is going to say you can offload to CPU, but no one wants to wait 5 seconds for 1 token.


That estimate doesn't account for context, which is very important for tool use and coding.

I used this napkin math for image generation, since the context (prompts) were so small, but I think it's misleading at best for most uses.


> You won't like it, but the answer is Apple.

Or strix halo.

Seems rather over simplified.

The different levels of quants, for Qwen3.6 it's 10GB to 38.5GB.

Qwen supports a context length of 262,144 natively, but can be extended to 1,010,000 and of course the context length can always be shortened.

Just use one of the calculators and you'll get much more useful number.


What Strix Halo system has unified memory? A quick google says it's just a static vram allocation in ram, not that CPU and GPU can actively share memory at runtime

All. Keep in mind strix != strix halo.

You can get tablets, laptops, and desktops. I think windows is more limited and might require static allocation of video memory, not because it's a separate pool, just because windows isn't as flexible.

With linux you can just select the lowest number in bios (usually 256 or 512MB) then let linux balance the needs of the CPU/GPU. So you could easily run a model that requires 96GB or more.


> What Strix Halo system has unified memory?

All of them. The static VRAM allocation is tiny (512MB), most of the memory is unified


Gemma 4 has made a lot of progress in this area. The model is phenomenal. It's size is workable. This is the worst it will ever be.

Now we just need the RAM market to get back to normal. Or at least fine OpenAI for speculating on raw wafers. There's an article on the front page [0] with this passage that gives me hope that consumer access to VRAM may improve

> On the infrastructure side: OpenAI signed non-binding letters of intent with Samsung and SK Hynix for up to 900,000 DRAM wafers per month, roughly 40% of global output. These were of course non-binding. Micron, reading the demand signal, shut down its 29-year-old Crucial consumer memory brand to redirect all capacity toward AI customers. Then Stargate Texas was cancelled, OpenAI and Oracle couldn’t agree terms, and the demand that had justified Micron’s entire strategic pivot simply vanished. Micron’s stock crashed.

[0] https://adlrocha.substack.com/p/adlrocha-how-the-ai-loser-ma...


Microns stock is still up 470% yoy

There's speculation that next Tuesday will be a big day for OpenAI and possibly GPT 6. Anthropic showed their hand today.


Sounds like a good opportunity to pause spending on nerfed 4.6 and wait for the new model to be released and then max out over 2 weeks before it gets nerfed again.



the performance degradation I've seen isn't quality/completion but duration, I get good results but much less quickly than I did before 4.6. Still, it's just anecdata, but a lot of folks seem to feel the same.


Been reading posts like these for 3 years now. There’s multiple sites with #s. I’m willing to buy “I’m paying rent on someone’s agent harness and god knows what’s in the system prompt rn”, but in the face of numbers, gotta discount the anecdotal.


You're probably right. It's probably more likely that for some period of time I forgot that I switched to the large context Opus vs Sonnet and it was not needed for the level of complexity of my work.


Yeah, why trust your actual experience over numbers? Nothing surer than synthetic benchmarks


Strawman, and, synthetic benchmark? :)


I don't believe that trackers like this are trustworthy. There's an enormous financial motive to cheat and these companies have a track record of unethical conduct.

If I was VP of Unethical Business Strategy at OpenAI or Anthropic, the first thing I'd do is put in place an automated system which flags accounts, prompts, IPs, and usage patterns associated with these benchmarks and direct their usage to a dedicated compute pool which wouldn't be affected by these changes.


This just looks like random noise to me? Is it also random on short timespans, like running it 10x in a row?


Explained in the methodology at the bottom of this page: https://marginlab.ai/trackers/claude-code/


That does not sound very believable. Last time Anthropic released a flagship model, it was followed by GPT Codex literally that afternoon.


Ya'll know they're teaching to the test. I'll wait till someone devises a novel test that isn't contained in the datasets. Sure, they're still powerful.


My understanding is GPT 6 works via synaptic space reasoning... which I find terrifying. I hope if true, OpenAI does some safety testing on that, beyond what they normally do.


From the recent New Yorker piece on Sam:

“My vibes don’t match a lot of the traditional A.I.-safety stuff,” Altman said. He insisted that he continued to prioritize these matters, but when pressed for specifics he was vague: “We still will run safety projects, or at least safety-adjacent projects.” When we asked to interview researchers at the company who were working on existential safety—the kinds of issues that could mean, as Altman once put it, “lights-out for all of us”—an OpenAI representative seemed confused. “What do you mean by ‘existential safety’?” he replied. “That’s not, like, a thing.”


Amusing! Even if they believe that, they should know the company communicated the opposite earlier.


No chance an openAI spokesperson doesnt know what existential safety is


I did not read the response as...

>Please provide the definition of Existential Safety.

I read:

>Are you mentally stable? Our product would never hurt humanity--how could any language model?


The absolute gall of this guy to laugh off a question about x-risks. Meanwhile, also Sam Altman, in 2015: "Development of superhuman machine intelligence is probably the greatest threat to the continued existence of humanity. There are other threats that I think are more certain to happen (for example, an engineered virus with a long incubation period and a high mortality rate) but are unlikely to destroy every human in the universe in the way that SMI could. Also, most of these other big threats are already widely feared." [1]

[1] https://blog.samaltman.com/machine-intelligence-part-1


Why are these people always like this.


Likely an improvement on:

> We study a novel language model architecture that is capable of scaling test-time computation by implicitly reasoning in latent space. Our model works by iterating a recurrent block, thereby unrolling to arbitrary depth at test-time. This stands in contrast to mainstream reasoning models that scale up compute by producing more tokens. Unlike approaches based on chain-of-thought, our approach does not require any specialized training data, can work with small context windows, and can capture types of reasoning that are not easily represented in words. We scale a proof-of-concept model to 3.5 billion parameters and 800 billion tokens. We show that the resulting model can improve its performance on reasoning benchmarks, sometimes dramatically, up to a computation load equivalent to 50 billion parameters.

<https://arxiv.org/abs/2502.05171>


Oh you mean literally the thing in AI2027 that gets everyone killed? Wonderful.


AI 2027 is not a real thing which happened. At best, it is informed speculation.


Funny if you open their website and go to April 2026 you literally see this: 26b revenue (Anthropic beat 30b) + pro human hacking (mythos?).

I don’t think predictions, but they did a great call until now.


I agree that they called many things remarkably well! That doesn't change the fact that AI 2027 is not a thing which happened, so it isn't valid to point out "this killed us in AI 2027." There are many reasons to want to preserve CoT monitorability. Instead of AI 2027, I'd point to https://arxiv.org/html/2507.11473.


That's sounds really interesting. Do you have some hints where to read more?


Oh, of course they will /s


The article says the policy change is separate and unrelated to Anthropic’s discussions with the Pentagon.


the article specifically says:

> The policy change is separate and unrelated to Anthropic’s discussions with the Pentagon, according to a source familiar with the matter.


I'm not fond of this trend of stating a position and attributing it to "a source familiar with the situation"

It combines interpretation of meaning with ambiguity to allow the reporter to assert anything they want. The ambiguity is there to protect the identity of the source but it has to be a more discrete disclosure of information in return. If you can't check the person you can still check what they said.

I would be ok with direct quotes from an anonymous source. That removes the interpretation of meaning at least.

As it is written, it would not be inaccurate to say this if their source was the lesswrong post, or even an earlier thread here on HN.

Phrasing "A source with direct knowledge of the situation" might remove some of the leeway for editorialising, but without sharing what the source actually said, it opens the door to saying anything at all and declaring "That's what I thought they meant" when challenged.

It's unfalsifyible journalism.


I really like how The Verge discusses this.

https://www.theverge.com/press-room/22772113/the-verge-on-ba...

On their podcast, they frequently bring up how tech company PR teams try to move as much conversation with journalists as possible into "on background", uncited, generic sourcing.


Virustotal at upload and periodically during the day


VirusTotal is completely useless for this though? You need enough people to be pwned by that particular piece of malware for it to be flagged as dangerous, by which point the attackers would've already repacked it so it doesn't match the previous signature.


Adding on here...

VirusTotal is flagging the trello skill as suspucious because it Does NOT include an API key? Am i expected to share my keys if I want to upload a skill?

https://clawhub.ai/steipete/trello

"Requiring TRELLO_API_KEY and TRELLO_TOKEN is appropriate for Trello access, but the registry records no required env vars while SKILL.md documents them. This omission is problematic: the skill will need highly privileged credentials but the published metadata does not disclose that requirement. The SKILL.md also references 'jq' and uses curl, but these are not declared in the registry entry."


You’ve completely missed the point, it’s saying that the skill will need you to provide a Trello API key but he hasn’t declared that it will need that

Subsequently they’ve included the use of curl but also haven’t declared that either which means that it _could_ leak your key if you provide it one. That’s why it’s suspicious - virus total has flagged that you should probably review the skill.md


Oh, I see. Seems obvious you would need an API key in this context but I get the idea that it's an undeclared but required var, which could be shady


sure it does, Bezo's space company and Google are both planning the same

Here's Sundar talking about doing it by 2027: https://www.businessinsider.com/google-project-suncatcher-su...


It's all BS. There is no viable way to put industrial levels of compute into a space based platform that can work within the severe thermal, power, mass/volume, radiation, reliability, and economic demands. It is just stupid smoke blowing to separate idiot investors from their money. J-school grads don't have a clue what they're parroting about.


You can talk to it in discord or whatsap or telegram etc. cause it's checking for you in a loop.

That's the biggest difference I can tell.


It was tough, but it wasn't Battletoads tough.


Was most of Battletoads tough, or just the sewers part? It's been so so long.


Everything after the first level was tough. Those damn speeder bikes.


I like this workflow


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: