Hacker Newsnew | past | comments | ask | show | jobs | submit | mhitza's commentslogin

"Nvidia’s $100 billion OpenAI deal has seemingly vanished" https://arstechnica.com/information-technology/2026/02/five-...

"Disney cancels $1B deal with OpenAI after video platform Sora is shut down: 'The future is human'" https://finance.yahoo.com/sectors/technology/articles/disney...

And if I recall correctly the AI datacenter deal isn'tdoing Oracle stock any favours.


I haven't thought about any secondary play, but if these companies converge on Google's TPUs, they would probably eagerly slice from NVIDIA's current market.

> In September 2025, Google is in talks with several "neoclouds," including Crusoe and CoreWeave, about deploying TPU in their datacenter. In November 2025, Meta is in talks with Google to deploy TPUs in its AI datacenters.

https://en.wikipedia.org/wiki/Tensor_Processing_Unit


I keep getting notification from my tooling that gemini models are overloaded so we switched you to openai. So I feel google is not ready to sell tpu’s just yet.

If you're building agentic processes (harnesses) for business processes local models are a great way to do that, while keeping your data, and any personal data, private.

If you're vibe coding a codex/claude subscription makes more sense as a more polished experience.

I don't vibe code, but I use self hosted models with codex for code review and snippet generation.


In the article it states that this person had an account that would have been limited to $2000 in usage.

And the system automatically upgraded them to higher spending limits when they crossed the $1000 in usage costs.

They could definitely make that an opt-in feature.


Yea, makes no sense for it to be opt out. Otherwise it just means there are no limits.

This is the LLM integration approach I was pitching last year to some companies. Though in my case it was strictly tied to self-hosted inference.

Agents at the edge of business where they can work independently, asynchronously, is an approach that I don't feel was explored enough in business environments.

Sending your entire communication and documents to OpenAI would be a very bold choice.


Not only are businesses already doing that - they're not even cleaning up their source material so LLMs are generating garbage outputs from the old inconsistent trash that haunts Confluence, Google Drive, and all of the other dumping grounds for enterprise ephemera. Oftentimes "AI transformation" is just a slightly better search engine that regurgitates your old strategy (that didn't work the first time) and wraps it up in new sycophantic language that C-levels use to bulldoze the budgets and timelines of actual skilled front line employees.

I do believe that LLMs and AI provide actual value, but the "workspace" is usually the passive aggressive CYA battleground for employees to appear productive in-spite of leadership's blind-spots, ossified business practices, and "aligned" decision-making that doesn't actually fix a broken org. Maybe this release will be the one that finally challenges nepo-hires, not-invented here, and all of the other corpo crap that defines "enterprise" business.


Cleaning up source material is not easy work in companies that have massive piles of it and don't exactly know which parts of it are wrong. Quite often these documents are poorly versioned and do work for something but not exactly what you're looking for.

With this said, you can use your incorrect AI answers to find and then purge or repair this old and/or poorly written documentation and improve the output.


I agree - and I've noticed that these AI transformations tend to lay bare the many issues, inconsistencies, and other problems with workspace functions and data. Unfortunately the people that are usually in charge of these projects do not have the seniority or sway to actually change the broken processes or aren't on the right team to remove cruft. Usually you have to wait until a salesperson misquotes something from an AI summary before these issues get unblocked because they actually affected revenue.

Free Monads are a very nice (though not performant) way of creating an embedded domain specific language interpreter.

Once I was building a declarative components library in PHP, using the ideas I've learned from free monads. I'm sure you can't imagine what an attrocity I've built. It did the job, but I had to mentally check out and throw in a couple of goto's in my main evalution loop.

All that to say that elegance of expressivity is tied to the syntax and semantics of languages.


Free Monads are also built on a tower of mathematical structures that come with laws and invariants. I have yet to see such formalization for transducers.

Featureatis. Just keep pumping out features with no thought. Today, probably also AI-coded .

Even in mid-sized projects if you keep pushing for only new features you'll get a similar system. At least my experience in 3 or so midsized projects that I've worked on where nothing else mattered than checking of features from a huge backlog.


Ah, been at a company like that once before. After a while a dedicated team was created to go in and fix broader issues and essentially stop the system from collapsing under its own weight.

It's a MoE model and the A3B stands for 3 Billion active parameters, like the recent Gemma 4.

You can try to offload the experts on CPU with llama.cpp (--cpu-moe) and that should give you quite the extra context space, at a lower token generation speed.


Mac has unified memory, so 36GB is 36GB for everything- gpu,cpu.

CPU-MoE still helps with mmap. Should not overly hurt token-gen speed on the Mac since the CPU has access to most (though not all) of the unified memory bandwidth, which is the bottleneck.

I'll try to use that, but llama-server has mmap on by default and the model still takes up the size of the model in RAM, not sure what's going on.

Try running CPU-only inference to troubleshoot that. GPU layers will likely just ignore mmap.

For sure I was running on autopilot with that reply. Though in Q4 I would expect it to fit, as 24B-A4B Gemma model without CPU offloading got up to 18GB of VRAM usage

Do I expect the same memory footprint from an N active parameters as from simply N total parameters?

No - this model has the weights memory footprint of a 35B model (you do save a little bit on the KV cache, which will be smaller than the total size suggests). The lower number of active parameters gives you faster inference, including lower memory bandwidth utilization, which makes it viable to offload the weights for the experts onto slower memory. On a Mac, with unified memory, this doesn't really help you. (Unless you want to offload to nonvolatile storage, but it would still be painfully slow.)

All that said you could probably squeeze it onto a 36GB Mac. A lot of people run this size model on 24GB GPUs, at 4-5 bits per weight quantization and maybe with reduced context size.


i dont get it, mac has unified memory how would offloading experts to cpu help?

I bet the poster just didn’t remember that important detail about Macs, it is kind of unusual from a normal computer point of view.

I wonder though, do Macs have swap, coupled unused experts be offloaded to swap?


Of course the swap is there for fallback but I hate using it lol as I don't want to degrade SSD longevity.

Extra problems with the copyright industry for no benefit.

Hope the owner's OpSec was good enough and we won't hear about their unmasking.


They have a 500k[1] reward for finding OPSEC failures, so I think they have the basics down.

[1]https://software.annas-archive.gl/AnnaArchivist/annas-archiv...


No way Anna’s archive has $500k

Why not? Are they going to scam the person who completes the Google Books bounty for 200k?

Extra? I thought they were clearly violating IP law to begin with. Unless I misunderstand this is "water is wet" territory (both the judgment as well as what Anna's Archive did).

Extra, because with the piracy of music they bought into equation members of (and implicitly) the recording industry https://en.wikipedia.org/wiki/Recording_Industry_Association...

Water isn't wet, but it does "wet" other things. Wetness is the degree to which a liquid contacts and adheres to a solid surface, so it's makes no sense to say that water is wet.

I do not see any law being violated by Anna's Archive in the slightest.

Just because you disagree with a law doesn't mean that it doesn't exist. You anti copyright shills are exhausting... Why can't you try to attract people to your side to eventually instead effect some real change? Do you just take that much pleasure in being an edgelord that your cause be damned?

Just use it to train / tune a LLM. Apparently, everything becomes legal if you only put the stuff into the right kind of software.

That's at least what many people like to argue here on HN.


Anna's wants[1] companies to train on their data.

[1] https://annas-archive.gl/blog/ai-copyright.html


Thanks a lot, that's an interesting read and they make an interesting case.

I would have thought all big AI companies used Anna's Archive, but apparently only some of the US based companies used them.


hmm you are right, I too wish the same brother

Contrast looks good for the text, but the font used has very thin lines. A thicker font would have been readable by itself. At 250% page zoom it's good enough, if you don't enable the browser built-in reader mode.

Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: