More

mhitza · 2026-04-24T23:21:15 1777072875

"Nvidia’s $100 billion OpenAI deal has seemingly vanished" https://arstechnica.com/information-technology/2026/02/five-...

"Disney cancels $1B deal with OpenAI after video platform Sora is shut down: 'The future is human'" https://finance.yahoo.com/sectors/technology/articles/disney...

And if I recall correctly the AI datacenter deal isn'tdoing Oracle stock any favours.

mhitza · 2026-04-24T21:01:46 1777064506

I haven't thought about any secondary play, but if these companies converge on Google's TPUs, they would probably eagerly slice from NVIDIA's current market.

> In September 2025, Google is in talks with several "neoclouds," including Crusoe and CoreWeave, about deploying TPU in their datacenter. In November 2025, Meta is in talks with Google to deploy TPUs in its AI datacenters.

https://en.wikipedia.org/wiki/Tensor_Processing_Unit

dzhiurgis · 2026-04-24T21:52:58 1777067578

I keep getting notification from my tooling that gemini models are overloaded so we switched you to openai. So I feel google is not ready to sell tpu’s just yet.

mhitza · 2026-04-23T18:00:33 1776967233

If you're building agentic processes (harnesses) for business processes local models are a great way to do that, while keeping your data, and any personal data, private.

If you're vibe coding a codex/claude subscription makes more sense as a more polished experience.

I don't vibe code, but I use self hosted models with codex for code review and snippet generation.

mhitza · 2026-04-22T20:05:58 1776888358

In the article it states that this person had an account that would have been limited to $2000 in usage.

And the system automatically upgraded them to higher spending limits when they crossed the $1000 in usage costs.

They could definitely make that an opt-in feature.

pixl97 · 2026-04-22T20:31:53 1776889913

Yea, makes no sense for it to be opt out. Otherwise it just means there are no limits.

mhitza · 2026-04-22T18:25:16 1776882316

This is the LLM integration approach I was pitching last year to some companies. Though in my case it was strictly tied to self-hosted inference.

Agents at the edge of business where they can work independently, asynchronously, is an approach that I don't feel was explored enough in business environments.

Sending your entire communication and documents to OpenAI would be a very bold choice.

linkjuice4all · 2026-04-22T18:45:32 1776883532

Not only are businesses already doing that - they're not even cleaning up their source material so LLMs are generating garbage outputs from the old inconsistent trash that haunts Confluence, Google Drive, and all of the other dumping grounds for enterprise ephemera. Oftentimes "AI transformation" is just a slightly better search engine that regurgitates your old strategy (that didn't work the first time) and wraps it up in new sycophantic language that C-levels use to bulldoze the budgets and timelines of actual skilled front line employees.

I do believe that LLMs and AI provide actual value, but the "workspace" is usually the passive aggressive CYA battleground for employees to appear productive in-spite of leadership's blind-spots, ossified business practices, and "aligned" decision-making that doesn't actually fix a broken org. Maybe this release will be the one that finally challenges nepo-hires, not-invented here, and all of the other corpo crap that defines "enterprise" business.

pixl97 · 2026-04-22T20:03:19 1776888199

Cleaning up source material is not easy work in companies that have massive piles of it and don't exactly know which parts of it are wrong. Quite often these documents are poorly versioned and do work for something but not exactly what you're looking for.

With this said, you can use your incorrect AI answers to find and then purge or repair this old and/or poorly written documentation and improve the output.

linkjuice4all · 2026-04-22T20:19:59 1776889199

I agree - and I've noticed that these AI transformations tend to lay bare the many issues, inconsistencies, and other problems with workspace functions and data. Unfortunately the people that are usually in charge of these projects do not have the seniority or sway to actually change the broken processes or aren't on the right team to remove cruft. Usually you have to wait until a salesperson misquotes something from an AI summary before these issues get unblocked because they actually affected revenue.

mhitza · 2026-04-22T17:56:31 1776880591

Free Monads are a very nice (though not performant) way of creating an embedded domain specific language interpreter.

Once I was building a declarative components library in PHP, using the ideas I've learned from free monads. I'm sure you can't imagine what an attrocity I've built. It did the job, but I had to mentally check out and throw in a couple of goto's in my main evalution loop.

All that to say that elegance of expressivity is tied to the syntax and semantics of languages.

solomonb · 2026-04-22T19:27:22 1776886042

Free Monads are also built on a tower of mathematical structures that come with laws and invariants. I have yet to see such formalization for transducers.

mhitza · 2026-04-20T14:09:10 1776694150

Featureatis. Just keep pumping out features with no thought. Today, probably also AI-coded .

Even in mid-sized projects if you keep pushing for only new features you'll get a similar system. At least my experience in 3 or so midsized projects that I've worked on where nothing else mattered than checking of features from a huge backlog.

jamesfinlayson · 2026-04-21T03:42:20 1776742940

Ah, been at a company like that once before. After a while a dedicated team was created to go in and fix broader issues and essentially stop the system from collapsing under its own weight.

mhitza · 2026-04-16T14:14:03 1776348843

It's a MoE model and the A3B stands for 3 Billion active parameters, like the recent Gemma 4.

You can try to offload the experts on CPU with llama.cpp (--cpu-moe) and that should give you quite the extra context space, at a lower token generation speed.

abhikul0 · 2026-04-16T14:23:59 1776349439

Mac has unified memory, so 36GB is 36GB for everything- gpu,cpu.

zozbot234 · 2026-04-16T14:39:37 1776350377

CPU-MoE still helps with mmap. Should not overly hurt token-gen speed on the Mac since the CPU has access to most (though not all) of the unified memory bandwidth, which is the bottleneck.

abhikul0 · 2026-04-16T15:09:32 1776352172

I'll try to use that, but llama-server has mmap on by default and the model still takes up the size of the model in RAM, not sure what's going on.

zozbot234 · 2026-04-16T15:14:31 1776352471

Try running CPU-only inference to troubleshoot that. GPU layers will likely just ignore mmap.

mhitza · 2026-04-16T14:32:02 1776349922

For sure I was running on autopilot with that reply. Though in Q4 I would expect it to fit, as 24B-A4B Gemma model without CPU offloading got up to 18GB of VRAM usage

dgb23 · 2026-04-16T14:17:37 1776349057

Do I expect the same memory footprint from an N active parameters as from simply N total parameters?

daemonologist · 2026-04-16T14:35:53 1776350153

No - this model has the weights memory footprint of a 35B model (you do save a little bit on the KV cache, which will be smaller than the total size suggests). The lower number of active parameters gives you faster inference, including lower memory bandwidth utilization, which makes it viable to offload the weights for the experts onto slower memory. On a Mac, with unified memory, this doesn't really help you. (Unless you want to offload to nonvolatile storage, but it would still be painfully slow.)

All that said you could probably squeeze it onto a 36GB Mac. A lot of people run this size model on 24GB GPUs, at 4-5 bits per weight quantization and maybe with reduced context size.

pdyc · 2026-04-16T14:18:43 1776349123

i dont get it, mac has unified memory how would offloading experts to cpu help?

bee_rider · 2026-04-16T14:22:48 1776349368

I bet the poster just didn’t remember that important detail about Macs, it is kind of unusual from a normal computer point of view.

I wonder though, do Macs have swap, coupled unused experts be offloaded to swap?

abhikul0 · 2026-04-16T14:39:38 1776350378

Of course the swap is there for fallback but I hate using it lol as I don't want to degrade SSD longevity.

mhitza · 2026-04-15T08:26:58 1776241618

Extra problems with the copyright industry for no benefit.

Hope the owner's OpSec was good enough and we won't hear about their unmasking.

Cider9986 · 2026-04-15T14:32:24 1776263544

They have a 500k[1] reward for finding OPSEC failures, so I think they have the basics down.

[1]https://software.annas-archive.gl/AnnaArchivist/annas-archiv...

HDThoreaun · 2026-04-15T17:49:10 1776275350

No way Anna’s archive has $500k

Cider9986 · 2026-04-15T18:13:56 1776276836

Why not? Are they going to scam the person who completes the Google Books bounty for 200k?

fc417fc802 · 2026-04-15T08:36:37 1776242197

Extra? I thought they were clearly violating IP law to begin with. Unless I misunderstand this is "water is wet" territory (both the judgment as well as what Anna's Archive did).

mhitza · 2026-04-15T08:46:19 1776242779

Extra, because with the piracy of music they bought into equation members of (and implicitly) the recording industry https://en.wikipedia.org/wiki/Recording_Industry_Association...

ndsipa_pomu · 2026-04-15T13:30:01 1776259801

Water isn't wet, but it does "wet" other things. Wetness is the degree to which a liquid contacts and adheres to a solid surface, so it's makes no sense to say that water is wet.

shevy-java · 2026-04-15T08:53:24 1776243204

I do not see any law being violated by Anna's Archive in the slightest.

gertop · 2026-04-15T17:53:22 1776275602

Just because you disagree with a law doesn't mean that it doesn't exist. You anti copyright shills are exhausting... Why can't you try to attract people to your side to eventually instead effect some real change? Do you just take that much pleasure in being an edgelord that your cause be damned?

bulbar · 2026-04-15T09:16:31 1776244591

Just use it to train / tune a LLM. Apparently, everything becomes legal if you only put the stuff into the right kind of software.

That's at least what many people like to argue here on HN.

Cider9986 · 2026-04-15T14:33:32 1776263612

Anna's wants[1] companies to train on their data.

[1] https://annas-archive.gl/blog/ai-copyright.html

bulbar · 2026-04-16T04:51:41 1776315101

Thanks a lot, that's an interesting read and they make an interesting case.

I would have thought all big AI companies used Anna's Archive, but apparently only some of the US based companies used them.

lifecodes · 2026-04-15T08:35:14 1776242114

hmm you are right, I too wish the same brother

mhitza · 2026-04-14T11:31:15 1776166275

Contrast looks good for the text, but the font used has very thin lines. A thicker font would have been readable by itself. At 250% page zoom it's good enough, if you don't enable the browser built-in reader mode.