More

regularfry · 2026-04-02T19:44:21 1775159061

But that's exactly the root of the complaint. Because there's (for the sake of argument) only one syntactic concept, there's no bandwidth for structural concepts to be visible in the syntax. If you're used to a wide variety of symbols carrying the structural meaning (and we're humans, we can cope with that) then `)))))))` has such low information density as to be a problematic road bump. It's not that the syntax is hard to learn, it's that everything else you need to build a program gets flattened and harder to understand as a result.

Even among lisps this has been problematic, you can look at common lisp's LOOP macro as an attempt to squeeze more structural meaning into a non-S-expression format.

regularfry · 2026-04-02T17:16:45 1775150205

My money's on whatever models qwen does release edging ahead. Probably not by much, but I reckon they'll be better coders just because that's where qwen's edge over gemma has always been. Plus after having seen this land they'll probably tack on a couple of epochs just to be sure.

regularfry · 2026-04-02T17:13:57 1775150037

Thinking vs non-thinking. There'll be a token cost there. But still fairly remarkable!

DoctorOetker · 2026-04-02T17:42:27 1775151747

Is there a reason we can't use thinking completions to train non-thinking? i.e. gradient descent towards what thinking would have answered?

joshred · 2026-04-02T18:08:07 1775153287

From what I've read, that's already part of their training. They are scored based on each step of their reasoning and not just their solution. I don't know if it's still the case, but for the early reasoning models, the "reasoning" output was more of a GUI feature to entertain the user than an actual explanation of the steps being followed.

regularfry · 2026-04-02T16:18:13 1775146693

At this point my bet is that the breakthrough isn't going to be qbits per chip, it's going to be entanglements-per-second in quantum networking. If you could string together simpler processors in a cluster at anything approaching interesting scales then all of a sudden the orders of magnitude become a lot less constrained and it's just a money problem.

freetonik · 2026-04-02T17:29:08 1775150948

Quantum networking is a lesser problem than changing the state and keeping intact long enough. You can already move quantum state over fiber optics pretty reliably, so transport exists, but what then? You need to put the qubits of the connected chip into the corresponding state (which takes time), and do it many times, and all that time is an overhead.

Superconducting QCs are fast, but the state degrades incredibly quickly, so you only have a fraction of a second (maybe a millisecond at best, currently) until the entire state is garbage. Some other modalities like trapped ion are the opposite: state can live long, but each operation is orders of magnitude slower.

regularfry · 2026-04-02T15:16:12 1775142972

Everyone may want the best, but the amount of AI-addressable work outstrips the budget available for buying the best by quite a wide margin.

regularfry · 2026-04-02T10:20:36 1775125236

Looks like an incremental improvement, technically. Seems to benchmark around Kimi K2.5 but it's cheaper and faster.

regularfry · 2026-03-31T22:41:05 1774996865

For quite a long time there will be a greater advantage to local processing for STT than for TTT chat, or even OCR. Being able to do STT on the device that owns the microphone means that the bandwidth off that device can be dramatically reduced, if it's even necessary for the task at hand.

regularfry · 2026-03-31T08:57:28 1774947448

The OS distro model is actually the right one here. Upstream authors hate it, but having a layer that's responsible for picking versions out of the ecosystem and compiling an internally consistent grouping of known mutually-compatible versions that you can subscribe to means that a lot of the random churn just falls away. Once you've got that layer, you only need to be aware of security problems in the specific versions you care about, you can specifically patch only them, and you've got a distribution channel for the fixes where it's far more feasible to say "just auto-apply anything that comes via this route".

That model effectively becomes your ring 1. Ring 0 is the stdlib and the package manager itself, and - because you would always need to be able to step outside the distribution for either freshness or "that's not been picked up by the distro yet" reasons - the ecosystem package repositories are the wild west ring 2.

In the language ecosystems I'm only aware of Quicklisp/Ultralisp and Haskell's Stackage that work like this. Everything else is effectively a rolling distro that hasn't realised that's what it is yet.

regularfry · 2026-03-26T08:55:46 1774515346

Its existence has been used by the devs as a reason not to prioritise fixing user-facing bugs. It really should be in core at this point.

regularfry · 2026-03-26T08:54:31 1774515271

On the one hand it's clearly suboptimal for any change, even ones that nothing depends on, to trigger a recompute. But also it feels like there's something a bit broken with spreadsheet dependency resolution in the first place. I've never been able to nail down a test case, but models seem to go over a performance cliff at a certain point. Ordinarily I'd put it down to something being unavoidably quadratic, but I've had cases where I'm certain that the same model is radically slower after being reloaded off disk.