But that's exactly the root of the complaint. Because there's (for the sake of argument) only one syntactic concept, there's no bandwidth for structural concepts to be visible in the syntax. If you're used to a wide variety of symbols carrying the structural meaning (and we're humans, we can cope with that) then `)))))))` has such low information density as to be a problematic road bump. It's not that the syntax is hard to learn, it's that everything else you need to build a program gets flattened and harder to understand as a result.
Even among lisps this has been problematic, you can look at common lisp's LOOP macro as an attempt to squeeze more structural meaning into a non-S-expression format.
My money's on whatever models qwen does release edging ahead. Probably not by much, but I reckon they'll be better coders just because that's where qwen's edge over gemma has always been. Plus after having seen this land they'll probably tack on a couple of epochs just to be sure.
From what I've read, that's already part of their training. They are scored based on each step of their reasoning and not just their solution. I don't know if it's still the case, but for the early reasoning models, the "reasoning" output was more of a GUI feature to entertain the user than an actual explanation of the steps being followed.
At this point my bet is that the breakthrough isn't going to be qbits per chip, it's going to be entanglements-per-second in quantum networking. If you could string together simpler processors in a cluster at anything approaching interesting scales then all of a sudden the orders of magnitude become a lot less constrained and it's just a money problem.
Quantum networking is a lesser problem than changing the state and keeping intact long enough. You can already move quantum state over fiber optics pretty reliably, so transport exists, but what then? You need to put the qubits of the connected chip into the corresponding state (which takes time), and do it many times, and all that time is an overhead.
Superconducting QCs are fast, but the state degrades incredibly quickly, so you only have a fraction of a second (maybe a millisecond at best, currently) until the entire state is garbage. Some other modalities like trapped ion are the opposite: state can live long, but each operation is orders of magnitude slower.
For quite a long time there will be a greater advantage to local processing for STT than for TTT chat, or even OCR. Being able to do STT on the device that owns the microphone means that the bandwidth off that device can be dramatically reduced, if it's even necessary for the task at hand.
The OS distro model is actually the right one here. Upstream authors hate it, but having a layer that's responsible for picking versions out of the ecosystem and compiling an internally consistent grouping of known mutually-compatible versions that you can subscribe to means that a lot of the random churn just falls away. Once you've got that layer, you only need to be aware of security problems in the specific versions you care about, you can specifically patch only them, and you've got a distribution channel for the fixes where it's far more feasible to say "just auto-apply anything that comes via this route".
That model effectively becomes your ring 1. Ring 0 is the stdlib and the package manager itself, and - because you would always need to be able to step outside the distribution for either freshness or "that's not been picked up by the distro yet" reasons - the ecosystem package repositories are the wild west ring 2.
In the language ecosystems I'm only aware of Quicklisp/Ultralisp and Haskell's Stackage that work like this. Everything else is effectively a rolling distro that hasn't realised that's what it is yet.
On the one hand it's clearly suboptimal for any change, even ones that nothing depends on, to trigger a recompute. But also it feels like there's something a bit broken with spreadsheet dependency resolution in the first place. I've never been able to nail down a test case, but models seem to go over a performance cliff at a certain point. Ordinarily I'd put it down to something being unavoidably quadratic, but I've had cases where I'm certain that the same model is radically slower after being reloaded off disk.
Even among lisps this has been problematic, you can look at common lisp's LOOP macro as an attempt to squeeze more structural meaning into a non-S-expression format.
reply