> pretty diligent about applying search blocklists, closing hacking loopholes, and reading model outputs to catch unanticipated hacks. If we wanted to, we could choose to close our eyes and plug our ears and report higher scores for Terminal-bench, SWE-bench, etc. that technically comply with the reference implementation but aren't aligned with real value delivered to users
Of course, but that's the difference between sins of commission and sins of omission. The question is what "pretty diligent" actually translates to in practice. How many people will encourage delays in a model release or post-training improvement waiting "for more thorough evaluation"? How many popularized AI results can you vouch for on this?
The zeitgeist is to celebrate bias for action, avoiding analysis paralysis and shipping things (esp. with conference driven research culture, even before we get into thorny questions of market dynamics), so even if we have a few pockets of meticulous excellence, the incentive structure pushes towards making the whole field rot.
Is the cost of laying fiber (via a public utility) to each household counted as part of the monthly internet bill, or is that funded separately? (eg. as part of property taxes)
Yes, and that becomes more intuitive when you "un-curry" the nested lambdas into a single lamba with twice the number of arguments. The point is that the state of a constant does not depend whatsoever on the state of the (rest of the) world, how much ever of that state piles on.
Yes. Cars have fixed cost ranges for people so the end result is pretty much predetermined - electric cars settling at the same price and quality points as ICE cars are today.
> what's wrong with legitimately not knowing what e.g. the data structure will end up looking?
But that's not what the above comment said.
> Just let it run, check debugger/stdout/localhost page and adjust: "Oh, right, the entries are missing canonical IDs, but at the same time there are already all the comments in them, forgot they would be there
So you did have an expectation that the entries should have some canonical IDs, and anticipated/desired a certain specific behavior of the system.
Which is basically the meaning of "what will the output be?" when simplified for programming novices at university.
I wonder whether the blast radius of the law might interfere with OSs running on cloud machines. That might explain why California based companies in the cloud business might want to ensure that the bits they resell are compliant.
To elaborate on @jeswin's point above (IDK why it got downvoted)... a data structure is basically like a cache for the processing algorithm. The business logic and algorithm needs will dictate what details can be computed on-the-fly -vs- pre-generated and stored (be it RAM or disk). Eg: if you're going to be searching a lot then it makes sense to augment the database with some kind of "index" for fast lookup. Or if you are repeatedly going to be pllotting some derived quantity then maybe it makes sense to derive that once and store with the struct.
It's not enough for a data structure to represent the "fundamental" degrees of freedom needed to model the situation; the algorithmic needs (vis-a-vis the available resources) most definitely matter a lot.
Bad analogy. The things I delegate to a calculator, I'm absolutely sure I understand well (and could debug if need be). These are also very legible skills that are easy to remind myself by re-reading the recipe -- so I'm not too worried about skills "atrophying".
> speculative decoding for bread and butter frontier models. The thing that I’m really very skeptical of is the 2 month turnaround. To get leading edge geometry turned around on arbitrary 2 month schedules is .. ambitious
Can we use older (previous generation, smaller) models as a speculative decoder for the current model? I don't know whether the randomness in training (weight init, data ordering, etc) will affect this kind of use. To the extent that these models are learning the "true underlying token distribution" this should be possible, in principle. If that's the case, speculative decoding is an elegant vector to introduce this kind of tech, and the turnaround time is even less of a problem.
https://github.com/shell-pool/shpool
reply