It's the next step removed from the tablet based ordering that has taken over in restaurants. Like those tablets, it won't be everywhere, but its easy to imagine it being ubiquitous, especially in chain stores.
Block auto regressive generation can give you big speedups.
Consider that outputting two tokens at a time will be a (2-epsilon)x speedup over running one token at a time. As your block size increases, you quickly get to fast enough that it doesn't matter sooooo much whether you're doing blocks or actual all-at-once generation. What matters, then, is there quality trade-off for moving to block-mode output. And here it sounds like they've minimized that trade-off.
can it go back and use future blocks as context? Thats what i'm most interested in here - fixing line 2 because of a change/discovery we made in the process of writing line 122. I think that problem is a big part of the narrowsightedness of current coding models
Exactly. The current (streaming) way means that once it makes a decision, it's stuck with it. For example, variable naming: once it names it something, it's stuck using that name in the future. Where as a human would just go back and change the name.
Maybe "thinking" will fix this aspect, but I see it as a serious shortcoming.
The special thing is that it’s decentralized. I know this discussion will not resolve and I’m not a blockchain zealot. I do think it’s an elegant decentralized storage system for algorithmic art where you make outputs definitive and collectible after initiating a run.
I think that's more than fair - "I like blockchain for decentralized proof of ownership more than other methods for the same." is as fine a preference as any other, of course.
I'm kind of excited about that though. What I've come to realize is that automated testing and linting and good review tools are more important than ever, so we'll probably see some good developments in these areas. This helps both humans and AIs so it's a win win. I hope.
> it's looking like assessment and evaluation are massive bottlenecks.
So I think LLMs have moved the effort that used to be spent on fun part (coding) into the boring part (assessment and evaluation) that is also now a lot bigger..
You could build (code, if you really want) tools to ease the review. Of course we already have many tools to do this, but with LLMs you can use their stochastic behavior to discover unexpected problems (something a deterministic solution never can). The author also talks about this when talking about the security review (something I rarely did in the past, but also do now and it has really improved the security posture of my systems).
You can also setup way more elaborate verification systems. Don't just do a static analyis of the code, but actually deploy it and let the LLM hammer at it with all kinds of creative paths. Then let it debug why it's broken. It's relentless at debugging - I've found issues in external tools I normally would've let go (maybe created an issue for), that I can now debug and even propose a fix for, without much effort from my side.
So yeah, I agree that the boring part has become the more important part right now (speccing well and letting it build what you want is pretty much solved), but let's then automate that. Because if anything, that's what I love about this job: I get to automate work, so that my users (often myself) can be lazy and focus on stuff that's more valuable/enjoyable/satisfying.
When writing banal code, you can just ask it to write unit tests for certain conditions and it'll do a pretty good job. The cutting edge tools will correctly automatically run and iterate on the unit tests when they dont pass. You can even ask the agent to setup TDD.
Cars removed the fun part (raising and riding horses) and automatic transmissions removed the fun part (manual shifting), but for most people it's just a way to get from point A to B.
It's far more sane to review a complete PR than to verify every small change. They are like dicey new interns - do you want to look over their shoulder all day, or review their code after they've had time to do some meaningful quantum of work?
> It's far more sane to review a complete PR than to verify every small change.
Especially when the harness loop works if you let it work. First pass might have syntax issues. The loop will catch it, edit the file, and the next thing pops up. Linter issues. Runtime issues. And so on. Approving every small edit and reading it might lead to frustrations that aren't there if you just look at the final product (that's what you care about, anyway).
Bad idea, shutter speed was 1/4 apparently (https://news.ycombinator.com/item?id=47632457), even the small rotational inertia everything in zero gravity gets from a human "dropping" it would probably be enough to be annoying, you'd get a better shot holding it.
I would love to see the effect of the mirror's effect on the motion of the camera in a weightless environment. I bet it's enough to measurably affect the picture, especially on a long exposure. Net torque of it opening and then closing should be near (but probably not exactly) zero, but while it's open the camera should spin a tiny amount.
I haven’t looked at the manual but it likely has the ability to flip and keep the mirror up for direct capture on the sensor without the mirror flipping up and down between exposures.
reply