> seems like soooo much efficiency waiting to be unlocked at the chip level
Well if you are exclusively using GPUs that are general purpose, of course you leave so much efficiency on the table. That’s why Google started making TPUs more than a decade ago. I remember that kerfuffle when Google fired Timnit Gebru when Gebru’s paper used GPUs to calculate the environment impact of LLMs while ignoring the efficiency of TPUs; this basically made Jeff Dean very angry due to that wide efficiency gap.
These NVIDIA GPUs aren't general purpose in the way that you think. They can't even run games. Nvidia blackwell is probably slightly more efficient than TPUs for training. Do you really expect a 4 trillion company with the majority of its revenue being AI for some years now, not to have built its flagship product fully around AI? The GPU name stuck around, but they are pretty terrible at graphics.
The real efficiency win in these chips is that they are made for inference only. You can throw away the vast majority of a chip if you only need a few ops, a single precision (like INT8 or FP8) and don't need ultra fast interconnects.
It kind of was. I really hate gaslighting, but GP is not inaccurate. Google claimed it did not meet their bar for publication because it ignored recent research on how to reduce the environmental and bias-related risks of LLMs. On the other hand, a large org is unlikely to subsidize high-profile research that makes it look bad. And Gebru was critical of Google’s internal culture and diversity efforts…
I also suspect that the frequency of outdoor exercise matters even if the total duration of outdoor exercise remains the same. Subjectively, I feel much healthier when doing thirty minutes of outdoor exercise six times a week, than when doing one hour of outdoor exercise three times a week. But then of course, all the causal effects could have been caused by a different factor (say dopamine release) than vitamin D.
That unfortunately doesn’t match my experience at all. My Claude often runs rg in the repo attempting to find things that need to be changed. And of course Claude still needs to invoke the build tool to ensure the change can be compiled, which necessarily involves reading almost every single file at least for a fresh checkout? Or did you envision the build tool being completely remote?
If the goal is just to make it work like google3, then hg and jj and sapling can all already achieve this. There’s no need for a new contender here. The differentiation must come from something else.
But of course at Google the file system part (CitC) is a layer beneath the version control system and is shared across different vcs tools.
I do think hosting is an important part of the VCS story. I agree that hg and jj and sapling are capable of being front ends to a google3 like backend GitHub like thing to support it (Google has this internally for jj). Of course some people are working on hosting solutions for these but it feels wrong to me that hosting platforms and their underlying VCS are not made by the same team. IMO people like google3 so much because it’s one integrated system which is the approach I’m trying with Oak.
Well even at Google the hosting solutions and the VCS are not made by the same team. I lack imagination in thinking how being made by the same team can improve things, but that’s on me. Good luck!
Public hg and jj are just a front-end to git. No virtual file system overlay or anything like that. Meta has open sourced many of the components of sapling, but there is no plumbing to put it all together in the same configuration.
That's the max you can statically allocate in the BIOS. It's best to leave that at the minimum (500 MB I think), and let the drivers dynamically allocate. You can use up to about 120 GB on Linux.
I was sad in a different way. I immediately realized that this could be solved by dynamic programming by computing the recurrence F(x,y)=F(x-1,y)+F(x,y-1) with the base case F(0,0)=1 and F(x,y)=0 if x<0 or y<0. The problem is that I immediately jumped to generating functions as a tool to solve this. I defined G(u,v)=\sum_x \sum_y F(x,y) u^x v^y. After maybe ten minutes of manipulation I arrived at the closed form for G(u,v)=1/(1-u-v). At this point I recognized its series expansion and its coefficients are just given by the binomial theorem.
I feel sad because I had forgotten the simple and intuitive construction of choosing “go down” and “go right” directions. When a person learns more advanced mathematics, it is often the case that the person just applies such advanced mathematics by rote without realizing that a solution can be found with more elementary mathematics and more creativity. It reminded me of the time in middle school before derivatives were taught, when my teacher reminded me that using derivatives to solve a problem would receive no credit.
There is nothing wrong in using generating functions. A very handy and powerful tool. I wish I was better at it than I am.
It is a common experience in mathematical problem solving that the first solution leads to more insight which illuminates a shorter slap-my-forehead solution -- bruised forehead.
Yeah when I read a model’s chains-of-thought I have a tendency to interrupt that because it’s going down a wrong direction. But usually the end result is still fine.
reply