Hacker Newsnew | past | comments | ask | show | jobs | submit | d_burfoot's commentslogin

> So just manual memory management with extra steps

This is actually the perfect situation: you are allowed to do it carefully and manually for 1% of code on the hot path, but you don't have to worry about it for the 99% of the code that's not.


I don't disagree with these principles, but if I wanted to compress all my programming wisdom into 5 rules, I wouldn't spend 3 out of the 5 slots on performance. Performance is just a component of correctness : if you have a good methodology to achieve correctness, you will get performance along the way.

My #1 programming principle would be phrased using a concept from John Boyd: make your OODA loops fast. In software this can often mean simple things like "make compile time fast" or "make sure you can detect errors quickly".


Gah, don't take advice about doing a PhD from the dude who had the best possible academic experience! The vast majority of people who've gone through the PhD grinder have had radically worse outcomes than Karpathy. It's like taking advice about starting a cult from Joseph Smith.

(This is not to say you shouldn't do it. Just get info and advice from a less biased source).


The business plan makes sense to me. They are a company that is focussed specifically on building AI data centers, which is a huge part of the economy at the moment. The big cloud players know about generic data centers, but there are likely big efficiency wins to be gained by specializing on AI. There is also the geopolitical angle: European countries (and others!) will likely trust a UK-based company more than one of the American BigCos. NVidia is a great partner and investor for them: NScale will buy billions worth of NVDA chips, and also send information and learnings about the unique needs of the market to the chipmaker.

That being said, financial engineering tricks like depreciation and tax sheltering are of course hugely important in the global economy. It's likely that NVDA has a lot of cash sitting in Europe that it doesn't want to repatriate because it would have to pay taxes on it.


It makes sense till it doesn’t. History tells us that suppliers funding demand like this tends to end badly.


> there are likely big efficiency wins to be gained by specializing on AI

I've seen this suggested before as well, but I don't think I've heard a lot of actual concrete things. What big efficiency wins are to be had specializing on "AI" datacenters as opposed to what the past mega hyperscalers have done? What techniques seem to be out there that cloud providers and others have slept on? What makes them so different, in terms of operating a datacenter?

I'm genuinely asking.


I know nothing about the inner workings of a large scale AI datacenter but I'd imagine the power and cooling requirements are more specialised than your average datacenter that mostly has to handle transmitting large amounts of data over the internet (not that that isn't computationally expensive, I just imagine LLMs (especially at the current scale of their deployment) are much more demanding)


> I'd imagine the power and cooling requirements are more specialised than your average datacenter

But are they actually doing things differently than the high compute parts of the hyperscaled datacenters? Are there radical new ways of distributing heat in the datacenter that only makes sense at that level of energy usage per square foot? Is AI energy use that much higher per square foot of other high-compute parts of datacenters, or is it just that its now something like 90% of the floor plan versus maybe only 50-60%?

> handle transmitting large amounts of data over the internet

I certainly can't speak for all datacenters, and I've never been in a hyperscaler datacenter. But of all the datacenters I've spent time in, the space for the outside network connectivity was rather small compared to the rest of the space for storage and compute. Think a few small office suites dedicated to outside networks coming in and connecting to the clients in the datacenter compared to a medium to large sized warehouse full of compute and storage.


There's "high compute", and then there's proper HPC. AI these days is way more on the HPC end of the scale. The GPUs are doing computations using 2-bit and 4-bit numbers and not 64-bit, but everything else is going to be comparable.


Interesting historical anecdote: the Swiss became the world's best watchmakers because, in Protestant Geneva under the leadership of John Calvin, jewelry was banned as ostentation. But you were allowed to wear a watch - it was important to get to church and work on time - so people starting wearing expensive watches instead of jewelry.


Interesting. It's my understanding that in a similar vein, one of the reasons Belgium got so good at making tasty, stronger beers because wine and spirits were banned in the past.

It's not quite 'necessity' being the mother of invention, but perhaps desire.


Not only that! You know the typical image of a witch? With a pointy hat and a giant round-shaped pot?

Those were illegal beer brewers in Belgium! Women would put their pointy hat on their door as a sign that there might be, if you ask nicely, some beer for you to buy there.


combining the previous two comments, I once heard that there are chips/french fries all over the ground in Belgium because if you ask a Belge the time, they look at their wristwatch, immediately inverting the container of fries they are undoubtedly carrying.

Not sure about Belgian potato or watch branding.


It's a bad idea to phrase advice as "Don't Do X", for most values of X that are often undertaken:

- Don't move to Detroit

- Don't go into academia

- Don't use dating apps

- Don't buy Google stock

It's most obvious for the last one: you should buy Google (or any other) stock if you think it's underpriced and sell it if you think it's overpriced. But even for the other advice, a kind of Efficient Market Hypothesis holds. If there were a massive exodus of people from academia, causing universities to increase salaries and reduce administrative burdens, going into academia might be great for the right people. For many people Detroit is a terrible city, but I know a guy who worked for the Tigers, and bought a large house for a small amount of money, and did a lovely job renovating it, so Detroit worked well for him.

Life is all about finding underpriced value: options that you will appreciate more than others, for whatever reason.


These kinds of stories may seem silly to some (certainly it would seem silly to my past self), but I think these narratives of personal journeys are going to become more and more important to humanity as AI and automation take over most jobs.


> they mimic and amplify the inherent racism present in their own training data

LLMs turn out to be biased against white men:

https://www.lesswrong.com/posts/me7wFrkEtMbkzXGJt/race-and-g...

> When present, the bias is always against white and male candidates across all tested models and scenarios. This happens even if we remove all text related to diversity.


Important sentences immediately before the ones you quote.

> For our evaluation, we inserted names to signal race / gender while keeping the resume unchanged. Interestingly, the LLMs were not biased in the original evaluation setting, but became biased (up to 12% differences in interview rates) when we added realistic details like company names (Meta, Palantir, General Motors), locations, or culture descriptions from public careers pages.


Hah. Even LLMs know Meta and Palantir are evil af.


These are because of post-training. You have to give it such directives in post-training to correct the biases they bring in from scraping the whole internet (and other datasets like books, etc.) for data


Looking at the paper, the effect is significant but weak (5-7%), even with the conditionals that magnify the effect. I would be curious to see the effect if this experiment were performed on a slightly different categorical variable (e.g. how are two white ethnicities treated). I do think its bad if preferences are "baked in" to the default though - prompting them away seems like a bad solution.


That's not a reliable source.


I mean this seriously: we need more cults.

Cults have been viciously slandered by mainstream information sources, often because lurid cult stories generate clicks and headlines. Of course some cults are abusive, just like some marriages are abusive. But we still think marriage is good in general.

If you think all cults are bad, you're implicitly against all religion, since every mainstream religion was once a cult. Being anti-cult is also profoundly un-American. America was built by cultists. Freedom of religion is literally the first principle stated in the Bill of Rights.

A cult is really just a professionally managed social environment. If you trust professionals like lawyers, doctors, or teachers with their respective duties, there's no reason in principle you shouldn't trust a cult leader to manage your social environment for you. Of course you should vet them, ask about their reputation, etc.


Kolmogorov Complexity is only defined up to a constant, which represents Turing machine translation length.


I guess we need to guesstimate the length of a shortest Turing machine implementation of amd64 then?


This is cool. No need to guesstimate, it could be a world record category.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: