Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

illusion that these models are learning like children do, when in reality they are not even entirely capable of learning novel concepts

Now imagine how much would your kid learn if the only input he ever received was a sequence of words?



Are you saying it's not fair for LLMs, because of the way they are taught is different?

The difference is that we don't know better methods for them, but we do know of better methods for people.


I think they're saying that it's silly to claim humans learn with less data than LLMs, when humans are ingesting a continuous video, audio, olfactory and tactile data stream for 16+ hours a day, every day. It takes at least 4 years for a human children to be in any way comparable in performance to GPT-4 on any task both of them could be tested on; do people really believe GPT-4 was trained with more data than a 4 year old?


> do people really believe GPT-4 was trained with more data than a 4 year old?

I think it was; the guesstimate I've seen is GPT-4 was trained on 13e12 tokens, that over 4 years is 8.9e9/day or about 1e5/s.

Then it's a question of how many bits per token — my expectation is 100k/s is more than the number of token-equivalents we experience, even though it's much less than the bitrate even of just our ears let alone our eyes.


Interesting analysis, makes sense. I wonder how we should account for the “pre-built” knowledge that is transferred to a newborn genetically and from the environment at conception and during gestation. Of course things like epi-genetics also come into play.

The analogies get a little blurry here, but perhaps we can draw a distinction between information that an infant gets from their higher-level senses (e.g. sight, smell, touch, etc) versus any lower-level biological processes (genetics, epi-genetics, developmental processes, and so on).

The main point is that there is a fundamental difference: LLMs have very little prior knowledge [1] while humans contain an immense amount of information even before they begin learning through the senses.

We need to look at the billions of years of biological evolution, millions of years of cultural evolution, and the immense amounts of environmental factors, all which shape us before birth and before any “learning” occurs.

[1] The model architecture probably counts as hard-coded prior knowledge contained before the model begins training, but it is a ridiculously small amount of information compared to the complexity of living organisms.


I think that's all fair that both LMMs and and people get a certain (even unbounded) amount of "pretraining" before actual tasks.

But after the training people are much more equipped to do single-shot recognition and cognitive tasks of imagery and situations they have not encountered before, e.g. identifying (from pictures) which animals is being shown, even if it is the second time of seeing that animal (the first being shown that this animal is a zebra).

So, basically, after initial training, I believe people are superior in single-shot tasks—and things are going to get much more interesting once LMMs (or something after that?) are able to do that well.

It might be that GPT-4o can actually do that task well! Someone should demo it, I don't have access. Except, of course, GPT-4o already knows what zebras look like, so something else than exactly that..


> I think they're saying that it's silly to claim humans learn with less data than LLMs, when humans are ingesting a continuous video, audio, olfactory and tactile data stream for 16+ hours a day, every day.

Yeah, but they're seeing mostly the same thing day after day!

They aren't seeing 10k stills of 10k different dogs, then 10k stills of 10k different cats. They're seeing $FOO thousand images of the family dog and the family cat.

My (now 4.5yo) toddler did reliably tell the difference between cats and dogs the first time he went with us to the local SPCA and saw cats and dogs that were not our cats and dogs.

In effect, 2 cats and 2 dogs were all he needed to reliably distinguish between cats and dogs.


> In effect, 2 cats and 2 dogs were all he needed to reliably distinguish between cats and dogs.

I assume he was also exposed to many images, photos and videos (realistic or animated) of cats and dogs in children books and toys he handled. In our case, this was a significant source of animal recognition skills of my daughters.


> I assume he was also exposed to many images, photos and videos (realistic or animated) of cats and dogs in children books and toys he handled.

No images or photos (no books).

TV, certainly, but I consider it unlikely that animals in the animation style of pepper pig helps the classifier.

Besides which, we're still talking under a dozen cats/dogs seen till that point.

Forget about cats/dogs. Here's another example: he only had to see a burger patty once to determine that it was an altogether new type of food, different from (for example) a sausage.

Anyone who has kids will have dozens of examples where the classifier worked without a false positive off a single novel item.


So a billion years of evolutionary search plus 20 years of finetuning is a better method?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: