If a human eye works at say 10 fps, then 8 minutes with a cat is about 10k image...

captaincaveman · on June 12, 2024

I'd say that was more like a single instance, one interaction with a thing.

lxgr · on June 12, 2024

But in that single interaction, you might have seen the cat from all kinds of different angles, in various poses, doing various things, some of which are particularly not-dog-like.

I vaguely remember hearing that there's even ways to expand training data like that for neural networks, i.e. by presenting the same source image slightly rotated, partially obscured etc.

naasking · on June 12, 2024

One interaction that captures a multidimensional, multisensory set of perceptions. In an ML training set, say for visual recognition, this would consist at least of hundreds of images from many angles, in different poses and varied lighting.

captaincaveman · on June 12, 2024

I don't think its analogous, I don't think we see a cat and our brain have it frame by frame adjust our synaptic weights (or whatever brains do). The whole premise of natural brains being able to learn by static images or disjointed modalities is a very clunky reductionist engineered approach we have taken.

naasking · on June 13, 2024

> I don't think we see a cat and our brain have it frame by frame adjust our synaptic weights (or whatever brains do)

I think that "whatever we do" is doing a lot of heavy lifting here. Some of those "whatevers" will be isomorphic to a frame-level analysis that pulls out structural commonalities, or close enough that it's not a clunky reductionist analogy.

captaincaveman · on June 13, 2024

When we see what we think is a cat, what we have categorised as a cat, I don't think we are looking at it from each angle and going, cat, cat, cat. I think there is an aspect of something like the 'free-energy principle' that is required to trigger off a re-assessment. So while visually we may receive 20fps of cat images, it's mostly discarded unless there is some novelty that challenges expectation.