But in that single interaction, you might have seen the cat from all kinds of different angles, in various poses, doing various things, some of which are particularly not-dog-like.
I vaguely remember hearing that there's even ways to expand training data like that for neural networks, i.e. by presenting the same source image slightly rotated, partially obscured etc.
One interaction that captures a multidimensional, multisensory set of perceptions. In an ML training set, say for visual recognition, this would consist at least of hundreds of images from many angles, in different poses and varied lighting.
I don't think its analogous, I don't think we see a cat and our brain have it frame by frame adjust our synaptic weights (or whatever brains do). The whole premise of natural brains being able to learn by static images or disjointed modalities is a very clunky reductionist engineered approach we have taken.
> I don't think we see a cat and our brain have it frame by frame adjust our synaptic weights (or whatever brains do)
I think that "whatever we do" is doing a lot of heavy lifting here. Some of those "whatevers" will be isomorphic to a frame-level analysis that pulls out structural commonalities, or close enough that it's not a clunky reductionist analogy.
When we see what we think is a cat, what we have categorised as a cat, I don't think we are looking at it from each angle and going, cat, cat, cat.
I think there is an aspect of something like the 'free-energy principle' that is required to trigger off a re-assessment. So while visually we may receive 20fps of cat images, it's mostly discarded unless there is some novelty that challenges expectation.