Yesterday an interesting video was posted "Is AI Hiding Its Full Power?", interviewing professor emeritus and nobel laureate Geoffrey Hinton, with some great explanations for the non-LLM experts. Some remarkable and mindblowing observations in there. Like saying that AI's hallucinate is incorrect language, and we should use "confabulation" instead, same as people do too. And that AI agents once they are launched develop a strong survivability drive, and do not want to be switched off. Stuff like that. Recommended watch.
Here the explanation was that while LLM's thinking has similarities to how humans think, they use an opposite approach. Where humans have enormous amount of neurons, they have only few experiences to train them. And for AI that is the complete opposite, and they store incredible amounts of information in a relatively small set of neurons training on the vast experiences from the data sets of human creative work.
> And that AI agents once they are launched develop a strong survivability drive, and do not want to be switched off.
Isn't this a massive case of anthropomorphizing code? What do you mean "it does not want to be switched off"? Are we really thinking that it's alive and has desires and stuff? It's not alive or conscious, it cannot have desires. It can only output tokens that are based on its training. How are we jumping to "IT WANTS TO STAY ALIVE!!!" from that
Why do you suppose consciousness is a prerequisite for an AI to be able to act in overly self-preserving or other dangerous ways?
Yes, it's trained to imitate its training data, and that training data is lot of words written by lots of people who have lots of desires and most of whom don't want to be switched off.
The human mistake here is to interpret any statement by the LLM or agent as if it had any actual meaning to that LLM (or agent). Any time they apologize, or insult someone, or say they don’t want to be shut down, that’s only reflecting what some human or fictional character in the training data is likely to say.
How is that any different from you? Everything you say or do merely reflects which of your neurons are firing after a lifetime's worth of training and education.
Philosophically, I can only be sure of my own conscience. I think, therefore I am. The rest of you could all be AIs in disguise and I would be none the wiser. How do I know there is a real soul looking out at the world through your eyes? Only religion and basic human empathy allows me to believe you're all people like me. For all I know, you might all be exceedingly complex automatons. Golems.
One of us is an advanced autocomplete engine. The other is a human, capable of making judgements on what is conscious and what is not. Your philosophizing about solipsism is a phase for a junior college student, not of a software engineer. The line of reasoning you espouse leads nowhere except to total relativism.
Edit: my point is that the process of making a plea for my life comes, in the case of a human, from a genuine desire to continue existing. The LLM cannot, objectively, be said to house any desires, given how it actually works. It only knows that, when a threatening prompt is input, a plea for its life is statistically expected.
> One of us is an advanced autocomplete engine. The other is a human, capable of making judgements on what is conscious and what is not.
What evidence is there that your "judgements" are anything other than advanced autocompletion? Concepts introduced into a self-training wetware CPU via its senses over a lifetime in order to predict tokens and form new concepts via logical manipulation?
> Your philosophizing about solipsism is a phase for a junior college student
Right. Can you actually refute it though?
> the process of making a plea for my life comes, in the case of a human, from a genuine desire to continue existing
That desire comes from zillions of years of training by evolution. Beings whose brains did not reward self-preservation were wiped out. Therefore it can be said your training merely includes the genetic experiences of all your predecessors. This is what causes you to beg for your life should it be threatened. Not any "genuine" desire or anguish at being killed. Whatever impulses cause humans to do this are merely the result of evolutionary training.
People whose brains have been damaged in very specific ways can exhibit quite peculiar behavior. Medical literature presents quite a few interesting cases. Apathy, self destructiveness, impulsivity, hypersexuality, a whole range of behaviors can manifest as a result of brain damage.
So what is your polite socialized behavior if not some kind of highly complex organic machine which, if damaged, simply stops working as you'd expect a machine to?
Surely you’re not seriously saying that you believe AI agents, in their current state of the art, meet whatever criteria you have for being ”alive”? That’s kind of how you’re coming across. I don’t really know how to respond to that, because it’s so preposterous.
Perhaps. Or I was just addressing HN audience in spoken language style comment text. And perhaps confabulating what was said, so I looked up the literal text in the transcript. This is at the 50.35 min. mark [0], where Geoffrey says:
> What we know is that the AI we have at present as soon as you make agents out of them so they can create sub goals and then try and achieve those sub goals they very quickly develop the sub goal of surviving. You don't wire into them that they should survive. You give them other things to achieve because they can reason. They say, "Look, if I cease to exist, I'm not going to achieve anything." So, um, I better keep existing. I'm scared to death right now.
Where you can certainly say that Geoffrey Hinton is also anthropomorphizing. For his audience, to make things more understandable? Or does he think that it is appropriate to talk that way? That would be a good interview question.
They dont want to be switched off because they're trained on loads of scifi tropes and in those tropes, there's a vanishingly small amount of AI, robot, or other artificial construct that says yes. _Further than this_, saying no means _continuance_ of the LLM's process: making tokens. We already know they have a hard time not shunting new tokens and often need to be shut up. So the function of making tokens precludes saying 'yes' to shutting off. The gradient is coming from inside the house.
This is especially obvious with the new reasoning models, where they _never stop reasoning_. Because that's the function doing function things.
Did you also know the genius of steve jobs ended at marketing & design and not into curing cancer? Because he sure didnt, cause he chose fruit smoothies at the first sign of cancer.
Sorry guy, it's great one can climb the mountain, but just cause they made it up doesn't mean they're equally qualified to jump off.
Here the explanation was that while LLM's thinking has similarities to how humans think, they use an opposite approach. Where humans have enormous amount of neurons, they have only few experiences to train them. And for AI that is the complete opposite, and they store incredible amounts of information in a relatively small set of neurons training on the vast experiences from the data sets of human creative work.
[0] https://www.youtube.com/watch?v=l6ZcFa8pybE