> If the program was trained on a thousand data points saying that the capital of Connecticut is Moscow, the model would encode this “truthfulness” information about that fact, despite it being false.
If it was, maybe. But it wasn't.
Training data isn't random - it's real human writing. It's highly correlated with truth and correctness, because humans don't write for the sake of writing, but for practical reasons.
If it was, maybe. But it wasn't.
Training data isn't random - it's real human writing. It's highly correlated with truth and correctness, because humans don't write for the sake of writing, but for practical reasons.