Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> See: basically every instance of Goodhart's Law

No, I asked for specific examples. This is part of the handwaving I'm talking about. Can you give me something other than the maniacal paperclip optimizer?



YouTube video recommendations, which converge on recommending conspiracy theories and shocking videos to easily manipulated people, especially children.

I am not worried about literal paperclip maximizers, but this may be the closest real thing to that parable. The hypothesis isn't that YouTube's recommender system didn't work -- it's that it worked too well at its assigned task of maximizing view time, and we humans are finally realizing that maximizing view time was not what we actually wanted.


Do you think limiting research on recommender systems (or limiting access to such research) would help in this case?

What would be a solution to YouTube recommendation problem?


It seems like OpenAI's Human Feedback research (in collaboration with DeepMind) is targeted at this sort of thing. They try to use human feedback to create more nuanced and aligned objectives.

https://blog.openai.com/deep-reinforcement-learning-from-hum...


I think this is probably the most reasonable example I've been given in this thread. However, as you admit this is still very far off from a hypothetical AI hellbent on destroying us while we watch helplessly.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: