The article was pretty over my head, but does your argument hold to the various ...

xamuel · on Oct 21, 2020

The argument isn't so much about the type of agent (which I think is what Neural Turing Machines etc. are about), it's about the type of environment. In traditional Reinforcement Learning, environments give real-number-valued rewards (or even rational-number-valued rewards which is even more constrained). Presumably this was a decision that was made with hardly a second thought because real numbers are most familiar to people... but such "appeals to familiarity" are totally irrelevant for such an alien field as AGI :) The point of the paper is that a genuine AGI should be able to comprehend environments that involve rewards with a more sophisticated structure than can be accurately represented using real numbers.

visarga · on Oct 22, 2020

There are multi-reward agents and multi-task agents but all rewards get added into a final scalar value. And gradient based methods need to have this one scalar value to derive gradients from.

Since you mentioned higher dimensional representations for rewards, I want to remind that the sub-fields of Inverse RL and Model-based RL are concerned with reward representation and prediction by neural nets.

Also, it doesn't seem like a good idea to try to disprove an entire field with a purely theoretical (a-priori) argument. There should be at least some consideration given to the state of the art in the field.

xamuel · on Oct 23, 2020

I make no attempt to disprove an entire field, rather to question an implicit assumption, namely that real number rewards are flexible enough to capture all relevant environments an AGI should be able to comprehend. Quite the opposite, I indicate ways reinforcement learning could be modified to get past the roadblock I point out with real numbers.

ssivark · on Oct 21, 2020

IIUC, the claim is that the very idea of a (real valued) “objective function” to be “optimized” is broken?

xamuel · on Oct 21, 2020

Broken in the sense that it's not flexible enough to apply to all conceivable environments a genuine AGI could navigate, without misleading that AGI. But I should stress that real number objective functions are probably fine for many specific interesting environments, I'm not trying to say that real number objective functions are useless. Just that they aren't flexible enough to cover all environments :)

ssivark · on Oct 21, 2020

Fair enough. It would be interesting/instructive to construct (relatively simple) examples where we can see that they’re broken :-)

xamuel · on Oct 22, 2020

Easy to come up with examples using exotic money-related constructions. Suppose there's something called a "superdollar". If you have a superdollar, you can use it to create an arbitrary number of dollars for yourself, any time you want, which you can trade for goods and services. If you want, you can also trade the superdollar itself. Now picture an environment with two buttons, one of which always rewards you one dollar, and the other of which always rewards you one superdollar. Shoe-horning this environment into traditional RL, you'd have to assign the superdollar button some finite reward, say a million. But then you would mislead the traditional-RL-agent into thinking a million dollars was as good as one superdollar, which clearly is not true.

IshKebab · on Oct 21, 2020

But isn't the sophisticated structure what leads to the real number. If you change the rewards to something more complex surely you still have to pick between actions and at some point you'll have to evaluate which one is "better" and I can't see why you couldn't use real numbers to represent utility.

I mean, humans are general intelligences, and you can translate pretty much any human reward into money, which is a real number.

The paper is super long though. Maybe someone can give a TL;DR that makes sense.

xamuel · on Oct 22, 2020

Question: when you say "I can't see why you couldn't use real numbers to represent utility", does your reasoning for that have anything to do with Dedekind cuts, Cauchy sequences, or complete ordered fields? Because that's what the real numbers _are_. If your reasoning has nothing to do with these sort of things, then it can't possibly be sound because in order to argue that X has such-and-such property, you need to know what X actually _is_.

To repeat an example I posted for someone else: Suppose there's something called a "superdollar". If you have a superdollar, you can use it to create an arbitrary number of dollars for yourself, any time you want, which you can trade for goods and services. If you want, you can also trade the superdollar itself. Now picture an environment with two buttons, one of which always rewards you one dollar, and the other of which always rewards you one superdollar. Shoe-horning this environment into traditional RL, you'd have to assign the superdollar button some finite reward, say a million. But then you would mislead the traditional-RL-agent into thinking a million dollars was as good as one superdollar, which clearly is not true.

IshKebab · on Oct 22, 2020

Good example, although what if you just assigned it a reward of like 100 trillion dollars? It might not be exactly correct but then you're assuming that exactly correct rewards are required for AGI which seems like a pretty big assumption.

Actually I thought about this some more, and maybe money wasn't the best example, but I think there must be some internal measure of utility that humans use that can be represented by real numbers.

Imagine you are presented with an array of possible actions with associated (possibly estimated) rewards. You can only pick one. Maybe there are some doors but you can only open one - behind the first is $1m, behind the second is a superdollar, behind the third is a button that cures world hunger, behind the 4th is your loving family, whatever.

As a human I can pick one. No matter what the rewards are. Even if one reward is "you essentially become God". That means I can order them, and therefore that they can be represented by real numbers (plus infinity for the god option).

I don't see why the infinity would cause an issue: the "you can now do literally anything" reward is worth more than every other reward, but it's the only one. Also it doesn't actually exist so who cares?

Actually I guess it can exist in games, e.g. God mode in Quake. But that should have an infinite reward and agents should choose it over everything else so I can't see the problem really.

eli_gottlieb · on Oct 22, 2020

>I mean, humans are general intelligences, and you can translate pretty much any human reward into money, which is a real number.

A lot of people have written quite a lot of arguments that this is false.

IshKebab · on Oct 22, 2020

A lot of people have written a lot of arguments about everything. Has anyone actually demonstrated that it isn't true?