This article seems completely off-base to me. The fact that the training method ...

This article seems completely off-base to me. The fact that the training method can't learn 'no 3 repeated moves' doesn't change my opinion of the result at all; to me this is just a technical detail, and does not retract from the essence of the result (that you can get good performance without hard-coding search). But the author harps on this point over and over as if it's a fatal flaw.

And then we get some rant about how 'it's not really learning how to plan' because it doesn't know to win in as few moves as possible. But again this just doesn't matter! The model was trained to maximize its odds of winning. You don't get to make up a new rule that you're supposed to win quickly. Totally beside the point. But somehow this observation is taken to mean that 'well it's not actually planning/reasoning/understanding. It's just imitating its training data.' Such statements are usually operationally meaningless, and the argument could just as well apply to human learning.

Point taken that the training method can't learn arbitrary rules (this is a good point, and it sounds like it needed to be made, though I didn't see the Twitter-storm myself). But the tone of this article irked me a bit.