Oh, I actually meant to link to this paper: https://arxiv.org/pdf/2205.10487.pdf...

Oh, I actually meant to link to this paper: https://arxiv.org/pdf/2205.10487.pdf

They make an argument that there might exist an unfortunate dup/unique data ratio in a dataset, where a model decides to memorize a frequently repeated chunk of data which is big enough to justify accuracy degradation happening for the rest of the data, but not too big to make memorization difficult (section 5.1). The degradation they show is substantial - almost as if going from 800M to 400M model.