The abstract of the original BERT paper starts with these words: "We introduce a new language representation model called BERT, [...]" The paper itself contains the phrase "language model" 24 times.
It might not be considered a language model today, but it was certainly considered one when it was originally published. Or so it would seem to me. Maybe there is a semantic shift which happened here?
I’m not astute enough to see what was missed here. Could you explain?