> Astute readers will note what’s been missed here. I’m not astute enough to see...

Otterly99 · 2026-03-02T12:44:06 1772455446

If I'm not mistaken, BERT is a classifier (enters text, outputs labels) so it is not a "Language model", as it cannot be used for text generation.

krisoft · 2026-03-02T17:10:04 1772471404

The abstract of the original BERT paper starts with these words: "We introduce a new language representation model called BERT, [...]" The paper itself contains the phrase "language model" 24 times.

It might not be considered a language model today, but it was certainly considered one when it was originally published. Or so it would seem to me. Maybe there is a semantic shift which happened here?