Those AI accelerators aren't really "full CPUs", since there's no cache coherence, really. They're tiny 32kB memory slabs + decoder + ALUs + networking to connect to the rest of the FPGA.
But its certainly more advanced than a DSP slice (which was only somewhat more complicated than a multiply-and-add circuit).
-------
I guess you can think of it as a tiny 32kB SRAM + CPU though. But its still missing a bunch of parts that most people would consider "part of a CPU". But even a GPU provides synchronization functions for its cores to communicate / synchronize together with.