Stable Diffusion takes 3.9 seconds to produce 8x 256x256 images (so about 0.5 seconds each), or 2 seconds to produce 1x 256x256 images. That's with DPM++ 2M Karras sampling with 20 iterations on a 3090.
I was just referring to the article with those numbers. See the "results" section, where they talk about their hardware and such, for apples-to-apples comparison.