Implementation and Sustainability
Hardware: Gemini 3 Pro was trained using Google’s Tensor Processing Units (TPUs). TPUs are
specically designed to handle the massive computations involved in training LLMs and can speed up
training considerably compared to CPUs. TPUs often come with large amounts of high-bandwidth
memory, allowing for the handling of large models and batch sizes during training, which can lead to
better model quality. TPU Pods (large clusters of TPUs) also provide a scalable solution for handling the
growing complexity of large foundation models. Training can be distributed across multiple TPU devices
for faster and more efficient processing.
When I worked there, there was a mix of training on nvidia GPUs (especially for sparse problems when TPUs weren't as capable), CPUs, and TPUs. I've been gone for a few years but I've heard a few anecdotal statements that some of their researchers have to use nvidia GPUs because the TPUs are busy.
I assume that's a Gemini LLM response? You can tell Gemini is bullshitting when it starts using "often" or "usually" - like in this case "TPUs often come with large amounts of memory". Either they did or they didn't. "This (particular) mall often has a Starbucks" was one I encountered recently.
It's not bullshit (i.e., intended) but probabilities all the way down, as Hume reminded us: from observations, you can only say the sun will likely rise in the east. You'd need to stand behind a theory of the world to say otherwise (but we were told "attention is all you need"...)
So google doesn't use NVIDIA GPUs at all ?