XLA: The TensorFlow compiler framework

ktta · on Jan 9, 2017

This page has the first mention of Google TPUs since the initial announcement from Google.

Anyone know what the status is? When TPUs will be allowed to be used in Google Cloud?

I'm confused as to why they ever announced it at a big event like Google I/O rather than a paper or even a simple blog post if they aren't going to give people access to them. There's some hint of it being offered in conjunction with TF and other ML cloud offerings in the blog post[1], and this 'XLA compiler framework' looks it's related. But I'm still wondering how much time people have to wait.

[1]:https://cloudplatform.googleblog.com/2016/05/Google-supercha...

visarga · on Jan 9, 2017

> When TPUs will be allowed to be used in Google Cloud?

TPUs are only useful for prediction, not training, and for very high volumes of work where energy use is a major cost, so, it's for Google and FaceBook-type of situations. They are not designed as accelerated general purpose processors for deep learning.

Cheap inference can be done on CPUs as well, because production models can be optimized to only a small fraction of the computation (20x reductions are not impossible) by pruning neurons, weight quantization and other techniques.

halflings · on Jan 9, 2017

What leads you to think that TPUs are not useful for training?

ogrisel · on Jan 9, 2017

The aforementioned google translate paper suggests so.

My hunch is that TPUs work with a mix of 8 bit and 16 bit integer arithmetic on quantized networks with quantized data [1].

As far as I know nobody has managed to make SGD work properly with integer weights and that could be a reason why TPUs are not used for training (yet).

[1] https://petewarden.com/2016/05/03/how-to-quantize-neural-net...

I could be completely wrong though.

alfalfasprout · on Jan 9, 2017

Probably memory. Unless their ASICs have some kind of ultra fast on-chip memory. That's why GPUs still reign supreme. FPGAs have started seeing use for inference for those of us without the billion dollar budgets needed to develop ASICs.

dharma1 · on Jan 9, 2017

CPU/mobile inference - I think we will see a lot more stuff like this this year -

https://xnor.ai/

ma2rten · on Jan 9, 2017

Google Translate Paper mentions TPUs as well:

https://arxiv.org/abs/1609.08144

Seanny123 · on Jan 9, 2017

Neato! I'm surprised they went with a JIT compiler over a full-on compiler, but that might just be me not understanding: a) Compilers b) How a JIT compiler would apply to this situation

My lab-mate Jan Gosmann recently did something similar for our spiking neural network software Nengo [1]. Although it isn't Deep Learning, it also builds a computational graph of operations. He ended up optimising the layout of the operations in memory to increase the efficiency of Numpy operations and reduce the amount of time spent in Python. He's in the process of writing a paper about it.

[1] https://github.com/nengo/nengo/pull/1035

BenoitP · on Jan 9, 2017

From my limited understanding:

JITs can take into account the actual data being processed. And most importantly here, its size.

Knowing the size will help with making chunks of work fit into L1, L2, L3 caches. Creating sensible SIMDs operations. Choosing what goes into a warp.

Also, sometimes it is better to rematerialize a computation than having stored it; and the threshold to do it depends on the space and computing costs.

Languages like Halide [1] let you hand-tune this threshold. I guess this is the kind of work XLA does here.

[1] http://halide-lang.org/

emu · on Jan 9, 2017

Precisely. Just-in-time compilation allows the compiler to specialize the generated code for the shapes of the Tensors that appear at run time. This allows us to generate better code.

XLA also has an experimental ahead-of-time mode, which we think will be particularly interesting for some production and mobile deployments. This is all work in progress though, and we're looking forward to getting the community involved.

mafribe · on Jan 9, 2017

The compiler does not appear to be open at this point. Anybody know when this will change? Which team in Google is writing the compiler?

emu · on Jan 9, 2017

I'm on the Tensorflow team.

Unofficial answer: no promises, but it should be open-source soon. It may even be released in the next day or two. Watch this space!

emu · on Jan 10, 2017

The code is up:

https://github.com/tensorflow/tensorflow/commit/1e67c90e2cac...

https://github.com/tensorflow/tensorflow/tree/master/tensorf...

The corresponding documentation hasn't been pushed yet; I'll post a link when it is up.

Note that XLA is work in progress --- we're releasing the code early because we want to get the community involved. The GPU backend is in good shape, and improving by the day. We haven't had as much time to devote to the CPU backend, and it only has limited support for parallelism. Contributions welcome!

emu · on Jan 10, 2017

The documentation is now up too: https://www.tensorflow.org/versions/master/experimental/xla/

mafribe · on Jan 9, 2017

Can't wait ... what kind of JIT is it? Tracing? Meta-Tracing?

pilooch · on Jan 9, 2017

It bears some similarities with with Nvidia's tensorRT that is closed source.