Golang dev.ssa branch merged into tip

nikolay · on March 2, 2016

https://en.wikipedia.org/wiki/Static_single_assignment_form

UniQP · on March 2, 2016

After libfirm [1], the Go compiler now seems to be the second mature compiler with an SSA-based backend. I hope more compilers will follow.

I missed some references to corresponding papers in the source code. For instance, is the register allocator an implementation of "Linear Scan Register Allocation on SSA Form" [2]?

Also some facts mention in comments [3] are a bit scary but I guess this will be resolved over time.

[1] http://pp.ipd.kit.edu/firm/

[2] http://www.christianwimmer.at/Publications/Wimmer10a/Wimmer1...

[3] https://github.com/golang/go/blob/master/src/cmd/compile/int...

electrum · on March 2, 2016

According to Wikipedia, SSA is used extensively by GCC, LLVM and every major JIT: https://en.wikipedia.org/wiki/Static_single_assignment_form#...

UniQP · on March 2, 2016

Yes, in the middle end, but they don't have an SSA-based back end (for example SSA-based register allocation).

DannyBee · on March 2, 2016

Most of LLVM's backend is SSA, actually. At some point it lowers out of SSA, but you have to do that at some point anyway.

It happens that LLVM does instruction selection and scheduling on SSA but not regalloc (it does phi elimination before regalloc, though uses ssa based liveness info in regalloc)

My take: Who cares? If someone wants to make register allocation SSA based, and demonstrates code or speed or maintenance or whatever benefits, great.

There are theoretical benefits, but in practice, LLVM does pretty well with it's current scheme.

Because of this, it's not really near the top of any todo list, nor should it be.

UniQP · on March 2, 2016

> At some point it lowers out of SSA, but you have to do that at some point anyway.

libfirm never goes out of SSA (since it uses a graph-based representation it's simply impossible: the data dependency edges need one target). It converts the programm to CSSA and assigns the same register to all phi operands. When finally emitting the program, phi nodes emit nothing.

DannyBee · on March 2, 2016

Okay, sure, this is pretty much the same thing for a graph based IR. You can argue this isn't a form of lowering, but in practice, it doesn't matter.

Note that your claim "(since it uses a graph-based representation it's simply impossible: the data dependency edges need one target)"

is wrong :)

SSA also requires that things have a single, unique, reaching definition.

The fact that the graph edges have one target does not guarantee this (it is instead a representation that requires this for correctness) Open64 is another compiler with factored use-def chains like this (though the rest is not graph based), and has the same issue -

If you hoist something to various valid points (IE across existing live ranges), it's possible to generate multiple reaching definitions. This is because the dataflow definition of reaching definitions is not "whatever is on the end of this graph edge". So the fact that the graph edges only point to one of those definitions just makes the graph edges wrong.

So it's not impossible, but libfirm avoids it or performs the necessary phi insertion in various places to avoid creating invalid SSA for the representation it has.

(Note: It is possible to have IR's where the only ordering is a data dependence ordering, and so the graph is the source of truth and where it points is where it points. There are also representations where the control flow is implied by the graph nodes. In these representations, what you say would be correct. From what i know of libfirm, it has basic blocks and control flow edges, and it uses that to give some ordering. In this world, it's possible to screw up the SSA properties with pretty simple operations, and make the graph no longer correct)

UniQP · on March 2, 2016

> So it's not impossible, but libfirm avoids it or performs the necessary phi insertion in various places to avoid creating invalid SSA for the representation it has.

It's impossible to have one edge pointing to multiple reaching definitions.

DannyBee · on March 3, 2016

Of course, but I think you missed the point of what i said. Which was: It's possible to get into situations where that that node could have multiple possible reaching definitions, even if it only points to one at a given time. That is still not SSA. In that case, the edge is just wrong, even if it only points to one arbitrarily selected possible reaching definition.

In short: Your argument is "it's SSA because the data structures only allow for a single reaching definition". This is not right. If i don't insert phi nodes, and just have the IR point at random reaching definitions, it's not magically SSA, because there is not actually a unique reaching definition for each point, it just happens you've pointed the edges at arbitrarily selected reaching definitions (IE made the graph wrong :P)

UniQP · on March 2, 2016

> There are theoretical benefits, but in practice, LLVM does pretty well with it's current scheme.

I think having a decoupled spilling phase is nice feature of the SSA-based scheme. However, I don't say that LLVM (or GCC) should switch to the SSA-based scheme. I just appreciate that the Go compiler switched to the SSA-based scheme, because it may bring some new insights to the research community.

DannyBee · on March 2, 2016

SSA is not a pre-req to a decoupled spilling phase :)

In fact, the guy who wrote LLVM's current register allocation did his thesis on decoupled SSA register allocation - http://www.theses.fr/2012ENSL0777

But he still decided not to go that way in LLVM when decoupling regalloc from spilling.

"because it may bring some new insights to the research community."

No offense, but this seems pretty unlikely. It's been 10 years and the number of production optimizing compilers that use SSA based regalloc (for example) is still hovering close to 0.

Sad (because i helped fund a ton of that research), but true.

The same thing is sadly true of libfirm. While a really impressive piece of work, it hasn't spurred much that i can see ...

(FWIW: I had the same hopes)

pklausler · on March 2, 2016

I've written a graph-coloring register assigner that stayed in SSA form in an optimizing compiler for the Cray X1 and BlackWidow architectures. The key idea is to treat each phi function as being a copy, perhaps coalescable, located at the foot of each of the block's predecessors. Works great.

DannyBee · on March 2, 2016

Explicitly inserting copies is lowering :)

goldfire · on March 2, 2016

So... not been closely following development on Go. What does this mean?

vastbinderj · on March 2, 2016

It is the new backend for the Go Compiler

See: https://docs.google.com/document/d/1szwabPJJc4J-igUZU4ZKprOr...

gansai · on March 2, 2016

So, currently Stage1 is complete. 2 more to go

tux1968 · on March 2, 2016

From the referenced doc:

  I’m not entirely convinced that stages 2 & 3 are
  necessary, maybe we stop at stage 1.  It all depends on
  what optimizations we’d like to do that can’t be done
  because the IR is in the wrong form.  Proceeding with
  stages 2 and 3 might gain some efficiency in the compiler
  itself because then we don’t have to generate the old IR
  at all.  I suspect that effect will be small, however.

fintler · on March 2, 2016

Just a guess: it's probably static single assignment for the IR.

https://www.cs.cmu.edu/~fp/courses/15411-f08/lectures/09-ssa...

iofj · on March 2, 2016

It appears it's a new way to do code generation. So any adding of new architectures is dead in the water until this is resolved.

https://groups.google.com/forum/#!topic/golang-dev/zz-gWGXFZ...

enneff · on March 2, 2016

Actually, the opposite. The merging of the branch means that people are now free to continue work on adding new architectures.

mattetti · on March 2, 2016

Did the compiler get even slower on tip? I know that was a concern with this change

grey-area · on March 2, 2016

10% slower, which isn't too bad, let's hope they can now focus on improving compilation speed, I miss go 1.4 having recently upgraded. It would be really nice to see that actually improve for 1.7 instead of regressing.

zamalek · on March 2, 2016

SSA, once you understand it, is easier to work with than almost all other forms of instruction sets. I'd argue that it would only accelerate new architecture in the long-run.

I'm interested in why LLVM was disqualified. Was it simply never considered or is it incompatible with the Go type system, calling convention, etc.?

vruiz · on March 2, 2016

> I'm interested in why LLVM was disqualified.

They simply used what they knew best:

> If step one had been "learn the GCC or LLVM toolchains well enough to add segmented stacks", I'm not sure we'd have gotten to step two.

> Honestly, if we'd built on GCC or LLVM, we'd be moving so slowly I'd probably have left the project years ago.

https://news.ycombinator.com/item?id=8817990

enneff · on March 2, 2016

Even today we can still re-build the entire tool chain and standard library in one minute on a modest machine.

If we were using LLVM...

tinco · on March 2, 2016

It's rather easy to add a calling convention to LLVM. If I would have to guess it would be that they thought LLVM was too slow for them. They said from the start compilation speed was a big point for them.

Also LLVM requires you to either write your IR in SSA or add another expensive optimisation pass to make it SSA (mem2reg). Perhaps they thought writing an SSA generator would be too much of a headache.

witty_username · on March 2, 2016

The LLVM docs say that clang uses mem2reg for mutable local variables, so it can't be very slow.

From the end of

http://llvm.org/docs/tutorial/LangImpl7.html#memory-in-llvm

> Proven and well tested: clang uses this technique for local mutable variables. As such, the most common clients of LLVM are using this to handle a bulk of their variables. You can be sure that bugs are found fast and fixed early.

alexanderstocko · on March 2, 2016

In addition to what others said, support for precise garbage collection in LLVM was not ideal at the time. The experimental gc statepoint extension spearheaded by the Azul guys is trying to change that. They've been working on it publicly since late 2014.

gillianseed · on March 2, 2016

Improved optimizations which will also be easier to implement, with the result being faster and also typically smaller code.

rurban · on March 2, 2016

And typically longer compile times. Most variables are split into ("phi") variants, for each assignment, and many more costly optimization steps are now possible.

gillianseed · on March 2, 2016

True, an increase in optimizations will likely mean longer compile times, on the other hand, with better optimized code, the compiler itself (as it's written in Go) will also perform better, which may negate some of the increase in compile time.

jlouis · on March 2, 2016

One of the alluring things of SSA form is that many optimizations are much faster to execute on the form. The costly part is to raise the SSA form in the first place which in the standard implementation requires one to build a costly dominator tree.

You don't need to add every optimization known to man to a compiler, so you can sometimes keep a few of the important ones and then skip every other optimization. A priori, I'd guess SSA would speed up the compiler, which means you end up having a better budget for the more expensive optimizations.

UniQP · on March 2, 2016

As stated in [1] they use a variant of "Simple and Efficient Construction of Static Single Assignment Form" [2], which does not require a dominator tree (or a liveness analysis).

[1] https://github.com/golang/go/blob/master/src/cmd/compile/int...

[2] http://pp.info.uni-karlsruhe.de/uploads/publikationen/braun1...

qznc · on March 2, 2016

I think Wirth with his Pascal compiler had this as a rule. If you added an optimization (which takes additional time), it must speed up the compiler enough that compilation times are not longer.

pklausler · on March 2, 2016

It depends on the compiler. The great thing about SSA conversion is that you only have to do it once, whereas a classical def/use-chaining analysis may have to be redone many times in an old-style optimizer.

SSA is a conversion of the program into a representation of its data flow. I've used it in multiple compilers and found it to be a big win for easing other analyses (e.g., induction variable recognition become trivial) and reducing bugs due to the update problem.

mangeletti · on March 2, 2016

Does this change also pave the way for future changes that would make Go programs to run faster (e.g., easier to make optimizations during compile time, something else)?