I've written a graph-coloring register assigner that stayed in SSA form in an optimizing compiler for the Cray X1 and BlackWidow architectures. The key idea is to treat each phi function as being a copy, perhaps coalescable, located at the foot of each of the block's predecessors. Works great.