"If data is on the stack, any references starting there disappear as soon as tha...

"If data is on the stack, any references starting there disappear as soon as that stack frame is popped, so there's no lingering work for the GC to do"

You're only considering the sweep phase. Sweep is the easy part of tracing GC - you can always chuck sweep into a background thread. The mark phase is the problem. Marking always has to trace roots, including stack roots - that's how GC works. Tracing GC never traces dead objects.

You can reduce allocation pressure by using the stack, which will make the GC run less often, but my point is that when it does run you're no better off than Java, and quite a bit worse since Java's GC can run in parallel with the mutator and Go's can't.

"Your comments strongly imply that you have no practical experience with Go's GC. I suggest you stop claiming that it has certain performance characteristics or behaviours when you have not experienced it yourself."

I have experience with GC generally. There's nothing particularly special about Go's GC: it's a standard stop-the-world mark-and-sweep collector for a language that supports a limited form of stack allocation but generally uses heap allocation. The performance characteristics that this form of GC must have are well-known.

If you want data, look at the binary-trees benchmark: http://shootout.alioth.debian.org/u32/benchmark.php?test=all...

It's mostly a test of GC. Java's GC runs more often (thus the memory use is lower) and yet it's still 4x faster. This is because Java has a generational, concurrent-incremental collector.