And because the JVM does global stop-the-world garbage collection, which makes soft real-time implausible because of the unpredictability of GC affecting your actors. Erlang has per-process heaps.
Basically the Erlang VM was created for this use case while the JVM was not, and its not something you can just add with a library.
edit: Also the lightweightness of Erlang processes compared to Java threads[1] and hot code upgrades.
> And because the JVM does global stop-the-world garbage collection, which makes soft real-time implausible because of the unpredictability of GC affecting your actors. Erlang has per-process heaps.
Not quite. Every environment needs some shared memory semantics. In Erlang that's done with ETS tables, which don't undergo GC at all. Java's GCs are so good now, that, if shared data structures are used sparingly, would still give you better performance than BEAM. Plus, you have commercial pauseless GCs, as well as hard realtime JVMs.
> Basically the Erlang VM was created for this use case while the JVM was not, and its not something you can just add with a library. edit: Also the lightweightness of Erlang processes compared to Java threads[1] and hot code upgrades.
The JVM is so versatile, though, that all this can actually be added as a library, including true lightweight threads and hot code swapping (see Quasar[1]).
Nevertheless, BEAM was indeed designed with process isolation in mind, and on the JVM, actors might interfere with one another's execution more so than on BEAM, but even on BEAM you get the occasional full VM crashes. If total process isolation is not your main concern, you might find that Java offers more than Erlang, all things considered.
Java/Scala do allow you to do bad things. So add the following to Hershel's question:
"Assume developers are non-malicious and will only pass immutable objects across actor/future boundaries."
Also, I'm not that familiar with Erlang's memory model, so I might be wrong on this. But as far as I'm aware the memory for a message in Erlang is shared between threads - it's only local variables that use private memory. This means Erlang will also need some sort of concurrent garbage collector - does Erlang's version not stop the world, or at least the messaging subsystem?
> But as far as I'm aware the memory for a message in Erlang is shared between threads
Yes and no. Some large binaries (a specific Erlang data type, that can say represent a packet or block of data from disk), will be shared and reference counted when passed between processes instead of copied. They are immutable just like most datatypes in Erlang. These binaries have a specific GC algorithm that it just might take longer sometime for them to be reclaimed. But it seems all that could presumably be done via atomic updates to counters and references.
In general most messages are copied on send. So implementation wise GC is very simple then. On another level because data in Erlang is immutable, the fact that messages get copied is also an implementation detail! One could conceive another implementation of a VM that only passes references and immutable data on message send (well minus when it sends it to another machine, of course). But that would make GC a bit more tricky just like in case of those binaries.
The large binary GC is actually pretty simple too: Shared binaries are refcounted; the references are in the process heap. When the references are GCed from the process, the shared binary can be freed. The reason that sometimes it takes a long time to free, is that some types of processes will get references to a large number of binaries, but not trigger a process garbage collection, leaving lots of binaries allocated in the shared space. Garbage collection for a process is only automatically triggered when the process heap would grow, so there are some common cases which result in bad behavior: processes that don't generate much garbage on their heap, but do touch a lot of binaries (often this is request routing); processes that grow their heap to some large size doing one kind of work, but then switch to another type that doesn't use much heap space, leaving a long time between GC; processes that touch a lot of binaries but then don't do any processing for a long time (maybe a periodic cleanup task).
Another common issues is taking references to a small part of a large shared binary.
That's exactly right. Erlang has per actor heaps, and it's garbage collector only stops the actors that are being garbage collected. This is highly concurrent and is a great property to have when you're trying to keep your response times low.
Hopefully with the upcoming Spores[1] feature in Scala, Akka will be able to enforce message immutability in some form. I was at Scala Days last week and the developer behind spores gave a great talk on the sort of immutability guarantees this feature will allow. Worth watching once it's posted online.
Even just marveling at the complexity and how they got it working.
Otherwise, besides those tricks, how would you do it when you have multiple threads accessing objects on a shared heap?
Erlang's VM is another even wonderful piece of engineering. Each little process lives in its own memory heap. Then pauseless garbage collection become trivial. It has many other really cool and unique features (hot code reloading, inter-node distribution, ability to load C code, etc etc...)
Well I wouldn't say "roll-over" to the new process is exactly simple but it is a good trick though. Forking has its interesting dark cases that have to be handled. Inherited file descriptors, what happens to threads, signals and so on.
And because the JVM does global stop-the-world garbage collection, which makes soft real-time implausible because of the unpredictability of GC affecting your actors. Erlang has per-process heaps.
Basically the Erlang VM was created for this use case while the JVM was not, and its not something you can just add with a library.
edit: Also the lightweightness of Erlang processes compared to Java threads[1] and hot code upgrades.
[1]: http://i.imgur.com/hKMJ3HD.png