I'm surprised the decent solution to this isn't more widely known. People have mentioned Occam and Stackless Python; both interesting. But their ancester is Hoare's CSP and other descendant have included Squeak (not the Smalltalk relation), Newsqueak, Plan 9's Alef, Inferno's Limbo, and now libthread.
Channels with co-operating threads are easy to reason about. See Russ Cox's overview page http://swtch.com/~rsc/thread/ for more.
Stackless python does absolutely nothing to help with scaling applications to multiple cores. It allows you to write asynchronous applications to better utilize a single processor for operations that depend heavily on IO (or otherwise waiting for some resource).
I wasn't pushing Stackless, just saying that others have mentioned it and its ancestory has something in common with what I'm trying to sell; channels as a synchronisation method. See the references I gave for more details.
Channels with co-operating threads are easy to reason about. See Russ Cox's overview page http://swtch.com/~rsc/thread/ for more.