Yes, you are correct. On an average Linux box to have a single cache line data transfer between the cores under 150ns 99% of the time is about the best that you can get. Especially if you are running stock kernel that eats this 1% and creates huge outliers ;). There is some talk though, about crazily-expensive switches with integrated FPGAs...
And the crazy expensive switch is guaranteed to be the cheap part. Now add the all the ip required to make that fpga smart enough to place orders, and you're talking huge bills in dev hours and third party licensing.
It's 1ft/ns in a vacuum, but my understanding is that it's closer to 0.5ft/ns in a wire, so what you say is doubly true. But I think HFTs are running on servers close to the NASDAQ datacenter, and that people pay buckets of money for such privileges.
Yes and No. Nearly every electronic exchange now offers co-location services (including NASDAQ).
The "serious dough" part is a little harder to quantify. I've not looked into NASDAQ specifically but server colocation is usually on the order of a couple of thousand dollars a month. This is a drop in the bucket compared to the real costs of a professional trading outfit (namely employees and margin/risk costs).
Anecdotally, it's also almost exactly what I paid for a tier 1 co-located server at my first job in a startup during the first dotcom boom.