Fun fact: Clojure is one of the few languages in which STM is easy!
You're right in the general case - STM incurs significant bookkeeping overhead (at least a word per mutable location, usually more), which is one of the reasons it never took off. (The actual reason is the CPU overhead, which slows down memory accesses so much you've lost all the performance you might gain from parallelism.)
Clojure sidesteps this by having an incredibly small number of mutable locations. With a persistent data structures, every mutation operation already builds its own private version of the structure, and then atomically changes a pointer to point to the new version. So you've got MVCC built in, and you only need transactional metadata on one location (the ref).
You're right in the general case - STM incurs significant bookkeeping overhead (at least a word per mutable location, usually more), which is one of the reasons it never took off. (The actual reason is the CPU overhead, which slows down memory accesses so much you've lost all the performance you might gain from parallelism.)
Clojure sidesteps this by having an incredibly small number of mutable locations. With a persistent data structures, every mutation operation already builds its own private version of the structure, and then atomically changes a pointer to point to the new version. So you've got MVCC built in, and you only need transactional metadata on one location (the ref).