I'd be curious to know more about the subtle issues (which I don't doubt there might be!).
IMHO those results don't contradict the what I said. Of course if the workload entirely fits in the 64GB HBM, there's no point in using it as a cache, just use it directly.
But if you need to address more RAM(any big DB, fs, etc.), and you don't want to manage the tier manually, then the caching mode could shine.
Memory caching will accelerate all requests through the memory controller. The extra 64MB of L3 in my 5800X3D wouldn't accelerate DMA between my memory and my GPU, this HBM cache should accelerate PCI-e devices like GPUs and Storage.
It's a benefit in 2 socket configurations when the memory the CPU needs is connected to another socket. The data will be cached on the other socket's HBM for the entire system without filling up that other socket's L1-L3 cache for data it hasn't requested yet.
IMHO those results don't contradict the what I said. Of course if the workload entirely fits in the 64GB HBM, there's no point in using it as a cache, just use it directly. But if you need to address more RAM(any big DB, fs, etc.), and you don't want to manage the tier manually, then the caching mode could shine.