Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think that basic branch-sims are still useful as a kind of lowest common denominator - if the simple-minded branch sim can predict it OK, it'll almost certainly go OK on a wide variety of real CPUs. Do you disagree?

Benchmarking on a particular CPU does give you more precise results, but at the risk of them being less generally applicable.



Sure, I think it's fair to say that if cachegrind finds the branching to be predictable, that any modern processor will do fine on it. But if cachegrind predicts poor performance, I wouldn't suggest changing your code unless you've discovered that the actual performance is poor on a real processor. Overall, I think cachegrind's branch prediction is nowadays less accurate than the rule of thumb that "if there is a pattern the CPU will find it". Since using performance counters is so simple and accurate (at least on x86/x64), I'm not sure that there is a benefit to using the slower, less-accurate older method of emulation.

You response encouraged me to check whether the branch predictor in cachegrind had been updated since I last looked at it. It doesn't look like it. It's still pretty simple, about 20 lines of actual code: https://github.com/fredericgermain/valgrind/blob/master/cach...

I greatly appreciated the honesty in one of the comments: "TODO: use predictor written by someone who understands this stuff."

It is a valid question might whether the consistency from processor to processor is any better than cachegrind or a rule of thumb. My experience was that for Nehalem/Sandy Bridge/Haswell things were more same than different (and only got better), but I don't know about other lines.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: