XXH3, world's fastest hash algorithm, has reached stable status

segfaultbuserr · on July 28, 2020

Note to readers: This is a non-crypto hash algorithm. It's not meant to be cryptographically secure, but collision is rare enough for databases and similar applications, and is really fast. A real crypto hash like SipHash or BLAKE is already considered lightning fast (1-3 GB/s) in crypto applications, but XXH is faster than a RAM sequential read (30+ GB/s).

MR4D · on July 29, 2020

So, does this mean I can replace CRC32 with XXH ?

ncmncm · on Aug 1, 2020

Depends what you use CRC for. CRC is processed a byte at a time, but really fast things nowadays do 16 or 32 bytes at a time.

5986043handy · on July 29, 2020

Yep, Blake3 can do around 6-7GB/s https://github.com/BLAKE3-team/BLAKE3

feniv · on July 28, 2020

If you’re interested in diving deeper, the SMhasher benchmark suite tests a much more exhaustive list of hash functions (including wyhash) - https://github.com/rurban/smhasher

https://www.strchr.com/hash_functions Is also a great write up of how some of the classic hashing algorithms perform of various data sets.

https://nullprogram.com/blog/2018/07/31/ is an interesting approach on generating hashing functions automatically.

rurban · on July 29, 2020

Note that latest wyhash 1.0 is still not included into my benchmarks, it's in the wyhash branch. My benchmarking machine is still busy with fuzzing unfortunately.

As for the speed: I wouldn't claim XXH3 the world's fastest hash function at all. It's very fast for bulk digests, of large buffers. Amongst other similar hash functions. But as we know hashing mostly involves short strings, and there code size matters most. Wordwise FNV1, crc-ni and wyhash would be the fastest IMHO.

tpetry · on July 29, 2020

So if you are confused like me how someone came up with an algorithm faster than anything else currently available: This is from the mastermind behind the lz4 and zstd compression.

Akababa · on July 29, 2020

Noob question: how do they even measure performance faster than RAM sequential read?

_b · on July 29, 2020

By hashing data already in the processor's cache.

MuffinFlavored · on July 28, 2020

faster than RAM sequential read...

i wonder if anything above sse2 would offer even more performance improvements?

staticfloat · on July 28, 2020

If you look at the second table below the first, an AVX2 implementation of XXH3 gets just under a 2x speed boost.

smabie · on July 30, 2020

What about AVX512?

ncmncm · on Aug 1, 2020

Probably not: Diminishing returns. Anyway existing impls serialize two 256-bit ops.

Const-me · on July 28, 2020

I wonder how it compares to AES-NI. Technically that's not a hash, but it can be abused to be, and due to the hardware support the bandwidth is pretty good.

telendram · on July 29, 2020

AES-NI can be used as a hash, but don't assume that it makes a good job at it. Latest version of MeowHash is fine, but all other variants and prior versions have shitty collision ratios, equivalent to a ~50 bits hash : https://github.com/Cyan4973/xxHash/wiki/Collision-ratio-comp...

eggsome · on July 29, 2020

Is there a 2 axis graph that has both collision ratios and speed?

jnwatson · on July 28, 2020

In the second table, you can see the performance of MeowHash which uses AES-NI. It is comparable in performance to XXH3 with AVX2.

The_rationalist · on July 28, 2020

MeowHash (in the benchmarck table) is actually the second fastest hash for large data and use hardware accalerated AES. Still xxHash3 win everywhere, for any kind of data (it seems) and is portable.