I wrote something up in a previous post. I'm no crypto-expert. But I did study i...

l33t2328 · on April 23, 2022

> even Hash-functions should be largely built out of invertible operations

Why is that? I don’t understand why you should minimize the loss of entropy at any particular step. Is it to help resist collisions?

> I could explain why but that gets more complicated.

If you’d care to go into more detail, I’d love to hear it. It was my understanding that the AES S-box was the result of some nonlinear transformation of the input bytes.

dragontamer · on April 23, 2022

> Why is that? I don’t understand why you should minimize the loss of entropy at any particular step. Is it to help resist collisions?

SHA 256 is built out of 64 rounds of an internal mixing function.

That is, SHA256 is basically a glorified mix(mix(mix(mix...(mix(internal_state))))))))))))))) function, with 64x mixes applied to the internal data per input.

Now consider how "mix" should be designed.

1. If "mix" is invertible, it means that every possible input "pigeonholes" to exactly one possible output. Mathematically, we'd call invertible functions "one-to-one" and "onto".

2. If "mix" is not invertible, then the above condition does not hold.

-------------

So lets look at the simplest non-invertible function for 3-bit numbers as an example. And see what happens when we use it as a "mixing" function.

    Input | Output
    --------------
    000   | 001
    001   | 010
    010   | 011
    011   | 100
    100   | 101
    101   | 110
    110   | 111 <----
    111   | 111 <----

Notice that 110 and 111 both "map" to 111. So what happens when we run mix(mix(mix(mix...(data)))) ??

Eventually, after about 8 rounds, all your data turns into "111" eventually. You're "losing" information. It means that everyone can predict that your hash probably will be 111 eventually.

Now lets consider the invertible function:

    Input | Output
    --------------
    000   | 001
    001   | 010
    010   | 011
    011   | 100
    100   | 101
    101   | 110
    110   | 111 
    111   | 000 <----

000 was "unchosen" in previous function, and now we have one-to-one and onto . invertibility. As such mix(mix(mix(mix(...(data)))) will 100% depend on the input data.

Of course, this mixing function is simply "(data+1) modulo 8", which is insufficient for mixing purposes. (The "confusion" is pretty bad and very predictable, but you can see how "add +1" is actually kind of good at diffusion when "all the carries" line up... such as 011 -> 100 is changing a lot of bits from one mix). But you can imagine that even in this simplest example, the invertible function leads to a less-predictable internal state than the non-invertible function.

---------

Functions should also be "nonlinear". The above x+1 mixing function is bad because its linear (5x mixing functions can be simplified into x+5). It turns out that combinations of XOR, bitshifts, and ADDs could be non-linear, but you need to run a bunch of tests over the inputs/outputs to be sure that your particular mixing function is sufficient.

xmprt · on April 23, 2022

What differentiates a non-crypto hash from a crypto hash. Is there a fundamental difference between the two which prevents one from being used for cryptographic purposes?

dragontamer · on April 23, 2022

A non-crypto hash is run enough to confuse "simple" statistical tests.

A crypto-hash is run enough times to confuse "difficult" statistical tests from a dedicated adversary.

For example, the JOAAT hash is a simple add / shift / XOR hash run 1x round per byte. A singular round is all you need to confuse the "simple" statistical tests such as birthday attacks, dice rolls, etc. etc.

SHA-256 is also a simple add / shift / XOR hash, albeit using 256-bits at a time instead of 8-bits at a time. It is a 64-round hash function. That is: it is round(round(round(round(data)))) 64-times deep.

--------

If you wanted to turn JOAAT non-crypto hash into a crypto-hash, a good first step would be to run JOAAT(JOAAT(JOAAT...() maybe 64x or 128x over the data.

Obviously, you'd need to run cryptoanalysis over the end result, and also test for non-linearity and other such properties. But "large number of rounds" is probably the most obvious difference between non-crypto and crypto hashes.

---------

In the 90s, functions like AES were designed with "maximum confusion / diffusion per round". In just 10 rounds, the bits are mixed up enough that it defeats linear and differential cryptography.

I don't know which cipher "changed the meta", but maybe it was the XTEA cipher? Since XTEA was so simple, they knew it would need many-many rounds to defeat linear and differential crypto. So they just did 64-rounds and called it the end of it.

----------

So you can see, a "crypto" hash is effectively a regular old hash done an absurd number of times to defeat the most incredible statistical-tests known to man... done under the assumption that the "opponent" is performing the most difficult statistical tests.

l33t2328 · on April 23, 2022

Yes! We’d like our cryptographic hash functions to be collision resistant and preimage resistant.

That is, we’d like it to be hard to generate 2 different messages m1 and m2 where the hash of m1 is equal to the hash of m2, and we’d also like for it to be hard to compute any function of the message m(except the hash of m) if you’re given only the hash of m.

Non cryptographic hash functions don’t require these properties, and in fact some hashing algorithms used for data mining are designed to, for example, map near inputs to near outputs.