Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
How SHA-256 Works Step-by-Step (boot.dev)
197 points by kyleShropshire on April 22, 2022 | hide | past | favorite | 55 comments


Every time one of these is posted, I’m expecting the steps to explain why they are being done. Like what makes this combination of operations have the particular properties we need?


You're not wrong. You're 100% right. This is like posting a story about how a game engine works, and having it be just blocks of assembly language with no explanation other than "here's some code multiplying a matrix, and then...".


Geez, now I want to write an article about how a game engine works, explaining the rationale for every piece of it; I don't think I've ever seen somebody do that before. It's all kind of spread around in disparate bits and pieces, often quite hard to find (or even to know that you ought to be looking for it).

Problem is that such a work would probably have to be more of a textbook, in terms of length and impenetrability. And it's not immediately obvious who the audience would be, as the vast majority of game developers these days are using pre-existing engines, rather than building new ones. But maybe there's still enough conceptual underpinnings that are worth discussing which would still be relevant for Unity/Unreal/Godot folks.


I think there is a middle ground. You can give a bird's eye view of the way things are done and a hand-wavy intuition for why they are done that way. Analogies go a long way towards building the latter. Then for those who want to dive deep, you can leave pointers to articles that cover such things in detail.

It doesn't always have to be a choice between writing a book and nothing at all because the topic is too complex. I personally have benefited from bird's eye view blog posts and mini-articles more than I can remember. If you have the knowledge then go for it, someone out there will thank you.


I think with most posts and also textbooks the problem is scope. I think the Handmade Hero project shows quite well what an amount of work it is to explain it. And there is the difference between a generic game engine like Unity and a specific engine for a specific project. And I mainly mean platform targeting and the likes. But I would read a blog post series :)


I'm sure there's way too much variability to write a whole game engine and describe every detail as "this is the best way to do it because X" rather than "this is the way we did it because personal opinion/time/money/we had no clue how it works."

When it comes to the 3d renderer alone, there's at least enough common ground to explain the big ideas. Like "we multiply matrices because GPUs can do that very well in parallel and we describe our geometry as points because we can then describe the whole image using simple coordinate transformations which boil down to matrix multiplications."


Yeah, I agree that you'd never get anywhere with a "this is the best way to do it because X" approach; it's just too divisive and often there's no single "best" answer that can be equally applied to every project.

I feel like a comprehensive article would need to be more of a catalog of "here are the basics of the various major approaches and how they differ and why you might choose one over another", rather than (for example) advocating specifically for forward shading vs. deferred vs. forward+ vs. clustered as being the universally "best" option. (especially when that whole discussion has become kind of moot for the majority of modern games developers, who mostly just take their engine's default rendering pipeline and just use it)


I wrote something up in a previous post. I'm no crypto-expert. But I did study it a bit.

https://news.ycombinator.com/item?id=30248439

Obviously, any actual crypto-experts can feel to correct me if I got history and/or understanding incorrect here.

-----

Addenendum to my previous post.

1. Confusion -- Bytes should turn into other bytes in random-looking ways. For example, the byte 0x25 may turn into 0x88. Aka: S-boxes in 90s-era ciphers.

2. Diffusion -- Bit-changes should "spread" to as many bits as possible.

3. Invertible -- Invertible operations minimizes the loss of information. Encryption/Decryption must be invertible by definition, but even Hash-functions should be largely built out of invertible operations. Try to make confusion and/or diffusion steps invertible.

4. ADD / XOR / Shift / Rotate -- These operations are the more popular way to make Invertible Confusion/Diffusion functions today.

5. SBox + Galois Fields -- For AES (a 90s-era algorithm), SBox was the source of confusion, and Galois Field arithmetic was the source of Diffusion. I could explain why but that gets more complicated.

5. Testing -- Test your functions against linear cryptography (how is the input related to the output?) and differential cryptography (how is each input bit related to each output bit on a bit-by-bit basis?)

------

Obviously, hash functions (like SHA256) must be non-invertible by the end of it all. But you want to carefully think about where the source of non-invertibility comes from, and to minimize the loss of entropy/information at any particular step.

With these principles, its not very hard to make your own hash function. I'd suggest studying Bob Jenkin's "JOAAT" hash, just-one-at-a-time. Its a non-crypto hash, but it is probably one of the simplest hashes that uses the above principles: https://en.wikipedia.org/wiki/Jenkins_hash_function


> even Hash-functions should be largely built out of invertible operations

Why is that? I don’t understand why you should minimize the loss of entropy at any particular step. Is it to help resist collisions?

> I could explain why but that gets more complicated.

If you’d care to go into more detail, I’d love to hear it. It was my understanding that the AES S-box was the result of some nonlinear transformation of the input bytes.


> Why is that? I don’t understand why you should minimize the loss of entropy at any particular step. Is it to help resist collisions?

SHA 256 is built out of 64 rounds of an internal mixing function.

That is, SHA256 is basically a glorified mix(mix(mix(mix...(mix(internal_state))))))))))))))) function, with 64x mixes applied to the internal data per input.

Now consider how "mix" should be designed.

1. If "mix" is invertible, it means that every possible input "pigeonholes" to exactly one possible output. Mathematically, we'd call invertible functions "one-to-one" and "onto".

2. If "mix" is not invertible, then the above condition does not hold.

-------------

So lets look at the simplest non-invertible function for 3-bit numbers as an example. And see what happens when we use it as a "mixing" function.

    Input | Output
    --------------
    000   | 001
    001   | 010
    010   | 011
    011   | 100
    100   | 101
    101   | 110
    110   | 111 <----
    111   | 111 <----
Notice that 110 and 111 both "map" to 111. So what happens when we run mix(mix(mix(mix...(data)))) ??

Eventually, after about 8 rounds, all your data turns into "111" eventually. You're "losing" information. It means that everyone can predict that your hash probably will be 111 eventually.

Now lets consider the invertible function:

    Input | Output
    --------------
    000   | 001
    001   | 010
    010   | 011
    011   | 100
    100   | 101
    101   | 110
    110   | 111 
    111   | 000 <----
000 was "unchosen" in previous function, and now we have one-to-one and onto . invertibility. As such mix(mix(mix(mix(...(data)))) will 100% depend on the input data.

Of course, this mixing function is simply "(data+1) modulo 8", which is insufficient for mixing purposes. (The "confusion" is pretty bad and very predictable, but you can see how "add +1" is actually kind of good at diffusion when "all the carries" line up... such as 011 -> 100 is changing a lot of bits from one mix). But you can imagine that even in this simplest example, the invertible function leads to a less-predictable internal state than the non-invertible function.

---------

Functions should also be "nonlinear". The above x+1 mixing function is bad because its linear (5x mixing functions can be simplified into x+5). It turns out that combinations of XOR, bitshifts, and ADDs could be non-linear, but you need to run a bunch of tests over the inputs/outputs to be sure that your particular mixing function is sufficient.


What differentiates a non-crypto hash from a crypto hash. Is there a fundamental difference between the two which prevents one from being used for cryptographic purposes?


A non-crypto hash is run enough to confuse "simple" statistical tests.

A crypto-hash is run enough times to confuse "difficult" statistical tests from a dedicated adversary.

For example, the JOAAT hash is a simple add / shift / XOR hash run 1x round per byte. A singular round is all you need to confuse the "simple" statistical tests such as birthday attacks, dice rolls, etc. etc.

SHA-256 is also a simple add / shift / XOR hash, albeit using 256-bits at a time instead of 8-bits at a time. It is a 64-round hash function. That is: it is round(round(round(round(data)))) 64-times deep.

--------

If you wanted to turn JOAAT non-crypto hash into a crypto-hash, a good first step would be to run JOAAT(JOAAT(JOAAT...() maybe 64x or 128x over the data.

Obviously, you'd need to run cryptoanalysis over the end result, and also test for non-linearity and other such properties. But "large number of rounds" is probably the most obvious difference between non-crypto and crypto hashes.

---------

In the 90s, functions like AES were designed with "maximum confusion / diffusion per round". In just 10 rounds, the bits are mixed up enough that it defeats linear and differential cryptography.

I don't know which cipher "changed the meta", but maybe it was the XTEA cipher? Since XTEA was so simple, they knew it would need many-many rounds to defeat linear and differential crypto. So they just did 64-rounds and called it the end of it.

----------

So you can see, a "crypto" hash is effectively a regular old hash done an absurd number of times to defeat the most incredible statistical-tests known to man... done under the assumption that the "opponent" is performing the most difficult statistical tests.


Yes! We’d like our cryptographic hash functions to be collision resistant and preimage resistant.

That is, we’d like it to be hard to generate 2 different messages m1 and m2 where the hash of m1 is equal to the hash of m2, and we’d also like for it to be hard to compute any function of the message m(except the hash of m) if you’re given only the hash of m.

Non cryptographic hash functions don’t require these properties, and in fact some hashing algorithms used for data mining are designed to, for example, map near inputs to near outputs.


There's some good discussion in the comments of another "one of these" a short while ago: https://news.ycombinator.com/item?id=30244534


huh, I didn't post this here, but I'm the author and kinda fun to just see it show up. Thanks for the feedback, it's a good point. I'll be making those updates soon.


Question: why didn't you post it to hn?


Linear and differential cryptanalisis are the 2 classical way of breaking cryptographic hash functions, I think they are a cool way to learn about the importance of the current constructs being in use:

https://alldifferences.net/difference-between-linear-and-dif...

A simple way to look at them is this: if you change some specific bits in the input, maybe not all bits change by exacly 50% chance in the output, or they are not independent. For only a few rounds of S-Box-es you can probably find something like this by hand, for many rounds, SAT-solver or a special tool is needed.


Can you help me understand how one might use a SAT solver to find ways in which cryptographic primitives (or their components) deviate from ideal pseudorandomness?

I know what all of these things are, my intuition just isn't jumping to a way to formulate statistical correlations as a SAT problem.


You're better off reading the Wikipedia page which covers things like that (for instance, the Merkle–Damgård construction) in some detail and without all the oversimplifications and almost unavoidable wrongness of slight write-ups like this.


On the contrary, I just want the dumbed down version. I’m not ever going to implement it (nobody should RYO crypto unless you really know what you’re doing), nor am I going to have to rely on the knowledge in my daily life - I’m not a maths educator, and I’m certainly not a cryptographer.

It’s still good to understand the basics.

I learnt a lot about PKI by forcing myself to learn the absolute basics of modulo, but I’ll never implement it or do a full calculation.


This isn't really a 'dumbed down version' and the person I'm replying to specifically asked about the sort of things not covered in this version.


Exactly. Also SHA-256 is in the same family as MD-5. It would be nice to kind of go over how the family in general works, what weaknesses were discovered, and what changes were incorporated into SHA-256 that addresses these issues.


I’ve always been curious about whether it’s possible to have a collision when your input is 256 bits or less (in the case of sha256)… I even emailed Bruce Schneier. I got a polite response that he didn’t have time to look at it, which indicated to me he didn’t know “off the top of his head”, which I interpreted as “it’s not impossible”… but I still don’t know.


There are 2^257 - 1 inputs of 256 bits or less and only 2^256 possible hashes, so there must be a collision.


The input is always padded out to 256 bits, though, isn't it?


The input is padded to form a sequence of 512 bit chunks, but that doesn't change anything, padding is one-to-one so there's still the same number of inputs.


> padding is one-to-one so there's still the same number of inputs

That part is debatable, if we imagine different padding schemes.

If the padding was just 0s, I would easily accept an argument that 111000 and 1110 are the same input giving the same hash.

You could also say you consider the extra '1' bit in SHA-256 as part of the payload, not truly 'padding' because it's mandatory, and make a similar argument, and it wouldn't be blatantly wrong.


> That part is debatable, if we imagine different padding schemes.

It isn't debatable because SHA-256 (and all these schemes) define the padding because it's necessary to have padding defeat the problem you're about to talk about and to do that they are in fact one-to-one:

> If the padding was just 0s, I would easily accept an argument that 111000 and 1110 are the same input giving the same hash.

As you might have guessed, this is not how it works.

SHA-256 appends that "extra 1 bit" you talk about, then zeroes until it is 64-bits short of a multiple of 512 bits, and then a 64-bit count.

So, as the earlier poster explained there are 2^257 -1 distinct hash inputs in abotsis' imagined set and thus there must be collisions by the pigeon hole principle, it really is that simple.


> it's necessary to have padding defeat the problem you're about to talk about

There is no way it's "necessary" to avoid that particular problem, yet at the same time unnecessary to avoid length extension attacks.

> As you might have guessed, this is not how it works.

That's a weirdly hostile way to respond to a clearly counterfactual thought experiment.

> and thus there must be collisions by the pigeon hole principle, it really is that simple

Except they didn't just say there must be collisions in SHA-256, by my reading they were making an argument that the method of appending padding and stuff doesn't matter.


> There is no way it's "necessary" to avoid that particular problem, yet at the same time unnecessary to avoid length extension attacks.

Length extension attacks work because of a common misuse of these algorithms. If you use them properly you aren't vulnerable to a length extension attack whereas you would be in trouble with the terrible "padding" scheme you propose as a counterfactual. Algorithms like SHA-512/256 and Keccak are prominent because they're misuse resistant.

Consider the guillotine used to cut paper to size. Used correctly it's not very dangerous, but we don't let kids have access to a guillotine because it lacks misuse resistance. In contrast devices like a Rotatrim are safe for kids because they are resistant to misuse. It turns out most programmers are kids and we should not have given them C++ std::sort() or SHA-256 or ECB mode encryption because they will cut themselves.

> by my reading they were making an argument that the method of appending padding and stuff doesn't matter.

If your "method of appending padding" turns 2^257 -1 inputs into 2^256 outputs that was not, in fact, "padding" in the sense anybody knowledgeable in this field would use. You're supposed to be able to remove the padding in cryptography, if some of the "padded" structures are indistinguishable from each other you can't do that.


If you had no size based padding, you could fake a document having a few more zero bytes at the end. That's almost never dangerous, and it also requires very specific (mis)use to cause problems, just like a length extension. Unless you can name a killer use case, I stand by length extension problems being a significantly bigger deal and more "necessary" to stop.

> You're supposed to be able to remove the padding in cryptography

It's a hash. You can never access the post-padding version of the input in the first place.


> Unless you can name a killer use case, I stand by length extension problems being a significantly bigger deal and more "necessary" to stop.

"it works as a cryptographic hash" is certainly a "killer use case" for a cryptographic hash.

Your "Dylan16807 hash" without working padding does not work as a cryptographic hash, it's useless.


It would work just fine. And if you call those messages equivalent then there is no cryptographic failing. You'll need better than that argument.

Nobody complains that RSA gives you the same result for 45 and 0045 after all.

Better, think of it like a hash algorithm that takes bits vs one that takes bytes. They both work fine and are secure. If your byte happens to be a really wide byte, that's not a fatal flaw.


> And if you call those messages equivalent then there is no cryptographic failing. You'll need better than that argument.

Defining failure as success is very, very stupid. Nobody needs to "do better" than calling your position out as wrong.

> Nobody complains that RSA gives you the same result for 45 and 0045 after all.

RSA doesn't give you "the same result for 45 and 0045" unless what you mean by "0045" is "Actually just 45 except for some reason I write that with extra zeroes as part of my bad faith argument".

Hash algorithms that don't work aren't "just fine" they're useless.

The reason algorithms like SHA-256 are defined for bits isn't arbitrary - bits are literally the unit of information, this is the obvious and natural way to define the function, so choosing to define a function over "really wide bytes" doesn't make any sense.


> Nobody needs to "do better" than calling your position out as wrong.

It's not going to be convincing unless you can talk about a specific attack. And in particular, that attack should be more relevant to real-world use than a weakness like length extension.

Especially because you could look at this weakness as a sibling attack to length extension, one that's mildly easier to pull off but you can only append zeros. That seems safer overall to me. And it's not reasonable to completely excuse one flaw as needing "misuse" but not the other.

> The reason algorithms like SHA-256 are defined for bits isn't arbitrary - bits are literally the unit of information, this is the obvious and natural way to define the function, so choosing to define a function over "really wide bytes" doesn't make any sense.

A hash algorithm that takes an input in bytes is not a failure. If you think taking single bits is actually necessary, rather than just 'obvious and natural', then I really think you're not analyzing the security properly.


One way to think of hash functions is that the hash of any value is basically a random number. So we can consider what happens if the results were actually random, then what would be the probability of a collision?

Consider a function from [0, 2^256 - 1] to [0, 2^256 - 1]. That is it maps 256bit numbers to 256bit numbers. It could in theory be represented as an enormous table lookup.

Now how many such functions are there? Well there are 2^256 ways to map 0. And then 2^256 ways to map 1.. etc. We have 2^256 inputs we need to deal with each of which can give one of of 2^256 results. That turns into (2^256) ^ (2^256)

Now, how many collision free functions are there? In this case there are 2^256 ways to map 0, but then we have to pick a different number for 1, so there are only 2^256 - 1 possiblites. Then 2^256 - 2 etc... It becomes (2^256)! where the exclamation mark is factorial.

So the probability of no collisions is (2^256)! / [(2^256) ^ (2^256)]. It may not be obvious but that's a very small number. A little bit of intuition: let's say we fill out the values of our gigantic table by hand. Let's assume that by some miracle we filled out the first half without creating any collisions. Now that means we used up half the possible outputs. So every cell we fill out from now we have a 50/50 of creating a collision. And the further we get the more numbers are used up.

Now sha256 is not a randomly chosen function. But without any evidence either way, we might suspect it does have collisions.


To give some further context, I stumbled across this thought while reading how “community ids” are calculated. Community ids are commonly used to simplify joining/lookups for network security tools (suracata, zeek). They essentially concatenate the “quad tuple” (src ip/port, dest ip/port), and a “seed”, then run sha against it. I didn’t entirely understand the reason the authors chose sha (other than being security people who might have just reached for a crypto secure hash function). SHA is slow vs something like xxh, and given the number of sessions these things process, seemed like overkill. Further, it’s unclear to me what’s gained by using sha vs xxh or simply concatenating the bits. Then I started wondering about the downsides: whether it’s possible to have a false correlation because two sessions yielded the same sha digest.


What sort of collision? Or asking in a different way, what would you consider a collision?


2 messages producing the same digest. That’s actually a really interesting question.


I think he means a set-theoretical collision, not a feasible collision attack. I.e. "is SHA256 a bijection on 256-bit integers?"


AOL. It's like the articles that would explain how unix software by explaining in detail how to ./configured;make;make install and leave out everything specific to the software in question.

I can explain one small part, though. A lot of these things need some fixed constants that have to be not all-zero, not all-one, not easily predictable and not someone's backdoor. So people will say things like "we need five ten-digit numbers here, so we'll use the second to eleventh digits of the square roots of the first five primes".

That is, the use of the square root has nothing to do with being the square root of two, there's no deep math. The square root of two is just some number that you won't suspect of being a backdoor.

https://en.wikipedia.org/wiki/Nothing-up-my-sleeve_number


Came here to ask the same. I enjoyed the blog, yet as a non-cryptographer, I'd like to see the reasons behind the actions: e.g. Okay we right rotate that input, but why are we doing this? Or Why are we taking cube roots of 64 primes? etc.


> Why are we taking cube roots of 64 primes?

https://en.wikipedia.org/wiki/Nothing-up-my-sleeve_number


That was helpful. It'd be great if there are tips like these (even Wikipedia links like this is very helpful) are also present in the posts like the blog post.


The wikipedia article spells it out pretty clearly, and the pseudo-code is easily translated into any language, here it is in JavaScript if anyone is interested:

sha-256: https://github.com/jeffallen6767/sha-256-js/blob/master/src/...

Once you understand the concept that most of these algos are simply dividing the input into computer-friendly sized blocks and then stacking and manipulating these bits in 3d space like a rubics cube, then the whole thing becomes a bit easier to understand.

Here's a few more for comparison:

sha-1: https://github.com/jeffallen6767/sha-1-js/blob/master/src/sh...

md5: https://github.com/jeffallen6767/md5-js/blob/master/src/md5....

keccak: https://github.com/jeffallen6767/keccak-p-js/blob/master/src...

side note, I also ended-up implementing the keccak algo in c for open cl usage, because I wanted to see if I could use it in parallel on my graphics card from nodejs ( spoiler: it's indeed possible ):

https://github.com/jeffallen6767/chain/blob/master/src/minin...

Please forgive my terrible programming style, this was done years ago...


> Append a single 1:

Just begs the question, why?


AFAIK the 1 bit does nothing. Padding with 1+(number of 0s)+(length of message) is traditional though and used by MD4/MD5/SHA1/SHA2.

You can read in a few places that this is the padding proposed by Merkle, but I looked it up and in my reading of One Way Hash Functions and DES he does not use the 1

> As an aside, we note that x is padded with 0's until its size is an integral multiple of the size of the convenientlySizedChunks. Note that padding with 0s might introduce some ambiguity about the exact value of x that is being authenticated, as discussed by Juenernan[4]. The message "010110" padded to 8 bits would be "01011000" and it is now unclear how many trailing 0's were present in the original message. Several methods are available to resolve this ambiguity. We select a method that will also be useful in resolving a further problem: we append the length of x, in bits, at the end of x. To make this additional "length" field easy to find. we shall right justify it in the final block. If the length field won't fit, we add additional blocks to the end of x.


If your input ends with a 0 and you don't append the 1, you have no idea if that 0 is part of the padding or input


Yes you do, since you still have the length field.


Good point, I never thought about it that way. The reverse is also true: The length is probably unnecessary, since the "1" is there. I suppose the length makes it harder to produce collisions with inputs of different lengths, but in practice I don't think anyone actually tries to do that.

If I had to take a wild guess, I'd guess that some early design in the family used only the "1", and that the length was added later?


Padding without the length isn't suffix-free, ie. there's two different messages x and y with pad(x) a suffix of pad(y). You want that basically because if there's ever a collision part-way through the loop on two inputs, if there's a common suffix, the rest of the loop will be the same so there's no chance to "escape" the collision.

Being some kind of artifact seems plausible.


> Padding without the length isn't suffix-free

Can you give a more concrete example? Specifically with padding with 1 then all 0s without length. Appending the 1 then all 0s is supposed to prevent collisions.

The length suffix is used as an early attempt to avoid length extension attacks on MACs of the form H(secret|M). However, we've later seen that it's not sufficient as it's easy to determine length of the secret by trial and error. This eventually led to the creation of HMAC H(H(secret^opad)|H(secret^ipad|M)).

In theory, the length suffix is no longer needed (or the "1" suffix but we save more space by removing the length). Maybe a cryptographer with more history knowledge can explain this but personally I think it's now one of those "don't fix what's not broken" things. It doesn't hurt security and it's already been thoroughly analyzed (and hardware optimized) so we just leave it.


Of it not being suffix-free? Just stick a full block on the front of the message.

Length padding isn't just for MACs, its used in the first place to prove Merkle-Damgard works at all, ie. is collision-preserving. If you have a hash collision with two messages with the same length, you can run through the Merkle-Damgard loop in parallel and find a collision in the compression function. For different lengths that doesn't work but if you just insert the length into the last block, you know you'll always have a collision in the final block in that case.

There's a proof here that being suffix-free is necessary and sufficient for a padding rule to make Merkle-Damgard collision-preserving: https://eprint.iacr.org/2009/325.pdf


> stick a full block on the front of the message

Yes, this will give you the same intermediate state in the hash function but if m1!=m2, then the final output H(pad'(x|m1)) != H(pad'(x|m2)) where pad' is just appending 1 then all 0s.

Maybe it's due to the years since my crypto class or maybe I'm just dense but I'm not following your logic at all. I'm asking for an actual example of m1 and m2 that will result in the same hash if they are not padded with their lengths.

I made the assertion that for all m1,m2 if m1!=m2 then H(pad'(m1))!=H(pad'(m2)). Disproving it is simply giving a counter example m1,m2.

EDIT: I took some time to skim through the paper you've linked and it supports my assertion:

> We also have shown that the simplest padding such as padding 10^d only can be sufficient for collision preserving property if we restrict collision resistant assumption of the underlying compression function for the first (t − 1) bits.

Since we know (or at least believe) the SHA2 compression function is collision resistant, then length padding is redundant.


Oh, we're talking about different things. I'm only talking about proving collision-preservation, not something as spectacular as generating collisions.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: