finger exercises: SHA-256

In which I learn about generators, typed arrays, and bit operations by implementing SHA-256.

I’ve so far managed to get by with just putting everything in the top level because just executing a main function with some other top level functions has been sufficient. No need to complicate things beyond what is necessary. For this one I think I might need some state so gonna start with a class just in case

I’m gonna forego constructors and instances for as long as possible since a class with static methods and properties is as simple as it gets after just having top level methods. Will be dealing with bits and bytes so might as well fill-in some constants

That’s the minimum without looking at the spec. Looking at the spec now I need a few more things

The spec also defines some helper functions so won’t hurt to have those as well

Pretty basic stuff: rotations, shifts, xor, addition modulo word size. That takes care of the easy stuff. Now I need to start byte munging and converting everything to the right form. This stuff is best handled slowly and throughly. I will need something to take the sequence of bytes, pad them, and give back words which the spec says should be 32 bits. I don’t have such fine grained control over byte size declarations so will have to make do with int

The spec requires padding that is a mashup of 1 bit, 0 bits, and the length of the original byte sequence as the last 2 words. I’m dealing with multiples of 8 bits and the size of the block is defined in the spec as 512 bits. Expressing all of those constraints in bytes (because of the 8 bit constraint on input data) this is the formula for the number of bytes of 0s

I also need the number of bits in the original message because that is gonna end up being the last two words in the sequence

I want to minimize using buffers, accumulators, and conditionals. Since I know the size of the message I’m working with a simple calculation in terms of the size tells me how many blocks I can safely traverse without any issues before I need to worry about adding the padding bytes

That gives me all the blocks I can traverse without having to worry about the padding stuff but after that is exhausted I need to deal with the left overs and start padding

That gives me the left overs along with the 1 bit and the 0 padding. This is a bit too convoluted for my taste. I can’t tell what is going on by just glancing at it but I’m not sure if there is a better way to do this. Moving on to the last 2 words

That completes the generator for getting words from the original message expressed as bytes. I think it is possible to do all this in a streaming fashion without knowing the size of the message in bytes ahead of time. I’ll do that later to see if it is simpler or more complicated. Now I can assume I have a list of words that is a multiple of 512 bits so the rest of the implementation can use that invariant.

Now that I have the words lets group them into 16 words at a time (8 * 4 * 16 = 512 bits). I’m doing this because the stretching step takes 16 words and stretches them into 64 words so pre-processing ahead of time seems to make sense

I have enough now to create the message schedule

It’s pretty much a direct translation of what is in the spec. I believe this is the “stretching” portion of the hashing algorithm.

Finally putting all the pieces together gives me the hash digest

Finally a utility to convert the output into a hex string

Testing it on the empty string and comparing with some reference implementations things seem to check out. Now that I look at it I’m not sure if generators and typed arrays make the algorithm any simpler. I’ll revisit the issue later.