2025-08-25 - Hashing with Parallel Processing

> [!info]- Article PGP Signature > > To verify this signature, use the public key on the [[Hello, World!|Home Page]] and follow the instructions linked there. > > ``` > -----BEGIN PGP SIGNATURE----- > -----END PGP SIGNATURE----- > ``` > ## Today's Topics: Understanding SHA-3 ParallelHash ##### August 26th, 2025, ca. 07:00 -0400 --- ### Contributions - Continued work on my [SHA-3 C3 branch](https://github.com/NotsoanoNimus/c3c/commit/ec74bfcefaeef1ea25eec6b98bbff42bc859ad3f). ### Hashing in Parallel: A Brief History Before diving into this post's subject, I want to lay out a prologue that discusses the history of hashing in parallel. I'm probably not going to get this right by any measure, and this shouldn't be counted as any kind of authoritative source of knowledge on the subject. I work with these tools often, but that doesn't mean I've encountered every usage of them "in the field" as it were. With that out of the way, it's time to ask: how _can_ one parallelize a classical hashing algorithm? ##### Parallel SHA-256 You might first be tempted to say that it's not possible, particularly if you're familiar at all with how most cryptographic hashing algorithms operate, where input from the previous state is "chained" into the input of the next. So, this is a *correct* assumption, given that all you can use is the base hashing algorithm and nothing more. However, if you're familiar with how the SHA-2 functions work internally, then you're also likely aware of the concept of a [Merkle Tree](https://en.wikipedia.org/wiki/Merkle_tree), also known as a *Hash Tree*. ![[merkle_tree.png]] Using this sort of construction, the input data to be hashed is first divided into chunks (numbered _L1_, _L2_, ... in the diagram), and the hash function `hash`, in this case SHA-256, is applied to each. This can be done in parallel. Each hashed block is then concatenated with its adjacent block hash (i.e., neighbor node) to form the input for another hash. Once hashed again, this process repeats until and all nodes are summarized and a binary tree is formed, whose root node is used as the ultimate result. Here is the algorithm expressed in a more formulaic notation: $ r = r(d_i) = H(H(...H(H(H(d_i)||h_1)||h_2)...||h_{k-1})||h_k) $ ... where `r` is the `root hash`, the final node in the tree that represents its output value; `d_i` represents the `i`th input block from the input; `h_x` represents a computed neighbor node hash; and `k` represents the depth of -- or how many layers comprise -- the tree. This formula follows the "Merkle Path" from data block `d_i` to the root `r`. ##### Applying to All Hashing Algorithms This construct applies to all hashing algorithms, cryptographic or not, and can be used for various purposes. By way of Merkle Trees, the ingested data can reliably be summarized with a single hash, without relying on a single core to do all the work. This is because the initial `Lx` blocks from input data, as well as whole sub-branches of the tree, can be computed independently before being combined. There is *so much* more depth to explore with Merkle Trees, and it's a fascinating subject. I recommend watching a few lectures on the topic when you have the chance! > [!faq] Using a Protocol > This is just **one of many** ways that computer scientists have [very cleverly] achieved parallel hashing through some kind of added *protocol* -- that is to say, a typically non-standard but mutual agreement by all parties **on top of** the cryptographic hashing algorithm. > > One could argue that the hashing algorithms themselves are a protocol. **True**, but those are widely standardized and consist of a basic input-output relationship, without added parameters in most cases. > > For example, with Merkle Trees, any party wishing to generate the same deterministic hash using parallel processing ***must*** use the exact same Merkle Tree parameters -- e.g., the depth of the tree, what happens to any leftover or unevenly-sized input blocks, how and where parallel processing is accomplished, etc. If they don't, they will of course get a different result. --- ### SHA-3 & ParallelHash Enter the [NIST SHA-3 competition](https://csrc.nist.gov/projects/hash-functions/sha-3-project), held from November of 2007 through October of 2012. The [KECCAK](https://keccak.team/) so-called "sponge" function was selected as the ultimate winner of the competition, based on many different criteria. A document was later published that specified SHA-3 Derived Functions (in [NIST SP 800-185](https://nvlpubs.nist.gov/nistpubs/specialpublications/nist.sp.800-185.pdf)), and in that document is where we find _ParallelHash_ specified. ##### Defining It As I've done previously for [[2025-08-19 - Dreams of SHA-3|TupleHash]] in another article, I'm going to let NIST succinctly define _ParallelHash_ for me: > The purpose of *ParallelHash* is to support the efficient hashing of very long strings, by taking advantage of the parallelism available in modern processors. *ParallelHash* supports the 128- and 256-bit security strengths, and also provides variable-length output. > > Changing any input parameter to *ParallelHash*, even the requested output length, will result in unrelated output. Like the other functions defined in this document, *ParallelHash* also supports user-selected customization strings. After a brief description of parameter names and meanings, they go on to add: > *ParallelHash* divides the input bit string `X` into a sequence of contiguous, non-overlapping blocks, each of length `B` bytes, and then computes the hash value for each block separately. Finally, these hash values are combined and passed to cSHAKE along with the function name (`N`) of `"ParallelHash"` = `00001010 10000110 01001110 10000110 00110110 00110110 10100110 00110110 00010010 10000110 11001110 00010110`, the optional customization string `S`, and some encoded integer values. Alright, so from this, we glean a few simple facts: 1. The resulting hash is going to depend on the selected Block Length `B`. 2. This works just the same as most other overlays on cSHAKE, or the same as cSHAKE itself, meaning we can supply a Function Name `N` and a domain separation customization string `S`. 3. We can actually create a streamable hash context from this, by counting how many input blocks are consumed. This is similar to quite a few of the other SHA-3 streaming constructs as well. Without further ado, it's time to get to work. And get to work I did. :) --- ##### The Implementation This needs plenty of cleanup, but take a look at [this permalink](https://github.com/NotsoanoNimus/c3c/blob/ec74bfcefaeef1ea25eec6b98bbff42bc859ad3f/lib/std/hash/sha3/parallel_hash.c3#L1) which freezes my _ParallelHash_ progress in time. Certainly some fun gems in there (like `REMOVE ME REMOVE ME REMOVE ME...` comments and other blurbs...), but also plenty of meat to chew on. --- ### Closing Thoughts & MIscellany Lest I forget: here's today's irrelevant report on my work for *MFTAH* as I promised [[2025-08-22 - Directions, Directions|last Friday]]: I am reacquainting myself with the topic, what my goals were, what needs to be polished, and where I can go from here. Starting tomorrow, I will begin work on the [`getopt` argument parser](https://github.com/NotsoanoNimus/getopt.c3l) that I initially wrote about a month ago. It's annoying to see lingering, unfinished work lying about like this! After that is over, I need to take a **long** aside into Standard Library territory in order to get [_AES_ ](https://github.com/konimarti/aes.c3l) and _ChaCha20_ encryption supported by the language (good thing these modules are already written). Fun! Anyway, _ParallelHash_ is certainly up there with the other interesting cryptographic hash functions. SHA-3 really changed the game all around. In environments where native threading is supported and inputs are known to be significantly large, this would be _instrumental_ in reducing the time taken to produce valid and well-known hash results that don't require "common protocols" on top of everything. The fact that this algorithm isn't supported in the standard libraries of many other languages is a bit confusing, since the actual implementation isn't much more difficult than that of its siblings'. However, since many languages incorporate threading differently, it's probably a highly controversial and opinionated topic for most. > [!tip] My Usual Shout-out > You know I have to mention... > > The great thing about [the C3 language](https://c3-lang.org/) is that its designer gives **a lot** of freedom to its Standard Library contributors when it comes to areas of expertise. > > There are a few contributors with very high strengths in data structures and algorithms, some with networking, some with OS-specific skills, etc. It's great that in this space I can really exercise and grow into a subject-matter expert in this particular field. Okay, that's enough for now. Onward to `getopt`!