17

SHA-1, SHA-2, and the standardized version of SHA-3 are all sequential. This is impractical for hashing very large files distributed across machines. Any sequential hash can be straightforwardly converted into an efficiently parallelized hash using Merkle trees, but then I lose the standardization, and this is undesirable if the hash is to be used for long term authenticated storage.

Question: Is there an officially standardized tree-based (parallelizable) hash?

The most important property of the tree-based hash for my purposes is that for any partition of the input string, each portion can be reduced down to $O(\log n)$ intermediate space in parallel such that the intermediate values can be combined into the final hash, with at most a constant factor overhead over a sequential hash.

otus
  • 31,744
  • 5
  • 65
  • 159
Geoffrey Irving
  • 394
  • 1
  • 10
  • 1
    [blake2b](https://blake2.net) is probably along the lines of what you're looking for. – Stephen Touset Nov 09 '15 at 05:47
  • 3
    I think NIST is working on something as part of the SHA-3 process, but they do not have a draft yet – Richie Frame Nov 09 '15 at 06:02
  • 2
    http://csrc.nist.gov/groups/ST/hash/sha-3/Aug2014/documents/kelsey_sha3_2014_panel.pdf – otus Nov 09 '15 at 09:05
  • 2
    Tiger Tree Hash is probably the closest thing to a standard tree hash. – CodesInChaos Nov 09 '15 at 12:39
  • The Skein paper also describes the use of a hash tree, but I've forgotten if it specifies specific/default node sizes etc. – Maarten Bodewes Nov 09 '15 at 14:32
  • Yeah, I think most of the SHA-3 proposals discuss trees, but I'm specifically looking for a standard version. – Geoffrey Irving Nov 09 '15 at 14:35
  • 1
    @GeoffreyIrving I think tree hashing was included in some (/many?) of candidate designs for SHA-3 *because* there is no standard for tree hashing. It would not be required to include the tree hashing into the algorithm if there was. – Maarten Bodewes Nov 09 '15 at 22:51
  • Related: ["Optimal Parameter Selection for Efficient Memory Integrity Verification Using Merkle Hash Trees"](http://www.cs.cornell.edu/people/egs/papers/hashtree.pdf). – David Cary Feb 12 '16 at 17:09

3 Answers3

7

With SHA-3 Derived Functions (SP 800-185, pdf) there is now a standardized parallel hash based on SHA-3, called ParallelHash, appropriately.

However, it is not a tree hash, but more of a hash-list-based mode. The string to be hashed is divided into equal-sized blocks, which are hashed, concatenated and then hashed again.

While it is not a tree hash it should cover the use case of hashing very large files across many machines. The arbitrarily small blocking size allows you to get to your $O(\log n)$ requirement if you base it on an appropriate function of the input size.

otus
  • 31,744
  • 5
  • 65
  • 159
6

The BLAKE3 hash function was just announced today. Internally, it's a Merkle tree.

Jack O'Connor
  • 627
  • 6
  • 12
3

I'm working on a tree hash based on BLAKE2 that's absolutely not in any way official, but it's exactly the sort of design you described. Maybe this would be useful once it's been properly reviewed and stabilized: https://github.com/oconnor663/bao

One of the features of Bao that might be nice for authenticated storage, is that it can verify any part of a file against the root hash, without needing to retrieve the entire file first. So for example, if the authenticated file is a big video, and you want to stream it and seek around it on a mobile phone, you can do that.

Edit: With the release of BLAKE3, the latest version of Bao is no longer its own hash function. Instead, it's based on BLAKE3, implementing the verified streaming part of the spec.

Jack O'Connor
  • 627
  • 6
  • 12