Random(Notes): distributed storage

Reverse hash tree (RHT) is a tree data structure, where nodes contain a hash (ie not value only). I call this tree a "reverse" because the way node's hash value is computed is reversed to the Merkle tree way. In Merkle tree, parent nodes contain a hash of the child nodes. This allows a quick way to find the difference between 2 parent nodes in 2 trees - they would just have different hashes. The next step would be to find the child node that is actually differs.

In RHT, not a parent node hash is build based on child nodes hash, but in opposite: child node hash is build based on its own hash/value and parent node hash. For leaf nodes, value could be something useful, for intermediate nodes it could be just some version or category etc.

When child node it accessed, its hash is validated based on current value/hash and parent node hash. If the value is valid, we keep validating parent node hash with its parent node hash etc. This actually could be done on the way from root node to the child node. If found any node with inconsistent hash, this means the node and its children require a hash/data recalculation.

Notes based on the paper about Amazon Dynamo. There is however a difference between Dynamo and AWS DynamoDB (at least based on name), however this is not discussed in the document.

Dynamo is designed to be an eventual consistent storage. At the same moment it allows clients to specify number of hosts in the read (R) and write (W) quorums. The number of hosts in the cluster N is also configurable. The good rule of thumb is to keep N >= 3, and R + W > N. In this case, only the correct data would be returned.

Many systems use synchronous data replication to keep consistency strong (e.g. MySQL). This causes issues with availability, b/c if the replica failed, it becomes unavailable to its clients. Only single master host could accept writes in such systems. In opossite to this, Dynamo uses asynchronous data replication algorithms. This helps to build partition tolerated systems, as Dynamo would accept reads and writes even during network partition, and resolves the update conflicts using different algorithms.

Dynamo is a P2P system that uses gossip based protocol to share the membership and failure detection information with peers. The membership information contains the list of key ranges that the node is responsible for. Also nodes use gossip protocol to discover itself to other hosts. Basically, when a new dynamo host starts, it knows already a list of seed hosts, which it connects to tell them about own availability. Those seed hosts help to deliver the information via gossiping to other peers.

Dynamo uses consistent hashing for partitioning. This allows adding new nodes to the hash space without a need to re-hash existing values. Instead, only some part of values is moved to the new node, after what it becomes available for reads and writes.

Random(Notes)

Reverse Hash Tree

Amazon Dynamo paper notes