So yes I’ve haven’t blogged for a long time and I have been lazy on my schedule. It’s partly due to my job and partly due to experiments that I have been tinkering around with various techniques and algorithms. Some of my recent explorations include Go language, Raft consensus algorithm which turns out to be my favorite so far, and twemproxy . If you know all three of them you know why I am mentioning them together. YES I am building my own elastic key-value store. I am planning to use Raft for node state consensus (etcd 2.0 right now seems fit my requirements), and then use twemproxy on top to shard out my data. The only missing piece left would be re-balancing the data once a node joins the cluster. It may or may not see light of the day. I may be using it however for my hobby projects anyway.
While I am solving the problem of how a data node joins and some process starts to re-balance the cluster. One of the problems I wanted to reduce the number of key-value moves (migrating a key-value pair to correct nodes). Moving may involve migrating key-value pairs to correct shard and propagating changes onward in the consistent hash ring ( Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the World Wide Web (1997) by David Karger et al and being used by systems like Amazon dynamo, Riak etc.).
The most common way to distribute set of keys over a cluster is mod operator. On a typical day of distributed engineering if you are asked to find correct shard for a given key, a typical approach is to digest the key with some hashing algorithm (MD5, CRC32, Murmur just to name a few) to get a number. Lets call this number KeyDigest ; to find correct landing spot for your cluster you can simply do KeyDigest mod NumberOfShards (KeyDigest % NumberOfShards in C). This basic technique can make wonders happen; if used correctly you can store huge amount of data.
The technique described above works pretty good until you have to make your cluster elastic so that you can add or remove nodes from cluster. Yes there is a typical way to re-balance such a system but until recently I didn’t thought about it’s efficiency. So while searching for different techniques that can reduce the number of key-value moves within my shards I came across a pretty good technique called jump consistent hashing described in paper here . Given N shards, it lets you choose a shard for KeyDigest so that (1) about the same number of keys map to each shard and (2) the mapping from key to shard is perturbed as little as possible. Thus lesser assignment for a key to shard is less likely to change and you have to move lesser amount of data around when a shard configuration changes. You should read the detailed paper it contains all the details you may need of how efficient it is and how it works.
Excited by the claim I wrote a piece of Go code available here to simulate shard configuration change with 64K consecutive keys and calculate how many keys would be required to move using the mod technique (let’s call it ModConsistentHash) vs the technique described in paper (JumpConsistentHash). I tried various numbers and using the gist I shared you can try them as well. Here is example of the difference in number of moves you may have to do:
I start off with 10 shards and start increasing the number of nodes added to cluster as we go right (upto 52). Difference is obvious. Some of you may wonder, the line for JumpConsistentHash is constantly increasing so at some point it time they must meet. And you are right but wait if I start off in a larger cluster, say 32 shards and go all the way to 64 look what happens:
The gap between the two clearly increase, and this pattern can be verified by fixing the size of nodes joining cluster (only changing initial cluster size) as show in graph below:
The above graphs clearly shows the decrease in number of moves required when cluster size is increasing and number of nodes added to cluster are fixed (2 in my case). Jump consistent hash has a huge memory and speed advantage over consistent hashing technique by Karger et al (see paper for details). Paper goes in further details on how much efficient Jump consistent hash really is.
I am just mesmerized by the results and looking forward to tools like twemproxy to incorporate this technique. I am looking forward to modify twemproxy personally and verify the results. You in the mean time can also have some fun with this jump consistent hashing.