Sharding Redis #2. Rebalancing Your Cluster.

The fact is that when it comes to sharding – Redis is not the best tool you could have. Although redis cluster was partly implemented on an unstable branch a long time ago, apparently other prioritized stuff keeps antirez from finishing it.

So if you are sitting and wondering what is going to happen when you can’t fit all this data into a single redis server there are some, almost not documented, workarounds. I wrote briefly about it already here, but let’s take another look.

Ruby redis client has a Distributed module. It is simple.

1
2
3
4
require "redis"
require "redis/distributed"

redis =  Redis::Distributed.new ["redis://host_1.com:6379", "redis://host_2.com:6379"]

Now you can use redis in a usual way.

1
2
3
4
5
redis.set("key1", "value1")
redis.set("key2", "value2")

redis.get("key1") # => value1
redis.get("key2") # => value2

What’s different here from using a regular client is that Redis::Distributed creates a hash ring out of redis nodes that were passed in the beginning. It uses crc32 to calculate hashes for each node based of redis url of the node and its number. When performing an operation on a key it calculates a hash for it and maps it to an appropriate redis node. It works in a way so that keys are going to be mapped almost evenly across nodes. So for a cluster of two nodes, if you create 100 lists, aproximately 50 will be mapped to the first node and another 50 to the second node.

To know what node is mapped to a given key simply use a node_for method.

1
redis.node_for("key1")

Usually there’s no need to do that, because the whole process is transparent for developer. Until you need to add another node.

Okay, let’s add another node

Let’s say your two-node redis cluster is out of capacity and you need to add a 3rd node.

1
redis.add_node("redis://host_3.com:6379")

Simple enough, but what happened to the keys that were stored before adding the 3rd node? They remain at their old places. But because hashing depends on the number of nodes in the cluster, hash for some of the keys will be changed and they will be mapped to different nodes. So that when you try to get a key1 this can go to a different node and you won’t get a value that was stored before. Not fun if you care for the old values.

As an attempt to solve this problem I wrote redis-migrator. Redis-migrator takes a list of nodes for your old cluster and list of nodes for your new cluster and determines for which keys routes were changed. Then it moves those keys to the new nodes.

So it solves the previous problem like this:

1
2
3
4
5
6
7
8
9
10
require 'redis_migrator'

# a list of redis-urls for an old cluster
old_redis_hosts = ["redis://host1.com:6379", "redis://host2.com:6379"]

# a list of redis-urls for a new cluster
new_redis_hosts = ["redis://host1.com:6379", "redis://host2.com:6379", "redis://host3.com:6379"]

migrator = Redis::Migrator.new(old_redis_hosts, new_redis_hosts)
migrator.run

To make a migration process faster instead of doing sequentual writes I used redis pipeline which allows to send bunch of operations without delay and then gather responses afterwards. Checkout migrator_benchmark.rb to perform benchmark testing.

Although this tool still lacks some good error handling I believe it can give a good start.

Comments