Skip to content

Message Routing and Microsharding

Jim Carroll edited this page Mar 29, 2013 · 1 revision

#Message Routing and Microsharding

As described in the Simple Example, Dempsy associates an each unique @MessageKey value with an individual message processor instance. The main job of Dempsy then is to find the correct message processor to deliver each message to, among a cluster of machines. By default, Dempsy does this through the concept of microsharding.

Conceptually microsharding is elegant and provides several advantages over other solutions.

  • It applies a constant amount of computational resources to the message routing problem, independent of the number of potential or actual message processors (the message key space size) or compute nodes. This is as opposed to tracking the location of each individual message processor.
  • It provides a uniform distribution of message processors across compute nodes. This is as opposed to algorithms like the "distributed hash table."
  • It allows for "stickiness" of message processor node locations when the cluster topology changes. This is as opposed to stateless algorithms like a simple mod operation on the key's hash code over the number of nodes. In this case existing message processors would be reshuffled among the nodes every time a new node entered or left the cluster.

Microsharding is a technique where the key-space for all potential message processors is broken up among a fixed number of shards. Shards are assigned uniquely to the currently available compute nodes. The number of shards M is >> the number of nodes N. A shard is uniquely assigned to an individual node while a node will have many shards. The relationship between which nodes own which shards is managed by ZooKeeper.

Microsharding
Microsharding

In order to locate the correct node that a message processor is on, Dempsy will:

  1. Get the @MessageKey from the message instance.
  2. Get the hashCode of the @MessageKey
  3. Find the shard associated to the message processor to which the message is addressed using shard = message key's hash code % M
  4. Route the message to the node that owns the computed shard.

Using microsharding it's easy to manage changes to the cluster of nodes. Suppose a new node enters the cluster. An imbalance is introduced where the shards are no longer evenly distributed among the nodes. In this case, using ZooKeeper as the intermediary, the new node will negotiate for shards until the number of shards is evenly distributed again.

As a concrete example, let's say there's 5 nodes and 100 shards. That means each node has 20 shards. If a 6th node enters the cluster then the most any node should have would be 17 and the least any should have would be 16. The new node has zero and must acquire at least 16. The other nodes all have 20 and need to give up at least 4. In Dempsy's default RoutingStrategy this is done in a completely decentralized manner using ZooKeeper.

Notice that when the cluster changes, the fewest number of message processors actually migrate and most of them never move.

If one of the nodes then leaves the cluster we're back down to 5. When this happens the remaining nodes negotiate through ZooKeeper for to acquire the additional shards. None of the message processors from the nodes that stayed in the cluster needed to move.

Clone this wiki locally