Leader Election in Distributed System

One of the important problems of distributed computing is electing a leader process. Leader process is needed for multiple reasons: it could be used to make a final decision, it could be used to run any task that has to be run only by single process at a time (like finding a next task to execute and spreading execution between other hosts, acquiring a lock etc.), it may contain authoritative data and provide them to other nodes etc.

There are multiple leader election algorithms. Almost always, for most of the tasks it's possible to find the solution where there is no need of single leader:
  • for task scheduling the list of tasks, tasks could be partitioned and with locking spread over multiple hosts
  • for making final decision, it's possible to come up with some consensus (like paxos algorithm)
  • distributed locking could help other systems to avoid having a leader (again it might be based on some consensus algorithm like paxos or raft)

Leader plays the same role in distributed system as mutexes in concurrent programming: it allows to ensure that "critical section" is executed by only one node.

Not always leader plays a role a master, thus not always systems that have a leader are master-slave based. Pretty often, leader is a regular node, that does all the job as other peer nodes, but also has some authoritative between other nodes.

Most distributed systems could be easily separated into 3 types:
  • the one that uses leader/master node to synchronize or make decision
  • the one that does not need any synchronization or avoids single leader, and mostly based on DHT (many P2P systems)
  • mixed systems that use both techniques, and apply them for different jobs.
The systems that use mix solution are very interesting in the way they try to be scalable and failure tolerant. The distributed systems could be split on set of clusters/replica-sets, each has own leader.

The goal of leader election algorithm is to make sure that exactly one process thinks that it's a leader and that all other processes knows that it is a leader.

Time complexity: the number of the rounds needed until the leader is elected and others know who is new leader.

Communication complexity: the number of non-null messages that are sent during the execution.

For most of the algorithms, the best way to implement it is to have a special state machine for leader election. This machine should be in the initial state before the election starts. State machine helps to make sure that all processes are doing the same actions for the same input values. Also, many algorithm assume that each node that participates in election has unique identifier.

It's hard if possible to elect a leader in anonymous ring network, because all hosts receives the same messages and as result their state machine is in the same states. Thus, there is a need to assign identifiers to the ring network participants.

There are multiple algorithms for leader election:
  • Bully algorithm
  • LeLann, Change and Roberts algorithm
  • Hirschberg-Sinclair (HS) algorithm
  • FloodMax algorithm
  • Paxos based election
  • many others

In next few notes I'm going to describe a bit each algorithm.