Consensus in Distributed Databases – DEV Community




1. Introduction

Consensus is the backbone of fault-tolerant distributed systems, ensuring that nodes agree on a single outcome even in the face of crashes, delays, or network partitions. Without consensus, systems risk split-brain scenarios, inconsistent reads, and conflicting writes. Algorithms like Paxos, Raft, and Zab power consensus in real-world infrastructure such as ZooKeeper, etcd, Consul, Spanner, and CockroachDB. By doing so, they provide the foundation not only for strong consistency models like linearizability but also for the coordination primitives that keep large clusters safe and reliable.



Properties of Consensus Algorithm

From a theoretical perspective, consensus algorithms have three properties:

  • Agreement – The decision value is the same for all correct processes.

  • Validity – The decided value was proposed by one of the processes.

  • Termination – All correct processes eventually reach the decision.



Applications of Consensus

Some of the most important applications of consensus include:

  • Leader election – ensures exactly one primary node at a time, preventing conflicts and enabling strong consistency models like linearizability.

  • Replicated logs / state machine replication – guarantees a single, agreed-upon order of operations across replicas, forming the backbone of reliable database replication.

  • Locks and leases – allows only one process to safely acquire a critical resource, even under failures or network partitions.

  • Uniqueness constraints – prevents multiple clients from successfully creating conflicting records (e.g., the same username or primary key).

  • Atomic transaction commit – ensures that all participants in a distributed transaction either commit or abort together, avoiding partial results.

  • Fencing tokens and monotonic sequencing – provides strictly increasing tokens (from log indices or revisions) to protect against “zombie” clients acting on stale leases.

  • Shard and workload assignment – coordinates which nodes own which shards or tasks, and automatically reassigns them when nodes join, leave, or fail.

  • Globally ordered ID generation – provides unique, monotonic IDs that can serve as sequence numbers, event ordering markers, or tickets.




2. Shared Logs: Consensus in Practice

A shared log is an abstraction where many nodes or clients can propose entries, and the system ensures every replica sees the same append-only sequence in the same order. In practice, a single leader (at a time) appends entries; consensus decides the order and guarantees all replicas deliver the same history.

A shared log is also formally known as a total order broadcast, atomic broadcast, or total order multicast protocol. These terms describe the same idea in different words: requesting a value to be added to the log is called broadcasting, and reading a log entry is called delivering.



Properties of a Shared Log

  • Eventual append – if a node proposes an entry and does not crash, it will eventually see that entry appear in the log.

  • Reliable delivery – once any correct node sees an entry, every correct node will eventually see it.

  • Append-only – entries are immutable; new entries can only be added after existing ones.

  • Agreement (total order) – any two nodes that have read entry must have read exactly the same prefix before, in the same order.

  • Validity – every log entry came from a valid proposal (no fabricated entries).

Most practical consensus protocols (Raft, Multi-Paxos, Zab) expose exactly this log interface.




3. Consensus Algorithms (Implementations)

While consensus can be described abstractly, real systems rely on specific algorithms to make it work. The three most influential and widely used are Paxos, Raft, and Zab. Each solves the consensus problem, but with different trade-offs in understandability, performance, and adoption.



Paxos

  • Developed by Leslie Lamport; the first practical consensus algorithm.
  • Works through two phases of voting: first, nodes agree on a proposal number, and second, they agree on the actual value.
  • Guarantees safety even in the face of failures, but the logic is subtle and the algorithm is notoriously hard to implement correctly.
  • Multi-Paxos extends the protocol to agree on a sequence of values, effectively forming a shared log.



Raft

  • Designed with understandability in mind, making it simpler than Paxos.
  • Elects a leader through a voting process with term numbers.
  • The leader appends client requests to its log, replicates them to followers, and considers entries committed once a majority acknowledge them.
  • Ensures that a newly elected leader has an up-to-date log before continuing, simplifying failover.
  • Widely used in systems like etcd and Consul.



Zab

  • Stands for Zookeeper Atomic Broadcast.
  • Specifically designed to support ZooKeeper, a coordination service.
  • Uses a leader-based approach where the leader proposes updates, and followers acknowledge before the update is committed.
  • Emphasizes total order broadcast, ensuring every update is applied in the same sequence across all replicas.
  • Provides strong consistency guarantees for coordination tasks like locks and configuration management.

We’ll only touch on Paxos, Raft, and Zab here. Each will be covered in more detail in separate articles; this section is just a high-level overview of how consensus works in practice.




Source link

Leave a Reply

Your email address will not be published. Required fields are marked *