6.824 Lecture 6 - Fault Tolerance with Raft - Part 1


  • Systems we’ve seen so far have been single-master, and so have a SPOF
  • People wanted to build replicated systems before the first consensus papers came out (~1990); they attempted to do it by:
    • Building networks that would not fail (expensive)
    • Waiting for human/operator intervention when something went wrong
  • $2f + 1$ servers can withstand $f$ failures
  • Any two majorities overlap in at least one server
  • 1990: Paxos & Viewstamped Replication (VSR) papers were published; took ~15 years for these ideas to be used in production
  • Raft is closer to VSR than Paxos
  • Raft integration with (colocated) application code:
    • Application receives a request, and forwards it to the Raft leader
    • Raft replicas commit this request to their log
    • Each replica notifies its own application instance
    • Each instance of the application then applies the request
    • The application instance that received the request responds to the client
  • Followers acknowledge before the write log entries to disk.
    • If disk writes are slow enough that they start lagging, memory usage grows
    • If this happens for long enough, the server will run out of memory and crash
    • Raft doesn’t include a safeguard against this
  • “Committed” state is not persisted
    • Say all nodes crash and they come back up, a leader is first elected

    • That leader then sends out AppendEntries heartbeats

    • And then picks a commitIndex following this rule:

      If there exists an $N$ such that $N > \text{commitIndex}$, a majority of $\text{matchIndex[i]} ≥ N$, and $\text{log[N].term} == \text{currentTerm}$: set $\text{commitIndex} = N$ (§5.3, §5.4).

    • And then propagates it using subsequent AppendEntries calls

  • It’s possible to build a consensus system without the notion of a leader
    • Paxos doesn’t have leaders
  • Raft doesn’t include a mechanism for a leader to step down voluntarily
    • One way to fix this is to have leaders wait for heartbeat responses
    • And depose themselves if they haven’t received a quorum of responses a few times in a row
  • Raft eventually converges on the same value per slot on every server
Edit