6.824 Lecture 6 - Fault Tolerance with Raft - Part 1
- Video: http://nil.csail.mit.edu/6.824/2020/video/6.html
- FAQ: http://nil.csail.mit.edu/6.824/2020/papers/raft-faq.txt
- Stopped at 52:28
- Systems we’ve seen so far have been single-master, and so have a SPOF
- People wanted to build replicated systems before the first consensus papers came out (~1990); they attempted to do it by:
- Building networks that would not fail (expensive)
- Waiting for human/operator intervention when something went wrong
- $2f + 1$ servers can withstand $f$ failures
- Any two majorities overlap in at least one server
- 1990: Paxos & Viewstamped Replication (VSR) papers were published; took ~15 years for these ideas to be used in production
- Raft is closer to VSR than Paxos
- Raft integration with (colocated) application code:
- Application receives a request, and forwards it to the Raft leader
- Raft replicas commit this request to their log
- Each replica notifies its own application instance
- Each instance of the application then applies the request
- The application instance that received the request responds to the client
- Followers acknowledge before the write log entries to disk.
- If disk writes are slow enough that they start lagging, memory usage grows
- If this happens for long enough, the server will run out of memory and crash
- Raft doesn’t include a safeguard against this
- “Committed” state is not persisted
-
Say all nodes crash and they come back up, a leader is first elected
-
That leader then sends out
AppendEntries
heartbeats -
And then picks a
commitIndex
following this rule:If there exists an $N$ such that $N > \text{commitIndex}$, a majority of $\text{matchIndex[i]} ≥ N$, and $\text{log[N].term} == \text{currentTerm}$: set $\text{commitIndex} = N$ (§5.3, §5.4).
-
And then propagates it using subsequent
AppendEntries
calls
-
- It’s possible to build a consensus system without the notion of a leader
- Paxos doesn’t have leaders
- Raft doesn’t include a mechanism for a leader to step down voluntarily
- One way to fix this is to have leaders wait for heartbeat responses
- And depose themselves if they haven’t received a quorum of responses a few times in a row
- Raft eventually converges on the same value per slot on every server