6.824 Lecture 8 - Zookeeper
- Video: https://www.youtube.com/watch?v=pbmyrNjzdDk
- FAQ: http://nil.lcs.mit.edu/6.824/2020/papers/zookeeper-faq.txt
- Prereq: PaperZookeeper
Linearizability
- Continued from 6.824 Lecture 7 - Fault Tolerance with Raft - Part 2
- If all you have are writes, it’s hard to say if the system really did anything
- This history is linearizable because a total order exists that matches the conditions:
- This history is not linearizable because C1’s experience implies that 2 must’ve been written before 1, and C2’s experience implies the opposite:
- A linearizable system cannot serve stale reads:
- Here the upward arrow marks the client re-sending a request because the original request timed out:
- Assume the server actually processed the original request but the response was lost
- The retried request receives a cached response (so 3, not 4)
- You can either view retries as low-level machinery (in which case this is legal) or view them as actual repeated requests (in which case this violates linearizability)
Zookeeper
- Why ZK?
- ZK is an API for a general-purpose coordination service
- Does ZK’s performance scale linearly with the number of servers in the ZK cluster?
- The entire point of adding servers is to improve fault tolerance, so the answer would ordinarily be no
- But ZK optimizes for reads, so, the answer is actually yes for read-heavy workloads
- Adding servers may make some things slower (increased network/leader contention)
- ZK + ZAB is roughly similar to “generic application” + Raft:
- No sharding / partitioning
- Writes in the per-client ordering are ordered the same way in the global/total order of linearizable writes
- Reads observe the state of the “filesystem” at various points on the “log”.
- Subsequent reads within a session may go forward within the log but never backward.
- The
zxid
field is used to maintain this guarantee even when the client switches replicas
FAQ
Q: How does linearizability differ from serializability?
A: The usual definition of serializability is much like linearizability, but without the requirement that operations respect real-time ordering. Have a look at this explanation: http://www.bailis.org/blog/linearizability-versus-serializability/