6.824 Lecture 8 - Zookeeper

Video: https://www.youtube.com/watch?v=pbmyrNjzdDk
FAQ: http://nil.lcs.mit.edu/6.824/2020/papers/zookeeper-faq.txt
Prereq:
Paper
Zookeeper

Linearizability

Continued from 6.824 Lecture 7 - Fault Tolerance with Raft - Part 2
If all you have are writes, it’s hard to say if the system really did anything
This history is linearizable because a total order exists that matches the conditions:
- If two events are non-overlapping, the earlier one must come first in the order
- Every read sees the most recent write in the order
This history is not linearizable because C1’s experience implies that 2 must’ve been written before 1, and C2’s experience implies the opposite:
A linearizable system cannot serve stale reads:
Here the upward arrow marks the client re-sending a request because the original request timed out:
- Assume the server actually processed the original request but the response was lost
- The retried request receives a cached response (so 3, not 4)
- You can either view retries as low-level machinery (in which case this is legal) or view them as actual repeated requests (in which case this violates linearizability)

Zookeeper

Why ZK?
- ZK is an API for a general-purpose coordination service
- Does ZK’s performance scale linearly with the number of servers in the ZK cluster?
  - The entire point of adding servers is to improve fault tolerance, so the answer would ordinarily be no
  - But ZK optimizes for reads, so, the answer is actually yes for read-heavy workloads
  - Adding servers may make some things slower (increased network/leader contention)
ZK + ZAB is roughly similar to “generic application” + Raft:
No sharding / partitioning
Writes in the per-client ordering are ordered the same way in the global/total order of linearizable writes
Reads observe the state of the “filesystem” at various points on the “log”.
- Subsequent reads within a session may go forward within the log but never backward.
- The zxid field is used to maintain this guarantee even when the client switches replicas

FAQ

Q: How does linearizability differ from serializability?

A: The usual definition of serializability is much like linearizability, but without the requirement that operations respect real-time ordering. Have a look at this explanation: http://www.bailis.org/blog/linearizability-versus-serializability/