GFS was explicitly designed to be used in a single data center (lots of synchronous network IO before operations are done)
Used internally at Google, not directly available to customers
Ideas in this paper aren’t novel, but the fact that this system is used in the real world at a relatively large scale makes it important nonetheless.
GFS doesn’t provide strong consistency; this idea was “heretical” at the time in academic circles. Google was able to get away with this because:
It controlled all clients, so it wasn’t unreasonable to have the clients do more work (checksumming, being careful about record boundaries)
GFS' use cases could tolerate stale results - search, etc.
The client application performs IO in terms of files and byte offsets within the entire (potentially very large) file. The GFS client library and the master translate that to a chunked view.
When the master reboots, (for each chunk) it must wait until it knows about at least one chunkserver with a version for that chunk >= the master' stored version for that chunk.
When the master can’t reach a primary replica (for a chunk), it needs to wait until the lease expires before it designates a new primary
View of 3 replicas after a couple of appends (file is upside down):
D is a record that failed that the client chose not to retry
B is a record that failed that the client then (successfully) retried
Reading the entire replica in order with deduplication is not guaranteed to provide a consistent view (because of D’s scenario)
What concerns might be relevant if we were to “upgrade” GFS to a strongly consistent system?
Primaries perform deduplication
Secondary drift shouldn’t be allowed; secondaries should either stay in sync or be removed from contention if they can’t comply
Multiple phases so partial writes aren’t necessarily persisted (2PC)
A newly designated primary may need to catch up before accepting requests (if replication is async, presumably)
Master failover isn’t automatic, and this proved to be a problem