Dgraph - Graph Database

https://dgraph.io/paper

Two kinds of nodes:
- Zeroes are administrative nodes that handle things like:
  - Handing out monotonic timestamps
  - Form consensus on metadata about the rest of the cluster
  - Handle rebalances/reshards
- Alphas store data
Nodes are divided into groups; one group for zeroes, and 1..N groups for alphas.
- Each group forms a Raft cluster and can have 1, 3, or 5 replicas.
Primitive data structure is a triple, representing either:
- (subject, predicate, object) → (0xab, <follower_of>, 0xff)
- (subject, predicate, value) → (0xab, <name>, "John")
All subjects are given a globally unique uid
Data is sharded by relationship/predicate rather than by entity.
- For example, if your data store contains information about which user follows which other user, users are entities, and “follows” is the relationship
- Dgraph places all data for the “follows” relationship (across all entities) in a single group.
- A group can contain more than one relationship.
- Critically, what happens when a relationship can’t fit on a single node anymore?
Dgraph uses Badger (in-house LSM KV store) to persist data; metadata is stored in Raft.
- The paper glosses over this, but how does Dgraph ensure that Badger and Raft are in sync?
Uses GraphQL for queries, over GRPC/Protobuf or vanilla HTTP.
All records with the same subject and predicate are grouped into the same Badger key:
- The Badger value is a sorted list of uids/values, and can be split if a single KV-pair becomes too large.
Dgraph can execute rebalances by marking shards read-only; are writes rejected during this process?
Supports lock-free transactions via MVCC
- The entire Lock-Free High Availability Transaction Processing was impenetrable to me
- Lots of jargon and not much elucidation
Reads are/can be linerizable.
On the whole this was a quick, crisp paper, but it felt like a lot of details were omitted or glossed over, particularly in the second half.