Rust at speed — building a fast concurrent database
This was a very good use of ~50 minutes.
-
Went over a couple of reasons Rust is possibly better than Go; two that resonate with me:
- The type system, especially in the prevention of the
result, err := (...)
andif err != nil
idioms. - Documentation is consistent, interlinked (even for third-party crates), and supported by the language as a fairly first-class entity.
- The type system, especially in the prevention of the
-
Noria is a (research) database that plays with the idea of moving an application cache into the database and caching on write.
- This cache requires fast read-write access.
- Mutexes are too slow.
RwLock
s are better, but for a fast critical section like this one (read a value from a hash map), the actual process of obtaining the lock is something of a bottleneck.
-
Use
unsafe
to build a lock-free data structure:- Maintain two caches (trade-off space for perf); at a given point in time, each cache is exclusively used by either readers or writers.
- This is done by maintaining two separate unsafe pointers to these caches.
- Once we have a critical mass of writes that can be flushed, switch the pointers so they point at the opposite cache.
- The readers see all new writes, and the writers re-apply their changes to the new cache
- Does this imply that writers have to buffer writes internally until they can write to both caches?
- What happens in the case of multiple simultaneous writers? This explanation only covered a single writer.
- The paper should answer these questions.
- This pointer switch requires
unsafe
because Rust’s ownership system will not allow it. - This data structure performs great in micro-benchmarks and a simulated load test based on lobste.rs data:
-
This cache data structure is available as a crate: https://crates.io/crates/evmap
-
Some other general advice/tips for Rust:
- If you feel like you’re fighting the language, pause and look for a better way.
- A graph is extremely difficult to represent in the regular C/C++ structure without using
unsafe
- Using
Rc
is one way to solve this (I came to effectively the same conclusion yesterday) - Take this a step further by putting all graph nodes in a
Vec
and using vector indices in the actual graph data structure.- This is possibly unsuitable for a graph that requires heavy mutation.
- A graph is extremely difficult to represent in the regular C/C++ structure without using
- A lint flag exists that fails compilation if any public module/method/API is undocumented. (I couldn’t find this)
clippy
is a static analyzer that gives you advice about your code.- Rust makes it easy to split out your code into multiple crates; lean on this! (this feature is called “workspaces”)
- If you feel like you’re fighting the language, pause and look for a better way.