Rust at speed — building a fast concurrent database

This was a very good use of ~50 minutes.

  • Went over a couple of reasons Rust is possibly better than Go; two that resonate with me:

    • The type system, especially in the prevention of the result, err := (...) and if err != nil idioms.
    • Documentation is consistent, interlinked (even for third-party crates), and supported by the language as a fairly first-class entity.
  • Noria is a (research) database that plays with the idea of moving an application cache into the database and caching on write.

    • This cache requires fast read-write access.
    • Mutexes are too slow.
    • RwLocks are better, but for a fast critical section like this one (read a value from a hash map), the actual process of obtaining the lock is something of a bottleneck.
  • Use unsafe to build a lock-free data structure: 2020-06-06.00.38.52.png

    • Maintain two caches (trade-off space for perf); at a given point in time, each cache is exclusively used by either readers or writers.
    • This is done by maintaining two separate unsafe pointers to these caches.
    • Once we have a critical mass of writes that can be flushed, switch the pointers so they point at the opposite cache.
    • The readers see all new writes, and the writers re-apply their changes to the new cache
      • Does this imply that writers have to buffer writes internally until they can write to both caches?
      • What happens in the case of multiple simultaneous writers? This explanation only covered a single writer.
      • The paper should answer these questions.
    • This pointer switch requires unsafe because Rust’s ownership system will not allow it.
    • This data structure performs great in micro-benchmarks and a simulated load test based on lobste.rs data:
  • This cache data structure is available as a crate: https://crates.io/crates/evmap

  • Some other general advice/tips for Rust:

    • If you feel like you’re fighting the language, pause and look for a better way.
      • A graph is extremely difficult to represent in the regular C/C++ structure without using unsafe
      • Using Rc is one way to solve this (I came to effectively the same conclusion yesterday)
      • Take this a step further by putting all graph nodes in a Vec and using vector indices in the actual graph data structure.
        • This is possibly unsuitable for a graph that requires heavy mutation.
    • A lint flag exists that fails compilation if any public module/method/API is undocumented. (I couldn’t find this)
    • clippy is a static analyzer that gives you advice about your code.
    • Rust makes it easy to split out your code into multiple crates; lean on this! (this feature is called “workspaces”)
Edit