Kafka’s New Architecture

  • Kafka’s evolution
    • Single machine
    • Replication
    • Compaction
    • Exactly once (?)
  • Design goals
    • Elasticity
    • Pluggability
  • Optimizing Kafka is generally a function of the underlying hardware, not workload.
  • Something about Kafka being fast on SSDs. I was given to understand that Kafka’s sweet spot is on spinning disks because it favours high-throughput batch reads. Is this not true anymore?
  • Two new proposals in review
    • Tiered storage: store recent data in Kafka, and move cold data to another storage system. PoCs for HDFS and S3. This makes broker additions a lot faster because a lot less data needs to be copied to the new node.
    • Remove zookeeper: Use Kafka’s replicated log (via Raft, I think?) to store metadata instead of a separate system. This should allow scaling (theoretically) to tens of millions of partitions.
Edit