What Every Programmer Should Know About SSDs
https://databasearchitects.blogspot.com/2021/06/what-every-programmer-should-know-about.html?m=1
SSDs are more complicated and their performance behavior can appear quite mysterious if one simply thinks of them as fast disks.
-
SSDs are 100x faster for reads (100µs vs. 10ms), writes are even faster (10µs)
-
SSD storage medium is a flash chip; dozens of FCs, concurrent reads
-
Writes are split across FCs, and a hardware prefetcher keeps multiple FCs busy during a sequential read
- How is the data split across FCs? Striped or simply split in order?
-
Multiple FCs must be kept busy “manually” during random reads by issuing as many random reads as there are FCs
- The post implies that
libaio
andio_uring
handle this automatically, but: - Is this support customized based on the number of FCs on a specific SSD?
- The post implies that
-
SSDs have a volatile write cache that makes writes appear fast, but an actual persistent write is as slow as 1ms
- Server-grade SSDs provide battery persistence for the write cache so a flush isn’t strictly required
- Is Linux smart about this or do writes go to RAM first before the on-SSD cache?
-
Writes can be parallelized across FCs to keep overall throughput high, but writes don’t parallelize as well as reads:
because a write occupies a flash chip 10 times longer than a read, writes cause significant tail latencies for reads to the same flash chip.
-
Pages cannot be overwritten; new pages are appended to blocks that were erased. Blocks contain hundreds of pages, so it isn’t tenable to erase an entire block whenever a page needs to be rewritten.
- SSDs handle overwrites/rewrites by writing a new version of the page to a new location/block, and storing a mapping from logical address -> physical address
- A garbage collector erases blocks when no erased blocks exist; orphaned pages in the blocks being erased disappear, and all other pages are rewritten into the beginning of the block, (ideally) leaving space for more pages to be appended.
- As an example, here’s the before state:
- And after Block 0 is erased to make room for new pages:
-
Ergo, write amplification - P0 is rewritten even though there isn’t a logical reason to do so