In defence of swap - common misconceptions

https://chrisdown.name/2018/01/02/in-defence-of-swap.html

Swap provides a mechanism for reclaiming anonymous pages, and in at that sense brings them to the same level as file pages

Under no/low memory contention

With swap: We can choose to swap out rarely-used anonymous memory that may only be used during a small part of the process lifecycle, allowing us to use this memory to improve cache hit rate, or do other optimisations.

Without swap: We cannot swap out rarely-used anonymous memory, as it’s locked in memory. While this may not immediately present as a problem, on some workloads this may represent a non-trivial drop in performance due to stale, anonymous pages taking space away from more important use.

Under moderate/high memory contention

With swap: All memory types have an equal possibility of being reclaimed. This means we have more chance of being able to reclaim pages successfully – that is, we can reclaim pages that are not quickly faulted back in again (thrashing).

Without swap: Anonymous pages are locked into memory as they have nowhere to go. The chance of successful long-term page reclamation is lower, as we have only some types of memory eligible to be reclaimed at all. The risk of page thrashing is higher. The casual reader might think that this would still be better as it might avoid having to do disk I/O, but this isn’t true – we simply transfer the disk I/O of swapping to dropping hot page caches and dropping code segments we need soon.

Under temporary spikes in memory usage

With swap: We’re more resilient to temporary spikes, but in cases of severe memory starvation, the period from memory thrashing beginning to the OOM killer may be prolonged. We have more visibility into the instigators of memory pressure and can act on them more reasonably, and can perform a controlled intervention.

Without swap: The OOM killer is triggered more quickly as anonymous pages are locked into memory and cannot be reclaimed. We’re more likely to thrash on memory, but the time between thrashing and OOMing is reduced. Depending on your application, this may be better or worse. For example, a queue-based application may desire this quick transfer from thrashing to killing. That said, this is still too late to be really useful – the OOM killer is only invoked at moments of severe starvation, and relying on this method for such behaviour would be better replaced with more opportunistic killing of processes as memory contention is reached in the first place.