The 5-hour CDN

https://fly.io/blog/the-5-hour-content-delivery-network/

I’m going to talk about what you might come up with if you spend the next five hours building a CDN.

We have choices. We could use Varnish (scripting! edge side includes! PHK blog posts!). We could use Apache Traffic Server (being the only new team this year to use ATS!). Or we could use NGINX (we’re already running it!). The only certainty is that you’ll come to hate whichever one you pick. Try them all and pick the one you hate the least.

(We kid! Netlify is built on ATS. Cloudflare uses NGINX. Fastly uses Varnish.)

Three ways to pick a nearby server:

  1. Anycast/BGP
  2. DNS
  3. Ping multiple servers

From fly.io’s docs:

BGP Anycast

We broadcast and accept traffic from ranges of IP addresses (both IPv4 and IPv6) in all our datacenters. When we receive a connection on one of those IPs, we match it back to an active customer application, and then proxy the TCP connection to the closest available microVM.

Failover is tricky; you need to cover servers going down but also servers/DCs being slow. External monitoring may be required, potentially from the users' PoV as well.

Multiple servers in a region/DC don’t have to maintain individual, isolated caches. nginx supports hash-based caching, so each server in a DC is responsible for a shard of the overall cache. How does this adapt when on server in this “cluster” goes down?

400

Another proposed solution is to shard globally in layers, optimizing for the happy path while making cache misses somewhat slower.

400

Request Coalescing groups multiple similar requests, keeps them all waiting, and sends a single request to the origin.

Edit