In part 3 of our System Design series, we’re tackling caching and load balancing — the unsung heroes of performance. Without them, systems crumble under scale.
We’ll cover:
- Caching – App/DB/CDN; write-through/write-back, TTLs
- Cache Invalidation – TTLs, versioning, stampede protection
- Load Balancing – L4/L7, round-robin, least-connections, hashing
1. Caching
TL;DR: Caching is your first lever for scale. Use it everywhere, but know the trade-offs.
- App cache: In-memory (Redis, Memcached). Ultra-fast but volatile.
- DB cache: Query or object cache to offload hot queries.
- CDN cache: Push static assets near users.
Strategies:
- Write-through: Write to cache + DB simultaneously (safe, consistent, slower writes)
- Write-back: Write to cache first, sync to DB later (fast, risky if cache crashes)
- TTL (Time To Live): Expire stale data automatically
👉 Example: A news homepage caches top stories for 30s — thousands of requests saved.
👉 Interview tie-in: “How would you scale a read-heavy service?” — caching is the first answer.
2. Cache Invalidation
TL;DR: The hardest part of caching isn’t caching — it’s invalidation.
- TTL: Safe default, but may serve stale data.
-
Versioning: Change cache key when data updates (e.g.,
user:v2:123
) - Stampede protection: Use locking or request coalescing so multiple clients don’t hammer the DB when cache expires.
👉 Example: If 1M users refresh when a cache expires, that’s a cache stampede. Use jittered TTLs or async refresh.
👉 Interview tie-in: They’ll ask “What’s the hardest part about caching?” — answer: invalidation and consistency.
3. Load Balancing
TL;DR: Load balancers spread requests across servers and hide failures.
- L4 (Transport): Balances based on IP/port. Simple, fast.
- L7 (Application): Smarter — routes based on headers, cookies, paths.
Algorithms:
- Round Robin: Even distribution
- Least Connections: Send to the server with fewest active requests
- Hashing: Sticky sessions (e.g., same user → same server)
👉 Example: E-commerce app uses L7 LB to route /images
→ CDN, /checkout
→ payment cluster.
👉 Interview tie-in: “How do you handle uneven traffic across servers?” — least-connections or weighted load balancing.
✅ Takeaways
- Cache where it hurts most: hot queries, static assets, read-heavy endpoints
- Invalidation is the real challenge; plan strategies upfront
- Load balancing is critical for fairness, resilience, and routing logic
💡 Practice Question:
“Design the caching strategy for a Twitter timeline. How would you avoid cache stampede during trending events?”