Skip to Content
DocsArchitectureSystem DesignFrom Single Server to Distributed Systems

Scale from single server to distributed system

Separate Web Server from Database Server

Allows them to be scaled independently.

┌─────────────┐ ┌───────────────┐ │ Web Server │◄──────►│ Database │ │ (Nginx) │ │ (PostgreSQL) │ └─────────────┘ └───────────────┘ │ HTTP ┌─────────────┐ │ Clients │ └─────────────┘

Non-relational databases might be the right choice if:

  • your application requires super-low latency
  • your data are unstructured, and you do not have any relational data
  • you only need to serialize and deserialize data (JSON, YAML, XML etc.)
  • you need to store a massive amount of data

Scale the Web Servers

Vertical scaling (scale up): adding more power (CPU, RAM etc.) to your servers

  • there are hard limits
  • there is no failover and redundancy (single point of failure)
  • the overall cost of vertical scaling is high

Horizontal scaling (scale out): adding more servers into your pool of resources

┌───────────────┐ │ Load Balancer │ └───────┬───────┘ ┌───────────────┼───────────────┐ ▼ ▼ ▼ ┌───────────┐ ┌───────────┐ ┌───────────┐ │ Web App │ │ Web App │ │ Web App │ │ Server 1 │ │ Server 2 │ │ Server 3 │ └───────────┘ └───────────┘ └───────────┘
  • Stateful:

    • A stateful server remembers client data (state) from one request to the next
    • Every request from the same client must be routed the same server, this can be done with sticky sessions in most load balancers
    • But adding or removing servers is much more difficult; it is also challenging to handle server failures.
  • Stateless:

    • A stateless server keeps no state information
    • A stateless system is simpler, more robust and scalable.

Load balancer: distributes incoming traffic among web servers that are defined in a load-balanced set.

┌─────────────────────────────────────┐ │ Load Balancer │ │ ┌─────────────────────────────┐ │ Incoming ──────────────┼─►│ Health Checks + Algorithms │───┼───► Server Pool Requests │ │ (RR, Least Conn, IP Hash) │ │ │ └─────────────────────────────┘ │ └─────────────────────────────────────┘ Layer 4 (Transport) │ Layer 7 (Application) ─────────────────────────┼─────────────────────────────── Routes by: IP + Port │ Routes by: HTTP headers, Faster, less flexible │ cookies, URL paths │ More intelligent routing
  • Layer 4 (Transport Layer): Routes based on IP address and TCP/UDP port; faster but less flexible
  • Layer 7 (Application Layer): Routes based on HTTP headers, cookies, URL paths; more intelligent routing
  • Load balancing algorithms:
    • Round Robin
    • Weighted Round Robin
    • Least Connections
    • Weighted Least Connections
    • Resource-based/Adaptive
    • IP Hash (for session affinity)
  • Health checks: Regularly probe backend servers to detect failures and route traffic only to healthy instances
  • Session affinity trade-offs: Sticky sessions simplify stateful apps but reduce load distribution efficiency and complicate failover

API Gateway

A centralized entry point for all client requests that provides:

┌─────────┐ ┌─────────────────────────────────────────┐ │ Mobile │────►│ │ ┌──────────────┐ └─────────┘ │ API Gateway │────►│ User Service │ │ ┌────────────────────────────────────┐ │ └──────────────┘ ┌─────────┐ │ │ • Authentication • Rate Limit │ │ │ Web │────►│ │ • Routing • Caching │ │────►┌──────────────┐ └─────────┘ │ │ • Load Balancing • Logging │ │ │Order Service │ │ └────────────────────────────────────┘ │ └──────────────┘ ┌─────────┐ │ │ │ Partner │────►│ │────►┌────────────────┐ └─────────┘ └─────────────────────────────────────────┘ │Product Service │ └────────────────┘
  • Request routing: Routes requests to appropriate backend services
  • Authentication & Authorization: Centralized security enforcement (JWT validation, API keys, OAuth)
  • Rate limiting: Protects services from abuse and ensures fair usage
  • Request/Response transformation: Protocol translation, payload modification
  • Caching: Response caching to reduce backend load
  • Load balancing: Distributes traffic across service instances
  • Logging & Monitoring: Centralized request logging and metrics collection

Popular options: Kong, AWS API Gateway, Nginx, Envoy, Spring Cloud Gateway

Rate Limiting & Throttling

Protects your services from being overwhelmed by too many requests.

Token Bucket Algorithm: ═══════════════════════════════════════════════════════════ Tokens added ┌──────────────────────┐ at fixed rate ──────►│ Token Bucket │ │ [●][●][●][●][ ] │ ◄── Max capacity └──────────┬───────────┘ Request arrives ─────────────────┼───────────────────── ┌───── Has Token? ─────┐ │ │ YES NO │ │ ▼ ▼ ┌──────────┐ ┌───────────┐ │ Process │ │ Reject │ │ Request │ │ (429) │ └──────────┘ └───────────┘

Common algorithms:

  • Token Bucket: Tokens are added at a fixed rate; requests consume tokens
  • Leaky Bucket: Requests are processed at a constant rate; excess requests queue or drop
  • Fixed Window: Limits requests within fixed time intervals
  • Sliding Window Log: Tracks timestamps of requests; more accurate but memory-intensive
  • Sliding Window Counter: Hybrid approach balancing accuracy and efficiency

Implementation levels:

  • Per-user/API key rate limiting
  • Per-IP rate limiting
  • Global rate limiting
  • Distributed rate limiting (using Redis or similar)

Response handling: Return HTTP 429 (Too Many Requests) with Retry-After header

Circuit Breaker Pattern

Prevents cascading failures when a downstream service is unhealthy.

┌─────────────────────────────────────────────┐ │ Circuit Breaker States │ └─────────────────────────────────────────────┘ ┌──────────────────────────────────────────────────────────┐ │ │ ▼ │ ┌─────────┐ Failures exceed ┌─────────┐ │ │ CLOSED │ ──────threshold──────────► │ OPEN │ │ │ │ │ │ │ │ Normal │ │ Fail │ │ │Operation│ ◄────Success────────┐ │ Fast │ │ └─────────┘ │ └────┬────┘ │ ▲ │ │ │ │ │ Timeout │ │ │ expires │ │ ┌────┴────┐ │ │ │ │HALF-OPEN│◄─────┘ │ │ │ │ │ │ │ Test │──────Failures────────────┘ └─────────────────────│Requests │ Success └─────────┘

States:

  • Closed: Normal operation; requests pass through
  • Open: Service is failing; requests fail immediately without calling the service
  • Half-Open: After a timeout, allows limited requests to test if service has recovered

Configuration parameters:

  • Failure threshold: Number of failures before opening the circuit
  • Success threshold: Number of successes in half-open state before closing
  • Timeout: Duration to wait before transitioning from open to half-open

Benefits:

  • Fail fast: Reduces latency when downstream is unhealthy
  • Prevents resource exhaustion from waiting on failing services
  • Gives downstream services time to recover

Popular implementations: Resilience4j, Hystrix (deprecated), Polly (.NET)

Scale the Database

Database Replication: one writer and many readers (replicas)

┌──────────────────┐ Write ───────►│ Primary (RW) │ │ (Writer) │ └────────┬─────────┘ ┌───────────────────┼───────────────────┐ │ Replication │ │ Replication ▼ ▼ ▼ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ Replica 1 │ │ Replica 2 │ │ Replica 3 │ │ (Read) │ │ (Read) │ │ (Read) │ └─────────────┘ └─────────────┘ └─────────────┘ ▲ ▲ ▲ │ │ │ └───────────────────┴───────────────────┘ Read Queries
  • The writer node only supports write operations, the reader nodes only support read operations
  • If a reader node goes offline, read operations are redirected to other healthy reader nodes; a new read node will spin up and replace the old one.
  • If the writer node goes offline, a reader node will be promoted to be the new writer; all write operations will be redirected to the new writer node; a new reader node will replace the old one for data replication immediately.
  • Replication lag: Read replicas may be slightly behind the primary; design for eventual consistency or use synchronous replication for critical reads
  • Advantages:
    • Better performance: all read operations are distributed across reader nodes.
    • Reliability: data is replicated across multiple locations
    • High availability: data is replicated across multiple servers

Connection Pooling: Maintains a pool of reusable database connections

┌─────────────────┐ ┌────────────────────────┐ ┌──────────┐ │ Application │ │ Connection Pool │ │ Database │ │ │ │ ┌──┐ ┌──┐ ┌──┐ ┌──┐ │ │ │ │ Request 1 ─────┼─────►│ │C1│ │C2│ │C3│ │C4│ │◄────►│ │ │ Request 2 ─────┼─────►│ └──┘ └──┘ └──┘ └──┘ │ │ │ │ Request 3 ─────┼─────►│ (Reusable Conns) │ │ │ │ │ └────────────────────────┘ └──────────┘ └─────────────────┘ Without pool: New connection per request (slow) With pool: Reuse existing connections (fast)
  • Reduces overhead of establishing new connections
  • Prevents connection exhaustion under high load
  • Popular tools: HikariCP (Java), PgBouncer (PostgreSQL), ProxySQL (MySQL)
  • Key configurations: minimum/maximum pool size, connection timeout, idle timeout

Query Optimization:

  • Proper indexing strategy (B-tree, hash, composite indexes)
  • Table partitioning for large tables
  • Query analysis with EXPLAIN/EXPLAIN ANALYZE
  • Avoid N+1 query problems

Horizontal Scaling (sharding): Sharding separates large databases into smaller, more easily managed parts called shards. Each shard shares the same schema, though the actual data on each shard is unique to the shard.

┌────────────────┐ │ Shard Router │ └───────┬────────┘ ┌───────────────────┼───────────────────┐ │ │ │ ▼ ▼ ▼ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ Shard 1 │ │ Shard 2 │ │ Shard 3 │ │ Users A-H │ │ Users I-P │ │ Users Q-Z │ │ │ │ │ │ │ │ Same schema │ │ Same schema │ │ Same schema │ │ Unique data │ │ Unique data │ │ Unique data │ └─────────────┘ └─────────────┘ └─────────────┘ Sharding Key: user_id (first letter)

When choosing a sharding key, one of the most important criteria is to choose a key that can evenly distribute data. Challenges:

  • Resharding data: when a single shard could no longer hold more data due to rapid growth, we need to reshard and move data around. Consistent hashing is a commonly used technique.
  • Celebrity problem: also called the hotspot key problem; Excessive access to a specific shard could cause server overload, we may need to allocate a shard for each celebrity, or each shard might even require further partition.
  • Join and de-normalization: it is hard to perform join operations across database shards. de-normalization is a common workaround so that queries can be performed in a single table.
  • Cross-shard transactions: Distributed transactions are complex; consider saga pattern or eventual consistency

Improve Load/Response Time

Cache:

  • Stores the result of expensive responses or frequently accessed data in memory so that subsequent requests are served more quickly
  • Benefits: better system performance; ability to reduce database workloads; ability to scale the cache tier independently

Caching strategies:

Cache-Aside (Lazy Loading) Write-Through ══════════════════════════ ══════════════ ┌─────┐ 1.Read ┌───────┐ ┌─────┐ 1.Write ┌───────┐ │ App │─────────►│ Cache │ │ App │────────►│ Cache │ └──┬──┘ └───┬───┘ └─────┘ └───┬───┘ │ │ │ │ 2.Miss? │ 2.Write │ │ │ ▼ ▼ ▼ ┌──────┐ ◄──── 3.Load from DB ┌──────────┐ │ DB │ 4.Update cache │ DB │ └──────┘ └──────────┘ Write-Behind (Write-Back) Read-Through ═════════════════════════ ════════════ ┌─────┐ 1.Write ┌───────┐ ┌─────┐ Read ┌───────┐ │ App │────────►│ Cache │ │ App │───────►│ Cache │ └─────┘ └───┬───┘ └─────┘ └───┬───┘ │ │ 2.Async Cache loads Queue Write from DB on miss │ │ ▼ ▼ ┌──────────┐ ┌──────────┐ │ DB │ │ DB │ └──────────┘ └──────────┘
  • Cache-aside (Lazy Loading): Application checks cache first; on miss, loads from DB and populates cache
  • Write-through: Writes go to cache and database simultaneously; ensures consistency
  • Write-behind (Write-back): Writes to cache first, then asynchronously to database; higher performance but risk of data loss
  • Read-through: Cache sits between application and database; automatically loads on miss

Considerations:

  • Decide when to use cache: for data that is read frequently but modified infrequently
  • Consistency: keep the data in database and cache in sync
  • Mitigating failures: use multiple cache servers across different data centers are recommended to avoid single point of failure
  • Expiration policy: add TTL to the cached data (too short will cause the system to reload data from the database too frequently; too long will fetch stale data.)
  • Eviction policy: LRU (Least Recently Used), LFU (Least Frequently Used), FIFO (First In First Out)

Cache stampede (Thundering Herd): When a popular cache key expires, many requests simultaneously hit the database.

  • Solutions:
    • Locking: Only one request fetches from DB while others wait
    • Early expiration: Proactively refresh before TTL expires
    • Probabilistic early expiration: Randomly refresh before expiration
    • Never expire + background refresh

Distributed Cache:

  • Redis Cluster, Memcached for horizontal scaling
  • Consider: data partitioning, replication, failover strategies

CDN (Content Delivery Network):

Origin Server (Your Server) ┌────────────────────┼────────────────────┐ │ │ │ ▼ ▼ ▼ ┌───────────┐ ┌───────────┐ ┌───────────┐ │ CDN Edge │ │ CDN Edge │ │ CDN Edge │ │ (Asia) │ │ (Europe) │ │(Americas) │ └─────┬─────┘ └─────┬─────┘ └─────┬─────┘ │ │ │ ┌──────┴──────┐ ┌──────┴──────┐ ┌──────┴──────┐ ▼ ▼ ▼ ▼ ▼ ▼ [User] [User] [User] [User] [User] [User] Tokyo Singapore London Paris NYC São Paulo Static content (images, CSS, JS) served from nearest edge location
  • A network of geographically dispersed servers used to deliver static content like images, videos, CSS, JavaScript files etc.
  • Considerations:
    • Cost: CDN is charged for data transfers in and out, so only cache frequently used assets.
    • Setting an appropriate cache expiry: if too short, content might need frequent refresh; if too long, it can cause repeat reloading of content from origin servers to CDN.
    • Fallback: if CDN is temporarily down, clients should be able to detect the problem and request resources from the origin.
    • Invalidating files: expire assets by either invalidating the CDN object by provider API or using object versions.

Data Centers

To improve availability and provide a better user experience across wider geographical areas, supporting multiple data centers is crucial.

┌─────────────┐ │ GeoDNS │ │ Router │ └──────┬──────┘ ┌──────────────────────┼──────────────────────┐ │ │ │ ▼ ▼ ▼ ┌────────────────────┐ ┌────────────────────┐ ┌────────────────────┐ │ Data Center US │ │ Data Center EU │ │ Data Center APAC │ │ ┌────┐ ┌────┐ │ │ ┌────┐ ┌────┐ │ │ ┌────┐ ┌────┐ │ │ │ LB │ │ DB │ │ │ │ LB │ │ DB │ │ │ │ LB │ │ DB │ │ │ └────┘ └────┘ │ │ └────┘ └────┘ │ │ └────┘ └────┘ │ │ ┌────┐ ┌────┐ │ │ ┌────┐ ┌────┐ │ │ ┌────┐ ┌────┐ │ │ │Apps│ │Cache│ │ │ │Apps│ │Cache│ │ │ │Apps│ │Cache│ │ │ └────┘ └────┘ │ │ └────┘ └────┘ │ │ └────┘ └────┘ │ └─────────┬──────────┘ └────────┬─────────┘ └─────────┬──────────┘ │ │ │ └─────────────────────┼─────────────────────┘ Data Replication

GeoDNS is a DNS service that allows domain names to be resolved to IP addresses based on the location of a user.

In the event of any significant data center outage, we direct all traffic to a healthy data center.

Multi data center considerations:

  • Traffic redirection: GeoDNS can be used to direct traffic to the nearest data center based on user’s location.
  • Data synchronization: In failover cases, traffic might be routed to a data center where data is unavailable. A common strategy is to replicate data across multiple data centers.
  • Test and deployment: Use automated deployment tools to keep services consistent through all the data centers.

Consistency trade-offs:

  • Strong consistency: All reads return the latest write; higher latency
  • Eventual consistency: Updates propagate asynchronously; lower latency but stale reads possible
  • Conflict resolution: Last-writer-wins, vector clocks, CRDTs

Service Discovery

In dynamic environments, services need to discover each other automatically.

Client-Side Discovery Server-Side Discovery ════════════════════ ═════════════════════ ┌────────┐ ┌────────┐ │ Client │ │ Client │ └───┬────┘ └───┬────┘ │ │ │ 1.Query │ 1.Request ▼ ▼ ┌──────────────┐ ┌──────────────┐ │ Service │ │Load Balancer │ │ Registry │ │ / Router │ └──────────────┘ └──────┬───────┘ │ │ │ 2.Return │ 2.Query │ instances ▼ ▼ ┌──────────────┐ ┌────────┐ │ Service │ │ Client │──► 3.Direct call │ Registry │ └────────┘ to instance └──────────────┘ │ 3.Return ┌──────────────┐ │Load Balancer │──► 4.Route └──────────────┘ to instance

Client-side discovery: Client queries a service registry and chooses an instance

  • Examples: Netflix Eureka, Consul

Server-side discovery: Client requests go through a load balancer that queries the registry

  • Examples: Kubernetes Service, AWS ELB

Service registry: Database of available service instances and their locations

  • Must be highly available
  • Health checks to remove unhealthy instances
  • Examples: Consul, etcd, Zookeeper, Kubernetes DNS

Container Orchestration

Automates deployment, scaling, and management of containerized applications.

┌────────────────────────────────────────────────────────────────────┐ │ Kubernetes Cluster │ │ ┌──────────────────────────────────────────────────────────────┐ │ │ │ Control Plane │ │ │ │ ┌──────────┐ ┌───────────┐ ┌───────────┐ ┌───────────────┐ │ │ │ │ │API Server│ │ Scheduler │ │Controller │ │ etcd │ │ │ │ │ └──────────┘ └───────────┘ └───────────┘ └───────────────┘ │ │ │ └──────────────────────────────────────────────────────────────┘ │ │ │ │ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ │ │ Worker Node │ │ Worker Node │ │ Worker Node │ │ │ │ ┌───┐ ┌───┐ │ │ ┌───┐ ┌───┐ │ │ ┌───┐ ┌───┐ │ │ │ │ │Pod│ │Pod│ │ │ │Pod│ │Pod│ │ │ │Pod│ │Pod│ │ │ │ │ └───┘ └───┘ │ │ └───┘ └───┘ │ │ └───┘ └───┘ │ │ │ │ ┌───┐ ┌───┐ │ │ ┌───┐ │ │ ┌───┐ ┌───┐ │ │ │ │ │Pod│ │Pod│ │ │ │Pod│ │ │ │Pod│ │Pod│ │ │ │ │ └───┘ └───┘ │ │ └───┘ │ │ └───┘ └───┘ │ │ │ └─────────────────┘ └─────────────────┘ └─────────────────┘ │ └────────────────────────────────────────────────────────────────────┘ │ │ Auto-scaling Self-healing (HPA adds pods) (Restart failed pods)

Key capabilities:

  • Auto-scaling: Horizontal Pod Autoscaler (HPA) based on CPU/memory/custom metrics
  • Self-healing: Automatic restart of failed containers
  • Service discovery & load balancing: Built-in DNS and load balancing
  • Rolling updates & rollbacks: Zero-downtime deployments
  • Resource management: CPU/memory limits and requests

Scaling strategies:

  • Horizontal scaling: Add more pod replicas
  • Vertical scaling: Increase resources per pod
  • Cluster autoscaling: Add/remove nodes based on demand

Popular platforms: Kubernetes, Docker Swarm, Amazon ECS, Google Cloud Run

Message Queue

A message queue is a durable component that supports asynchronous communication.

┌──────────────┐ ┌──────────────┐ │ Producer │ │ Consumer │ │ (Service A) │ │ (Service B) │ └──────┬───────┘ └──────▲───────┘ │ │ │ 1.Publish 4.Process │ │ ▼ │ ┌─────────────────────────────────────────────────────────────────────┐ │ Message Queue │ │ ┌───────────────────────────────────────────────────────────────┐ │ │ │ [Msg1] [Msg2] [Msg3] [Msg4] [Msg5] ───────────────────────► │ │ │ └───────────────────────────────────────────────────────────────┘ │ │ │ │ Features: │ │ • Durability (messages persisted) │ │ • Ordering (FIFO or partitioned) │ │ • Acknowledgment (at-least-once/exactly-once) │ │ │ │ Dead Letter Queue (DLQ): │ │ ┌─────────────────────────┐ │ │ │ [Failed1] [Failed2] │ ◄── Messages that couldn't be │ │ └─────────────────────────┘ processed after retries │ └─────────────────────────────────────────────────────────────────────┘

Decoupling makes the message queue a preferred architecture for building a scalable and reliable application.

Delivery semantics:

  • At-most-once: Messages may be lost but never redelivered
  • At-least-once: Messages are never lost but may be redelivered; consumers must be idempotent
  • Exactly-once: Each message is delivered exactly once; most difficult to achieve

Dead Letter Queue (DLQ): Stores messages that could not be processed

  • Allows inspection and debugging of failed messages
  • Prevents poison messages from blocking the queue
  • Configure retry policies before moving to DLQ

Backpressure handling:

  • Rate limiting on producers
  • Queue size limits with rejection policies
  • Consumer scaling based on queue depth

Best practices:

  • Idempotent consumers for at-least-once delivery
  • Message ordering considerations (partitioning by key)
  • Poison message handling with retry limits

Popular options: Apache Kafka, RabbitMQ, Amazon SQS, Redis Streams

Observability: Logging, Metrics, Tracing

┌─────────────────────────────────────────────────────────────────────┐ │ Observability Stack │ ├─────────────────────────────────────────────────────────────────────┤ │ │ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ │ │ Logging │ │ Metrics │ │ Tracing │ │ │ │ │ │ │ │ │ │ │ │ • ELK Stack│ │ •Prometheus│ │ • Jaeger │ │ │ │ • Splunk │ │ • Grafana │ │ • Zipkin │ │ │ │ • Datadog │ │ • Datadog │ │ •OpenTelem │ │ │ └─────┬──────┘ └─────┬──────┘ └─────┬──────┘ │ │ │ │ │ │ │ └───────────────────┼───────────────────┘ │ │ │ │ │ ┌──────▼──────┐ │ │ │ Alerting │ │ │ │ PagerDuty, │ │ │ │ OpsGenie │ │ │ └─────────────┘ │ └─────────────────────────────────────────────────────────────────────┘ Distributed Trace Example: ═══════════════════════════ Request ──► API Gateway ──► User Service ──► Database │ │ │ │ │ [Span 1] │ [Span 2] │ [Span 3] │ │◄────────────┴───────────────┴──────────────┤ │ Total Trace Duration │

Logging:

  • Monitoring error logs is important because it helps to identify errors and problems in the system
  • Centralized logging: Aggregate logs from all services (ELK Stack, Splunk, Datadog)
  • Structured logging: JSON format for easier parsing and querying
  • Log levels: DEBUG, INFO, WARN, ERROR; configure appropriately per environment
  • Correlation IDs: Track requests across services

Metrics:

  • Collecting different types of metrics helps gain business insights and understand the health status of the system
  • Types: counters, gauges, histograms, summaries
  • Key metrics: latency (p50, p95, p99), error rate, throughput, saturation
  • RED method: Rate, Errors, Duration
  • USE method: Utilization, Saturation, Errors
  • Tools: Prometheus, Grafana, Datadog, CloudWatch

Distributed Tracing:

  • Track requests as they flow through multiple services
  • Identify bottlenecks and latency issues
  • Tools: Jaeger, Zipkin, OpenTelemetry, AWS X-Ray

Alerting:

  • Set up alerts for critical thresholds
  • Avoid alert fatigue with proper thresholds and grouping
  • Runbooks for common issues

Automation: CI (Continuous Integration) and CD (Continuous Deployment).

  • Automated testing at every stage
  • Infrastructure as Code (Terraform, Pulumi)
  • GitOps for declarative deployments

Summary Cheat Sheet

Application Scaling Journey ═══════════════════════════════════════════════════════════════════ Single Server Separate Concerns Scale Web Tier ┌───────────┐ ┌─────┐ ┌────┐ ┌─────┐ │ Web + DB │ ──► │ Web │ │ DB │ ──► │ LB │ └───────────┘ └─────┘ └────┘ └──┬──┘ ┌───┼───┐ [W] [W] [W] Add Cache Add CDN Multiple DCs ┌───────┐ ┌───────┐ ┌────┐ ┌────┐ │ Redis │ │ CDN │ │ DC1│ │ DC2│ └───────┘ └───────┘ └────┘ └────┘ (GeoDNS) Scale Database Message Queue Microservices ┌────┐ ┌─────────┐ ┌───┐ ┌───┐ │ RW │ │ Kafka │ │Svc│ │Svc│ └─┬──┘ └─────────┘ └───┘ └───┘ ┌──┴──┐ [R] [R]
StageKey Actions
Separate concernsSplit web server from database
Scale web tierAdd load balancer, horizontal scaling, stateless design
Add API GatewayCentralized routing, auth, rate limiting
Protect servicesRate limiting, circuit breakers
Scale databaseRead replicas, connection pooling, sharding
Improve latencyCaching (Redis), CDN for static assets
Go multi-regionData centers, GeoDNS, data replication
Dynamic infrastructureService discovery, container orchestration
Decouple servicesMessage queues with proper delivery semantics
Observe everythingLogging, metrics, tracing, alerting
Last updated on