Scale from single server to distributed system
Separate Web Server from Database Server
Allows them to be scaled independently.
┌─────────────┐ ┌───────────────┐
│ Web Server │◄──────►│ Database │
│ (Nginx) │ │ (PostgreSQL) │
└─────────────┘ └───────────────┘
▲
│ HTTP
│
┌─────────────┐
│ Clients │
└─────────────┘Non-relational databases might be the right choice if:
- your application requires super-low latency
- your data are unstructured, and you do not have any relational data
- you only need to serialize and deserialize data (JSON, YAML, XML etc.)
- you need to store a massive amount of data
Scale the Web Servers
Vertical scaling (scale up): adding more power (CPU, RAM etc.) to your servers
- there are hard limits
- there is no failover and redundancy (single point of failure)
- the overall cost of vertical scaling is high
Horizontal scaling (scale out): adding more servers into your pool of resources
┌───────────────┐
│ Load Balancer │
└───────┬───────┘
┌───────────────┼───────────────┐
▼ ▼ ▼
┌───────────┐ ┌───────────┐ ┌───────────┐
│ Web App │ │ Web App │ │ Web App │
│ Server 1 │ │ Server 2 │ │ Server 3 │
└───────────┘ └───────────┘ └───────────┘-
Stateful:
- A stateful server remembers client data (state) from one request to the next
- Every request from the same client must be routed the same server, this can be done with sticky sessions in most load balancers
- But adding or removing servers is much more difficult; it is also challenging to handle server failures.
-
Stateless:
- A stateless server keeps no state information
- A stateless system is simpler, more robust and scalable.
Load balancer: distributes incoming traffic among web servers that are defined in a load-balanced set.
┌─────────────────────────────────────┐
│ Load Balancer │
│ ┌─────────────────────────────┐ │
Incoming ──────────────┼─►│ Health Checks + Algorithms │───┼───► Server Pool
Requests │ │ (RR, Least Conn, IP Hash) │ │
│ └─────────────────────────────┘ │
└─────────────────────────────────────┘
Layer 4 (Transport) │ Layer 7 (Application)
─────────────────────────┼───────────────────────────────
Routes by: IP + Port │ Routes by: HTTP headers,
Faster, less flexible │ cookies, URL paths
│ More intelligent routing- Layer 4 (Transport Layer): Routes based on IP address and TCP/UDP port; faster but less flexible
- Layer 7 (Application Layer): Routes based on HTTP headers, cookies, URL paths; more intelligent routing
- Load balancing algorithms:
- Round Robin
- Weighted Round Robin
- Least Connections
- Weighted Least Connections
- Resource-based/Adaptive
- IP Hash (for session affinity)
- Health checks: Regularly probe backend servers to detect failures and route traffic only to healthy instances
- Session affinity trade-offs: Sticky sessions simplify stateful apps but reduce load distribution efficiency and complicate failover
API Gateway
A centralized entry point for all client requests that provides:
┌─────────┐ ┌─────────────────────────────────────────┐
│ Mobile │────►│ │ ┌──────────────┐
└─────────┘ │ API Gateway │────►│ User Service │
│ ┌────────────────────────────────────┐ │ └──────────────┘
┌─────────┐ │ │ • Authentication • Rate Limit │ │
│ Web │────►│ │ • Routing • Caching │ │────►┌──────────────┐
└─────────┘ │ │ • Load Balancing • Logging │ │ │Order Service │
│ └────────────────────────────────────┘ │ └──────────────┘
┌─────────┐ │ │
│ Partner │────►│ │────►┌────────────────┐
└─────────┘ └─────────────────────────────────────────┘ │Product Service │
└────────────────┘- Request routing: Routes requests to appropriate backend services
- Authentication & Authorization: Centralized security enforcement (JWT validation, API keys, OAuth)
- Rate limiting: Protects services from abuse and ensures fair usage
- Request/Response transformation: Protocol translation, payload modification
- Caching: Response caching to reduce backend load
- Load balancing: Distributes traffic across service instances
- Logging & Monitoring: Centralized request logging and metrics collection
Popular options: Kong, AWS API Gateway, Nginx, Envoy, Spring Cloud Gateway
Rate Limiting & Throttling
Protects your services from being overwhelmed by too many requests.
Token Bucket Algorithm:
═══════════════════════════════════════════════════════════
Tokens added ┌──────────────────────┐
at fixed rate ──────►│ Token Bucket │
│ [●][●][●][●][ ] │ ◄── Max capacity
└──────────┬───────────┘
│
Request arrives ─────────────────┼─────────────────────
▼
┌───── Has Token? ─────┐
│ │
YES NO
│ │
▼ ▼
┌──────────┐ ┌───────────┐
│ Process │ │ Reject │
│ Request │ │ (429) │
└──────────┘ └───────────┘Common algorithms:
- Token Bucket: Tokens are added at a fixed rate; requests consume tokens
- Leaky Bucket: Requests are processed at a constant rate; excess requests queue or drop
- Fixed Window: Limits requests within fixed time intervals
- Sliding Window Log: Tracks timestamps of requests; more accurate but memory-intensive
- Sliding Window Counter: Hybrid approach balancing accuracy and efficiency
Implementation levels:
- Per-user/API key rate limiting
- Per-IP rate limiting
- Global rate limiting
- Distributed rate limiting (using Redis or similar)
Response handling: Return HTTP 429 (Too Many Requests) with Retry-After header
Circuit Breaker Pattern
Prevents cascading failures when a downstream service is unhealthy.
┌─────────────────────────────────────────────┐
│ Circuit Breaker States │
└─────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────┐
│ │
▼ │
┌─────────┐ Failures exceed ┌─────────┐ │
│ CLOSED │ ──────threshold──────────► │ OPEN │ │
│ │ │ │ │
│ Normal │ │ Fail │ │
│Operation│ ◄────Success────────┐ │ Fast │ │
└─────────┘ │ └────┬────┘ │
▲ │ │ │
│ │ Timeout │
│ │ expires │
│ ┌────┴────┐ │ │
│ │HALF-OPEN│◄─────┘ │
│ │ │ │
│ │ Test │──────Failures────────────┘
└─────────────────────│Requests │
Success └─────────┘States:
- Closed: Normal operation; requests pass through
- Open: Service is failing; requests fail immediately without calling the service
- Half-Open: After a timeout, allows limited requests to test if service has recovered
Configuration parameters:
- Failure threshold: Number of failures before opening the circuit
- Success threshold: Number of successes in half-open state before closing
- Timeout: Duration to wait before transitioning from open to half-open
Benefits:
- Fail fast: Reduces latency when downstream is unhealthy
- Prevents resource exhaustion from waiting on failing services
- Gives downstream services time to recover
Popular implementations: Resilience4j, Hystrix (deprecated), Polly (.NET)
Scale the Database
Database Replication: one writer and many readers (replicas)
┌──────────────────┐
Write ───────►│ Primary (RW) │
│ (Writer) │
└────────┬─────────┘
│
┌───────────────────┼───────────────────┐
│ Replication │ │ Replication
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Replica 1 │ │ Replica 2 │ │ Replica 3 │
│ (Read) │ │ (Read) │ │ (Read) │
└─────────────┘ └─────────────┘ └─────────────┘
▲ ▲ ▲
│ │ │
└───────────────────┴───────────────────┘
Read Queries- The writer node only supports write operations, the reader nodes only support read operations
- If a reader node goes offline, read operations are redirected to other healthy reader nodes; a new read node will spin up and replace the old one.
- If the writer node goes offline, a reader node will be promoted to be the new writer; all write operations will be redirected to the new writer node; a new reader node will replace the old one for data replication immediately.
- Replication lag: Read replicas may be slightly behind the primary; design for eventual consistency or use synchronous replication for critical reads
- Advantages:
- Better performance: all read operations are distributed across reader nodes.
- Reliability: data is replicated across multiple locations
- High availability: data is replicated across multiple servers
Connection Pooling: Maintains a pool of reusable database connections
┌─────────────────┐ ┌────────────────────────┐ ┌──────────┐
│ Application │ │ Connection Pool │ │ Database │
│ │ │ ┌──┐ ┌──┐ ┌──┐ ┌──┐ │ │ │
│ Request 1 ─────┼─────►│ │C1│ │C2│ │C3│ │C4│ │◄────►│ │
│ Request 2 ─────┼─────►│ └──┘ └──┘ └──┘ └──┘ │ │ │
│ Request 3 ─────┼─────►│ (Reusable Conns) │ │ │
│ │ └────────────────────────┘ └──────────┘
└─────────────────┘
Without pool: New connection per request (slow)
With pool: Reuse existing connections (fast)- Reduces overhead of establishing new connections
- Prevents connection exhaustion under high load
- Popular tools: HikariCP (Java), PgBouncer (PostgreSQL), ProxySQL (MySQL)
- Key configurations: minimum/maximum pool size, connection timeout, idle timeout
Query Optimization:
- Proper indexing strategy (B-tree, hash, composite indexes)
- Table partitioning for large tables
- Query analysis with EXPLAIN/EXPLAIN ANALYZE
- Avoid N+1 query problems
Horizontal Scaling (sharding): Sharding separates large databases into smaller, more easily managed parts called shards. Each shard shares the same schema, though the actual data on each shard is unique to the shard.
┌────────────────┐
│ Shard Router │
└───────┬────────┘
┌───────────────────┼───────────────────┐
│ │ │
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Shard 1 │ │ Shard 2 │ │ Shard 3 │
│ Users A-H │ │ Users I-P │ │ Users Q-Z │
│ │ │ │ │ │
│ Same schema │ │ Same schema │ │ Same schema │
│ Unique data │ │ Unique data │ │ Unique data │
└─────────────┘ └─────────────┘ └─────────────┘
Sharding Key: user_id (first letter)When choosing a sharding key, one of the most important criteria is to choose a key that can evenly distribute data. Challenges:
- Resharding data: when a single shard could no longer hold more data due to rapid growth, we need to reshard and move data around. Consistent hashing is a commonly used technique.
- Celebrity problem: also called the hotspot key problem; Excessive access to a specific shard could cause server overload, we may need to allocate a shard for each celebrity, or each shard might even require further partition.
- Join and de-normalization: it is hard to perform join operations across database shards. de-normalization is a common workaround so that queries can be performed in a single table.
- Cross-shard transactions: Distributed transactions are complex; consider saga pattern or eventual consistency
Improve Load/Response Time
Cache:
- Stores the result of expensive responses or frequently accessed data in memory so that subsequent requests are served more quickly
- Benefits: better system performance; ability to reduce database workloads; ability to scale the cache tier independently
Caching strategies:
Cache-Aside (Lazy Loading) Write-Through
══════════════════════════ ══════════════
┌─────┐ 1.Read ┌───────┐ ┌─────┐ 1.Write ┌───────┐
│ App │─────────►│ Cache │ │ App │────────►│ Cache │
└──┬──┘ └───┬───┘ └─────┘ └───┬───┘
│ │ │
│ 2.Miss? │ 2.Write
│ │ │
▼ ▼ ▼
┌──────┐ ◄──── 3.Load from DB ┌──────────┐
│ DB │ 4.Update cache │ DB │
└──────┘ └──────────┘
Write-Behind (Write-Back) Read-Through
═════════════════════════ ════════════
┌─────┐ 1.Write ┌───────┐ ┌─────┐ Read ┌───────┐
│ App │────────►│ Cache │ │ App │───────►│ Cache │
└─────┘ └───┬───┘ └─────┘ └───┬───┘
│ │
2.Async Cache loads
Queue Write from DB on miss
│ │
▼ ▼
┌──────────┐ ┌──────────┐
│ DB │ │ DB │
└──────────┘ └──────────┘- Cache-aside (Lazy Loading): Application checks cache first; on miss, loads from DB and populates cache
- Write-through: Writes go to cache and database simultaneously; ensures consistency
- Write-behind (Write-back): Writes to cache first, then asynchronously to database; higher performance but risk of data loss
- Read-through: Cache sits between application and database; automatically loads on miss
Considerations:
- Decide when to use cache: for data that is read frequently but modified infrequently
- Consistency: keep the data in database and cache in sync
- Mitigating failures: use multiple cache servers across different data centers are recommended to avoid single point of failure
- Expiration policy: add TTL to the cached data (too short will cause the system to reload data from the database too frequently; too long will fetch stale data.)
- Eviction policy: LRU (Least Recently Used), LFU (Least Frequently Used), FIFO (First In First Out)
Cache stampede (Thundering Herd): When a popular cache key expires, many requests simultaneously hit the database.
- Solutions:
- Locking: Only one request fetches from DB while others wait
- Early expiration: Proactively refresh before TTL expires
- Probabilistic early expiration: Randomly refresh before expiration
- Never expire + background refresh
Distributed Cache:
- Redis Cluster, Memcached for horizontal scaling
- Consider: data partitioning, replication, failover strategies
CDN (Content Delivery Network):
Origin Server
(Your Server)
│
┌────────────────────┼────────────────────┐
│ │ │
▼ ▼ ▼
┌───────────┐ ┌───────────┐ ┌───────────┐
│ CDN Edge │ │ CDN Edge │ │ CDN Edge │
│ (Asia) │ │ (Europe) │ │(Americas) │
└─────┬─────┘ └─────┬─────┘ └─────┬─────┘
│ │ │
┌──────┴──────┐ ┌──────┴──────┐ ┌──────┴──────┐
▼ ▼ ▼ ▼ ▼ ▼
[User] [User] [User] [User] [User] [User]
Tokyo Singapore London Paris NYC São Paulo
Static content (images, CSS, JS) served from nearest edge location- A network of geographically dispersed servers used to deliver static content like images, videos, CSS, JavaScript files etc.
- Considerations:
- Cost: CDN is charged for data transfers in and out, so only cache frequently used assets.
- Setting an appropriate cache expiry: if too short, content might need frequent refresh; if too long, it can cause repeat reloading of content from origin servers to CDN.
- Fallback: if CDN is temporarily down, clients should be able to detect the problem and request resources from the origin.
- Invalidating files: expire assets by either invalidating the CDN object by provider API or using object versions.
Data Centers
To improve availability and provide a better user experience across wider geographical areas, supporting multiple data centers is crucial.
┌─────────────┐
│ GeoDNS │
│ Router │
└──────┬──────┘
│
┌──────────────────────┼──────────────────────┐
│ │ │
▼ ▼ ▼
┌────────────────────┐ ┌────────────────────┐ ┌────────────────────┐
│ Data Center US │ │ Data Center EU │ │ Data Center APAC │
│ ┌────┐ ┌────┐ │ │ ┌────┐ ┌────┐ │ │ ┌────┐ ┌────┐ │
│ │ LB │ │ DB │ │ │ │ LB │ │ DB │ │ │ │ LB │ │ DB │ │
│ └────┘ └────┘ │ │ └────┘ └────┘ │ │ └────┘ └────┘ │
│ ┌────┐ ┌────┐ │ │ ┌────┐ ┌────┐ │ │ ┌────┐ ┌────┐ │
│ │Apps│ │Cache│ │ │ │Apps│ │Cache│ │ │ │Apps│ │Cache│ │
│ └────┘ └────┘ │ │ └────┘ └────┘ │ │ └────┘ └────┘ │
└─────────┬──────────┘ └────────┬─────────┘ └─────────┬──────────┘
│ │ │
└─────────────────────┼─────────────────────┘
Data ReplicationGeoDNS is a DNS service that allows domain names to be resolved to IP addresses based on the location of a user.
In the event of any significant data center outage, we direct all traffic to a healthy data center.
Multi data center considerations:
- Traffic redirection: GeoDNS can be used to direct traffic to the nearest data center based on user’s location.
- Data synchronization: In failover cases, traffic might be routed to a data center where data is unavailable. A common strategy is to replicate data across multiple data centers.
- Test and deployment: Use automated deployment tools to keep services consistent through all the data centers.
Consistency trade-offs:
- Strong consistency: All reads return the latest write; higher latency
- Eventual consistency: Updates propagate asynchronously; lower latency but stale reads possible
- Conflict resolution: Last-writer-wins, vector clocks, CRDTs
Service Discovery
In dynamic environments, services need to discover each other automatically.
Client-Side Discovery Server-Side Discovery
════════════════════ ═════════════════════
┌────────┐ ┌────────┐
│ Client │ │ Client │
└───┬────┘ └───┬────┘
│ │
│ 1.Query │ 1.Request
▼ ▼
┌──────────────┐ ┌──────────────┐
│ Service │ │Load Balancer │
│ Registry │ │ / Router │
└──────────────┘ └──────┬───────┘
│ │
│ 2.Return │ 2.Query
│ instances ▼
▼ ┌──────────────┐
┌────────┐ │ Service │
│ Client │──► 3.Direct call │ Registry │
└────────┘ to instance └──────────────┘
│
│ 3.Return
▼
┌──────────────┐
│Load Balancer │──► 4.Route
└──────────────┘ to instanceClient-side discovery: Client queries a service registry and chooses an instance
- Examples: Netflix Eureka, Consul
Server-side discovery: Client requests go through a load balancer that queries the registry
- Examples: Kubernetes Service, AWS ELB
Service registry: Database of available service instances and their locations
- Must be highly available
- Health checks to remove unhealthy instances
- Examples: Consul, etcd, Zookeeper, Kubernetes DNS
Container Orchestration
Automates deployment, scaling, and management of containerized applications.
┌────────────────────────────────────────────────────────────────────┐
│ Kubernetes Cluster │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ Control Plane │ │
│ │ ┌──────────┐ ┌───────────┐ ┌───────────┐ ┌───────────────┐ │ │
│ │ │API Server│ │ Scheduler │ │Controller │ │ etcd │ │ │
│ │ └──────────┘ └───────────┘ └───────────┘ └───────────────┘ │ │
│ └──────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Worker Node │ │ Worker Node │ │ Worker Node │ │
│ │ ┌───┐ ┌───┐ │ │ ┌───┐ ┌───┐ │ │ ┌───┐ ┌───┐ │ │
│ │ │Pod│ │Pod│ │ │ │Pod│ │Pod│ │ │ │Pod│ │Pod│ │ │
│ │ └───┘ └───┘ │ │ └───┘ └───┘ │ │ └───┘ └───┘ │ │
│ │ ┌───┐ ┌───┐ │ │ ┌───┐ │ │ ┌───┐ ┌───┐ │ │
│ │ │Pod│ │Pod│ │ │ │Pod│ │ │ │Pod│ │Pod│ │ │
│ │ └───┘ └───┘ │ │ └───┘ │ │ └───┘ └───┘ │ │
│ └─────────────────┘ └─────────────────┘ └─────────────────┘ │
└────────────────────────────────────────────────────────────────────┘
│ │
Auto-scaling Self-healing
(HPA adds pods) (Restart failed pods)Key capabilities:
- Auto-scaling: Horizontal Pod Autoscaler (HPA) based on CPU/memory/custom metrics
- Self-healing: Automatic restart of failed containers
- Service discovery & load balancing: Built-in DNS and load balancing
- Rolling updates & rollbacks: Zero-downtime deployments
- Resource management: CPU/memory limits and requests
Scaling strategies:
- Horizontal scaling: Add more pod replicas
- Vertical scaling: Increase resources per pod
- Cluster autoscaling: Add/remove nodes based on demand
Popular platforms: Kubernetes, Docker Swarm, Amazon ECS, Google Cloud Run
Message Queue
A message queue is a durable component that supports asynchronous communication.
┌──────────────┐ ┌──────────────┐
│ Producer │ │ Consumer │
│ (Service A) │ │ (Service B) │
└──────┬───────┘ └──────▲───────┘
│ │
│ 1.Publish 4.Process
│ │
▼ │
┌─────────────────────────────────────────────────────────────────────┐
│ Message Queue │
│ ┌───────────────────────────────────────────────────────────────┐ │
│ │ [Msg1] [Msg2] [Msg3] [Msg4] [Msg5] ───────────────────────► │ │
│ └───────────────────────────────────────────────────────────────┘ │
│ │
│ Features: │
│ • Durability (messages persisted) │
│ • Ordering (FIFO or partitioned) │
│ • Acknowledgment (at-least-once/exactly-once) │
│ │
│ Dead Letter Queue (DLQ): │
│ ┌─────────────────────────┐ │
│ │ [Failed1] [Failed2] │ ◄── Messages that couldn't be │
│ └─────────────────────────┘ processed after retries │
└─────────────────────────────────────────────────────────────────────┘Decoupling makes the message queue a preferred architecture for building a scalable and reliable application.
Delivery semantics:
- At-most-once: Messages may be lost but never redelivered
- At-least-once: Messages are never lost but may be redelivered; consumers must be idempotent
- Exactly-once: Each message is delivered exactly once; most difficult to achieve
Dead Letter Queue (DLQ): Stores messages that could not be processed
- Allows inspection and debugging of failed messages
- Prevents poison messages from blocking the queue
- Configure retry policies before moving to DLQ
Backpressure handling:
- Rate limiting on producers
- Queue size limits with rejection policies
- Consumer scaling based on queue depth
Best practices:
- Idempotent consumers for at-least-once delivery
- Message ordering considerations (partitioning by key)
- Poison message handling with retry limits
Popular options: Apache Kafka, RabbitMQ, Amazon SQS, Redis Streams
Observability: Logging, Metrics, Tracing
┌─────────────────────────────────────────────────────────────────────┐
│ Observability Stack │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ │
│ │ Logging │ │ Metrics │ │ Tracing │ │
│ │ │ │ │ │ │ │
│ │ • ELK Stack│ │ •Prometheus│ │ • Jaeger │ │
│ │ • Splunk │ │ • Grafana │ │ • Zipkin │ │
│ │ • Datadog │ │ • Datadog │ │ •OpenTelem │ │
│ └─────┬──────┘ └─────┬──────┘ └─────┬──────┘ │
│ │ │ │ │
│ └───────────────────┼───────────────────┘ │
│ │ │
│ ┌──────▼──────┐ │
│ │ Alerting │ │
│ │ PagerDuty, │ │
│ │ OpsGenie │ │
│ └─────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
Distributed Trace Example:
═══════════════════════════
Request ──► API Gateway ──► User Service ──► Database
│ │ │ │
│ [Span 1] │ [Span 2] │ [Span 3] │
│◄────────────┴───────────────┴──────────────┤
│ Total Trace Duration │Logging:
- Monitoring error logs is important because it helps to identify errors and problems in the system
- Centralized logging: Aggregate logs from all services (ELK Stack, Splunk, Datadog)
- Structured logging: JSON format for easier parsing and querying
- Log levels: DEBUG, INFO, WARN, ERROR; configure appropriately per environment
- Correlation IDs: Track requests across services
Metrics:
- Collecting different types of metrics helps gain business insights and understand the health status of the system
- Types: counters, gauges, histograms, summaries
- Key metrics: latency (p50, p95, p99), error rate, throughput, saturation
- RED method: Rate, Errors, Duration
- USE method: Utilization, Saturation, Errors
- Tools: Prometheus, Grafana, Datadog, CloudWatch
Distributed Tracing:
- Track requests as they flow through multiple services
- Identify bottlenecks and latency issues
- Tools: Jaeger, Zipkin, OpenTelemetry, AWS X-Ray
Alerting:
- Set up alerts for critical thresholds
- Avoid alert fatigue with proper thresholds and grouping
- Runbooks for common issues
Automation: CI (Continuous Integration) and CD (Continuous Deployment).
- Automated testing at every stage
- Infrastructure as Code (Terraform, Pulumi)
- GitOps for declarative deployments
Summary Cheat Sheet
Application Scaling Journey
═══════════════════════════════════════════════════════════════════
Single Server Separate Concerns Scale Web Tier
┌───────────┐ ┌─────┐ ┌────┐ ┌─────┐
│ Web + DB │ ──► │ Web │ │ DB │ ──► │ LB │
└───────────┘ └─────┘ └────┘ └──┬──┘
┌───┼───┐
[W] [W] [W]
Add Cache Add CDN Multiple DCs
┌───────┐ ┌───────┐ ┌────┐ ┌────┐
│ Redis │ │ CDN │ │ DC1│ │ DC2│
└───────┘ └───────┘ └────┘ └────┘
(GeoDNS)
Scale Database Message Queue Microservices
┌────┐ ┌─────────┐ ┌───┐ ┌───┐
│ RW │ │ Kafka │ │Svc│ │Svc│
└─┬──┘ └─────────┘ └───┘ └───┘
┌──┴──┐
[R] [R]| Stage | Key Actions |
|---|---|
| Separate concerns | Split web server from database |
| Scale web tier | Add load balancer, horizontal scaling, stateless design |
| Add API Gateway | Centralized routing, auth, rate limiting |
| Protect services | Rate limiting, circuit breakers |
| Scale database | Read replicas, connection pooling, sharding |
| Improve latency | Caching (Redis), CDN for static assets |
| Go multi-region | Data centers, GeoDNS, data replication |
| Dynamic infrastructure | Service discovery, container orchestration |
| Decouple services | Message queues with proper delivery semantics |
| Observe everything | Logging, metrics, tracing, alerting |