System Design Interview Questions

System design interviews assess how you scope an ambiguous problem and reason about trade-offs at scale. There's no single right answer — interviewers want a clear, structured approach.

Practice with AI — free

What System Design interviews cover

Requirements & scope

Clarifying functional and non-functional requirements before designing anything.

Estimation

Back-of-the-envelope numbers for traffic, storage, and bandwidth to justify decisions.

High-level design

APIs, data model, core components, and the data flow between them.

Scaling & trade-offs

Caching, sharding, replication, queues, consistency vs availability, and bottlenecks.

Sample System Design interview questions

Design a URL shortener like bit.ly.
What a strong answer covers
- API design with POST to create, GET to redirect, 301/302 status codes
- Base62 encoding for short key generation; collision handling via retry or distributed ID generator
- Read-heavy workload; use Redis cache for hot URLs and PostgreSQL for persistent storage
- Key scalability bottleneck: short URL collision and database write throughput; shard by key's hash
View a sample answer
Design a URL shortener with these requirements: shorten a long URL to a unique short key, redirect to original URL, and support analytics. We'll use a REST API: POST /shorten returns short URL, GET /{key} returns 301 redirect. Short key generation: use Base62 encoding (a-z, A-Z, 0-9) for 7 chars = 62^7 ≈ 3.5 trillion combinations. To avoid collisions, we can use a distributed unique ID generator like Snowflake or a counter with range assignment per server. The system architecture: load balancer → web servers (stateless) → cache (Redis) → SQL database (PostgreSQL). For insertion: check cache, if miss, insert into DB and cache. Redirect flow: check cache, if hit return cached long URL; if miss, query DB, cache and respond. To scale, shard DB by URL key's hash or use consistent hashing for cache. Common pitfall: not handling custom aliases correctly; use a separate namespace. For analytics, asynchronously log clicks to a separate service (e.g., Kafka + Hadoop).
Design a news feed (e.g. Twitter/Instagram timeline).
What a strong answer covers
- Pull vs push model for content delivery; fanout-on-write vs fanout-on-read
- Feed generation: use a precomputed feed cache (Redis sorted sets) for active users
- Social graph management: adjacency list in graph DB or NoSQL
- Ranking: chronological or machine learning based; handle top-K from many subscriptions
View a sample answer
Design a news feed like Twitter/Instagram. Requirements: each user sees a timeline of posts from followed users, ordered by time (or relevance). Two core models: push (fanout-on-write) and pull (fanout-on-read). For high read-to-write ratio, use a hybrid: precompute feed for active users, store in Redis sorted set with score=timestamp. For inactive or less frequent users, generate on read. The data model: User service, Post service (NoSQL like Cassandra for high write volume), Graph service (e.g., Neo4j or Redis). When a user posts, the write path: insert post into DB, then fanout to followers' feed caches (asynchronously using message queue like Kafka). For scalability, limit fanout to active followers only, and handle super-followers (celebrities) with a pull-based model. For ranking, use simple chronological for simplicity, or a machine-learned relevance score stored in the set. Common pitfall: fanout latency for users with millions of followers; use a separate queue per celebrity and delay fanout. Feed caching layer: use Redis cluster with replication for high availability.
Design a rate limiter for a public API.
What a strong answer covers
- Token bucket, leaky bucket, fixed window, sliding window log, sliding window counter algorithms
- Distributed rate limiter: use Redis with Lua scripting for atomic increments
- Configuration per endpoint, per user, or global tiers (e.g., 10 req/sec)
- Handle burst traffic with token bucket; edge cases like clock skew and race conditions
View a sample answer
Design a rate limiter for a public API. Requirements: limit requests per user/IP within a window, different limits per endpoint, and low latency overhead. We'll use a sliding window counter algorithm for accuracy and efficiency. Use Redis as a central store, with sorted sets per user. Each request timestamp is stored, and the count is the number of entries in the last window. To avoid memory bloat, use a single key with a TTL equal to the window. For atomicity, use Lua script to check and increment. The data structure: key = 'rate_limit:{user_id}:{endpoint}', value = sorted set of timestamps. When a request comes, run Lua: remove timestamps older than window, count remaining, if count < limit, add current timestamp and return allow; else deny. For high throughput, use Redis pipelining. For scaling, shard by user_id. Common pitfall: clock skew across Redis nodes; use monotonic timestamps. Token bucket is better for burst but requires periodic refill. Rate limit headers (X-RateLimit-Limit, Remaining, Reset) inform clients. For global limits, use a separate key without user prefix.
Design a chat system that supports group messaging and read receipts.
What a strong answer covers
- Client-server architecture: WebSocket for real-time, HTTP polling for fallback
- Group messaging: store messages in a database (Cassandra for write scaling); use a message queue for delivery
- Read receipts: track last read message per user per conversation; update with timestamp
- Consistency model: eventual consistency for offline users; strong consistency for real-time online users
View a sample answer
Design a chat system with group messaging and read receipts. Requirements: one-to-one and group chats, real-time delivery, offline message storage, and read receipts. We'll use WebSocket for real-time communication, with HTTP long polling as fallback. Architecture: load balancer → chat servers (handle WebSocket connections), distributed message broker (Kafka/RabbitMQ), database (Cassandra for message storage), and presence service. For group messaging: when a user sends a message, the chat server publishes to a topic per conversation (e.g., 'chat:{conversation_id}'). Each group member subscribes to that topic (via user-specific queue). The server stores the message in Cassandra (partitioned by conversation_id, clustered by timestamp). For read receipts: each user has a 'last_read_message_id' per conversation, stored in a fast key-value store (Redis). When a user reads a message, update the receipt; if all participants read, notify the sender (optional). For scaling: horizontal scaling of chat servers (stateless), use consistent hashing for client-to-server mapping. Common pitfall: ordering of messages in group chat; use a sequence number or timestamp per conversation. For offline users, store messages in a mailbox and push when they come online using push notifications.
How would you scale a service from 1k to 10M users?
What a strong answer covers
- Start with monolithic architecture, scale vertically (more CPU/RAM) up to ~10k users
- Introduce caching layer (Redis/CDN), scale horizontally by adding stateless app servers behind a load balancer
- Database scaling: read replicas, sharding, or use NoSQL like Cassandra for write scaling
- Automate infrastructure: CI/CD, auto-scaling groups, monitoring (latency, error rates, database connections)
View a sample answer
Scaling a service from 1k to 10M users requires incremental architectural changes. At 1k users, a single server with a monolithic app and a SQL database suffices. At ~10k users, introduce a load balancer and multiple stateless app servers, plus a caching layer (Redis) for hot data. At ~100k users, database becomes bottleneck: add read replicas for read-heavy workloads, use a CDN for static assets, and consider database indexing optimization. At ~1M users, shard the database horizontally (by user_id or region) or migrate to NoSQL (Cassandra for write-heavy, DynamoDB for flexible queries). Introduce asynchronous processing with message queues (Kafka) for background jobs (e.g., email, analytics). At 10M users, implement microservices to decouple functionality (user service, feed service, etc.), use distributed caching (Redis cluster), and adopt event-driven architecture. Autoscaling groups and container orchestration (Kubernetes) handle traffic spikes. Monitoring and alerting are critical (distributed tracing, metrics). Common pitfalls: premature optimization, ignoring data consistency models, and not planning for database sharding early. Always test scalability with load testing.
Design a distributed key-value store with high availability.
What a strong answer covers
- Consistency models: strong (CP) vs eventual (AP) in CAP theorem; choose AP for high availability
- Partitioning: consistent hashing with virtual nodes for even distribution; use a ring
- Replication: quorum-based writes (W + R > N) to ensure consistency; prefer eventual consistency with hinted handoff
- Data storage: log-structured merge-tree (LSM tree) like LevelDB for write-heavy workloads; support CRUD operations
View a sample answer
Design a distributed key-value store with high availability. We'll target an AP system (available and partition-tolerant) using Dynamo-style architecture. Requirements: put(key, value) and get(key), always writable, eventually consistent. Use consistent hashing to partition data across nodes (e.g., 128 virtual nodes per physical node). For replication, each key is stored on N nodes (e.g., 3) in the ring. Write: the coordinating node sends to all N replicas; if at least W replicas acknowledge (e.g., W=2), success; else use hinted handoff (temporarily store on another node). Read: query all N replicas, return the latest version based on vector clocks. To resolve conflicts, use last-write-wins (timestamp) or return conflicting versions to client. For durability, use a commit log and periodic snapshots (e.g., SSTables). The storage engine can be RocksDB (LSM tree) for high write throughput. For scalability, add nodes and rebalance using virtual nodes (data automatically migrates). Common pitfall: vector clocks can grow large; limit size by truncating after a threshold. For operations, use gossip protocol for membership and failure detection. Ensure eventual consistency with read repair (repair outdated replicas on read). For high availability, avoid single points of failure; each node is equal.

How to prepare

Always start by clarifying requirements and constraints — don't jump to a solution.
Use a repeatable framework: requirements → estimation → API → data model → scale → trade-offs.
State trade-offs explicitly (e.g. consistency vs availability); there's rarely one right answer.
Drive the conversation and manage time — cover breadth first, then go deep where asked.

Frequently asked questions

How do I prepare for a system design interview?

Learn a repeatable framework, study a handful of canonical designs (feed, chat, URL shortener, rate limiter), and practice talking through trade-offs out loud.

Do I need to know specific technologies?

Know the building blocks (load balancers, caches, queues, SQL vs NoSQL, sharding) and their trade-offs. Naming exact products matters less than reasoning.

What's the most common mistake?

Jumping straight to a design without clarifying requirements or doing estimation, and not stating trade-offs.

Is system design only for senior roles?

It's most common at mid and senior levels, but junior candidates may get a lightweight version focused on a single component.

Practice System Design questions with instant AI feedback

Upload your resume, get a personalized mock interview, and see exactly what to improve — free to start.

Start free Browse all interview guides

What System Design interviews cover

Requirements & scope

Clarifying functional and non-functional requirements before designing anything.

Estimation

Back-of-the-envelope numbers for traffic, storage, and bandwidth to justify decisions.

High-level design

APIs, data model, core components, and the data flow between them.

Scaling & trade-offs

Caching, sharding, replication, queues, consistency vs availability, and bottlenecks.

Sample System Design interview questions

Design a URL shortener like bit.ly.

What a strong answer covers

API design with POST to create, GET to redirect, 301/302 status codes
Base62 encoding for short key generation; collision handling via retry or distributed ID generator
Read-heavy workload; use Redis cache for hot URLs and PostgreSQL for persistent storage
Key scalability bottleneck: short URL collision and database write throughput; shard by key's hash

View a sample answer

Design a URL shortener with these requirements: shorten a long URL to a unique short key, redirect to original URL, and support analytics. We'll use a REST API: POST /shorten returns short URL, GET /{key} returns 301 redirect. Short key generation: use Base62 encoding (a-z, A-Z, 0-9) for 7 chars = 62^7 ≈ 3.5 trillion combinations. To avoid collisions, we can use a distributed unique ID generator like Snowflake or a counter with range assignment per server. The system architecture: load balancer → web servers (stateless) → cache (Redis) → SQL database (PostgreSQL). For insertion: check cache, if miss, insert into DB and cache. Redirect flow: check cache, if hit return cached long URL; if miss, query DB, cache and respond. To scale, shard DB by URL key's hash or use consistent hashing for cache. Common pitfall: not handling custom aliases correctly; use a separate namespace. For analytics, asynchronously log clicks to a separate service (e.g., Kafka + Hadoop).

Design a news feed (e.g. Twitter/Instagram timeline).

What a strong answer covers

Pull vs push model for content delivery; fanout-on-write vs fanout-on-read
Feed generation: use a precomputed feed cache (Redis sorted sets) for active users
Social graph management: adjacency list in graph DB or NoSQL
Ranking: chronological or machine learning based; handle top-K from many subscriptions

View a sample answer

Design a news feed like Twitter/Instagram. Requirements: each user sees a timeline of posts from followed users, ordered by time (or relevance). Two core models: push (fanout-on-write) and pull (fanout-on-read). For high read-to-write ratio, use a hybrid: precompute feed for active users, store in Redis sorted set with score=timestamp. For inactive or less frequent users, generate on read. The data model: User service, Post service (NoSQL like Cassandra for high write volume), Graph service (e.g., Neo4j or Redis). When a user posts, the write path: insert post into DB, then fanout to followers' feed caches (asynchronously using message queue like Kafka). For scalability, limit fanout to active followers only, and handle super-followers (celebrities) with a pull-based model. For ranking, use simple chronological for simplicity, or a machine-learned relevance score stored in the set. Common pitfall: fanout latency for users with millions of followers; use a separate queue per celebrity and delay fanout. Feed caching layer: use Redis cluster with replication for high availability.

Design a rate limiter for a public API.

What a strong answer covers

Token bucket, leaky bucket, fixed window, sliding window log, sliding window counter algorithms
Distributed rate limiter: use Redis with Lua scripting for atomic increments
Configuration per endpoint, per user, or global tiers (e.g., 10 req/sec)
Handle burst traffic with token bucket; edge cases like clock skew and race conditions

View a sample answer

Design a rate limiter for a public API. Requirements: limit requests per user/IP within a window, different limits per endpoint, and low latency overhead. We'll use a sliding window counter algorithm for accuracy and efficiency. Use Redis as a central store, with sorted sets per user. Each request timestamp is stored, and the count is the number of entries in the last window. To avoid memory bloat, use a single key with a TTL equal to the window. For atomicity, use Lua script to check and increment. The data structure: key = 'rate_limit:{user_id}:{endpoint}', value = sorted set of timestamps. When a request comes, run Lua: remove timestamps older than window, count remaining, if count < limit, add current timestamp and return allow; else deny. For high throughput, use Redis pipelining. For scaling, shard by user_id. Common pitfall: clock skew across Redis nodes; use monotonic timestamps. Token bucket is better for burst but requires periodic refill. Rate limit headers (X-RateLimit-Limit, Remaining, Reset) inform clients. For global limits, use a separate key without user prefix.

Design a chat system that supports group messaging and read receipts.

What a strong answer covers

Client-server architecture: WebSocket for real-time, HTTP polling for fallback
Group messaging: store messages in a database (Cassandra for write scaling); use a message queue for delivery
Read receipts: track last read message per user per conversation; update with timestamp
Consistency model: eventual consistency for offline users; strong consistency for real-time online users

View a sample answer

Design a chat system with group messaging and read receipts. Requirements: one-to-one and group chats, real-time delivery, offline message storage, and read receipts. We'll use WebSocket for real-time communication, with HTTP long polling as fallback. Architecture: load balancer → chat servers (handle WebSocket connections), distributed message broker (Kafka/RabbitMQ), database (Cassandra for message storage), and presence service. For group messaging: when a user sends a message, the chat server publishes to a topic per conversation (e.g., 'chat:{conversation_id}'). Each group member subscribes to that topic (via user-specific queue). The server stores the message in Cassandra (partitioned by conversation_id, clustered by timestamp). For read receipts: each user has a 'last_read_message_id' per conversation, stored in a fast key-value store (Redis). When a user reads a message, update the receipt; if all participants read, notify the sender (optional). For scaling: horizontal scaling of chat servers (stateless), use consistent hashing for client-to-server mapping. Common pitfall: ordering of messages in group chat; use a sequence number or timestamp per conversation. For offline users, store messages in a mailbox and push when they come online using push notifications.

How would you scale a service from 1k to 10M users?

What a strong answer covers

Start with monolithic architecture, scale vertically (more CPU/RAM) up to ~10k users
Introduce caching layer (Redis/CDN), scale horizontally by adding stateless app servers behind a load balancer
Database scaling: read replicas, sharding, or use NoSQL like Cassandra for write scaling
Automate infrastructure: CI/CD, auto-scaling groups, monitoring (latency, error rates, database connections)

View a sample answer

Scaling a service from 1k to 10M users requires incremental architectural changes. At 1k users, a single server with a monolithic app and a SQL database suffices. At ~10k users, introduce a load balancer and multiple stateless app servers, plus a caching layer (Redis) for hot data. At ~100k users, database becomes bottleneck: add read replicas for read-heavy workloads, use a CDN for static assets, and consider database indexing optimization. At ~1M users, shard the database horizontally (by user_id or region) or migrate to NoSQL (Cassandra for write-heavy, DynamoDB for flexible queries). Introduce asynchronous processing with message queues (Kafka) for background jobs (e.g., email, analytics). At 10M users, implement microservices to decouple functionality (user service, feed service, etc.), use distributed caching (Redis cluster), and adopt event-driven architecture. Autoscaling groups and container orchestration (Kubernetes) handle traffic spikes. Monitoring and alerting are critical (distributed tracing, metrics). Common pitfalls: premature optimization, ignoring data consistency models, and not planning for database sharding early. Always test scalability with load testing.

Design a distributed key-value store with high availability.

What a strong answer covers

Consistency models: strong (CP) vs eventual (AP) in CAP theorem; choose AP for high availability
Partitioning: consistent hashing with virtual nodes for even distribution; use a ring
Replication: quorum-based writes (W + R > N) to ensure consistency; prefer eventual consistency with hinted handoff
Data storage: log-structured merge-tree (LSM tree) like LevelDB for write-heavy workloads; support CRUD operations

View a sample answer

Design a distributed key-value store with high availability. We'll target an AP system (available and partition-tolerant) using Dynamo-style architecture. Requirements: put(key, value) and get(key), always writable, eventually consistent. Use consistent hashing to partition data across nodes (e.g., 128 virtual nodes per physical node). For replication, each key is stored on N nodes (e.g., 3) in the ring. Write: the coordinating node sends to all N replicas; if at least W replicas acknowledge (e.g., W=2), success; else use hinted handoff (temporarily store on another node). Read: query all N replicas, return the latest version based on vector clocks. To resolve conflicts, use last-write-wins (timestamp) or return conflicting versions to client. For durability, use a commit log and periodic snapshots (e.g., SSTables). The storage engine can be RocksDB (LSM tree) for high write throughput. For scalability, add nodes and rebalance using virtual nodes (data automatically migrates). Common pitfall: vector clocks can grow large; limit size by truncating after a threshold. For operations, use gossip protocol for membership and failure detection. Ensure eventual consistency with read repair (repair outdated replicas on read). For high availability, avoid single points of failure; each node is equal.

How to prepare

Always start by clarifying requirements and constraints — don't jump to a solution.

Use a repeatable framework: requirements → estimation → API → data model → scale → trade-offs.

State trade-offs explicitly (e.g. consistency vs availability); there's rarely one right answer.

Drive the conversation and manage time — cover breadth first, then go deep where asked.

Frequently asked questions

How do I prepare for a system design interview?

Learn a repeatable framework, study a handful of canonical designs (feed, chat, URL shortener, rate limiter), and practice talking through trade-offs out loud.

Do I need to know specific technologies?

Know the building blocks (load balancers, caches, queues, SQL vs NoSQL, sharding) and their trade-offs. Naming exact products matters less than reasoning.

What's the most common mistake?

Jumping straight to a design without clarifying requirements or doing estimation, and not stating trade-offs.

Is system design only for senior roles?

It's most common at mid and senior levels, but junior candidates may get a lightweight version focused on a single component.