Senior Backend Engineer Interview Questions

What a Senior Backend Engineer interview focuses on, the questions you'll face, and how to practice them with instant AI feedback.

Run a free AI mock interview

What's expected at the Senior level

Expect distributed-systems design, reliability ownership and cross-team architecture.

Junior Mid

Sample Backend Engineer interview questions

TechnicalWhen would you choose a SQL database over a NoSQL store, and why?
What a strong answer covers
- ACID vs BASE trade-offs
- Structured schema vs flexible schema
- Join support vs denormalization
- Vertical scaling vs horizontal scaling
- Consistency and reliability requirements
View a sample answer
SQL databases are preferable when you need strong consistency, complex joins, and structured data with relational integrity. For example, a banking system requires ACID transactions to ensure accurate balances. NoSQL stores are better for high write throughput, flexible schemas, and horizontal scaling, like in real-time analytics or user sessions. The choice also depends on the query patterns: SQL excels at ad-hoc queries with joins, while NoSQL favors key-value or document lookups. A common pitfall is assuming NoSQL is always faster; for many workloads with complex relationships, SQL can be more performant due to optimized query planning. Also, consider operational maturity: SQL databases have more mature tooling for backups and replication.
TechnicalHow do database indexes work, and what are their trade-offs?
What a strong answer covers
- B-tree vs hash index structure
- Clustered vs non-clustered indexes
- Index overhead on writes
- Selectivity and covering indexes
- Index scan vs full table scan
View a sample answer
Database indexes work by creating a separate data structure, typically a B-tree, which allows O(log n) lookup, insertion, and deletion. The index stores a copy of the indexed column(s) and a pointer to the actual row. For range queries, B-tree is efficient, while hash indexes are better for exact equality lookups but do not support ranges. Clustered indexes determine the physical order of data on disk, so there can be only one per table; non-clustered indexes are separate structures. Trade-offs include increased write overhead (each insert/update must update all indexes) and additional storage space. Indexes with high selectivity (many distinct values) are more effective. A common pitfall is over-indexing, which can degrade write performance significantly, especially on high-throughput systems. Covering indexes can improve read performance by including all columns needed for a query, avoiding access to the main table.
TechnicalExplain how you would make a payment endpoint idempotent.
What a strong answer covers
- Idempotency key in request header
- Deduplication with database unique constraint
- Idempotency window with TTL
- Locking or optimistic concurrency
- Handling retries with same key
View a sample answer
To make a payment endpoint idempotent, we require the client to send an idempotency key (e.g., UUID) in the request header or body. The server stores a mapping of the key to the result of the first successful processing (e.g., payment status). On receiving a request, we check if the key already exists in a dedicated idempotency store (e.g., a Redis or a database table with a unique constraint). If it exists, we return the stored response without processing again. If not, we atomically insert the key and process the payment. The idempotency window (e.g., 24 hours) ensures old keys are cleaned up. A common pitfall is not handling partial failures: if the server crashes after processing but before storing the result, a retry might double-charge. To mitigate, use transactional outbox pattern or idempotency key in the database transaction. Also, ensure idempotency keys are generated client-side with sufficient randomness.

CodingImplement an LRU cache with O(1) get and put.

What a strong answer covers

Hash map for O(1) access
Doubly linked list for LRU order
Eviction policy: remove least recently used
Thread safety considerations

View a sample answer

An LRU cache, evicts the least recently used item when full, requires O(1) time for get and put. The implementation uses a hash map for direct node access and a doubly linked list to maintain order. The hash map maps keys to nodes in the list. On get, we move the accessed node to the head of the list. On put, if the key exists, we remove its node and add a new node at the head; if the cache is full, we remove the node just before the tail (the least recently used) and delete it from the map. A common pitfall is not handling the case when the cache is empty or has one element correctly. Thread safety can be added with locks or using collections.OrderedDict in Python, but manual implementation gives better understanding. The trade-off is space O(capacity) for the data structures.

Reference solutionpython

class Node:
    def __init__(self, key, val):
        self.key = key
        self.val = val
        self.prev = None
        self.next = None

class LRUCache:
    def __init__(self, capacity: int):
        self.cap = capacity
        self.cache = {}  # key -> node
        # dummy head and tail
        self.head = Node(0, 0)
        self.tail = Node(0, 0)
        self.head.next = self.tail
        self.tail.prev = self.head

    def _remove(self, node):
        prev, nxt = node.prev, node.next
        prev.next = nxt
        nxt.prev = prev

    def _add(self, node):
        # add to front (right after head)
        nxt = self.head.next
        self.head.next = node
        node.prev = self.head
        node.next = nxt
        nxt.prev = node

    def get(self, key: int) -> int:
        if key in self.cache:
            node = self.cache[key]
            self._remove(node)
            self._add(node)  # move to front
            return node.val
        return -1

    def put(self, key: int, value: int) -> None:
        if key in self.cache:
            self._remove(self.cache[key])
        node = Node(key, value)
        self._add(node)
        self.cache[key] = node
        if len(self.cache) > self.cap:
            # evict least recently used (tail.prev)
            lru = self.tail.prev
            self._remove(lru)
            del self.cache[lru.key]

# Time Complexity: O(1) for both get and put
# Space Complexity: O(capacity)

CodingGiven a stream of events, design a rate limiter for an API.
What a strong answer covers
- Token bucket vs sliding window algorithms
- Per-user/client rate limiting
- Distributed counters with Redis
- Handling bursts with smoothing
- Scalability: sharding and per-node limits
View a sample answer
For an API rate limiter handling a stream of events, I would use the token bucket algorithm because it allows bursts while enforcing a long-term average rate. Each user gets a bucket of tokens that refills at a fixed rate; when a request arrives, a token is consumed. To implement in a distributed system, I'd use Redis with a sorted set or counter per user, with a sliding window to avoid fixed-boundary issues. For high throughput, I'd shard users across multiple Redis instances. The token bucket can be implemented with two keys: last refill timestamp and token count. A common pitfall is clock skew between servers; using Redis with atomic operations (like Lua scripts) mitigates this. Alternatively, a sliding window log with a sorted set of timestamps per user ensures precise counting but uses more memory. The choice depends on whether burst tolerance or strict limit is needed.
System DesignDesign a URL shortener that handles billions of redirects.
What a strong answer covers
- Base62 encoding for short IDs
- Key-value store for mapping (e.g., Redis, DynamoDB)
- Hash generation with collision resolution
- Caching popular URLs with CDN
- Scaling: consistent hashing and read replicas
View a sample answer
A URL shortener for billions of redirects must focus on high availability and low latency. Use a distributed key-value store (e.g., Redis or DynamoDB) to map short codes to long URLs. For generating short codes, I'd use a counter-based approach with a distributed atomic counter (e.g., Redis INCR) and encode the number in Base62 (0-9a-zA-Z). To avoid collisions, ensure uniqueness by deriving from an auto-incrementing ID. For billions of redirects, read traffic dominates; cache popular short URLs in a CDN or memory cache with appropriate TTL. Write operations are less frequent and can be handled by a load-balanced set of application servers. To handle scale, shard the database by short code hash using consistent hashing. A common pitfall is generating short codes that are too long; aim for 6-7 characters (62^6 ≈ 56 billion combinations). Also, ensure the redirect is a 301 (permanent) for browsers to cache, or 302 if analytics are needed.
BehavioralDescribe an outage you helped resolve and what you changed afterward.
What a strong answer covers
- Specific incident with root cause
- Monitoring and alerting improvements
- Blameless postmortem culture
- Action items like runbooks and automation
- Testing changes in staging environment
View a sample answer
During an outage, our payment service became unavailable due to database connection pool exhaustion. A new feature introduced a slow query that held connections longer than expected, causing new requests to queue and timeout. I led the postmortem, which involved identifying the query using slow query logs, adding proper indexing, and implementing query timeout at the application level (e.g., 500ms). We also added monitoring on connection pool utilization and set up alerts when usage exceeds 80%. Additionally, we created a runbook for connection pool issues and integrated it into our on-call training. The key change was to enforce a review process for any new query to include an explain plan and load test in staging. This incident reinforced the importance of proactive monitoring and blameless culture for faster learning.
BehavioralTell me about a time you had to make a hard data-consistency trade-off.
What a strong answer covers
- CAP theorem: availability vs consistency
- Eventual consistency with idempotency
- Compensating transactions
- Conflict resolution strategies (e.g., LWW)
- Example from distributed system design
View a sample answer
In a distributed payment system, we chose eventual consistency over strong consistency for read-heavy operations, accepting that users might briefly see outdated balances. The trade-off was necessary to maintain high availability across multiple regions meeting SLAs. We implemented idempotency keys on write operations to prevent duplicate charges and used last-write-wins (LWW) conflict resolution for balance updates. For sensitive operations (e.g., transfers), we used a two-phase commit with a coordinator, but for most reads, we accepted stale data for up to a few hundred milliseconds. This was justified because users could refresh and the system would converge quickly. The compensating transaction pattern was used for refunds. A common pitfall is not having a clear expiration or fallback for strong consistency checks; we added a 'consistency level' parameter per request. This trade-off required thorough documentation and client education to manage expectations.

What interviewers assess

Data modeling

Relational vs. NoSQL, indexing, normalization and transactions.

API design

REST/GraphQL/gRPC, idempotency, pagination and versioning.

Concurrency

Locks, race conditions, queues and eventual consistency.

System design

Caching, sharding, replication and failure handling at scale.

Algorithms

Complexity analysis and practical data-structure choices.

How to prepare

State your assumptions and constraints before designing — interviewers reward scoping.
Always analyze time and space complexity for coding answers.
For system design, drive the conversation: requirements, API, data model, scale, failure modes.

Frequently asked questions

What system design questions come up in backend interviews?

Common prompts include designing a URL shortener, a rate limiter, a news feed, or a chat system, with discussion of caching, sharding and consistency.

How much algorithms knowledge do backend interviews need?

Most companies still include one or two coding rounds focused on data structures and complexity, even though day-to-day work is more about systems.

How do I practice backend interviews effectively?

Combine coding drills with spoken system-design practice, and run mock interviews so you can defend trade-offs out loud.

Practice Backend Engineer questions with instant AI feedback

Offersly runs a mock interview tailored to your resume and target role, then scores every answer on relevance, depth, clarity and correctness.

Start free All Backend Engineer interview questions

Sample Backend Engineer interview questions

TechnicalWhen would you choose a SQL database over a NoSQL store, and why?

What a strong answer covers

ACID vs BASE trade-offs
Structured schema vs flexible schema
Join support vs denormalization
Vertical scaling vs horizontal scaling
Consistency and reliability requirements

View a sample answer

SQL databases are preferable when you need strong consistency, complex joins, and structured data with relational integrity. For example, a banking system requires ACID transactions to ensure accurate balances. NoSQL stores are better for high write throughput, flexible schemas, and horizontal scaling, like in real-time analytics or user sessions. The choice also depends on the query patterns: SQL excels at ad-hoc queries with joins, while NoSQL favors key-value or document lookups. A common pitfall is assuming NoSQL is always faster; for many workloads with complex relationships, SQL can be more performant due to optimized query planning. Also, consider operational maturity: SQL databases have more mature tooling for backups and replication.

TechnicalHow do database indexes work, and what are their trade-offs?

What a strong answer covers

B-tree vs hash index structure
Clustered vs non-clustered indexes
Index overhead on writes
Selectivity and covering indexes
Index scan vs full table scan

View a sample answer

Database indexes work by creating a separate data structure, typically a B-tree, which allows O(log n) lookup, insertion, and deletion. The index stores a copy of the indexed column(s) and a pointer to the actual row. For range queries, B-tree is efficient, while hash indexes are better for exact equality lookups but do not support ranges. Clustered indexes determine the physical order of data on disk, so there can be only one per table; non-clustered indexes are separate structures. Trade-offs include increased write overhead (each insert/update must update all indexes) and additional storage space. Indexes with high selectivity (many distinct values) are more effective. A common pitfall is over-indexing, which can degrade write performance significantly, especially on high-throughput systems. Covering indexes can improve read performance by including all columns needed for a query, avoiding access to the main table.

TechnicalExplain how you would make a payment endpoint idempotent.

What a strong answer covers

Idempotency key in request header
Deduplication with database unique constraint
Idempotency window with TTL
Locking or optimistic concurrency
Handling retries with same key

View a sample answer

To make a payment endpoint idempotent, we require the client to send an idempotency key (e.g., UUID) in the request header or body. The server stores a mapping of the key to the result of the first successful processing (e.g., payment status). On receiving a request, we check if the key already exists in a dedicated idempotency store (e.g., a Redis or a database table with a unique constraint). If it exists, we return the stored response without processing again. If not, we atomically insert the key and process the payment. The idempotency window (e.g., 24 hours) ensures old keys are cleaned up. A common pitfall is not handling partial failures: if the server crashes after processing but before storing the result, a retry might double-charge. To mitigate, use transactional outbox pattern or idempotency key in the database transaction. Also, ensure idempotency keys are generated client-side with sufficient randomness.

CodingImplement an LRU cache with O(1) get and put.

What a strong answer covers

Hash map for O(1) access
Doubly linked list for LRU order
Eviction policy: remove least recently used
Thread safety considerations

View a sample answer

Reference solutionpython

class Node:
    def __init__(self, key, val):
        self.key = key
        self.val = val
        self.prev = None
        self.next = None

class LRUCache:
    def __init__(self, capacity: int):
        self.cap = capacity
        self.cache = {}  # key -> node
        # dummy head and tail
        self.head = Node(0, 0)
        self.tail = Node(0, 0)
        self.head.next = self.tail
        self.tail.prev = self.head

    def _remove(self, node):
        prev, nxt = node.prev, node.next
        prev.next = nxt
        nxt.prev = prev

    def _add(self, node):
        # add to front (right after head)
        nxt = self.head.next
        self.head.next = node
        node.prev = self.head
        node.next = nxt
        nxt.prev = node

    def get(self, key: int) -> int:
        if key in self.cache:
            node = self.cache[key]
            self._remove(node)
            self._add(node)  # move to front
            return node.val
        return -1

    def put(self, key: int, value: int) -> None:
        if key in self.cache:
            self._remove(self.cache[key])
        node = Node(key, value)
        self._add(node)
        self.cache[key] = node
        if len(self.cache) > self.cap:
            # evict least recently used (tail.prev)
            lru = self.tail.prev
            self._remove(lru)
            del self.cache[lru.key]

# Time Complexity: O(1) for both get and put
# Space Complexity: O(capacity)

CodingGiven a stream of events, design a rate limiter for an API.

What a strong answer covers

Token bucket vs sliding window algorithms
Per-user/client rate limiting
Distributed counters with Redis
Handling bursts with smoothing
Scalability: sharding and per-node limits

View a sample answer

For an API rate limiter handling a stream of events, I would use the token bucket algorithm because it allows bursts while enforcing a long-term average rate. Each user gets a bucket of tokens that refills at a fixed rate; when a request arrives, a token is consumed. To implement in a distributed system, I'd use Redis with a sorted set or counter per user, with a sliding window to avoid fixed-boundary issues. For high throughput, I'd shard users across multiple Redis instances. The token bucket can be implemented with two keys: last refill timestamp and token count. A common pitfall is clock skew between servers; using Redis with atomic operations (like Lua scripts) mitigates this. Alternatively, a sliding window log with a sorted set of timestamps per user ensures precise counting but uses more memory. The choice depends on whether burst tolerance or strict limit is needed.

System DesignDesign a URL shortener that handles billions of redirects.

What a strong answer covers

Base62 encoding for short IDs
Key-value store for mapping (e.g., Redis, DynamoDB)
Hash generation with collision resolution
Caching popular URLs with CDN
Scaling: consistent hashing and read replicas

View a sample answer

A URL shortener for billions of redirects must focus on high availability and low latency. Use a distributed key-value store (e.g., Redis or DynamoDB) to map short codes to long URLs. For generating short codes, I'd use a counter-based approach with a distributed atomic counter (e.g., Redis INCR) and encode the number in Base62 (0-9a-zA-Z). To avoid collisions, ensure uniqueness by deriving from an auto-incrementing ID. For billions of redirects, read traffic dominates; cache popular short URLs in a CDN or memory cache with appropriate TTL. Write operations are less frequent and can be handled by a load-balanced set of application servers. To handle scale, shard the database by short code hash using consistent hashing. A common pitfall is generating short codes that are too long; aim for 6-7 characters (62^6 ≈ 56 billion combinations). Also, ensure the redirect is a 301 (permanent) for browsers to cache, or 302 if analytics are needed.

BehavioralDescribe an outage you helped resolve and what you changed afterward.

What a strong answer covers

Specific incident with root cause
Monitoring and alerting improvements
Blameless postmortem culture
Action items like runbooks and automation
Testing changes in staging environment

View a sample answer

During an outage, our payment service became unavailable due to database connection pool exhaustion. A new feature introduced a slow query that held connections longer than expected, causing new requests to queue and timeout. I led the postmortem, which involved identifying the query using slow query logs, adding proper indexing, and implementing query timeout at the application level (e.g., 500ms). We also added monitoring on connection pool utilization and set up alerts when usage exceeds 80%. Additionally, we created a runbook for connection pool issues and integrated it into our on-call training. The key change was to enforce a review process for any new query to include an explain plan and load test in staging. This incident reinforced the importance of proactive monitoring and blameless culture for faster learning.

BehavioralTell me about a time you had to make a hard data-consistency trade-off.

What a strong answer covers

CAP theorem: availability vs consistency
Eventual consistency with idempotency
Compensating transactions
Conflict resolution strategies (e.g., LWW)
Example from distributed system design

View a sample answer

In a distributed payment system, we chose eventual consistency over strong consistency for read-heavy operations, accepting that users might briefly see outdated balances. The trade-off was necessary to maintain high availability across multiple regions meeting SLAs. We implemented idempotency keys on write operations to prevent duplicate charges and used last-write-wins (LWW) conflict resolution for balance updates. For sensitive operations (e.g., transfers), we used a two-phase commit with a coordinator, but for most reads, we accepted stale data for up to a few hundred milliseconds. This was justified because users could refresh and the system would converge quickly. The compensating transaction pattern was used for refunds. A common pitfall is not having a clear expiration or fallback for strong consistency checks; we added a 'consistency level' parameter per request. This trade-off required thorough documentation and client education to manage expectations.

What interviewers assess

Data modeling

Relational vs. NoSQL, indexing, normalization and transactions.

API design

REST/GraphQL/gRPC, idempotency, pagination and versioning.

Concurrency

Locks, race conditions, queues and eventual consistency.

System design

Caching, sharding, replication and failure handling at scale.

Algorithms

Complexity analysis and practical data-structure choices.

Frequently asked questions

What system design questions come up in backend interviews?

Common prompts include designing a URL shortener, a rate limiter, a news feed, or a chat system, with discussion of caching, sharding and consistency.

How much algorithms knowledge do backend interviews need?

Most companies still include one or two coding rounds focused on data structures and complexity, even though day-to-day work is more about systems.

How do I practice backend interviews effectively?

Combine coding drills with spoken system-design practice, and run mock interviews so you can defend trade-offs out loud.