Microservices Interview Questions

Microservices interviews assess your ability to design and build scalable, decentralized systems. They often target senior engineers and architects with deep knowledge of distributed computing, APIs, and data consistency. Expect a mix of architectural discussions, trade-off analyses, and practical coding scenarios.

Practice with AI — free

What Microservices interviews cover

Service Decomposition & Design

How to split a monolith into services, bounded contexts, and domain-driven design principles.

Inter-Service Communication

Synchronous vs asynchronous patterns, API gateways, service mesh, and communication protocols.

Data Management & Consistency

Handling distributed transactions, eventual consistency, sagas, and CQRS/event sourcing.

Deployment & Observability

Containerization, orchestration (Kubernetes), monitoring, logging, and tracing in distributed systems.

Sample Microservices interview questions

How would you decompose a large monolith into microservices? Walk through your decision process.
What a strong answer covers
- Start with domain analysis and bounded contexts using Domain-Driven Design.
- Identify subdomains based on business capabilities and core domains.
- Use event storming to understand workflows and aggregates.
- Define service boundaries by data ownership and change frequency.
- Implement incremental migration via strangler fig pattern and feature toggles.
View a sample answer
Decomposition begins with deep domain analysis: interview domain experts, model the business into subdomains (core, supporting, generic). Use event storming to discover aggregates and bounded contexts. Prioritize services with high coupling and frequent changes for extraction. Define each service's data ownership and APIs (REST or event-driven). Migrate incrementally using the strangler fig pattern: add a new service alongside the monolith, route new features to it, and gradually deprecate monolith functions. Use feature toggles to toggle between old and new implementations. Pitfalls: breaking transactional consistency, increasing network latency, and data sync issues between services. Mitigate with saga patterns and eventual consistency.
Explain the difference between orchestration and choreography in saga patterns. When would you use each?
What a strong answer covers
- Orchestration uses a central coordinator to manage saga steps and compensations.
- Choreography relies on event-driven interactions where each service reacts to events.
- Orchestration suits complex workflows with many services and strict ordering requirements.
- Choreography is better for simpler workflows, evolving systems, and lower coupling.
View a sample answer
In saga patterns, orchestration employs a coordinator service that tells each participant what to do and handles compensations centrally. This provides clear flow control and error handling but introduces a single point of failure and tight coupling to the coordinator. Choreography uses events: each service publishes events after its local transaction, and subsequent services subscribe and react. This minimizes direct dependencies but can lead to scattered logic and difficulty in monitoring the saga's progress. Choose orchestration when workflows are complex, require strict ordering, or when teams need explicit visibility. Choose choreography when you have simpler workflows, desire loose coupling, and can tolerate eventual consistency and event debugging challenges.

Write a simple circuit breaker implementation in your preferred language (pseudocode accepted).

What a strong answer covers

Circuit breaker has three states: CLOSED, OPEN, HALF_OPEN.
Tracks failure count and transitions to OPEN when threshold exceeded.
After timeout, goes to HALF_OPEN to test if service recovered.
Uses lock to handle concurrency.

View a sample answer

A circuit breaker wraps a remote service call and monitors failures. When failures exceed a threshold, it opens the circuit and fast-fails subsequent calls without invoking the service, preventing cascading failures. After a recovery timeout, it transitions to half-open to allow a test call. If successful, the circuit closes; otherwise it stays open. The provided Python implementation uses a threading lock for thread safety. The complexity of the wrapper is O(1) per call, but the underlying function determines actual cost. Pitfalls: not resetting failure count on partial success, not handling transient failures well, and needing careful tuning of thresholds.

Reference solutionpython

import time
import threading

class CircuitBreaker:
    def __init__(self, failure_threshold=5, recovery_timeout=30):
        self.failure_threshold = failure_threshold
        self.recovery_timeout = recovery_timeout
        self.failure_count = 0
        self.state = 'CLOSED'
        self.last_failure_time = None
        self.lock = threading.Lock()

    def call(self, func, *args, **kwargs):
        with self.lock:
            if self.state == 'OPEN':
                if time.time() - self.last_failure_time > self.recovery_timeout:
                    self.state = 'HALF_OPEN'
                else:
                    raise Exception("Circuit breaker is OPEN")
        try:
            result = func(*args, **kwargs)
        except Exception as e:
            with self.lock:
                self.failure_count += 1
                self.last_failure_time = time.time()
                if self.failure_count >= self.failure_threshold:
                    self.state = 'OPEN'
            raise e
        else:
            with self.lock:
                if self.state == 'HALF_OPEN':
                    self.state = 'CLOSED'
                    self.failure_count = 0
            return result

# Usage example
def unreliable_service():
    if time.time() % 10 < 3:  # Simulate failure
        raise Exception("Service error")
    return "Success"

cb = CircuitBreaker(failure_threshold=3, recovery_timeout=5)
for i in range(20):
    try:
        print(cb.call(unreliable_service))
    except Exception as e:
        print(e)
    time.sleep(1)

# Time complexity: O(1) for call (excluding function). Space: O(1).

How do you handle distributed transactions with eventual consistency? Provide a concrete example.
What a strong answer covers
- Use saga pattern with compensating transactions for rollback.
- Ensure idempotency of operations to handle retries.
- Publish events after local transactions and rely on asynchronous processing.
- Example: order placement with inventory and payment services.
View a sample answer
Distributed transactions require eventual consistency because ACID across services is impossible without distributed locking (like 2PC) which hurts availability. A common approach is the saga pattern, where each service executes a local transaction and emits an event. If a step fails, compensating transactions undo previous steps. For example, an order saga: Order Service creates an order in PENDING state, publishes 'OrderCreated' event. Inventory Service reserves stock, publishes 'InventoryReserved'. If payment fails, Inventory Service listens and releases the reservation (compensating action). Idempotency keys and retry mechanisms handle message duplicates. Sagas can be orchestrated (central coordinator) or choreographed (events). Pitfalls: compensating actions must be designed, and eventual consistency means temporary inconsistencies (e.g., order visible but payment pending) must be acceptable to users.
Design an API gateway for a system with 10+ microservices. What features would it include?
What a strong answer covers
- Single entry point for all microservices, implementing routing and load balancing.
- Authentication and authorization (OAuth2, JWT).
- Rate limiting and throttling per service or user.
- Request/response transformation (protocol translation, aggregation).
- Caching, monitoring, logging, and circuit breaker integration.
View a sample answer
An API gateway sits between clients and microservices, providing a unified interface. Key features include: 1) Routing – maps incoming requests to appropriate service instances. 2) Authentication – validates tokens (JWT) and enforces access control. 3) Rate limiting – prevents abuse by limiting requests per client. 4) Load balancing – distributes requests across service instances. 5) Caching – caches responses to reduce load (e.g., for read-heavy endpoints). 6) Request/response transformation – aggregates data from multiple services or converts protocols (e.g., REST to gRPC). 7) Monitoring – collects metrics (latency, errors) and logs. 8) Circuit breaker – protects services from cascading failures. 9) API versioning – routes to different service versions. Pitfalls: gateway becomes a bottleneck or single point of failure; avoid business logic in gateway.
Describe a scenario where you would choose gRPC over REST for inter-service communication.
What a strong answer covers
- gRPC uses HTTP/2, binary framing, and protocol buffers for better performance.
- Supports bidirectional streaming, ideal for real-time data.
- Strongly typed contracts enforce service interface consistency.
- Best for internal services within same data center or low-latency requirements.
View a sample answer
gRPC is preferred over REST when performance and low latency are critical, especially for internal microservice-to-microservice communication. Its binary format (Protocol Buffers) is smaller and faster to serialize than JSON. HTTP/2 multiplexing reduces head-of-line blocking and allows streaming (server-streaming, client-streaming, bidirectional). gRPC also generates client and server stubs from .proto files, ensuring type safety. Example scenario: a real-time analytics pipeline where services continuously stream data for aggregation. REST is simpler for external APIs, human readability, and where clients are diverse (browsers, mobile). Pitfalls: gRPC requires HTTP/2 support (not all load balancers); debugging binary payloads is harder; browser support is limited (needs gRPC-Web).
How would you implement distributed tracing across microservices? What tools would you use?
What a strong answer covers
- Use OpenTelemetry for instrumentation, generating trace IDs and span contexts.
- Propagate context via HTTP headers (e.g., traceparent).
- Collect traces and store in a backend like Jaeger or Zipkin.
- Use sampling to reduce overhead while retaining diagnostic data.
View a sample answer
Distributed tracing is implemented by instrumenting each service to generate spans that represent units of work. The OpenTelemetry SDK provides libraries to automatically capture incoming and outgoing requests, attaching a trace ID and parent span ID. This context is propagated via HTTP headers (e.g., 'traceparent') across service calls. A collector (e.g., Jaeger agent) forwards spans to a backend (Jaeger, Zipkin) for storage and visualization. Sampling strategies (rate-limiting, probabilistic) control the volume of traces. Tools: OpenTelemetry for instrumentation, Jaeger or Zipkin for storage/UI, and Prometheus/Grafana for metrics correlation. Pitfalls: high overhead if sampling rate too high; missing traces if context propagation is not correctly implemented; clock skew between services.

Code a health check endpoint that reports the status of downstream dependencies (e.g., database, cache, other services).

What a strong answer covers

Endpoint returns overall status and per-dependency status (ok/error).
Checks database connectivity with a simple query or ping.
Checks cache (e.g., Redis) with ping command.
Checks other services via HTTP health endpoint.
Timeout and error handling to avoid blocking the health endpoint.

View a sample answer

A health check endpoint reports the service's own status and its dependencies. This implementation uses Flask and checks a PostgreSQL database (connects with timeout), Redis cache (ping), and an external service (HTTP GET with timeout). Each check returns a status dictionary. The overall status is 'ok' only if all dependencies are healthy; otherwise it returns HTTP 503 and indicates 'degraded'. The endpoint sets timeouts to prevent hanging. Pitfalls: health checks themselves can become a point of failure if they are too heavy or have long timeouts; avoid checking every single connection and instead use lightweight probes. Also, consider caching health results to reduce load.

Reference solutionpython

from flask import Flask, jsonify
import redis
import psycopg2
import requests
import os

app = Flask(__name__)

# Dependency configurations (from env or defaults)
DB_HOST = os.environ.get('DB_HOST', 'localhost')
REDIS_HOST = os.environ.get('REDIS_HOST', 'localhost')
SERVICE_URL = os.environ.get('SERVICE_URL', 'http://other-service/health')

def check_database():
    try:
        conn = psycopg2.connect(host=DB_HOST, dbname='mydb', user='user', password='pass', connect_timeout=3)
        conn.close()
        return {"status": "ok"}
    except Exception as e:
        return {"status": "error", "message": str(e)}

def check_cache():
    try:
        r = redis.Redis(host=REDIS_HOST, socket_connect_timeout=3)
        r.ping()
        return {"status": "ok"}
    except Exception as e:
        return {"status": "error", "message": str(e)}

def check_other_service():
    try:
        resp = requests.get(SERVICE_URL, timeout=3)
        if resp.status_code == 200:
            return {"status": "ok"}
        else:
            return {"status": "error", "message": "non-200 response"}
    except Exception as e:
        return {"status": "error", "message": str(e)}

@app.route('/health')
def health():    
    deps = {
        "database": check_database(),
        "cache": check_cache(),
        "other_service": check_other_service()
    }
    overall = all(dep["status"] == "ok" for dep in deps.values())
    status_code = 200 if overall else 503
    return jsonify({"status": "ok" if overall else "degraded", "dependencies": deps}), status_code

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=8080)

# Time complexity: O(1) per dependency check (network time dominant). Space: O(1).

How to prepare

Master distributed systems fundamentals: CAP theorem, consistency models, and fault tolerance.
Practice service decomposition using domain-driven design (entities, aggregates, bounded contexts).
Understand eventual consistency and how to implement saga patterns (e.g., with compensation transactions).
Be ready to discuss trade-offs: orchestration vs choreography, synchronous vs async, stateful vs stateless.
Prepare for system design problems: practice whiteboarding a microservices architecture with focus on communication, data, and observability.

Frequently asked questions

What are the most important concepts for a microservices interview?

Key concepts include service decomposition, inter-service communication (REST, gRPC, messaging), data consistency (sagas, eventual consistency), and observability (logging, metrics, traces).

Should I know about Kubernetes for microservices interviews?

Yes, Kubernetes is often used to orchestrate containers. You should understand pods, services, deployments, and how to manage microservices in a cluster.

How do microservices interviews differ from general system design interviews?

Microservices interviews focus specifically on distributed architecture challenges like service boundaries, data distribution, and network resilience, while general system design may cover monoliths as well.

What coding questions can I expect in a microservices interview?

You might be asked to implement a circuit breaker, a retry mechanism, or a simple service health endpoint. Also, you may need to design an API or write code for inter-service communication.

How can I demonstrate experience with microservices if I haven't worked with them in production?

Build a small project using microservices patterns (e.g., with Docker, Node.js/Spring Boot, and a message queue). Discuss trade-offs and lessons learned. Theoretical knowledge combined with a personal project can be convincing.

Practice Microservices questions with instant AI feedback

Upload your resume, get a personalized mock interview, and see exactly what to improve — free to start.

Start free Browse all interview guides

Microservices Interview Questions

What Microservices interviews cover

Service Decomposition & Design

How to split a monolith into services, bounded contexts, and domain-driven design principles.

Inter-Service Communication

Synchronous vs asynchronous patterns, API gateways, service mesh, and communication protocols.

Data Management & Consistency

Handling distributed transactions, eventual consistency, sagas, and CQRS/event sourcing.

Deployment & Observability

Containerization, orchestration (Kubernetes), monitoring, logging, and tracing in distributed systems.

Sample Microservices interview questions

How would you decompose a large monolith into microservices? Walk through your decision process.

What a strong answer covers

Start with domain analysis and bounded contexts using Domain-Driven Design.
Identify subdomains based on business capabilities and core domains.
Use event storming to understand workflows and aggregates.
Define service boundaries by data ownership and change frequency.
Implement incremental migration via strangler fig pattern and feature toggles.

View a sample answer

Decomposition begins with deep domain analysis: interview domain experts, model the business into subdomains (core, supporting, generic). Use event storming to discover aggregates and bounded contexts. Prioritize services with high coupling and frequent changes for extraction. Define each service's data ownership and APIs (REST or event-driven). Migrate incrementally using the strangler fig pattern: add a new service alongside the monolith, route new features to it, and gradually deprecate monolith functions. Use feature toggles to toggle between old and new implementations. Pitfalls: breaking transactional consistency, increasing network latency, and data sync issues between services. Mitigate with saga patterns and eventual consistency.

Explain the difference between orchestration and choreography in saga patterns. When would you use each?

What a strong answer covers

Orchestration uses a central coordinator to manage saga steps and compensations.
Choreography relies on event-driven interactions where each service reacts to events.
Orchestration suits complex workflows with many services and strict ordering requirements.
Choreography is better for simpler workflows, evolving systems, and lower coupling.

View a sample answer

In saga patterns, orchestration employs a coordinator service that tells each participant what to do and handles compensations centrally. This provides clear flow control and error handling but introduces a single point of failure and tight coupling to the coordinator. Choreography uses events: each service publishes events after its local transaction, and subsequent services subscribe and react. This minimizes direct dependencies but can lead to scattered logic and difficulty in monitoring the saga's progress. Choose orchestration when workflows are complex, require strict ordering, or when teams need explicit visibility. Choose choreography when you have simpler workflows, desire loose coupling, and can tolerate eventual consistency and event debugging challenges.

Write a simple circuit breaker implementation in your preferred language (pseudocode accepted).

What a strong answer covers

Circuit breaker has three states: CLOSED, OPEN, HALF_OPEN.
Tracks failure count and transitions to OPEN when threshold exceeded.
After timeout, goes to HALF_OPEN to test if service recovered.
Uses lock to handle concurrency.

View a sample answer

Reference solutionpython

import time
import threading

class CircuitBreaker:
    def __init__(self, failure_threshold=5, recovery_timeout=30):
        self.failure_threshold = failure_threshold
        self.recovery_timeout = recovery_timeout
        self.failure_count = 0
        self.state = 'CLOSED'
        self.last_failure_time = None
        self.lock = threading.Lock()

    def call(self, func, *args, **kwargs):
        with self.lock:
            if self.state == 'OPEN':
                if time.time() - self.last_failure_time > self.recovery_timeout:
                    self.state = 'HALF_OPEN'
                else:
                    raise Exception("Circuit breaker is OPEN")
        try:
            result = func(*args, **kwargs)
        except Exception as e:
            with self.lock:
                self.failure_count += 1
                self.last_failure_time = time.time()
                if self.failure_count >= self.failure_threshold:
                    self.state = 'OPEN'
            raise e
        else:
            with self.lock:
                if self.state == 'HALF_OPEN':
                    self.state = 'CLOSED'
                    self.failure_count = 0
            return result

# Usage example
def unreliable_service():
    if time.time() % 10 < 3:  # Simulate failure
        raise Exception("Service error")
    return "Success"

cb = CircuitBreaker(failure_threshold=3, recovery_timeout=5)
for i in range(20):
    try:
        print(cb.call(unreliable_service))
    except Exception as e:
        print(e)
    time.sleep(1)

# Time complexity: O(1) for call (excluding function). Space: O(1).

How do you handle distributed transactions with eventual consistency? Provide a concrete example.

What a strong answer covers

Use saga pattern with compensating transactions for rollback.
Ensure idempotency of operations to handle retries.
Publish events after local transactions and rely on asynchronous processing.
Example: order placement with inventory and payment services.

View a sample answer

Distributed transactions require eventual consistency because ACID across services is impossible without distributed locking (like 2PC) which hurts availability. A common approach is the saga pattern, where each service executes a local transaction and emits an event. If a step fails, compensating transactions undo previous steps. For example, an order saga: Order Service creates an order in PENDING state, publishes 'OrderCreated' event. Inventory Service reserves stock, publishes 'InventoryReserved'. If payment fails, Inventory Service listens and releases the reservation (compensating action). Idempotency keys and retry mechanisms handle message duplicates. Sagas can be orchestrated (central coordinator) or choreographed (events). Pitfalls: compensating actions must be designed, and eventual consistency means temporary inconsistencies (e.g., order visible but payment pending) must be acceptable to users.

Design an API gateway for a system with 10+ microservices. What features would it include?

What a strong answer covers

Single entry point for all microservices, implementing routing and load balancing.
Authentication and authorization (OAuth2, JWT).
Rate limiting and throttling per service or user.
Request/response transformation (protocol translation, aggregation).
Caching, monitoring, logging, and circuit breaker integration.

View a sample answer

An API gateway sits between clients and microservices, providing a unified interface. Key features include: 1) Routing – maps incoming requests to appropriate service instances. 2) Authentication – validates tokens (JWT) and enforces access control. 3) Rate limiting – prevents abuse by limiting requests per client. 4) Load balancing – distributes requests across service instances. 5) Caching – caches responses to reduce load (e.g., for read-heavy endpoints). 6) Request/response transformation – aggregates data from multiple services or converts protocols (e.g., REST to gRPC). 7) Monitoring – collects metrics (latency, errors) and logs. 8) Circuit breaker – protects services from cascading failures. 9) API versioning – routes to different service versions. Pitfalls: gateway becomes a bottleneck or single point of failure; avoid business logic in gateway.

Describe a scenario where you would choose gRPC over REST for inter-service communication.

What a strong answer covers

gRPC uses HTTP/2, binary framing, and protocol buffers for better performance.
Supports bidirectional streaming, ideal for real-time data.
Strongly typed contracts enforce service interface consistency.
Best for internal services within same data center or low-latency requirements.

View a sample answer

gRPC is preferred over REST when performance and low latency are critical, especially for internal microservice-to-microservice communication. Its binary format (Protocol Buffers) is smaller and faster to serialize than JSON. HTTP/2 multiplexing reduces head-of-line blocking and allows streaming (server-streaming, client-streaming, bidirectional). gRPC also generates client and server stubs from .proto files, ensuring type safety. Example scenario: a real-time analytics pipeline where services continuously stream data for aggregation. REST is simpler for external APIs, human readability, and where clients are diverse (browsers, mobile). Pitfalls: gRPC requires HTTP/2 support (not all load balancers); debugging binary payloads is harder; browser support is limited (needs gRPC-Web).

How would you implement distributed tracing across microservices? What tools would you use?

What a strong answer covers

Use OpenTelemetry for instrumentation, generating trace IDs and span contexts.
Propagate context via HTTP headers (e.g., traceparent).
Collect traces and store in a backend like Jaeger or Zipkin.
Use sampling to reduce overhead while retaining diagnostic data.

View a sample answer

Distributed tracing is implemented by instrumenting each service to generate spans that represent units of work. The OpenTelemetry SDK provides libraries to automatically capture incoming and outgoing requests, attaching a trace ID and parent span ID. This context is propagated via HTTP headers (e.g., 'traceparent') across service calls. A collector (e.g., Jaeger agent) forwards spans to a backend (Jaeger, Zipkin) for storage and visualization. Sampling strategies (rate-limiting, probabilistic) control the volume of traces. Tools: OpenTelemetry for instrumentation, Jaeger or Zipkin for storage/UI, and Prometheus/Grafana for metrics correlation. Pitfalls: high overhead if sampling rate too high; missing traces if context propagation is not correctly implemented; clock skew between services.

Code a health check endpoint that reports the status of downstream dependencies (e.g., database, cache, other services).

What a strong answer covers

Endpoint returns overall status and per-dependency status (ok/error).
Checks database connectivity with a simple query or ping.
Checks cache (e.g., Redis) with ping command.
Checks other services via HTTP health endpoint.
Timeout and error handling to avoid blocking the health endpoint.

View a sample answer

Reference solutionpython

from flask import Flask, jsonify
import redis
import psycopg2
import requests
import os

app = Flask(__name__)

# Dependency configurations (from env or defaults)
DB_HOST = os.environ.get('DB_HOST', 'localhost')
REDIS_HOST = os.environ.get('REDIS_HOST', 'localhost')
SERVICE_URL = os.environ.get('SERVICE_URL', 'http://other-service/health')

def check_database():
    try:
        conn = psycopg2.connect(host=DB_HOST, dbname='mydb', user='user', password='pass', connect_timeout=3)
        conn.close()
        return {"status": "ok"}
    except Exception as e:
        return {"status": "error", "message": str(e)}

def check_cache():
    try:
        r = redis.Redis(host=REDIS_HOST, socket_connect_timeout=3)
        r.ping()
        return {"status": "ok"}
    except Exception as e:
        return {"status": "error", "message": str(e)}

def check_other_service():
    try:
        resp = requests.get(SERVICE_URL, timeout=3)
        if resp.status_code == 200:
            return {"status": "ok"}
        else:
            return {"status": "error", "message": "non-200 response"}
    except Exception as e:
        return {"status": "error", "message": str(e)}

@app.route('/health')
def health():    
    deps = {
        "database": check_database(),
        "cache": check_cache(),
        "other_service": check_other_service()
    }
    overall = all(dep["status"] == "ok" for dep in deps.values())
    status_code = 200 if overall else 503
    return jsonify({"status": "ok" if overall else "degraded", "dependencies": deps}), status_code

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=8080)

# Time complexity: O(1) per dependency check (network time dominant). Space: O(1).

How to prepare

Master distributed systems fundamentals: CAP theorem, consistency models, and fault tolerance.

Practice service decomposition using domain-driven design (entities, aggregates, bounded contexts).

Understand eventual consistency and how to implement saga patterns (e.g., with compensation transactions).

Be ready to discuss trade-offs: orchestration vs choreography, synchronous vs async, stateful vs stateless.

Prepare for system design problems: practice whiteboarding a microservices architecture with focus on communication, data, and observability.

Frequently asked questions

What are the most important concepts for a microservices interview?

Key concepts include service decomposition, inter-service communication (REST, gRPC, messaging), data consistency (sagas, eventual consistency), and observability (logging, metrics, traces).

Should I know about Kubernetes for microservices interviews?

Yes, Kubernetes is often used to orchestrate containers. You should understand pods, services, deployments, and how to manage microservices in a cluster.

How do microservices interviews differ from general system design interviews?

What coding questions can I expect in a microservices interview?

You might be asked to implement a circuit breaker, a retry mechanism, or a simple service health endpoint. Also, you may need to design an API or write code for inter-service communication.