AWS Interview Questions
AWS interviews for senior engineering roles test your depth across core services, architecture patterns, and hands-on problem-solving. You'll face a mix of conceptual discussions (e.g., trade-offs between services) and practical exercises (e.g., designing a scalable system or debugging a deployment). This page covers the most common AWS interview questions, organized by key subtopics, along with actionable prep tips and FAQ to help you ace the interview.
What AWS interviews cover
Compute & Serverless
Covers EC2, Lambda, ECS/EKS, and scalability decisions. Expect questions on auto-scaling, cold starts, and choosing the right compute service.
Storage & Databases
Includes S3, EBS, RDS, DynamoDB, and Redshift. Focus on consistency models, indexing, and cost-performance trade-offs.
Networking & Security
VPC design, subnets, security groups, NACLs, IAM policies, and encryption. Questions often involve designing a secure multi-tier architecture.
Architecture & DevOps
Patterns like microservices, event-driven, CI/CD pipelines, Infrastructure as Code (CloudFormation, CDK), and high availability/disaster recovery.
Sample AWS interview questions
- Explain the difference between a Security Group and a Network ACL. When would you use each?What a strong answer covers
- Security Groups are stateful; NACLs are stateless.
- Security Groups operate at the instance level; NACLs at the subnet level.
- Security Groups evaluate all rules to allow traffic; NACLs evaluate rules in order, stopping at first match.
- Default inbound behavior: Security Groups deny all; NACLs allow all.
View a sample answer
A Security Group acts as a virtual firewall for an EC2 instance, controlling inbound and outbound traffic at the instance level. It is stateful, meaning if you allow inbound traffic, the outbound response is automatically allowed regardless of outbound rules. In contrast, a Network ACL (NACL) is a stateless firewall that operates at the subnet level, requiring explicit rules for both inbound and outbound traffic. Security Groups evaluate all rules together to make a decision, while NACLs evaluate rules in numerical order and stop at the first matching rule. You would use Security Groups when you need fine-grained, stateful control per resource, such as for an EC2 instance or RDS. NACLs are used for broader subnet-level controls, like blocking specific IP ranges at the subnet boundary, and are often used as a secondary layer of defense. A common pitfall is forgetting that NACLs are stateless, which can cause connectivity issues if outbound rules are not properly defined for return traffic.
- Design a highly available and fault-tolerant web application on AWS. Include compute, storage, and networking components.What a strong answer covers
- Use Auto Scaling Group across multiple AZs with Elastic Load Balancer.
- Multi-AZ RDS for database or DynamoDB for NoSQL.
- Store static assets in S3 with CloudFront CDN.
- Use Route53 for DNS routing with health checks.
View a sample answer
To design a highly available and fault-tolerant web application on AWS, start with an Elastic Load Balancer (ALB) distributing traffic across an Auto Scaling Group of EC2 instances deployed in at least two Availability Zones. The instances should launch from an AMI with the application code, or use a CI/CD pipeline to deploy code. For storage, use Amazon S3 for static assets (images, CSS) and optionally CloudFront for content delivery. For the database, use Amazon RDS with Multi-AZ deployment for automatic failover, or Amazon DynamoDB for a fully managed NoSQL option with built-in replication across AZs. For session state, consider ElastiCache or DynamoDB. Use Route53 with failover routing and health checks to redirect traffic if an entire AZ fails. Pitfalls include not distributing instances across AZs, not having proper load balancer health checks, and ignoring database replication lag. Additionally, ensure data backups and disaster recovery plans are in place.
- How would you migrate a monolithic on-premises application to AWS with minimal downtime? Discuss strategy and tools.What a strong answer covers
- Use AWS Application Migration Service (MGN) for lift-and-shift rehosting.
- Create a replication schedule or continuous replication.
- Test cutover in a sandbox environment.
- Use Route53 weighted routing to shift traffic gradually.
View a sample answer
The best strategy for migrating a monolithic on-premises application to AWS with minimal downtime is to use a lift-and-shift (rehost) approach with AWS Application Migration Service (MGN). MGN allows you to replicate the entire server to AWS continuously with minimal impact on the source. First, install the replication agent on the on-premises server. Then, create a replication schedule or use continuous replication to copy data to a staging area in AWS. Once replication is consistent, launch a test instance to validate functionality, network, and performance. For the cutover, you can use Route53 with weighted routing to gradually shift traffic from the on-premises environment to the AWS environment, monitoring health checks. Alternatively, use a VPN or Direct Connect to keep both environments in sync. After successful cutover, decommission the on-premises server. Pitfalls include missing dependencies, data consistency issues during replication (especially for stateful applications), and underestimating network latency. Tools like AWS DMS can help with database migration if needed.
- Write a Lambda function in Python that processes S3 event notifications (e.g., resizing images). Show the handler and IAM policy.What a strong answer covers
- Lambda triggered by S3 ObjectCreated events.
- Use PIL/Pillow for image processing.
- Handler receives event with bucket and key.
- IAM policy allows read/write to S3 and Lambda execution.
View a sample answer
The Lambda function processes S3 events by resizing images upon upload. The function uses the Python Imaging Library (Pillow) to resize the image to a target width and height, then saves the resized image to another S3 bucket (e.g., 'resized-images'). The IAM policy must grant permissions to read from the source bucket and write to the destination bucket. The function should handle errors like invalid image formats. A common pitfall is not using a Lambda layer for Pillow due to size constraints, or forgetting to set the appropriate timeout for larger images.
Reference solutionpython import boto3 import os from PIL import Image import io s3 = boto3.client('s3') def lambda_handler(event, context): # Process S3 event for record in event['Records']: bucket = record['s3']['bucket']['name'] key = record['s3']['object']['key'] # Only process image files if not key.lower().endswith(('.png', '.jpg', '.jpeg', '.gif')): continue # Download image from S3 response = s3.get_object(Bucket=bucket, Key=key) image_content = response['Body'].read() # Open image with PIL image = Image.open(io.BytesIO(image_content)) # Resize to target dimensions (e.g., 300x300) target_width = 300 target_height = 300 image.thumbnail((target_width, target_height)) # Save resized image to buffer buffer = io.BytesIO() image.save(buffer, format=image.format) buffer.seek(0) # Upload resized image to destination bucket dest_bucket = os.environ.get('DEST_BUCKET', 'resized-images') dest_key = f"resized/{key}" s3.put_object(Bucket=dest_bucket, Key=dest_key, Body=buffer) print(f"Resized image {key} to {dest_key}") # IAM Policy (attach to Lambda execution role): # { # "Version": "2012-10-17", # "Statement": [ # { # "Effect": "Allow", # "Action": [ # "s3:GetObject", # "s3:PutObject" # ], # "Resource": [ # "arn:aws:s3:::source-bucket/*", # "arn:aws:s3:::dest-bucket/*" # ] # }, # { # "Effect": "Allow", # "Action": "logs:*", # "Resource": "*" # } # ] # } - Compare RDS, DynamoDB, and ElastiCache. For a real-time leaderboard, which would you choose and why?What a strong answer covers
- RDS: Relational, ACID, SQL; DynamoDB: NoSQL, key-value/document, auto-scaling; ElastiCache: in-memory cache, sub-millisecond latency.
- Real-time leaderboard needs low latency reads and atomic increments.
- DynamoDB with DAX can provide microsecond latency and supports atomic counters.
- ElastiCache (Redis) sorted sets are ideal for leaderboards with O(log N) updates and range queries.
View a sample answer
Amazon RDS is a managed relational database service that supports SQL and ACID transactions, ideal for structured data with complex queries. DynamoDB is a fully managed NoSQL key-value and document database that scales horizontally and provides single-digit millisecond latency at any scale. ElastiCache is an in-memory caching service (Redis or Memcached) that offers sub-millisecond latency. For a real-time leaderboard, ElastiCache with Redis sorted sets is the best choice because it provides atomic increment operations, O(log N) complexity for updating scores and retrieving rankings, and built-in range queries to get top players. DynamoDB with DAX can also work but may require more custom logic and has higher latency than in-memory. RDS is not suitable due to slower writes and lack of native rank support. A common pitfall is underestimating the cost of ElastiCache or not configuring proper eviction policies for large leaderboards.
- Describe a scenario where you would use a VPC endpoint instead of a NAT Gateway. How do they differ in cost and performance?What a strong answer covers
- VPC Endpoints allow private connectivity to AWS services without internet access.
- NAT Gateway enables outbound internet for private subnets.
- VPC Endpoints are cheaper for high-bandwidth traffic to supported services (e.g., S3, DynamoDB).
- NAT Gateway has hourly charges and data processing fees; VPC Endpoints have hourly charges but no data processing fees for gateway endpoints.
View a sample answer
A VPC Endpoint (specifically a Gateway Endpoint for S3 and DynamoDB) allows private connectivity from your VPC to AWS services without traversing the internet or a NAT. A NAT Gateway provides outbound internet access for instances in private subnets. You would use a VPC Endpoint when you need to access S3 or DynamoDB from a private subnet without going through the internet, which is both more secure and cheaper than routing through a NAT. Cost-wise, a NAT Gateway incurs an hourly charge plus data processing per GB, while a Gateway Endpoint has an hourly charge but no data processing fee. Performance-wise, VPC Endpoints offer lower latency because traffic stays within the AWS network, while NAT Gateways introduce additional hops and potential bandwidth limits. A common pitfall is using a VPC Endpoint for services not supported via Gateway Endpoints (e.g., SQS, SNS require Interface Endpoints, which are more expensive).
- How does S3 achieve strong consistency for PUT and DELETE operations? Explain the underlying model.What a strong answer covers
- S3 provides strong consistency for PUT and DELETE operations since December 2020.
- Underlying model uses a distributed key-value store with version vectors.
- New objects are immediately consistent; overwrites and deletes are also strongly consistent.
- List operations after writes are eventually consistent but now also strongly consistent.
View a sample answer
Amazon S3 now provides strong consistency for all PUT and DELETE operations across all regions, meaning any read after a write or delete will return the latest version. Under the hood, S3 uses a distributed key-value store that leverages version vectors and quorum-based replication to ensure order is maintained. When a PUT request creates a new object, it is immediately visible across all replicas. For overwrite PUTs and DELETE requests, S3 ensures that subsequent GETs see the latest state without stale data. Previously, S3 offered eventual consistency for overwrites and deletes, but after the December 2020 update, all operations are strongly consistent. This improves application reliability without sacrificing performance. A common pitfall is assuming that list operations (e.g., listing objects after a write) were also strongly consistent; now they are as well, but users should still validate performance implications for high-throughput scenarios.
- Write a CloudFormation template snippet that creates an EC2 instance with a custom security group and tags.What a strong answer covers
- CloudFormation template snippet using AWS::EC2::Instance.
- Define a custom security group with ingress rules.
- Add tags to both the instance and security group.
- Use YAML or JSON format.
View a sample answer
The CloudFormation snippet creates an EC2 instance with a custom security group. The security group allows SSH (port 22) and HTTP (port 80) from anywhere. Both the instance and security group are tagged with a Name tag. The instance uses a parameterized AMI ID and instance type. A common pitfall is not specifying the VPCId parameter or subnet for the instance; here we assume default VPC or pass subnet via parameter.
Reference solutionyaml Parameters: InstanceType: Type: String Default: t2.micro AllowedValues: - t2.micro - t3.micro LatestAmiId: Type: AWS::SSM::Parameter::Value<AWS::EC2::Image::Id> Default: /aws/service/ami-amazon-linux-latest/amzn2-ami-hvm-x86_64-gp2 Resources: MySecurityGroup: Type: AWS::EC2::SecurityGroup Properties: GroupDescription: Enable SSH and HTTP access SecurityGroupIngress: - IpProtocol: tcp FromPort: 22 ToPort: 22 CidrIp: 0.0.0.0/0 - IpProtocol: tcp FromPort: 80 ToPort: 80 CidrIp: 0.0.0.0/0 Tags: - Key: Name Value: MyAppSecurityGroup MyEC2Instance: Type: AWS::EC2::Instance Properties: ImageId: !Ref LatestAmiId InstanceType: !Ref InstanceType SecurityGroupIds: - !Ref MySecurityGroup Tags: - Key: Name Value: MyAppInstance
How to prepare
- Practice hands-on: Deploy a small app using EC2, S3, and RDS. Break it and debug using CloudWatch Logs and VPC Flow Logs.
- Master the AWS Well-Architected Framework pillars (cost, performance, reliability, security, operational excellence) and be ready to discuss trade-offs.
- Understand common design patterns: multi-AZ, read replicas, auto-scaling, and blue/green deployments. Use the AWS Free Tier to experiment.
- Be fluent in at least one Infrastructure as Code tool (CloudFormation, CDK, or Terraform) and one scripting language (Python, Node.js).
- Review AWS re:Invent sessions and the official documentation for the services you claim expertise in. Interviewers often ask about 'latest features'.
Frequently asked questions
How many years of AWS experience is expected for a senior role?
Typically 3-5 years of hands-on AWS experience. But depth and problem-solving ability matter more than just tenure.
Do I need to know all AWS services?
No, focus on core compute, storage, networking, and security. Understanding relational and NoSQL databases is also important.
Will there be whiteboard or diagram questions?
Yes, expect to draw architecture on a whiteboard or use a tool like draw.io. Practice explaining trade-offs as you design.
What is the best way to prepare for AWS system design interviews?
Learn common patterns like multi-AZ, auto-scaling groups, and decoupling with SQS/SNS. Work through the AWS Well-Architected Labs.
Are AWS certifications helpful for interviews?
Yes, especially the Solutions Architect Professional or DevOps Engineer. They validate breadth but not necessarily depth, so combine with real projects.
Practice AWS questions with instant AI feedback
Upload your resume, get a personalized mock interview, and see exactly what to improve — free to start.