Free AWS Data Engineer Associate Practice Exams

All questions come from my AWS Engineer Udemy course and certificationexams.pro
Free AWS Data Engineer Associate Exam Topics Tests
Over the past few months, I have been helping cloud engineers, data specialists, and AWS professionals prepare for the AWS Certified Data Engineer Associate certification. It’s an impressive credential that demonstrates mastery of designing, building, and optimizing data pipelines and analytical systems in the AWS ecosystem.
This certification confirms your expertise in building end-to-end data solutions, optimizing query performance, automating workflows, and maintaining data security and compliance. These skills are highly valued by organizations that rely on AWS for analytics and ML enhanced business intelligence.
The Data Engineer exam measures your ability to design data pipelines, manage metadata, apply best practices, and deliver high-quality, reliable data to downstream systems and teams.
AWS Exam Simulators
Through my AWS Udemy courses and the free AWS Data Engineer Associate Practice Questions available at certificationexams.pro, I have identified common areas where candidates benefit from deeper understanding. That insight helped shape a comprehensive set of AWS Data Engineer Questions and Answers that closely match the tone, logic, and challenge of the official AWS exam.
You can also explore AWS Data Engineer Sample Questions and use the AWS Data Engineer Associate Exam Simulator to measure your readiness. Each question in them includes clear explanations that reinforce key concepts such as ETL orchestration, schema design, data governance, and AWS security best practices.
If you are looking for Real AWS Data Engineer Exam Questions, these instructor-created scenarios capture the structure and complexity of the real exam. These are not AWS Data Engineer Exam Dumps or copied content. Each scenario tests your ability to design scalable data architectures and apply AWS-recommended practices.
The AWS Data Engineer Certification represents your ability to manage AWS data environments effectively and support data-driven decisions through smart architecture and automation. Study consistently, practice diligently, and approach the exam with confidence.
With thorough preparation and the right mindset, you will join the community of skilled AWS Data Engineers trusted by organizations worldwide.
Git, GitHub & GitHub Copilot Certification Made Easy |
---|
Want to get certified on the most popular AI, ML & DevOps technologies of the day? These five resources will help you get GitHub certified in a hurry.
Get certified in the latest AI, ML and DevOps technologies. Advance your career today. |
AWS Certified Data Engineer Associate Practice Exam Questions
Question 1
HelioCart, a retail analytics startup, runs about 18 Python-based AWS Lambda functions that all rely on the same in-house helper modules managed by the data engineering team. The team wants to centralize these shared dependencies so that a single update can be rolled out to every function with minimal maintenance effort and without repackaging each function separately. What is the most efficient approach to achieve this?
-
❏ A. Store the shared Python modules in Amazon S3 and have each Lambda download them during initialization
-
❏ B. Create a Lambda layer containing the common Python code and attach the layer to each function
-
❏ C. Containerize the functions and then add Lambda layers to the function configuration
-
❏ D. Use Lambda Extensions to inject the shared code into every function
Question 2
Which AWS service provides a connectionless API to run asynchronous SQL on Amazon Redshift and retrieve results programmatically?
-
❏ A. Amazon Redshift Serverless
-
❏ B. Redshift Data API
-
❏ C. Amazon RDS Data API
-
❏ D. Amazon Redshift Query Editor v2
Question 3
A retail analytics team at Riverstone Gear plans to prepare a customer orders dataset for machine learning using AWS Glue DataBrew. The raw files in Amazon S3 contain missing fields, concatenated name-and-address values, and columns that are no longer needed. The team wants to profile and fix the data with point-and-click steps and have the same workflow run automatically every 6 hours without writing code. Which approach should they take?
-
❏ A. Load the data into Amazon Redshift, clean it with SQL statements, and trigger recurring runs using Amazon EventBridge
-
❏ B. Create AWS Lambda functions in Python to parse and cleanse the files, and schedule invocations with Amazon EventBridge
-
❏ C. Build a DataBrew project to explore the data, author a recipe, and create a scheduled DataBrew job to apply it on a 6-hour cadence
-
❏ D. Provision an Amazon EMR cluster to run Apache Spark data cleaning jobs and orchestrate the schedule with AWS Step Functions
Question 4
Which actions enable private, least-ops connectivity from an AWS Lambda function to an Aurora PostgreSQL cluster in the same VPC? (Choose 2)
-
❏ A. Create an interface VPC endpoint for Amazon RDS
-
❏ B. Run the Lambda in the VPC using private subnets
-
❏ C. Enable private DNS on the Aurora endpoint
-
❏ D. Reference the Lambda security group in the Aurora security group on the DB port
-
❏ E. Use Amazon RDS Proxy and connect to the proxy endpoint
Question 5
An online learning startup runs a React single-page app that calls REST endpoints through Amazon API Gateway. The data engineering team needs a Python routine that can be invoked on demand via API Gateway and must synchronously return its result to the caller with predictable startup latency and minimal operations work. Which solution should they choose?
-
❏ A. Containerize the Python script and run it on Amazon ECS with EC2 launch type behind an Application Load Balancer
-
❏ B. Create an AWS Lambda function in Python and try to keep it warm using an Amazon EventBridge Scheduler rule that invokes it every 2 minutes
-
❏ C. Use an AWS Lambda Python function behind Amazon API Gateway and enable provisioned concurrency
-
❏ D. Run the Python routine as a service on Amazon ECS with AWS Fargate
Question 6
For an OLTP database on Amazon EBS with sustained random I/O and spiky demand, which actions best balance cost with predictable IOPS and throughput? (Choose 2)
-
❏ A. Amazon EFS
-
❏ B. Use io2 for critical data files
-
❏ C. Switch gp2 to gp3 and set IOPS/throughput independently
-
❏ D. Amazon FSx for Lustre
-
❏ E. Throughput Optimized HDD (st1)
Question 7
A digital publishing platform must block traffic from select countries to comply with regional rights, but an eight-person QA team operating from one of those countries still needs access for testing. The application runs on Amazon EC2 behind an Application Load Balancer, and AWS WAF is already associated with the load balancer. Which combination of controls should be implemented to satisfy these requirements? (Choose 2)
-
❏ A. Use an AWS WAF IP set that lists the contractor team public IPs to allow
-
❏ B. Create deny entries for those countries in the VPC network ACLs
-
❏ C. Add an AWS WAF geo match rule that blocks the specified countries
-
❏ D. Configure Application Load Balancer listener rules to block countries
-
❏ E. Amazon CloudFront
Question 8
Which AWS services together provide serverless ingestion, low-latency stateful stream processing, and delivery to durable storage for analytics with minimal operations?
-
❏ A. Amazon MSK, Amazon Managed Service for Apache Flink, Amazon S3
-
❏ B. Amazon Kinesis Data Streams, AWS Lambda, Amazon Redshift
-
❏ C. Amazon Kinesis Data Streams, Amazon Managed Service for Apache Flink, Amazon Kinesis Data Firehose
-
❏ D. Amazon Kinesis Data Streams, Amazon S3, Amazon Athena
Question 9
A digital advertising analytics startup ingests web and app clickstream events into the raw area of an Amazon S3 data lake every 15 minutes. The team wants to run ad hoc SQL sanity checks directly on these raw files with minimal administration and low cost. Which AWS service should they choose?
-
❏ A. Run an EMR Spark cluster on a schedule and execute SparkSQL against the new raw files every hour
-
❏ B. Use Amazon Athena to query the S3 raw zone with SQL
-
❏ C. Load each 15-minute increment into Amazon Redshift Serverless and validate with SQL
-
❏ D. Use AWS Glue DataBrew to profile and validate data in S3
Question 10
How should a Neptune cluster be scaled to support low-latency, read-heavy graph queries spiking to about 25,000 requests per second while writes remain moderate?
-
❏ A. Scale up the writer instance size
-
❏ B. Configure Amazon Neptune Global Database
-
❏ C. Add Neptune read replicas and use the reader endpoint
-
❏ D. Partition the graph across multiple Neptune clusters by key

All questions come from my AWS Engineer Udemy course and certificationexams.pro
Question 11
A regional healthcare analytics firm, Orion BioAnalytics, needs to observe API activity across multiple AWS accounts to detect suspicious access attempts. They must also preserve an auditable history of how AWS resource configurations change over time to meet regulatory requirements. The team has already enabled an organization trail in AWS CloudTrail and is using the 90-day event history. Which additional AWS service should they deploy to continuously record and evaluate configuration changes to their resources?
-
❏ A. Amazon GuardDuty
-
❏ B. AWS Security Hub
-
❏ C. AWS Config
-
❏ D. Amazon CloudWatch
Question 12
Which AWS Glue features should be used to transform JSON in S3 to Parquet and load into Amazon Redshift, scheduled automatically every 12 hours at 02:00 UTC? (Choose 2)
-
❏ A. AWS Glue Data Catalog
-
❏ B. AWS Glue Workflows
-
❏ C. AWS Glue DataBrew
-
❏ D. AWS Glue Jobs
-
❏ E. AWS Glue Data Quality
Question 13
A data platform team at a digital media startup plans to move their message workflows from Amazon SQS Standard queues to FIFO queues to guarantee ordered, exactly-once processing while using batch send and receive APIs. What actions should be included in their migration runbook? (Choose 3)
-
❏ A. Ensure the throughput for the target FIFO queue does not exceed 300 messages per second
-
❏ B. Ensure the FIFO queue name ends with the .fifo suffix
-
❏ C. Delete the existing Standard queue and create a new FIFO queue
-
❏ D. Convert the current Standard queue directly into a FIFO queue
-
❏ E. Confirm the target FIFO queue throughput with batching stays at or below 3,000 messages per second
-
❏ F. Replace the SQS Standard queue with Amazon MQ to achieve strict ordering
Question 14
Which AWS database service is purpose-built for graph queries and multi-hop traversals such as friends-of-friends?
-
❏ A. Amazon Redshift
-
❏ B. Neptune
-
❏ C. Amazon Aurora
-
❏ D. Amazon Keyspaces
Question 15
An insurtech firm processes about 20 million customer records per day that include personally identifiable data such as full names, national ID numbers, and payment card details. They need the ETL pipeline to automatically discover and protect sensitive fields before loading curated data into their analytics warehouse. Which AWS Glue transformation should they use?
-
❏ A. Filter transform
-
❏ B. Detect PII
-
❏ C. FindMatches transform
-
❏ D. Convert file format to Parquet
Question 16
Which AWS service enables the fastest migration of 25 PB from a low-bandwidth site to AWS?
-
❏ A. AWS Direct Connect
-
❏ B. Amazon S3 Transfer Acceleration
-
❏ C. AWS Snowball
-
❏ D. AWS Snowmobile
Question 17
A regional logistics firm operates Amazon EC2 workloads backed by Amazon EBS volumes. The team needs a cost-efficient backup approach that maintains durability and automates daily and weekly retention for about 45 days while keeping storage usage minimal. Which actions should they take? (Choose 2)
-
❏ A. Configure EBS snapshot lifecycle policies for automated creation and time based retention
-
❏ B. Use S3 Lifecycle rules to transition EBS snapshots to S3 Glacier Deep Archive
-
❏ C. Enable EBS Multi-Attach on volumes to improve durability
-
❏ D. Rely on incremental EBS snapshots so only changed blocks are saved
-
❏ E. Copy snapshots to a separate AWS account to enhance resilience
Question 18
Which S3 storage class automatically optimizes cost by moving objects across access tiers as access patterns change, without lifecycle policies?
-
❏ A. Amazon S3 Glacier Instant Retrieval
-
❏ B. S3 Standard-Infrequent Access with lifecycle rules
-
❏ C. S3 One Zone-Infrequent Access
-
❏ D. S3 Intelligent-Tiering class
Question 19
A healthcare analytics startup runs its microservices on Amazon ECS using the EC2 launch type, and it stores all container images in Amazon ECR. The security team wants to ensure that only authorized ECS tasks can pull images from ECR. What is the most appropriate way to secure the interaction between ECS and ECR?
-
❏ A. Configure VPC interface endpoints for Amazon ECR so ECS can reach ECR without using the internet
-
❏ B. Enable AWS Shield Advanced to block unauthorized access to the ECR registry
-
❏ C. Attach an IAM policy that allows required ECR pull actions to the ECS task execution role and set that role in the task definition
-
❏ D. Turn on Amazon ECR image scanning on push to restrict who can download images
Question 20
How should a company ingest about 5 GB/s from Amazon Kinesis Data Streams into Amazon Redshift with seconds-level latency and minimal operations for near-real-time BI dashboards?
-
❏ A. Amazon Kinesis Data Analytics (Flink) writing to Amazon Redshift via JDBC
-
❏ B. Amazon Kinesis Data Firehose to Amazon S3 then Amazon Redshift COPY
-
❏ C. Amazon Redshift streaming ingestion with external schema on Kinesis, materialized view, auto refresh
-
❏ D. AWS Glue streaming job to Amazon Redshift over JDBC
AWS Data Engineer Associate Practice Exam Answers

All questions come from my AWS Engineer Udemy course and certificationexams.pro
Question 1
HelioCart, a retail analytics startup, runs about 18 Python-based AWS Lambda functions that all rely on the same in-house helper modules managed by the data engineering team. The team wants to centralize these shared dependencies so that a single update can be rolled out to every function with minimal maintenance effort and without repackaging each function separately. What is the most efficient approach to achieve this?
-
✓ B. Create a Lambda layer containing the common Python code and attach the layer to each function
The most efficient approach is to use Create a Lambda layer containing the common Python code and attach the layer to each function.
Layers let you package shared libraries once, publish a new version when changes are needed, and have multiple functions reference that version to minimize redeployment work. Layers support up to five per function and are available for .zip-based Lambda functions.
Store the shared Python modules in Amazon S3 and have each Lambda download them during initialization is operationally heavier and introduces cold-start latency, potential network failures, code drift, and additional IAM management, so it is not the preferred method for core dependencies.
Containerize the functions and then add Lambda layers to the function configuration is incorrect because layers cannot be attached to container image functions. Dependencies must be baked into the image in that model.
Use Lambda Extensions to inject the shared code into every function is not suitable because extensions are meant for telemetry, security, and governance tooling around the runtime, not for packaging or distributing business logic or libraries.
For shared dependencies across many Lambda functions, think Lambda layers for .zip-based functions. If the question mentions container images, remember to bake dependencies into the image and that layers do not apply.
Question 2
Which AWS service provides a connectionless API to run asynchronous SQL on Amazon Redshift and retrieve results programmatically?
-
✓ B. Redshift Data API
The correct choice is Redshift Data API.
It provides an HTTPS, connectionless interface to submit SQL to Amazon Redshift asynchronously, manage statements, and retrieve results using the AWS SDK and IAM, eliminating JDBC or ODBC drivers and embedded database credentials.
The option Amazon Redshift Serverless is not sufficient because it addresses cluster management and scaling, not the connectionless programmatic execution of SQL.
The option Amazon RDS Data API applies to Aurora Serverless and certain RDS engines rather than Redshift.
The option Amazon Redshift Query Editor v2 is a web console for interactive use, not a service API for microservices.
Watch for keywords like no JDBC or ODBC, no stored credentials, IAM, asynchronous statements, and programmatic retrieval. These point directly to the Redshift Data API. Do not confuse it with the RDS Data API or Query Editor v2, which are commonly listed as distractors.
Question 3
A retail analytics team at Riverstone Gear plans to prepare a customer orders dataset for machine learning using AWS Glue DataBrew. The raw files in Amazon S3 contain missing fields, concatenated name-and-address values, and columns that are no longer needed. The team wants to profile and fix the data with point-and-click steps and have the same workflow run automatically every 6 hours without writing code. Which approach should they take?
-
✓ C. Build a DataBrew project to explore the data, author a recipe, and create a scheduled DataBrew job to apply it on a 6-hour cadence
Build a DataBrew project to explore the data, author a recipe, and create a scheduled DataBrew job to apply it on a 6-hour cadence is correct because AWS Glue DataBrew is designed for visual, no-code data profiling and transformation via recipes, and it supports recurring schedules to automate those steps.
Load the data into Amazon Redshift, clean it with SQL statements, and trigger recurring runs using Amazon EventBridge is not ideal because it requires writing SQL and lacks the visual, point-and-click recipe workflow requested.
Create AWS Lambda functions in Python to parse and cleanse the files, and schedule invocations with Amazon EventBridge is incorrect since it introduces custom code and operational overhead, which conflicts with the no-code requirement.
Provision an Amazon EMR cluster to run Apache Spark data cleaning jobs and orchestrate the schedule with AWS Step Functions is unnecessary complexity and code-heavy for a task that DataBrew can handle visually and without managing clusters.
When you see requirements for visual, no-code data preparation with repeatable steps and easy scheduling, think AWS Glue DataBrew and its recipes plus scheduled jobs.
Question 4
Which actions enable private, least-ops connectivity from an AWS Lambda function to an Aurora PostgreSQL cluster in the same VPC? (Choose 2)
-
✓ B. Run the Lambda in the VPC using private subnets
-
✓ D. Reference the Lambda security group in the Aurora security group on the DB port
Run the Lambda in the VPC using private subnets and Reference the Lambda security group in the Aurora security group on the DB port together provide private, least-operational-overhead connectivity. Attaching the function to the VPC ensures all traffic uses VPC ENIs in private subnets, and the security-group reference grants precise inbound access on the database port without exposing the cluster.
The option Create an interface VPC endpoint for Amazon RDS is incorrect because interface endpoints cover the RDS control-plane API, not the data-plane DB connections.
Enable private DNS on the Aurora endpoint does not establish routing or permissions and therefore doesn’t by itself enable connectivity.
Use Amazon RDS Proxy and connect to the proxy endpoint is unnecessary for basic private access and adds administrative overhead. It still requires the same VPC and security group configuration.
When you see a requirement for VPC-only access from Lambda to a database, think attach Lambda to the VPC and security group referencing. Avoid answers that introduce the public internet, NAT, or unnecessary services like PrivateLink for DB data-plane or RDS Proxy unless there is a clear need (e.g., connection pooling).
Question 5
An online learning startup runs a React single-page app that calls REST endpoints through Amazon API Gateway. The data engineering team needs a Python routine that can be invoked on demand via API Gateway and must synchronously return its result to the caller with predictable startup latency and minimal operations work. Which solution should they choose?
-
✓ C. Use an AWS Lambda Python function behind Amazon API Gateway and enable provisioned concurrency
The best choice is Use an AWS Lambda Python function behind Amazon API Gateway and enable provisioned concurrency.
Provisioned concurrency pre-initializes execution environments so requests from API Gateway return quickly and consistently, and the approach remains serverless with minimal operations.
Containerize the Python script and run it on Amazon ECS with EC2 launch type behind an Application Load Balancer requires managing EC2 capacity, AMIs, scaling, and load balancers, which is unnecessary overhead for a small, on-demand API-backed routine.
Create an AWS Lambda function in Python and try to keep it warm using an Amazon EventBridge Scheduler rule that invokes it every 2 minutes is unreliable for eliminating cold starts, especially during bursts or scaling, and wastes invocations. It does not guarantee consistently low latency.
Run the Python routine as a service on Amazon ECS with AWS Fargate reduces instance management but still demands container image builds, service configuration, and potential idle cost or startup lag, making it heavier than Lambda for this use case.
When the question stresses minimal ops and predictable low latency for synchronous API calls, think Lambda + provisioned concurrency. Avoid ad hoc warming tricks or container orchestration unless there are sustained workloads or container-specific needs.
Question 6
For an OLTP database on Amazon EBS with sustained random I/O and spiky demand, which actions best balance cost with predictable IOPS and throughput? (Choose 2)
-
✓ B. Use io2 for critical data files
-
✓ C. Switch gp2 to gp3 and set IOPS/throughput independently
The best balance of cost and predictable performance comes from combining Use io2 for critical data files and Switch gp2 to gp3 and set IOPS/throughput independently.
io2 provides sustained, low-latency performance with high and predictable IOPS for the most demanding OLTP components (for example, data files or redo logs). Migrating remaining gp2 volumes to gp3 allows you to provision the exact IOPS and throughput needed without increasing capacity, typically reducing cost versus gp2 while maintaining consistent performance as demand grows.
Amazon EFS is a network file system and not block storage. It generally adds latency and is not recommended for OLTP database data files.
Amazon FSx for Lustre is designed for high-performance file workloads such as HPC and analytics rather than transactional block I/O, so it is inappropriate for OLTP databases.
Throughput Optimized HDD (st1) targets large, sequential throughput and performs poorly for small, random I/O patterns typical of OLTP.
For spiky or growing OLTP demand, remember that gp3 decouples performance from capacity and is often the most cost-effective baseline SSD choice, while io2 is the go-to for consistently high, predictable IOPS and low latency. Avoid network file systems and HDD-backed EBS for random I/O database workloads.
Question 7
A digital publishing platform must block traffic from select countries to comply with regional rights, but an eight-person QA team operating from one of those countries still needs access for testing. The application runs on Amazon EC2 behind an Application Load Balancer, and AWS WAF is already associated with the load balancer. Which combination of controls should be implemented to satisfy these requirements? (Choose 2)
-
✓ A. Use an AWS WAF IP set that lists the contractor team public IPs to allow
-
✓ C. Add an AWS WAF geo match rule that blocks the specified countries
The right approach is to enforce country blocking in AWS WAF and create an explicit allow list for the exception. Using Add an AWS WAF geo match rule that blocks the specified countries applies the necessary geographic restrictions, and pairing it with Use an AWS WAF IP set that lists the contractor team public IPs to allow permits the trusted QA users to access the site despite the country block.
Create deny entries for those countries in the VPC network ACLs is not viable because NACLs cannot evaluate geolocation and therefore cannot block traffic by country.
Configure Application Load Balancer listener rules to block countries is incorrect since ALB listener rules do not offer geo-based filtering. This capability belongs to AWS WAF.
Amazon CloudFront alone is insufficient because, although it can enforce geo restrictions, it does not provide the necessary IP-based exception in this architecture without additional WAF rules and architectural changes.
Country-based filtering and granular IP allow lists are core AWS WAF functions on ALB-integrated apps. For exceptions to geo blocks, use an IP set allow list and pay attention to WAF rule order so the allow list is evaluated before the geo block.
Question 8
Which AWS services together provide serverless ingestion, low-latency stateful stream processing, and delivery to durable storage for analytics with minimal operations?
-
✓ C. Amazon Kinesis Data Streams, Amazon Managed Service for Apache Flink, Amazon Kinesis Data Firehose
The correct choice is Amazon Kinesis Data Streams, Amazon Managed Service for Apache Flink, Amazon Kinesis Data Firehose.
Kinesis Data Streams provides scalable, managed ingestion for high-throughput events. Amazon Managed Service for Apache Flink offers low-latency, stateful stream processing with built-in checkpointing and exactly-once semantics without managing clusters. Kinesis Data Firehose then delivers the processed records to durable storage such as Amazon S3 or OpenSearch with automatic scaling and minimal operations.
Amazon MSK, Amazon Managed Service for Apache Flink, Amazon S3 is more operationally heavy due to MSK clusters and typically requires custom-managed sinks to S3, which is less serverless than using Firehose.
Amazon Kinesis Data Streams, AWS Lambda, Amazon Redshift relies on custom Lambda consumers and introduces Redshift, which is not designed for stateful, millisecond-level stream processing.
Amazon Kinesis Data Streams, Amazon S3, Amazon Athena supports batch/interactive querying after data lands in S3 but lacks real-time stateful processing and near-real-time delivery.
When you see requirements for serverless, low-latency stateful processing, and durable delivery for analytics, look for the trio of Kinesis Data Streams for ingestion, Amazon Managed Service for Apache Flink for processing, and Kinesis Data Firehose for delivery. Avoid options that require cluster management or only provide batch analytics after landing in storage.
Question 9
A digital advertising analytics startup ingests web and app clickstream events into the raw area of an Amazon S3 data lake every 15 minutes. The team wants to run ad hoc SQL sanity checks directly on these raw files with minimal administration and low cost. Which AWS service should they choose?
-
✓ B. Use Amazon Athena to query the S3 raw zone with SQL
The best fit is Use Amazon Athena to query the S3 raw zone with SQL.
Athena lets you run standard SQL directly on data in Amazon S3 without standing up infrastructure, and you pay only for the data scanned, which aligns with low-ops and cost-effective requirements.
Run an EMR Spark cluster on a schedule and execute SparkSQL against the new raw files every hour is overkill for quick sanity checks because it involves provisioning and managing clusters and jobs, increasing operational effort and cost.
Load each 15-minute increment into Amazon Redshift Serverless and validate with SQL still requires building and maintaining ingestion pipelines and can be more expensive than querying S3 directly for ad hoc checks.
Use AWS Glue DataBrew to profile and validate data in S3 is not appropriate because DataBrew focuses on visual transformations and profiling rather than running arbitrary SQL queries.
When you see ad hoc SQL on S3 with minimal ops and cost-effective, think Amazon Athena. Reduce cost further with partitioning, compression, and columnar formats to minimize scanned data.
Question 10
How should a Neptune cluster be scaled to support low-latency, read-heavy graph queries spiking to about 25,000 requests per second while writes remain moderate?
-
✓ C. Add Neptune read replicas and use the reader endpoint
The best approach is to add read capacity without disrupting graph traversals. Add Neptune read replicas and use the reader endpoint so read traffic is load-balanced across replicas while a single writer handles the moderate write volume. This matches a read-heavy workload with spiky demand and maintains consistency of graph traversals.
The option Scale up the writer instance size only increases vertical capacity and does not provide horizontal read scaling or load distribution for spikes.
The option Configure Amazon Neptune Global Database focuses on cross-Region replication and disaster recovery, introducing extra latency and complexity without improving same-Region read throughput.
The option Partition the graph across multiple Neptune clusters by key breaks graph traversals and is unsupported for cross-cluster queries, making it impractical for low-latency graph queries.
For read-heavy Neptune workloads, think read replicas + reader endpoint. Remember Neptune has a single writer per cluster and cross-Region features are not for same-Region scaling. Sharding graphs across clusters is typically a red flag due to traversal complexity.
Question 11
A regional healthcare analytics firm, Orion BioAnalytics, needs to observe API activity across multiple AWS accounts to detect suspicious access attempts. They must also preserve an auditable history of how AWS resource configurations change over time to meet regulatory requirements. The team has already enabled an organization trail in AWS CloudTrail and is using the 90-day event history. Which additional AWS service should they deploy to continuously record and evaluate configuration changes to their resources?
-
✓ C. AWS Config
The correct choice is AWS Config.
It continuously records configuration state, relationships, and changes for supported AWS resources, and it enables rule-based evaluation and timeline views that satisfy audit and compliance requirements over time.
Amazon GuardDuty is a managed threat detection service that analyzes telemetry for malicious or unauthorized behavior, but it does not capture or store configuration histories.
AWS Security Hub centralizes and correlates security findings from multiple sources, yet it is not a configuration recorder and cannot provide resource configuration timelines.
Amazon CloudWatch provides metrics, logs, dashboards, and alarms. While events can react to changes, it does not maintain comprehensive historical configuration state for resources.
Remember: CloudTrail answers who did what via API calls, while AWS Config answers what the resource looked like and when it changed. Do not confuse this with GuardDuty (threat detection) or Security Hub (findings aggregation).
Question 12
Which AWS Glue features should be used to transform JSON in S3 to Parquet and load into Amazon Redshift, scheduled automatically every 12 hours at 02:00 UTC? (Choose 2)
-
✓ B. AWS Glue Workflows
-
✓ D. AWS Glue Jobs
The correct combination is AWS Glue Jobs and AWS Glue Workflows.
Glue Jobs execute the ETL logic to read JSON from Amazon S3, convert to Parquet, and write to Amazon Redshift using JDBC or COPY. Glue Workflows provide orchestration and time-based scheduling so the pipeline runs automatically at the specified interval.
The option AWS Glue Data Catalog is only for metadata and schemas. It does not transform, load, or schedule pipelines.
AWS Glue DataBrew is a visual, interactive data prep tool and is not the right choice for production ETL into Redshift on a schedule.
AWS Glue Data Quality focuses on rules and validations, not core ETL execution or orchestration.
Look for cues like convert format, load into Redshift, and automated scheduling to map to Glue Jobs for ETL and Workflows for orchestration. If a question emphasizes scheduling a single job only, Glue Triggers may be sufficient. When you see visual preparation or interactive profiling, think DataBrew rather than production ETL. Distinguish the Data Catalog (metadata) from services that actually run the ETL.
Question 13
A data platform team at a digital media startup plans to move their message workflows from Amazon SQS Standard queues to FIFO queues to guarantee ordered, exactly-once processing while using batch send and receive APIs. What actions should be included in their migration runbook? (Choose 3)
-
✓ B. Ensure the FIFO queue name ends with the .fifo suffix
-
✓ C. Delete the existing Standard queue and create a new FIFO queue
-
✓ E. Confirm the target FIFO queue throughput with batching stays at or below 3,000 messages per second
Delete the existing Standard queue and create a new FIFO queue is required because SQS does not allow converting a Standard queue into FIFO. You must stand up a new FIFO queue and migrate producers and consumers.
Ensure the FIFO queue name ends with the .fifo suffix is mandatory, as SQS identifies FIFO queues by the .fifo suffix on the queue name.
Confirm the target FIFO queue throughput with batching stays at or below 3,000 messages per second aligns with the documented FIFO throughput quota when using batch operations. Without batching, the limit is 300 messages per second.
Ensure the throughput for the target FIFO queue does not exceed 300 messages per second is not appropriate in this context because the team plans to use batching, which raises the limit to 3,000 messages per second.
Convert the current Standard queue directly into a FIFO queue is impossible since SQS does not support in-place conversion between Standard and FIFO types.
Replace the SQS Standard queue with Amazon MQ to achieve strict ordering is a distractor because Amazon MQ is a separate managed broker service and is not part of an SQS-to-SQS migration path.
When you see SQS FIFO on the exam, remember three cues: the .fifo name suffix, no in-place conversion from Standard to FIFO, and the throughput numbers of 300 without batching versus 3,000 with batching.
Question 14
Which AWS database service is purpose-built for graph queries and multi-hop traversals such as friends-of-friends?
-
✓ B. Neptune
The correct choice is Neptune.
It is AWS’s managed graph database designed specifically for storing and traversing highly connected data with low-latency multi-hop queries. It supports Gremlin and SPARQL, which are purpose-built graph query languages ideal for friends-of-friends patterns and similar traversals.
The option Amazon Redshift is incorrect because it is a columnar data warehouse for analytical SQL workloads, not for native graph traversals.
The option Amazon Aurora is incorrect because relational schemas require multiple joins for multi-hop relationships, which is less efficient than a graph engine.
The option Amazon Keyspaces is incorrect because it is a wide-column store compatible with Cassandra, lacking native graph traversal semantics.
Keywords such as graph, traversal, multi-hop, friends-of-friends, Gremlin, or SPARQL are strong signals for Neptune. Map use cases to purpose-built databases quickly: graph ⇒ Neptune. Analytics warehouse ⇒ Redshift. Relational OLTP ⇒ Aurora. Wide-column ⇒ Keyspaces.
Question 15
An insurtech firm processes about 20 million customer records per day that include personally identifiable data such as full names, national ID numbers, and payment card details. They need the ETL pipeline to automatically discover and protect sensitive fields before loading curated data into their analytics warehouse. Which AWS Glue transformation should they use?
-
✓ B. Detect PII
The correct choice is Detect PII.
AWS Glue (via DataBrew) provides a Detect PII transform that scans columns for known sensitive data types and enables actions such as masking, hashing, or redaction so that sensitive values are protected before the data is written to the warehouse.
Filter transform is only for row-level filtering based on predicates and cannot locate or protect personally identifiable information.
FindMatches transform is designed for entity resolution and deduplication, not for detecting or handling PII.
Convert file format to Parquet changes how data is stored and optimized for analytics but offers no capability to classify or obfuscate sensitive fields.
When the requirement is to automatically discover and mask sensitive fields in AWS Glue workflows, look for Detect PII. Do not confuse it with transforms for deduplication (FindMatches), row filtering (Filter), or file format conversion.

All questions come from my AWS Engineer Udemy course and certificationexams.pro
Question 16
Which AWS service enables the fastest migration of 25 PB from a low-bandwidth site to AWS?
-
✓ D. AWS Snowmobile
AWS Snowmobile is designed for exabyte-scale offline data transfer and is the fastest option for moving very large datasets, such as 25 PB, from locations with limited network bandwidth. It avoids WAN constraints by physically transporting data in a secure, truck-based appliance directly into AWS.
The option AWS Snowball is intended for terabytes to a few petabytes and would require many devices and multiple jobs, extending overall timelines for 25 PB.
The option AWS Direct Connect relies on network throughput. Even at 10 Gbps, 25 PB would take roughly 231 days, and provisioning high-capacity links can add time.
The option Amazon S3 Transfer Acceleration improves internet transfers via edge locations but remains limited by the site’s outbound bandwidth and is not optimal for tens of petabytes when time is critical.
For data volumes of 10 PB or more or when the site has constrained bandwidth, think offline ingestion with Snowmobile. For up to a few petabytes with limited or no reliable network, consider Snowball. For ongoing network transfers where bandwidth is sufficient, services like DataSync, Direct Connect, or S3 Transfer Acceleration can fit. Quickly estimate feasibility by dividing total data by effective throughput (e.g., 25 PB at 10 Gbps is about 231 days), then decide whether offline is required.
Question 17
A regional logistics firm operates Amazon EC2 workloads backed by Amazon EBS volumes. The team needs a cost-efficient backup approach that maintains durability and automates daily and weekly retention for about 45 days while keeping storage usage minimal. Which actions should they take? (Choose 2)
-
✓ A. Configure EBS snapshot lifecycle policies for automated creation and time based retention
-
✓ D. Rely on incremental EBS snapshots so only changed blocks are saved
The best combination for cost efficiency and durability is Configure EBS snapshot lifecycle policies for automated creation and time based retention and Rely on incremental EBS snapshots so only changed blocks are saved.
Lifecycle policies automate creation, retention, and deletion to control snapshot sprawl, while EBS snapshots are inherently incremental so you only pay for changed blocks.
Use S3 Lifecycle rules to transition EBS snapshots to S3 Glacier Deep Archive is invalid because EBS snapshots are not S3 objects and cannot be transitioned using S3 Lifecycle, even though an EBS snapshot archive tier exists and is managed via EBS APIs, not S3.
Enable EBS Multi-Attach on volumes to improve durability does not provide backups or extra durability. It enables concurrent access for certain io volume types and does not reduce storage costs.
Copy snapshots to a separate AWS account to enhance resilience can improve isolation or governance but duplicates data and increases storage cost, which conflicts with the cost-minimization goal.
Remember that EBS snapshots are incremental by default and that Data Lifecycle Manager automates retention and deletion. Be wary of distractors that involve S3 Lifecycle for snapshots or features like Multi-Attach that do not address backup cost or durability.
Question 18
Which S3 storage class automatically optimizes cost by moving objects across access tiers as access patterns change, without lifecycle policies?
-
✓ D. S3 Intelligent-Tiering class
S3 Intelligent-Tiering class is correct because it automatically monitors object access and moves data between frequent, infrequent, and archive access tiers to optimize cost with no lifecycle policies to manage. It is designed for unpredictable access, charging a small monitoring and automation fee while reducing storage cost as objects cool.
Amazon S3 Glacier Instant Retrieval is incorrect because it is an archive storage class that does not perform automatic tiering and can incur retrieval costs. It requires explicit class selection or lifecycle actions.
S3 Standard-Infrequent Access with lifecycle rules is incorrect because lifecycle rules are manual and time based, not adaptive to per object access fluctuations.
S3 One Zone-Infrequent Access is incorrect because it sacrifices availability zone redundancy and does not provide automatic tier transitions.
When you see keywords like unpredictable or variable access and no lifecycle management, think Intelligent-Tiering. Remember Intelligent-Tiering charges monitoring per object, does not monitor objects smaller than 128 KB, and can transition into archive access tiers that carry minimum storage durations.
Question 19
A healthcare analytics startup runs its microservices on Amazon ECS using the EC2 launch type, and it stores all container images in Amazon ECR. The security team wants to ensure that only authorized ECS tasks can pull images from ECR. What is the most appropriate way to secure the interaction between ECS and ECR?
-
✓ C. Attach an IAM policy that allows required ECR pull actions to the ECS task execution role and set that role in the task definition
Attach an IAM policy that allows required ECR pull actions to the ECS task execution role and set that role in the task definition is correct because ECS uses the task execution role to obtain an authorization token and call ecr:GetAuthorizationToken, ecr:BatchGetImage, and related actions to pull images from ECR. Granting these permissions to the execution role ensures only tasks with that role can retrieve images.
Configure VPC interface endpoints for Amazon ECR so ECS can reach ECR without using the internet is not sufficient because PrivateLink provides private connectivity, but it does not enforce which principals are allowed to pull images from ECR.
Enable AWS Shield Advanced to block unauthorized access to the ECR registry is incorrect because Shield addresses DDoS protection and does not implement IAM-based authorization for ECR pulls.
Turn on Amazon ECR image scanning on push to restrict who can download images is incorrect because image scanning is a vulnerability assessment feature and does not control access permissions to pull images.
When the question is about who can pull images from ECR, think IAM and the ECS task execution role. Private connectivity and scanning features improve security posture but do not enforce authorization.
Question 20
How should a company ingest about 5 GB/s from Amazon Kinesis Data Streams into Amazon Redshift with seconds-level latency and minimal operations for near-real-time BI dashboards?
-
✓ C. Amazon Redshift streaming ingestion with external schema on Kinesis, materialized view, auto refresh
The correct choice is Amazon Redshift streaming ingestion with external schema on Kinesis, materialized view, auto refresh.
Redshift’s native streaming ingestion maps Kinesis Data Streams into an external schema and populates a materialized view that can be auto-refreshed, delivering seconds-level latency with minimal operational overhead. It scales far better than JDBC-based sinks and avoids staging complexity.
The option Amazon Kinesis Data Analytics (Flink) writing to Amazon Redshift via JDBC is operationally heavier and JDBC sinks typically cannot sustain multi-GB/s throughput with seconds latency while also introducing backpressure and commit contention.
The option Amazon Kinesis Data Firehose to Amazon S3 then Amazon Redshift COPY adds buffering and COPY cycles that increase latency and operational work compared to direct streaming ingestion.
The option AWS Glue streaming job to Amazon Redshift over JDBC has similar JDBC limitations and added management effort without meeting the ultra-low latency and scale requirements.
Jira, Scrum & AI Certification |
---|
Want to get certified on the most popular software development technologies of the day? These resources will help you get Jira certified, Scrum certified and even AI Practitioner certified so your resume really stands out..
You can even get certified in the latest AI, ML and DevOps technologies. Advance your career today. |
Cameron McKenzie is an AWS Certified AI Practitioner, Machine Learning Engineer, Copilot Expert, Solutions Architect and author of many popular books in the software development and Cloud Computing space. His growing YouTube channel training devs in Java, Spring, AI and ML has well over 30,000 subscribers.