AWS Machine Learning Practice Exam Questions
Question 1
From a SageMaker Studio notebook, what is the most direct, low-latency way to read the latest items from a DynamoDB table with about 120 million records?
-
❏ A. Use AWS Glue to ETL into Parquet in Amazon S3 and read from S3
-
❏ B. Use Amazon DynamoDB Accelerator (DAX) from the notebook
-
❏ C. Use Amazon Athena with the DynamoDB federated connector
-
❏ D. Call DynamoDB directly from the notebook using Boto3
-
❏ E. Export the table to S3 with on-demand export and read the files
-
❏ F. Replicate to Amazon RDS via a scheduled Lambda and query RDS
Question 2
Which metric should be optimized when tuning a binary classifier’s threshold to maximize correct positives and negatives on a validation set?
-
❏ A. Area under the ROC curve (AUC-ROC)
-
❏ B. Classification accuracy
-
❏ C. Precision
-
❏ D. F1 score
Question 3
Which Cost Explorer capabilities help analyze SageMaker spend and forecast future costs? (Choose 2)
-
❏ A. AWS Cost Anomaly Detection
-
❏ B. Trend charts and cost forecasting
-
❏ C. Reconfigure SageMaker resources from Cost Explorer
-
❏ D. Filter costs by service, usage type, and cost allocation tags
-
❏ E. Automate instance scheduling and rightsizing
-
❏ F. AWS Budgets
Question 4
Which serverless deployment best delivers low-latency, real-time ML inference for a small model under bursty, unpredictable traffic with minimal operations?
-
❏ A. Amazon SageMaker real-time endpoint
-
❏ B. Amazon EC2 behind Amazon API Gateway
-
❏ C. AWS Lambda + Amazon API Gateway
-
❏ D. Amazon SageMaker Asynchronous Inference
Question 5
Which SageMaker-native service enables automated retraining and promotion to production as new data arrives?
-
❏ A. AWS Step Functions
-
❏ B. Amazon SageMaker Pipelines with automated retrain and deploy
-
❏ C. Amazon SageMaker Model Monitor
-
❏ D. Amazon EventBridge and AWS Lambda
Question 6
In Amazon Bedrock, how can you make a text assistant return consistent, repeatable responses across conversations? (Choose 2)
-
❏ A. Add response caching to reuse prior answers
-
❏ B. Fine-tune the model on domain policies and product data
-
❏ C. Enable CloudWatch Logs and analyze traces
-
❏ D. Lower temperature and top-k in Bedrock to reduce randomness
-
❏ E. Use Amazon Bedrock Knowledge Bases for retrieval grounding
Question 7
Which technique best reduces noise in high-dimensional user-event features for a recommender model?
-
❏ A. SageMaker Clarify
-
❏ B. t-SNE embedding before training
-
❏ C. Use PCA to keep top principal components
-
❏ D. Data augmentation
-
❏ E. k-means user segmentation
-
❏ F. Feature hashing
Question 8
How should you use Amazon CloudWatch to monitor and troubleshoot a SageMaker real-time endpoint’s latency and errors?
-
❏ A. Create a CloudWatch Synthetics canary to call the endpoint and alarm on canary failures
-
❏ B. Enable SageMaker inference logging to CloudWatch Logs and alarm on ModelLatency and 5XX
-
❏ C. Use CloudTrail and VPC Flow Logs; alarm on API patterns
-
❏ D. Use CloudWatch ServiceLens with X-Ray traces on the endpoint to analyze latency
Question 9
Which SageMaker setup delivers synchronous low-latency (median < 90 ms) predictions and automated model quality drift alerts?
-
❏ A. SageMaker Asynchronous Inference with SNS alerts
-
❏ B. SageMaker Multi-Model Endpoint with CloudWatch alarms
-
❏ C. SageMaker Serverless Inference with EventBridge rules
-
❏ D. SageMaker real-time endpoint with Model Monitor alerts
Question 10
For product-specific catalog image tagging and automated detection of unsafe content in user images using Amazon Rekognition, which capability pairing should be used?
Question 11
Which AWS services best map to low-latency online recommendations (<200 ms), batch churn scoring 3 times per day, and event-driven price updates within seconds?
-
❏ A. Amazon SageMaker real-time endpoints for all models
-
❏ B. SageMaker endpoint for recommendations, EC2 for churn batch, Lambda for pricing events
-
❏ C. AWS Lambda for every workload
-
❏ D. SageMaker Asynchronous Inference for recommendations, SageMaker Batch Transform for churn, AWS Lambda for pricing
-
❏ E. SageMaker Serverless Inference for recommendations, SageMaker Processing for churn, Amazon EventBridge for pricing
Question 12
Which AWS Glue feature unifies schemas and metadata across diverse sources to improve training data quality?
-
❏ A. AWS Glue Crawlers
-
❏ B. AWS Glue Data Catalog
-
❏ C. AWS Lake Formation
-
❏ D. AWS Glue job autoscaling
Question 13
How can you rightsize a SageMaker real-time endpoint to minimize cost while meeting p95 latency and throughput targets (for example, p95 < 40 ms at ~1,500 RPS)?
-
❏ A. AWS Compute Optimizer pick smallest instance by CPU/memory
-
❏ B. SageMaker endpoint A/B traffic shifting between instance types
-
❏ C. SageMaker Inference Recommender load-test candidates and pick best cost–latency fit
-
❏ D. Manual CloudWatch review and iterative redeploy
Question 14
Which AWS services enable CI/CD from code commit to build, SageMaker training on latest S3 data, and deployment to a managed endpoint?
-
❏ A. AWS CodePipeline, AWS CodeBuild, Amazon ECR
-
❏ B. AWS Step Functions, AWS CodeBuild, AWS CodePipeline
-
❏ C. AWS CodePipeline + AWS CodeBuild + Amazon SageMaker
-
❏ D. Amazon EventBridge, AWS CodeDeploy, AWS Lambda
-
❏ E. AWS Step Functions, Amazon SageMaker, Amazon ECR
Question 15
In SageMaker, which approach provides a quick, interpretable baseline for binary churn on 30-day tabular features with minimal setup?
-
❏ A. SageMaker Autopilot
-
❏ B. Train a SageMaker Linear Learner classifier on 30-day tabular features
-
❏ C. SageMaker K-means clustering on 30-day features
-
❏ D. SageMaker XGBoost with extensive HPO and 10-fold CV
Question 16
A SageMaker real-time endpoint has p95 latency near 600 ms, but the target is under 120 ms. Which configuration change most directly lowers per-request latency?
-
❏ A. SageMaker Asynchronous Inference
-
❏ B. Enable gzip on InvokeEndpoint payloads
-
❏ C. Add an Elastic Inference accelerator to the endpoint
-
❏ D. Increase instances and enable auto scaling
Question 17
In Amazon Lookout for Equipment, which data preparation best improves accuracy when training across different machine types?
-
❏ A. Add a machine_type feature and train one global model
-
❏ B. Create separate datasets per equipment type for training
-
❏ C. Apply identical normalization to all sensors and machines
-
❏ D. Merge all machines into one dataset
-
❏ E. Randomly mix records from different machines to reduce bias
Question 18
Which best describes how a deep neural network learns during training?
-
❏ A. No data; patterns from preset rules
-
❏ B. Weights set manually with fixed heuristics
-
❏ C. Labeled data with backprop and gradient optimizers to minimize loss
-
❏ D. Amazon SageMaker Autopilot
Question 19
Which SageMaker compute choices best meet production p95 latency under 30 ms at high throughput while keeping test costs low? (Choose 2)
-
❏ A. Use small CPU instances for the test endpoint to minimize cost
-
❏ B. Use SageMaker Serverless Inference for production
-
❏ C. Provision Inferentia-based instances for the production endpoint to hit low-latency, high-QPS
-
❏ D. Use the same instance type for production and test
-
❏ E. Run both endpoints on Spot capacity
-
❏ F. Choose the largest GPU instance for production
Question 20
How should you run custom TensorFlow training on SageMaker in script mode with 25 TB of data in S3?
-
❏ A. Build a custom Docker image with TensorFlow in ECR and run in script mode
-
❏ B. Package a model .tar.gz and point script mode at it to train
-
❏ C. Provide a Python entry point for SageMaker’s TensorFlow container, upload source to S3, pick a supported TF version, and launch in script mode
-
❏ D. Use SageMaker JumpStart to train from S3 instead of writing a script
AWS Machine Learning Sample Exam Questions
Question 1
From a SageMaker Studio notebook, what is the most direct, low-latency way to read the latest items from a DynamoDB table with about 120 million records?
-
✓ D. Call DynamoDB directly from the notebook using Boto3
The best choice is Call DynamoDB directly from the notebook using Boto3. Using the AWS SDK from SageMaker Studio gives immediate, low-latency access to live DynamoDB data without moving or transforming it. You can use efficient key-based operations like GetItem or Query on your table’s primary and sort keys to retrieve the freshest records with minimal overhead and setup.
Use AWS Glue to ETL into Parquet in Amazon S3 and read from S3 is batch-oriented and incurs latency from jobs and file reads, which undermines interactive exploration.
Use Amazon DynamoDB Accelerator (DAX) from the notebook can reduce read latency for hot keys but introduces additional infrastructure, client changes, and potential cache staleness, which conflicts with the need for the freshest data and minimal setup.
Use Amazon Athena with the DynamoDB federated connector routes through a Lambda-based connector and adds serialization overhead, increasing latency versus direct calls.
Export the table to S3 with on-demand export and read the files is asynchronous and designed for analytics at rest, not real-time reads.
Replicate to Amazon RDS via a scheduled Lambda and query RDS adds synchronization lag and complexity and is poorly matched to DynamoDB’s key-value access pattern.
When you see phrases like interactive, freshest records, minimal delay, and minimal setup, prefer solutions that avoid extra data movement, avoid batch exports, and minimize service hops. Direct SDK access to the operational store is usually the most latency-efficient path; S3-based ETL, federated queries, and replication typically imply higher latency and operational overhead.
Question 2
Which metric should be optimized when tuning a binary classifier’s threshold to maximize correct positives and negatives on a validation set?
-
✓ B. Classification accuracy
The correct choice is Classification accuracy. When you are selecting a single operating threshold and the goal is to maximize correct identifications of both positives and negatives, accuracy is the objective that directly counts true positives and true negatives equally on the validation set.
Area under the ROC curve (AUC-ROC) is threshold-independent and measures ranking quality across all thresholds, so optimizing AUC does not pick the best single threshold.
Precision focuses only on the quality of predicted positives and ignores true negatives, so it will not maximize correct negatives.
F1 score balances precision and recall but still excludes true negatives, making it misaligned with maximizing correct positives and negatives together.
If a prompt emphasizes a single threshold and maximizing both correct positives and negatives, think accuracy. If it stresses threshold-independent ranking, think AUC-ROC. If it highlights imbalanced classes or equal performance per class, consider balanced accuracy or class-weighted metrics. For threshold tuning, evaluate on a validation set and compare confusion-matrix-derived metrics across thresholds.
Question 3
Which Cost Explorer capabilities help analyze SageMaker spend and forecast future costs? (Choose 2)
-
✓ B. Trend charts and cost forecasting
-
✓ D. Filter costs by service, usage type, and cost allocation tags
Trend charts and cost forecasting is correct because Cost Explorer provides time-series spend visualizations and built-in forecasting, which are essential for projecting future ML costs. Filter costs by service, usage type, and cost allocation tags is also correct because Cost Explorer lets you drill into costs by dimensions and tags, making it possible to attribute spend to SageMaker training jobs and inference endpoints accurately.
AWS Cost Anomaly Detection is incorrect since it is a separate service focused on anomaly detection, not a Cost Explorer capability.
Reconfigure SageMaker resources from Cost Explorer is incorrect because Cost Explorer cannot change or manage resources; it is a reporting and analysis tool.
Automate instance scheduling and rightsizing is incorrect since Cost Explorer does not perform operational automation; that is handled by other services and tools.
AWS Budgets is incorrect because budgets and alerts are configured in AWS Budgets, not within Cost Explorer, even though they are related cost management features.
When you see keywords like forecast, historical trends, or dimension/tag filtering, think Cost Explorer. When you see alerts or enforcement, think AWS Budgets. For automation or resource changes, Cost Explorer is not the answer.
Question 4
Which serverless deployment best delivers low-latency, real-time ML inference for a small model under bursty, unpredictable traffic with minimal operations?
-
✓ C. AWS Lambda + Amazon API Gateway
AWS Lambda + Amazon API Gateway is the best fit because it is fully serverless, scales automatically with bursty traffic, and charges per request, minimizing both operational effort and cost. For a compact model needing immediate predictions, Lambda’s event-driven scaling and managed runtime make it suitable for low-latency synchronous inference, especially when packaged efficiently (e.g., zipped package or container image) and tuned for cold starts if needed.
The option Amazon SageMaker real-time endpoint is less appropriate because it keeps provisioned instances running, resulting in steady baseline cost and more management overhead, which conflicts with a cost-sensitive, sporadic workload.
Amazon EC2 behind Amazon API Gateway is not serverless and requires server management and always-on capacity, increasing cost and operations.
Amazon SageMaker Asynchronous Inference targets long-running or large-payload inference with request queuing, which is not suitable for low-latency, synchronous predictions required at request time.
Watch for keywords like serverless, bursty/unpredictable traffic, low latency, and minimal operations. These typically point to Lambda with API Gateway for small models. Be cautious of options that keep capacity warm (EC2, provisioned endpoints) when the scenario emphasizes pay-per-use or scaling to zero. Also distinguish between real-time (synchronous, low-latency) and asynchronous (queued, not immediate) inference patterns.
Question 5
Which SageMaker-native service enables automated retraining and promotion to production as new data arrives?
-
✓ B. Amazon SageMaker Pipelines with automated retrain and deploy
Amazon SageMaker Pipelines with automated retrain and deploy is correct because it provides an ML-native workflow to continuously retrain on new data, evaluate candidates, register models, and automate conditional promotion to deployment targets. It also records lineage and supports step caching and experiment tracking, which are important for repeatable CI/CD in ML.
The option AWS Step Functions is not ideal because it is a general-purpose orchestrator and lacks built-in ML-specific capabilities such as the SageMaker Model Registry, pipeline caching, and native experiment lineage; it can work, but it is not the best SageMaker-native choice for CI/CD of models.
The option Amazon SageMaker Model Monitor focuses on monitoring data and model quality for drift or violations, not on retraining or deployment automation.
The option Amazon EventBridge and AWS Lambda can trigger training on new data, but it does not provide full pipeline orchestration, evaluation gates, lineage, or automated promotion workflows.
When you see cues like continuous training, automated promotion, ML-native pipeline, and model registry, favor SageMaker Pipelines. General-purpose orchestrators like Step Functions or event-driven glue like EventBridge and Lambda often require significant custom logic and lack ML-focused features.
Question 6
In Amazon Bedrock, how can you make a text assistant return consistent, repeatable responses across conversations? (Choose 2)
-
✓ B. Fine-tune the model on domain policies and product data
-
✓ D. Lower temperature and top-k in Bedrock to reduce randomness
The most reliable ways to make generations consistent are to reduce sampling randomness and to tightly align the model to fixed, domain-specific norms. Lower temperature and top-k in Bedrock to reduce randomness directly constrains token sampling, which reduces variability between runs. Fine-tune the model on domain policies and product data helps the model internalize consistent terminology and policy interpretations, further stabilizing responses across semantically similar prompts.
The option Add response caching to reuse prior answers only returns an identical prior response for exact prompt matches and does nothing for paraphrases or slightly altered inputs, nor does it control sampling variability.
Enable CloudWatch Logs and analyze traces increases observability but does not change generation behavior, so it cannot enforce consistency.
Use Amazon Bedrock Knowledge Bases for retrieval grounding improves factual grounding but does not remove randomness in token selection, so responses can still vary in phrasing and detail.
When the goal is consistent phrasing, think sampling controls first (temperature, top-k, possibly top-p and a seed if the model supports it). Then consider fine-tuning or strict prompt templates to standardize style and policy logic. RAG helps correctness, not determinism.
Question 7
Which technique best reduces noise in high-dimensional user-event features for a recommender model?
-
✓ C. Use PCA to keep top principal components
The correct choice is Use PCA to keep top principal components. PCA projects high-dimensional features into a lower-dimensional subspace that preserves most variance, concentrating signal while discarding low-variance directions where noise typically resides. This improves generalization and is scalable with implementations like the SageMaker PCA algorithm for large, sparse interaction data.
SageMaker Clarify is focused on bias detection and explainability, not feature denoising or dimensionality reduction.
t-SNE embedding before training is designed for visualization; it is non-parametric, unstable across runs, and not suitable as a production feature transformer.
Data augmentation often increases noise for implicit-feedback logs and does not inherently denoise features.
k-means user segmentation clusters users but leaves noisy features intact and can propagate weak signals.
Feature hashing compresses dimensionality via collisions, which can mix signal and noise; it does not selectively suppress noise.
When you see high-dimensional noisy interaction features, think of dimensionality reduction that preserves signal, such as PCA. Avoid options aimed at visualization (t-SNE), bias/explainability (Clarify), generic clustering (k-means), or techniques that may amplify or ignore noise (augmentation, hashing).
Question 8
How should you use Amazon CloudWatch to monitor and troubleshoot a SageMaker real-time endpoint’s latency and errors?
-
✓ B. Enable SageMaker inference logging to CloudWatch Logs and alarm on ModelLatency and 5XX
The best approach is to combine SageMaker’s native metrics with detailed logs. Enable SageMaker inference logging to CloudWatch Logs and alarm on ModelLatency and 5XX so you receive proactive alerts on latency and error spikes while also having invocation and container logs to diagnose root cause. CloudWatch provides SageMaker-specific metrics such as ModelLatency, OverheadLatency, Invocations, InvocationsPerInstance, 4XXErrors, and 5XXErrors, which you can use with percentile statistics and alarms. Pairing these alarms with CloudWatch Logs from the endpoint allows quick correlation of spikes to model/container behavior, configuration changes, or dependency timeouts.
The option “Create a CloudWatch Synthetics canary to call the endpoint and alarm on canary failures” can detect endpoint reachability but does not expose native SageMaker metrics or detailed per-invocation insights, making it insufficient for diagnosing latency regressions.
The option “Use CloudTrail and VPC Flow Logs; alarm on API patterns” emphasizes audit and network flow telemetry, which does not capture per-invocation latency or SageMaker error rates.
The option “Use CloudWatch ServiceLens with X-Ray traces on the endpoint to analyze latency” is not applicable because SageMaker InvokeEndpoint is not natively traced by X-Ray, and it misses the dedicated SageMaker metrics required for effective troubleshooting. For the exam, look for cues like endpoint latency, 5XX errors, and per-invocation diagnostics. These keywords point to using CloudWatch Alarms on SageMaker metrics (ModelLatency, 5XX) and CloudWatch Logs from the endpoint for detailed troubleshooting. Remember that CloudTrail, VPC Flow Logs, and synthetic canaries are complementary but not substitutes for SageMaker’s native metrics and logs.
Question 9
Which SageMaker setup delivers synchronous low-latency (median < 90 ms) predictions and automated model quality drift alerts?
-
✓ D. SageMaker real-time endpoint with Model Monitor alerts
SageMaker real-time endpoint with Model Monitor alerts is correct because real-time endpoints are designed for synchronous, low-latency inference, meeting tight SLAs like median under 90 ms. SageMaker Model Monitor continuously evaluates data and quality metrics against baselines and integrates with CloudWatch to trigger alerts, satisfying the drift-notification requirement.
The option SageMaker Asynchronous Inference with SNS alerts is wrong because asynchronous mode prioritizes throughput and cost for long-running jobs and cannot reliably meet strict sub-90 ms latency, even though SNS can deliver notifications.
The option SageMaker Multi-Model Endpoint with CloudWatch alarms is incorrect since MME focuses on hosting many models behind one endpoint and CloudWatch alarms alone do not assess model quality or drift.
The option SageMaker Serverless Inference with EventBridge rules is not ideal for tight latency SLAs due to potential cold starts and variability, and EventBridge does not perform drift detection.
When the requirement emphasizes synchronous per-request latency, think real-time endpoints. For automated drift detection and notification, map to Model Monitor plus CloudWatch alarms. Beware of choices that use asynchronous, batch, or serverless patterns for strict sub-100 ms latency needs.
Question 10
For product-specific catalog image tagging and automated detection of unsafe content in user images using Amazon Rekognition, which capability pairing should be used?
-
✓ B. Amazon Rekognition Custom Labels + DetectModerationLabels
The correct choice is Amazon Rekognition Custom Labels + DetectModerationLabels. Custom Labels lets you train domain-specific models so your catalog images can be tagged with product-specific categories that generic models do not capture. DetectModerationLabels provides built-in classifiers to identify explicit, unsafe, or suggestive content in user-uploaded images, fulfilling the trust-and-safety requirement without custom training.
The option Amazon Rekognition Custom Labels + RecognizeCelebrities is incorrect because RecognizeCelebrities only identifies known public figures and does not detect unsafe content. The choice Amazon Rekognition DetectLabels + IndexFaces is wrong since IndexFaces is for face indexing/search, not moderation, and DetectLabels is a general-purpose tagger that may not capture your tailored product taxonomy.
The option Amazon Rekognition DetectLabels + DetectText is incorrect because DetectText focuses on OCR and does not provide explicit/unsafe-content detection, and DetectLabels alone is not specialized for product-specific tagging.
When you see needs for a domain-specific or brand-specific taxonomy, look for Custom Labels. When you see requirements like unsafe, explicit, NSFW, moderation, the best fit is DetectModerationLabels. Features like IndexFaces and RecognizeCelebrities are face/identity oriented, not content safety.
Question 11
Which AWS services best map to low-latency online recommendations (<200 ms), batch churn scoring 3 times per day, and event-driven price updates within seconds?
-
✓ B. SageMaker endpoint for recommendations, EC2 for churn batch, Lambda for pricing events
SageMaker endpoint for recommendations, EC2 for churn batch, Lambda for pricing events is the best fit because it aligns each workload with the appropriate execution model. Low-latency online recommendations need consistent, millisecond-scale responses and autoscaling, which is what SageMaker real-time endpoints provide. The batch churn scoring that runs a few times per day can be handled cost-effectively on EC2 with simple scheduling, without paying for always-on inference capacity. Event-driven price updates that must react within seconds are well matched to AWS Lambda’s on-demand, event-based execution model.
The option Amazon SageMaker real-time endpoints for all models forces batch and event workloads into a real-time hosting pattern, increasing cost and operational burden without benefits. The choice AWS Lambda for every workload suffers from cold starts and duration/memory limits, making it unsuitable for steady, ultra-low-latency inference and long-running batch scoring. The configuration SageMaker Asynchronous Inference for recommendations, SageMaker Batch Transform for churn, AWS Lambda for pricing misapplies Asynchronous Inference, which is designed for queued, longer-running invocations, not interactive sub-200 ms requests. The setup SageMaker Serverless Inference for recommendations, SageMaker Processing for churn, Amazon EventBridge for pricing is problematic because Serverless Inference can introduce cold-start latency for interactive calls, Processing is not primarily for inference serving, and EventBridge alone cannot execute pricing logic.
Map use cases to patterns quickly. Use real-time endpoints for steady, low-latency online inference. Choose batch compute (EC2, SageMaker Batch Transform, or Processing) for scheduled or large offline scoring. Prefer event-driven actions with AWS Lambda when responses are needed within seconds. Reserve Asynchronous Inference for high-latency, variable-duration requests. Be cautious with Serverless Inference when strict p99 latency targets are required due to potential cold starts.
Question 12
Which AWS Glue feature unifies schemas and metadata across diverse sources to improve training data quality?
-
✓ B. AWS Glue Data Catalog
The correct choice is AWS Glue Data Catalog because it is the centralized metadata repository that stores table definitions, schemas, and partitions across heterogeneous sources. Centralizing and governing metadata ensures consistent, discoverable datasets for feature engineering and training, directly improving model quality.
‘AWS Glue Crawlers’ are not the answer because crawlers only discover schema and populate the Data Catalog; they are not the unified metadata store itself.
‘AWS Lake Formation’ is incorrect because it focuses on governance and permissions for data lakes and uses the Data Catalog underneath; it is not the Glue feature that centralizes metadata.
‘AWS Glue job autoscaling’ is unrelated to schema or metadata management and instead concerns compute elasticity and cost/performance of ETL jobs.
When you see phrases like unify schemas, centralized metadata, or cross-source discoverability, map them to the Data Catalog. Differentiate Glue Crawlers (discovery/population) from the Data Catalog (the actual metadata store). Also distinguish governance services like Lake Formation from metadata catalogs.
Question 13
How can you rightsize a SageMaker real-time endpoint to minimize cost while meeting p95 latency and throughput targets (for example, p95 < 40 ms at ~1,500 RPS)?
-
✓ C. SageMaker Inference Recommender load-test candidates and pick best cost–latency fit
SageMaker Inference Recommender load-test candidates and pick best cost–latency fit is correct because it automates benchmarking across multiple instance families, sizes, and concurrency settings using your model and payloads. It reports p50/p90/p95 latency, throughput, and estimated cost so you can select the lowest-cost configuration that still satisfies targets such as p95 < 40 ms at ~1,500 RPS.
The option AWS Compute Optimizer pick smallest instance by CPU/memory is incorrect because Compute Optimizer does not evaluate model-specific inference performance on SageMaker endpoints; CPU and memory utilization alone can lead to violating latency SLOs.
The option SageMaker endpoint A/B traffic shifting between instance types is incorrect because while production variants enable live experiments and canaries, they do not systematically explore many candidates or automate load testing and cost comparison. It is best for controlled rollouts, not comprehensive rightsizing.
The option Manual CloudWatch review and iterative redeploy is incorrect because ad-hoc trial-and-error lacks consistent load generation, is slow, and often misses better configurations compared to automated, reproducible benchmarking.
When you see rightsizing for SageMaker inference with explicit cost and SLO requirements, look for automated benchmarking across instance families and sizes. The key signal is Inference Recommender. Avoid answers that rely on generic utilization tools, purely manual tuning, or production-only canary tests for initial sizing.
Question 14
Which AWS services enable CI/CD from code commit to build, SageMaker training on latest S3 data, and deployment to a managed endpoint?
-
✓ C. AWS CodePipeline + AWS CodeBuild + Amazon SageMaker
AWS CodePipeline + AWS CodeBuild + Amazon SageMaker is correct because CodePipeline reacts to source commits to orchestrate stages, CodeBuild rebuilds and tests the preprocessing and training artifacts, and SageMaker runs training on data in Amazon S3 and deploys the resulting model to a managed endpoint. This trio is the standard, managed CI/CD pattern for automating ML workflows on AWS, and is the foundation of SageMaker Projects MLOps templates.
The option AWS CodePipeline, AWS CodeBuild, Amazon ECR is incorrect because ECR is only a container registry and does not provide SageMaker training or endpoint deployment.
The option AWS Step Functions, AWS CodeBuild, AWS CodePipeline is incorrect since it omits SageMaker; without SageMaker, there is no native training or endpoint deployment even though orchestration exists.
The option Amazon EventBridge, AWS CodeDeploy, AWS Lambda is incorrect because CodeDeploy does not deploy SageMaker endpoints and this stack lacks a proper build stage for ML artifacts.
The option AWS Step Functions, Amazon SageMaker, Amazon ECR is incorrect because it lacks a CI/CD pipeline and build stage to automate from commits; ECR only stores images.
When a question emphasizes commit-triggered automation plus ML training and managed endpoint deployment, look for the combination of CodePipeline for orchestration, CodeBuild for builds/tests, and SageMaker for training and hosting. Options that swap in Step Functions, CodeDeploy, ECR, or only event triggers typically miss one of these required CI/CD or ML-specific pieces.
Question 15
In SageMaker, which approach provides a quick, interpretable baseline for binary churn on 30-day tabular features with minimal setup?
-
✓ B. Train a SageMaker Linear Learner classifier on 30-day tabular features
Train a SageMaker Linear Learner classifier on 30-day tabular features is correct because Linear Learner provides a fast, interpretable baseline for binary classification on structured data with minimal configuration in SageMaker. It supports logistic loss, produces straightforward coefficients for feature impact, and is ideal before trying more complex models like boosted trees or deep nets.
SageMaker Autopilot is not ideal for a baseline because it automates feature engineering, model selection, and tuning to maximize performance, which adds complexity beyond a quick, transparent baseline.
SageMaker K-means clustering on 30-day features is unsupervised and does not directly produce a churn classifier, so it is unsuitable as a supervised baseline.
SageMaker XGBoost with extensive HPO and 10-fold CV focuses on exhaustive tuning to maximize metrics like AUC, which contradicts the goal of a minimal, transparent baseline.
For tabular binary classification baselines in SageMaker, think simple and interpretable. Prioritize linear/logistic models for speed and clarity before exploring XGBoost or neural networks. Look for cues like “minimal setup,” “transparent,” and “baseline” to choose linear methods over automated or heavily tuned approaches.
Question 16
A SageMaker real-time endpoint has p95 latency near 600 ms, but the target is under 120 ms. Which configuration change most directly lowers per-request latency?
-
✓ C. Add an Elastic Inference accelerator to the endpoint
Add an Elastic Inference accelerator to the endpoint is correct because Elastic Inference attaches fractional GPU acceleration to a CPU instance, directly speeding up the model’s compute stage and reducing per-request latency for supported frameworks and models. This targets the primary source of latency in real-time inference: model execution time.
The option SageMaker Asynchronous Inference is incorrect because async endpoints queue and process requests out of band, optimizing throughput for long-running jobs but increasing end-to-end latency compared to real-time endpoints.
The option Enable gzip on InvokeEndpoint payloads is incorrect because payload compression affects network transfer size, which is usually a small part of overall latency versus model compute.
The option Increase instances and enable auto scaling is incorrect because scaling out addresses throughput and concurrency; single-request latency remains similar unless the instance is CPU/GPU bound due to overload.
Focus on what directly changes per-request compute time. Terms like throughput, concurrency, and auto scaling point to capacity, not single-request latency. When you see a need to cut inference latency on a CPU-backed endpoint, look for GPU acceleration options such as Elastic Inference or moving to GPU/Inferentia instances. Note that Elastic Inference has limited framework support and has seen reduced emphasis in newer architectures; in current practice, shifting to GPU or Inferentia instances may be preferred, but historically EI is associated with lowering latency without moving to full GPU instances.
Question 17
In Amazon Lookout for Equipment, which data preparation best improves accuracy when training across different machine types?
-
✓ B. Create separate datasets per equipment type for training
Create separate datasets per equipment type for training is correct because Lookout for Equipment learns multivariate temporal patterns specific to each asset class. Segmenting data by equipment type preserves unique behavior, operating ranges, and failure modes, which leads to higher-quality models and more precise anomaly detection.
The option Add a machine_type feature and train one global model is inferior because a single global model often underfits type-specific signals even with a type indicator.
Apply identical normalization to all sensors and machines can remove meaningful scale differences across equipment types, harming learnable signal.
Merge all machines into one dataset blends distinct patterns and reduces separability of anomalies.
Randomly mix records from different machines to reduce bias increases noise and obscures per-type temporal structure rather than improving generalization.
For multivariate time-series services like Lookout for Equipment, think per-asset-type datasets and models, aligned timestamps and sensor names, and consistent units. Be cautious of global models that try to cover heterogeneous equipment; they usually lose type-specific fidelity. Watch for keywords like separate datasets by equipment type versus single unified dataset.
Question 18
Which best describes how a deep neural network learns during training?
-
✓ C. Labeled data with backprop and gradient optimizers to minimize loss
Labeled data with backprop and gradient optimizers to minimize loss is correct because deep networks learn by computing a loss on training data, backpropagating gradients, and updating weights with optimizers such as SGD or Adam to iteratively reduce that loss across epochs.
The option No data; patterns from preset rules is incorrect since deep learning depends on data-driven optimization rather than rule-based discovery.
The option Weights set manually with fixed heuristics is wrong because parameters are learned automatically from data, not manually configured.
The option Amazon SageMaker Autopilot is not a learning mechanism; it is an AWS service that automates model selection and tuning but does not describe how a neural network learns internally.
Look for keywords like backpropagation, gradients, loss minimization, and optimizers. Beware of distractors that mention preset rules, manual parameter setting, or product names that are not mechanisms of learning.
Question 19
Which SageMaker compute choices best meet production p95 latency under 30 ms at high throughput while keeping test costs low? (Choose 2)
-
✓ A. Use small CPU instances for the test endpoint to minimize cost
-
✓ C. Provision Inferentia-based instances for the production endpoint to hit low-latency, high-QPS
Provision Inferentia-based instances for the production endpoint to hit low-latency, high-QPS is correct because AWS Inferentia on SageMaker is purpose-built for high-throughput, low-latency inference with strong price-performance, making it well-suited for strict p95 latency targets under heavy load. Use small CPU instances for the test endpoint to minimize cost is also correct because functional testing, validation, and basic experiments typically do not require accelerator hardware; using smaller CPUs keeps spend low while still enabling iteration.
Choose the largest GPU instance for production is incorrect because it prioritizes brute-force capacity over price-performance. Inferentia generally delivers better cost efficiency for inference at scale, and the biggest GPU may be unnecessary for the workload.
Use the same instance type for production and test is incorrect since production requires tight p95 latency at scale while test emphasizes cost control; different requirements call for different instance choices.
Run both endpoints on Spot capacity is incorrect because Spot capacity is interruptible and not suitable for real-time endpoints that need consistent low-latency availability. Spot is intended for training or batch jobs that can handle interruptions.
Use SageMaker Serverless Inference for production is incorrect for strict p95 latency SLOs at high volume due to potential cold starts and concurrency limits that can cause latency variability.
For real-time, low-latency inference at scale, consider Inferentia for price-performance. Use CPU in non-critical test environments to reduce cost. Be cautious with Serverless Inference for stringent latency SLOs and avoid Spot for latency-sensitive, always-on endpoints. Always separate production and test instance choices based on differing SLOs and budgets.
Question 20
How should you run custom TensorFlow training on SageMaker in script mode with 25 TB of data in S3?
-
✓ C. Provide a Python entry point for SageMaker’s TensorFlow container, upload source to S3, pick a supported TF version, and launch in script mode
The correct approach is Provide a Python entry point for SageMaker’s TensorFlow container, upload source to S3, pick a supported TF version, and launch in script mode. In script mode, you supply a Python training script (entry point) and optional source directory; SageMaker runs it inside the managed TensorFlow container for the specified framework version, scaling out as needed and streaming data from Amazon S3.
The option Build a custom Docker image with TensorFlow in ECR and run in script mode is unnecessary and mismatched; when you bring your own container you don’t use framework script mode, and it adds overhead without benefit if the prebuilt TensorFlow container suffices.
The option Package a model .tar.gz and point script mode at it to train confuses model artifacts with training code; tar.gz model artifacts are produced after training and are used for hosting, not as the training entry point.
The option Use SageMaker JumpStart to train from S3 instead of writing a script is for curated, prebuilt models and templates, not executing a custom TensorFlow training loop in script mode.
When you see script mode, think prebuilt framework containers plus an entry point, source_dir, and framework_version. If the prompt mentions custom code and TensorFlow, prefer the SageMaker TensorFlow container with script mode over AutoML or JumpStart. If it suggests a custom container, make sure the question truly requires BYOC; otherwise, prebuilt containers with script mode are the intended solution.