Google Cloud Machine Learning Engineer Certification Practice Exam
Question 1
At CloudVista Analytics you are building a quick sentiment classification prototype in a managed Vertex AI Workbench notebook, and you want to try NLTK tokenization right away so that you can run a few experiments in under 20 minutes. How should you make the library available to the currently running Jupyter kernel?
-
❏ A. Run pip install nltk from the notebook terminal
-
❏ B. Execute !pip install nltk –user in a notebook cell
-
❏ C. Create a new Vertex AI Workbench instance from a custom image that already includes NLTK
-
❏ D. Launch a Dataflow batch job that tokenizes text with NLTK and writes outputs to Cloud Storage
Question 2
In a streaming Dataflow pipeline that reads from Pub/Sub and writes to BigQuery, what is the simplest way to run TensorFlow inference while achieving a p95 latency under 150 ms?
-
❏ A. Vertex AI endpoint
-
❏ B. Embed SavedModel in a DoFn
-
❏ C. Vertex AI Batch Prediction
Question 3
You exported a TensorFlow text classifier as a SavedModel for online predictions. The serving signature named serving_default declares one input called text with data type string and a shape of any batch size by two, and it produces two scores per example. You started TensorFlow Serving and will send an HTTP request with a JSON body and a content type of application/json to http://example.com:8502/v1/models/review_model:predict. Which request body should you send so that it matches the signature and returns a prediction?
-
❏ A. data = json.dumps({“signature_name”: “serving_defaut”, “instances”: [[“ab”, “bc”, “cd”]]})
-
❏ B. data = json.dumps({“signature_name”: “serving_default”, “instances”: [[“a”, “b”, “c”, “d”, “e”, “f”]]})
-
❏ C. data = json.dumps({“signature_name”: “serving_default”, “instances”: [[“a”, “b”], [“c”, “d”], [“e”, “f”]]})
-
❏ D. data = json.dumps({“signature_name”: “serving_default”, “instances”: [[“a”, “b”, “c”], [“d”, “e”, “f”]]})
Question 4
In Kubeflow Pipelines, what is the simplest way to execute a BigQuery SQL query and pass the results directly to the next pipeline component?
-
❏ A. Orchestrate the query with Cloud Workflows and call it from the pipeline
-
❏ B. Use the KFP BigQuery query component from the registry and pass its output to the next step
-
❏ C. Invoke a Cloud Function that uses the BigQuery API and return the result to the pipeline
Question 5
At Crestline Analytics you are training a deep neural network on a tabular dataset for customer churn. Several columns have very different value ranges where some counters reach 300000 while some ratios lie between 0 and 1. During training the gradients behave poorly and the optimizer does not converge to a good solution. What should you change in data preparation to help the model converge?
-
❏ A. Build feature crosses to combine the strongest predictors
-
❏ B. Use Vertex AI Vizier to tune the optimizer and the learning rate
-
❏ C. Scale all input features to comparable ranges using normalization or standardization
-
❏ D. Drop columns that contain missing values to simplify the dataset
Question 6
Over the next 30 days, your model will process about five million online predictions per day. How should you configure Vertex AI Model Monitoring to detect input drift while minimizing cost?
-
❏ A. Monitor only features and increase monitoring frequency
-
❏ B. Monitor features and attributions and sample predictions near 0
-
❏ C. Monitor features only and sample predictions near 1
Question 7
A startup named RiverSky is designing a Vertex AI Pipelines workflow to predict sentiment from long-form customer support chat transcripts. The team needs full control over the training code and hyperparameter choices, and after training they plan to publish the model to an endpoint for real-time predictions. Which pipeline components should be selected?
-
❏ A. TabularDatasetCreateOp, CustomTrainingJobOp, and EndpointCreateOp
-
❏ B. TextDatasetCreateOp, AutoMLTextTrainingOp, and EndpointCreateOp
-
❏ C. TextDatasetCreateOp paired with CustomTrainingJobOp and ModelDeployOp
-
❏ D. TextDatasetCreateOp, CustomTrainingJobOp, and ModelBatchPredictOp
Question 8
Which method should you use in BigQuery to discover unlabeled customer segments and automatically determine the number of clusters?
-
❏ A. AutoML Tables with labeled data
-
❏ B. Dataproc Spark MLlib k-means with elbow
-
❏ C. BigQuery ML k-means with auto k
-
❏ D. BigQuery ML PCA
Question 9
You work for a consumer electronics startup that is training a TensorFlow and Keras model to detect soldering defects on printed circuit boards, and you intend to apply random translation, cropping, and contrast changes to every mini-batch of 48 images during training. You want the data input path to be highly efficient at runtime and to make good use of compute while avoiding unnecessary storage and I/O overhead. What should you do?
-
❏ A. Implement the augmentations in Keras data generators that feed the model during fit
-
❏ B. Use Apache Beam on Dataflow to precompute many augmented variants and persist them as TFRecords in Cloud Storage
-
❏ C. Embed the random augmentation ops in the tf.data pipeline using dataset.map with TensorFlow ops and prefetch so transforms run on the fly per batch
-
❏ D. Run a Dataflow job for each training run that performs random augmentations and stages the results as TFRecords for the trainer
Question 10
In Vertex AI you are tuning two hyperparameters with Bayesian optimization, an integer embedding dimension from 32 to 96 and a learning rate from 1e-06 to 2e-02. You want to maximize validation accuracy and training time is not a constraint. How should you set the scaling type for each hyperparameter and what should you choose for maxParallelTrials?
-
❏ A. Set UNIT_LOG_SCALE for both hyperparameters and use many parallel trials
-
❏ B. Set UNIT_LINEAR_SCALE for embedding, UNIT_LOG_SCALE for learning rate, and keep maxParallelTrials low
-
❏ C. Set UNIT_LINEAR_SCALE for both hyperparameters and keep maxParallelTrials low

All GCP questions come from my Google ML Udemy Course and certificationexams.pro
Question 11
You are an ML engineer at a wind farm operator working on a predictive maintenance effort. Your task is to build a binary classifier that predicts whether a turbine will fail within the next four days so that technicians can act early. Scheduled servicing is inexpensive yet unexpected breakdowns are very costly. You trained several binary classifiers that output 1 when a failure is predicted. While evaluating on a separate validation set you want to emphasize catching as many true failures as possible and you must also ensure that more than half of the maintenance work orders triggered by the model are truly associated with imminent failures. Which model should you select?
-
❏ A. The model with the highest precision where recall is at least 0.5
-
❏ B. The model with the lowest log loss and recall at least 0.5
-
❏ C. The model with the highest recall while precision exceeds 0.5
-
❏ D. The model with the highest AUC ROC and precision above 0.5
Question 12
How can you enable high fidelity per prediction feature attributions for an existing custom classifier that is deployed for online serving on Vertex AI while making minimal code changes?
-
❏ A. Train a new AutoML Tabular model and use built in explanations
-
❏ B. Use Vertex AI Batch Predictions with explanations and reuse results for online serving
-
❏ C. Register the model in Vertex AI and enable sampled Shapley attributions with baselines
-
❏ D. Modify the custom prediction container to compute and return sampled Shapley attributions
Question 13
The loyalty analytics group at HelioMart wants to send scheduled outreach every three weeks to customers predicted to exceed a dynamic spend threshold. This is the team’s first machine learning initiative and you have been tasked with operationalizing it. You created a fresh Google Cloud project and used Vertex AI Workbench to build an initial XGBoost model from purchase transactions stored in Cloud Storage. You want an end to end process that automatically produces predictions for the team in a secure way while keeping costs down and minimizing ongoing code maintenance. What should you build to meet these goals?
-
❏ A. Create a scheduled workflow in Cloud Composer that loads data from Cloud Storage into BigQuery, trains and runs batch predictions with BigQuery ML, and stores results in a BigQuery table named crm_ops.predicted_spend
-
❏ B. Schedule a Vertex AI Workbench notebook that pulls data from Cloud Storage, performs both training and batch inference on the notebook instance, and writes a CSV with emails and expected spend back to Cloud Storage
-
❏ C. Create a scheduled pipeline on Vertex AI Pipelines that reads transactional data from Cloud Storage, uses Vertex AI for training and batch prediction, and writes a file to a Cloud Storage bucket that lists customer emails with projected spend
-
❏ D. Build a Cloud Composer workflow that reads Cloud Storage data, invokes Vertex AI for training and prediction, and emails the loyalty team’s Google Group a password protected attachment containing customer emails and predicted spend
Question 14
You need to set up a workflow that runs Dataflow preprocessing and performs batch predictions for a TensorFlow model every 30 days, then writes the results to BigQuery. Which approach should you choose?
-
❏ A. Import the model into BigQuery and use BigQuery ML with SQL transformations
-
❏ B. Deploy to a Vertex AI endpoint and send requests from Dataflow
-
❏ C. Use Vertex AI Pipelines with DataflowPythonJobOp then ModelBatchPredictOp and load results to BigQuery
Question 15
A nationwide retail bank has archived thousands of customer support call recordings on a local file server. The clips are in WAV format and average about 7 minutes each. You need to generate transcripts and analyze customer sentiment at scale, and you plan to use the Speech-to-Text API. You want the most efficient workflow with minimal operational overhead. What should you do?
-
❏ A. Iterate over the local audio files in Python, build RecognitionAudio objects from the file bytes, call the long running recognize method to produce transcripts, and then call the Natural Language API analyzeSentiment method
-
❏ B. Upload the audio files to Cloud Storage, call the long running recognize method to generate transcripts, and then call the predict method of an AutoML sentiment model to score the transcripts
-
❏ C. Upload the audio files to Cloud Storage, use the long running recognize method to create transcripts, and trigger a Cloud Function that calls the Natural Language API analyzeSentiment method
-
❏ D. Iterate over the local audio files in Python, construct RecognitionAudio objects from the file bytes, call the synchronous recognize method to obtain transcripts, and then send the text to an AutoML sentiment model
Question 16
Which Google Cloud service allows you to train and evaluate multiple models directly on BigQuery tables within 24 hours, enabling algorithm comparisons without moving data?
-
❏ A. Vertex AI AutoML Tabular
-
❏ B. BigQuery ML
-
❏ C. Vertex AI Training
Question 17
AuroraCast runs a podcast streaming platform with a custom recommendation model that ranks the next episode from a listener’s history. The model is served on a Vertex AI endpoint and was recently retrained with newer data that performed well in offline evaluation. You want to validate the new model with live users while keeping operational effort minimal. What should you do?
-
❏ A. Create a separate Vertex AI endpoint for the updated model and implement a lightweight router that randomly sends 15% of production requests to it, then grow that share only if business metrics like average minutes played per session improve
-
❏ B. Deploy the new model to the existing Vertex AI endpoint and configure traffic splitting so that 10% of live requests go to the new deployment while you monitor session length and completion rate, then gradually raise the split if results improve
-
❏ C. Log production prediction payloads in BigQuery and run a Vertex AI Experiments study with batch predictions from both models, then promote the new model only after offline metrics look better than the baseline
-
❏ D. Enable model monitoring on the current endpoint to watch for prediction drift and immediately replace the deployed model with the retrained version, then roll back if alerts fire
Question 18
Given a TensorFlow model that predicts whether a user will spend more than $20 in the next 30 days and data stored in BigQuery, what is the most cost effective and low overhead way to deploy the model and run predictions?
-
❏ A. Dataflow streaming with Pub/Sub and results in Cloud SQL
-
❏ B. BigQuery ML import SavedModel then scheduled batch predictions to Cloud SQL
-
❏ C. TensorFlow Serving on GKE with per request BigQuery lookups
Question 19
You are a data scientist at example.com which operates a regional ecommerce marketplace with around 900 short lifecycle SKUs, and the company has four years of historical sales stored in BigQuery; leadership wants monthly sales forecasts for every SKU and prefers the fastest path with minimal setup and maintenance; which solution should you choose?
-
❏ A. Use Vertex AI Forecast to build a managed forecasting model
-
❏ B. Train ARIMA_PLUS forecasting models in BigQuery ML for each SKU
-
❏ C. Build a custom model with TensorFlow on Vertex AI Training
-
❏ D. Use BigQuery ML with XGBoost regression to predict monthly sales
Question 20
How should you handle a categorical feature such as RegionTier with approximately 25 percent missing values to preserve predictive signal and ensure consistency between training and serving?
-
❏ A. Fill missing values with the most frequent category
-
❏ B. Hash bucket encode without a missing flag
-
❏ C. Add “missing” category plus a binary missing indicator
Certified Google Cloud ML Engineer Certification Practice Exam Answers

All GCP questions come from my Google ML Udemy Course and certificationexams.pro
Question 1
At CloudVista Analytics you are building a quick sentiment classification prototype in a managed Vertex AI Workbench notebook, and you want to try NLTK tokenization right away so that you can run a few experiments in under 20 minutes. How should you make the library available to the currently running Jupyter kernel?
-
✓ B. Execute !pip install nltk –user in a notebook cell
The correct option is Execute !pip install nltk –user in a notebook cell. This makes the package available to the same Python interpreter that the current Jupyter kernel uses so you can import it immediately for your quick experiments.
Installing from inside the notebook ensures the command runs in the environment backing the active kernel. The –user flag installs to the user site which does not require administrator privileges in managed environments and the package is placed on the user path so the kernel can find it. This is the fastest way to try NLTK without rebuilding or restarting infrastructure.
Run pip install nltk from the notebook terminal is not ideal because the terminal can use a different environment than the running kernel and it might require elevated permissions or a kernel restart. This can delay your ability to import the library right away.
Create a new Vertex AI Workbench instance from a custom image that already includes NLTK is unnecessary for a quick prototype. Building or selecting a custom image and provisioning a new instance takes more time and does not help with immediate availability in the current kernel.
Launch a Dataflow batch job that tokenizes text with NLTK and writes outputs to Cloud Storage does not install a library into your notebook session. Dataflow is for scalable data processing rather than for making a Python package importable in the running Jupyter kernel.
When a question asks how to use a library right away in a notebook, think about installing it from within the notebook so the current kernel sees it, and prefer a user level install when you lack admin rights.
Question 2
In a streaming Dataflow pipeline that reads from Pub/Sub and writes to BigQuery, what is the simplest way to run TensorFlow inference while achieving a p95 latency under 150 ms?
-
✓ B. Embed SavedModel in a DoFn
The correct option is Embed SavedModel in a DoFn.
Keeping the model in memory inside the pipeline workers avoids network calls and service hops, which is essential for a strict p95 target under 150 ms. You can load the SavedModel once per worker in the DoFn setup and perform inference per element with very low overhead. This keeps the design simple and self contained while providing predictable latency.
Vertex AI endpoint introduces an external RPC on every inference which adds network latency, authentication overhead, and potential tail latency from retries or throttling. These factors make it harder to consistently meet a 150 ms p95 in a streaming pipeline and they add architectural complexity compared to in process inference.
Vertex AI Batch Prediction is designed for offline batch scoring and not for real time streaming workloads. It does not meet low per element latency requirements and therefore is unsuitable for a 150 ms p95 target in a streaming pipeline.
When the question emphasizes streaming and strict p95 latency targets, favor in process inference that avoids network hops. Look for clues like simple design and per element scoring to choose embedding the model within the pipeline over external prediction services.
Question 3
You exported a TensorFlow text classifier as a SavedModel for online predictions. The serving signature named serving_default declares one input called text with data type string and a shape of any batch size by two, and it produces two scores per example. You started TensorFlow Serving and will send an HTTP request with a JSON body and a content type of application/json to http://example.com:8502/v1/models/review_model:predict. Which request body should you send so that it matches the signature and returns a prediction?
-
✓ C. data = json.dumps({“signature_name”: “serving_default”, “instances”: [[“a”, “b”], [“c”, “d”], [“e”, “f”]]})
The correct request body is text: data = json.dumps({“signature_name”: “serving_default”, “instances”: [[“a”, “b”], [“c”, “d”], [“e”, “f”]]}). It matches the serving signature because each instance is a list of two strings which satisfies an input tensor shaped to any batch size by two, and it uses the correct signature name.
The serving signature declares one string input named text with shape [batch, 2]. A valid REST predict request therefore sends a list of instances where each instance supplies exactly two string values. The chosen body sends three examples and each example contains two strings which aligns with the model input. Using the serving_default signature name is also required to invoke the default prediction function.
text: data = json.dumps({“signature_name”: “serving_defaut”, “instances”: [[“ab”, “bc”, “cd”]]}) is incorrect because the signature name is misspelled and the single instance supplies three strings which does not match the expected two values per example.
text: data = json.dumps({“signature_name”: “serving_default”, “instances”: [[“a”, “b”, “c”, “d”, “e”, “f”]]}) is incorrect because it provides one instance with six strings rather than instances of two strings each.
text: data = json.dumps({“signature_name”: “serving_default”, “instances”: [[“a”, “b”, “c”], [“d”, “e”, “f”]]}) is incorrect because it sends two instances with three strings each which does not satisfy the required width of two.
Translate the signature into shapes before choosing a request. If the input is string with shape batch by two then each instance must have exactly two strings and the batch dimension can be any length. Also verify the signature_name matches what the model exports.
Question 4
In Kubeflow Pipelines, what is the simplest way to execute a BigQuery SQL query and pass the results directly to the next pipeline component?
-
✓ B. Use the KFP BigQuery query component from the registry and pass its output to the next step
The correct option is Use the KFP BigQuery query component from the registry and pass its output to the next step.
This is the simplest approach because the BigQuery component is prebuilt in the KFP and Vertex AI Pipelines ecosystem and it runs the SQL while exposing the results as typed outputs that you can wire directly to the next task. You avoid writing custom integration code and you keep the data flow inside the pipeline which makes parameter passing straightforward.
With the BigQuery component you reference the task output in the pipeline to feed the next component. This lets the downstream step consume the query result without managing client libraries, authentication, or temporary storage.
Orchestrate the query with Cloud Workflows and call it from the pipeline is not the simplest path because it introduces an external orchestrator outside KFP which adds configuration and handoff complexity and brings no benefit for a single query that needs to feed the next step.
Invoke a Cloud Function that uses the BigQuery API and return the result to the pipeline requires building and deploying code and handling result serialization and HTTP integration which is more complex than using the native component designed for this scenario.
When a question asks for the simplest way inside KFP, favor a built in component from the registry because it minimizes custom code and makes it easy to pass outputs directly to the next step.
Question 5
At Crestline Analytics you are training a deep neural network on a tabular dataset for customer churn. Several columns have very different value ranges where some counters reach 300000 while some ratios lie between 0 and 1. During training the gradients behave poorly and the optimizer does not converge to a good solution. What should you change in data preparation to help the model converge?
-
✓ C. Scale all input features to comparable ranges using normalization or standardization
The correct option is Scale all input features to comparable ranges using normalization or standardization. This aligns the magnitude of features so that gradient based optimizers make balanced updates and the network converges more reliably.
When raw features span orders of magnitude the loss surface becomes ill conditioned and gradients can explode or vanish. Normalizing to zero mean and unit variance or scaling to a bounded range such as 0 to 1 improves numerical stability and lets a single learning rate work across all inputs. This preprocessing is especially important for deep models trained on tabular data where counters and ratios coexist.
The option Build feature crosses to combine the strongest predictors focuses on adding interaction terms rather than fixing the scale mismatch. Crosses can increase model capacity yet they do not address the unstable gradients caused by disparate feature ranges and they can even make optimization harder.
The option Use Vertex AI Vizier to tune the optimizer and the learning rate is hyperparameter tuning rather than data preparation. Tuning may help a little but if inputs are on very different scales the optimizer still struggles and the better first step is to rescale the features.
The option Drop columns that contain missing values to simplify the dataset removes potentially important signal and does not solve the convergence issue. A more appropriate approach is to impute or encode missing values while keeping features and ensuring they are scaled.
When you see features with very different ranges and unstable training think of normalization or standardization first before changing models or tuning hyperparameters.
Question 6
Over the next 30 days, your model will process about five million online predictions per day. How should you configure Vertex AI Model Monitoring to detect input drift while minimizing cost?
-
✓ B. Monitor features and attributions and sample predictions near 0
The correct choice is Monitor features and attributions and sample predictions near 0.
This configuration balances detection quality with cost for very high traffic. Monitoring both feature distributions and feature attributions lets you identify shifts in the input data and also shifts in what the model relies on, which strengthens input drift detection over a month. Using a very low traffic sampling rate logs only a small fraction of requests, which keeps storage and explainability computation costs low while still accumulating enough samples across 30 days to surface meaningful drift.
Monitor only features and increase monitoring frequency is not a good fit because raising the frequency increases cost and monitoring only features misses attribution drift that can reveal changes in feature importance, which weakens detection and diagnosis.
Monitor features only and sample predictions near 1 is not cost efficient because sampling near one logs almost every request, which is very expensive at five million predictions per day, and it also omits attribution monitoring which reduces visibility into the nature of the drift.
When cost matters, tune the traffic sampling rate first and choose the lowest value that still yields enough samples over the monitoring window. Combine feature distribution checks with feature attribution monitoring when the question asks about input drift and diagnostic insight.
Question 7
A startup named RiverSky is designing a Vertex AI Pipelines workflow to predict sentiment from long-form customer support chat transcripts. The team needs full control over the training code and hyperparameter choices, and after training they plan to publish the model to an endpoint for real-time predictions. Which pipeline components should be selected?
-
✓ C. TextDatasetCreateOp paired with CustomTrainingJobOp and ModelDeployOp
The correct choice is TextDatasetCreateOp paired with CustomTrainingJobOp and ModelDeployOp. This combination matches a text use case, gives full control over training code and hyperparameters, and supports publishing the trained model for real-time predictions.
TextDatasetCreateOp is designed for natural language datasets, which fits long-form chat transcripts. CustomTrainingJobOp lets you bring your own training code, libraries, and explicit hyperparameter settings, so it satisfies the requirement for full control. ModelDeployOp is the step that makes the trained model available for online predictions by deploying it to an endpoint.
TabularDatasetCreateOp, CustomTrainingJobOp, and EndpointCreateOp is not suitable because it prepares tabular data rather than text and EndpointCreateOp only creates an endpoint resource and does not deploy a model to it, so you would still be missing the deployment step needed for real-time inference.
TextDatasetCreateOp, AutoMLTextTrainingOp, and EndpointCreateOp does not meet the requirement for full control since AutoML abstracts away model code and hyperparameter choices, and creating an endpoint alone does not deploy the model for serving.
TextDatasetCreateOp, CustomTrainingJobOp, and ModelBatchPredictOp targets offline predictions to batch outputs instead of serving low-latency online predictions, so it does not fulfill the plan to publish the model to an endpoint for real time use.
Map each pipeline step to the requirement. Choose the dataset component by data modality, the training component by the level of training control needed, and the prediction component by online versus batch serving.
Question 8
Which method should you use in BigQuery to discover unlabeled customer segments and automatically determine the number of clusters?
-
✓ C. BigQuery ML k-means with auto k
The correct option is BigQuery ML k-means with auto k.
This approach runs inside BigQuery so there is no need to move data, and it is designed for unsupervised clustering to discover customer segments without labels. It can automatically select an appropriate number of clusters based on internal evaluation, which satisfies the requirement to choose k for you while working directly in BigQuery.
AutoML Tables with labeled data targets supervised learning and requires labeled training examples, so it does not discover unlabeled segments. The standalone AutoML Tables service has also been folded into Vertex AI, which means it is less likely to appear on newer exams in this form.
Dataproc Spark MLlib k-means with elbow would require moving data to a Dataproc cluster and the elbow method relies on manual inspection to pick the number of clusters, so it neither runs directly in BigQuery nor chooses k automatically.
BigQuery ML PCA performs dimensionality reduction and feature extraction rather than clustering, so it cannot form customer segments or determine the number of clusters.
Match keywords to capabilities. When you see unlabeled data with a need to find segments automatically and you must stay in BigQuery, choose BigQuery ML k-means with auto k. Mentions of labeled data usually point to supervised options instead.
Question 9
You work for a consumer electronics startup that is training a TensorFlow and Keras model to detect soldering defects on printed circuit boards, and you intend to apply random translation, cropping, and contrast changes to every mini-batch of 48 images during training. You want the data input path to be highly efficient at runtime and to make good use of compute while avoiding unnecessary storage and I/O overhead. What should you do?
-
✓ C. Embed the random augmentation ops in the tf.data pipeline using dataset.map with TensorFlow ops and prefetch so transforms run on the fly per batch
The correct option is Embed the random augmentation ops in the tf.data pipeline using dataset.map with TensorFlow ops and prefetch so transforms run on the fly per batch.
This approach performs augmentations at training time so each mini batch can receive new random translations, crops, and contrast changes without expanding the dataset on disk. Using a tf.data pipeline with map and prefetch lets you overlap CPU preprocessing with accelerator execution and reduces input bottlenecks. You can use parallel calls with autotuning and prefetch to keep the device fed while avoiding unnecessary I O and storage growth.
Implement the augmentations in Keras data generators that feed the model during fit is less efficient for high throughput training because Python based generators often become a bottleneck. They do not integrate as tightly with the TensorFlow execution pipeline and typically cannot overlap preprocessing with device execution as effectively as a tf.data pipeline.
Use Apache Beam on Dataflow to precompute many augmented variants and persist them as TFRecords in Cloud Storage creates a much larger dataset and increases storage costs and read bandwidth. It also fixes the set of augmentations before training which reduces variability per epoch and adds latency to the workflow.
Run a Dataflow job for each training run that performs random augmentations and stages the results as TFRecords for the trainer repeats heavy preprocessing for every run which adds time and cost. It replaces efficient online augmentation with additional I O and staging overhead that the question explicitly asks to avoid.
When a question emphasizes runtime efficiency and avoiding extra I O or storage, prefer tf.data with map, parallelism, and prefetch so augmentations run on the fly during training.
Question 10
In Vertex AI you are tuning two hyperparameters with Bayesian optimization, an integer embedding dimension from 32 to 96 and a learning rate from 1e-06 to 2e-02. You want to maximize validation accuracy and training time is not a constraint. How should you set the scaling type for each hyperparameter and what should you choose for maxParallelTrials?
-
✓ B. Set UNIT_LINEAR_SCALE for embedding, UNIT_LOG_SCALE for learning rate, and keep maxParallelTrials low
The correct option is Set UNIT_LINEAR_SCALE for embedding, UNIT_LOG_SCALE for learning rate, and keep maxParallelTrials low.
The embedding dimension is an integer that ranges from 32 to 96 and it varies over a narrow linear range. A linear scale is appropriate because each step change has similar meaning across the range and there is no need to bias the search toward orders of magnitude. The learning rate spans many orders of magnitude from 1e-06 to 2e-02 and a log scale is best because it lets Bayesian optimization explore small values and large values in a balanced way in log space.
Bayesian optimization benefits from learning from completed trials and when training time is not a constraint you should keep maxParallelTrials low. Fewer concurrent trials allow the optimizer to incorporate results from earlier trials which generally leads to better final accuracy.
Set UNIT_LOG_SCALE for both hyperparameters and use many parallel trials is incorrect because the embedding dimension should not be log scaled over such a small integer range and running many trials in parallel reduces the ability of Bayesian optimization to learn from prior results.
Set UNIT_LINEAR_SCALE for both hyperparameters and keep maxParallelTrials low is incorrect because the learning rate should be on a log scale due to its wide range across several orders of magnitude.
When a parameter spans orders of magnitude choose log scale. For discrete or narrow numeric ranges choose linear. For Bayesian optimization keep parallel trials low unless speed is the priority.
Question 11
You are an ML engineer at a wind farm operator working on a predictive maintenance effort. Your task is to build a binary classifier that predicts whether a turbine will fail within the next four days so that technicians can act early. Scheduled servicing is inexpensive yet unexpected breakdowns are very costly. You trained several binary classifiers that output 1 when a failure is predicted. While evaluating on a separate validation set you want to emphasize catching as many true failures as possible and you must also ensure that more than half of the maintenance work orders triggered by the model are truly associated with imminent failures. Which model should you select?
-
✓ C. The model with the highest recall while precision exceeds 0.5
The correct choice is The model with the highest recall while precision exceeds 0.5.
This scenario values catching as many true failures as possible because missed failures are costly. That means you should maximize recall. At the same time you must ensure that more than half of the triggered work orders are true failures which means precision must be strictly greater than 0.5. Selecting the model that delivers the highest recall while maintaining precision above 0.5 directly aligns with these goals and reflects the appropriate operating point on the precision recall tradeoff curve.
The model with the highest precision where recall is at least 0.5 is incorrect because it optimizes the wrong objective. Maximizing precision tends to lower recall, which conflicts with the need to catch as many true failures as possible. It also imposes a recall floor rather than enforcing the required precision greater than 0.5.
The model with the lowest log loss and recall at least 0.5 is incorrect because log loss measures probabilistic calibration and overall error, not the desired balance between recall and precision at a specific threshold. It also places a constraint on recall rather than ensuring precision is greater than 0.5.
The model with the highest AUC ROC and precision above 0.5 is incorrect because AUC ROC summarizes ranking quality across all thresholds and does not guarantee high recall at the chosen operating threshold. Although it meets the precision requirement, it does not ensure recall is maximized.
Translate business goals into metric priorities and constraints. If the question says to catch as many positives as possible then emphasize recall. If it also requires that more than half of alerts are correct then enforce a precision greater than 0.5. Optimize the prioritized metric while holding the constraint.
Question 12
How can you enable high fidelity per prediction feature attributions for an existing custom classifier that is deployed for online serving on Vertex AI while making minimal code changes?
-
✓ C. Register the model in Vertex AI and enable sampled Shapley attributions with baselines
The correct option is Register the model in Vertex AI and enable sampled Shapley attributions with baselines. This lets you keep the existing custom model and turn on Vertex AI explanations for online predictions with only configuration changes. Sampled Shapley offers high fidelity per feature attributions when you provide suitable baselines and you can enable it during model registration and deployment so your serving code stays the same.
For a custom model already served online, you add an explanation specification that selects the Sampled Shapley method and defines meaningful baselines for each feature. You then deploy the model with explanations enabled so Vertex AI computes and returns per prediction attributions in the response. This satisfies the need for high fidelity feature attributions while minimizing code changes because the platform performs the attribution calculations.
The option Train a new AutoML Tabular model and use built in explanations is incorrect because it requires retraining and moving away from your existing custom classifier. This is not a minimal code change and it alters your model stack.
The option Use Vertex AI Batch Predictions with explanations and reuse results for online serving is incorrect because batch predictions are offline. Their attributions cannot be reused for real time requests where inputs differ and low latency explanations are needed.
The option Modify the custom prediction container to compute and return sampled Shapley attributions is incorrect because it forces significant code changes and operational complexity. Vertex AI already provides built in explainability for custom models so computing Shapley inside the container is unnecessary and can increase latency.
When you see a requirement for minimal code changes with online predictions and feature attributions, prefer enabling Vertex AI explanations on the existing model. Choose the method that matches the data type such as Sampled Shapley for tabular and supply proper baselines.
Question 13
The loyalty analytics group at HelioMart wants to send scheduled outreach every three weeks to customers predicted to exceed a dynamic spend threshold. This is the team’s first machine learning initiative and you have been tasked with operationalizing it. You created a fresh Google Cloud project and used Vertex AI Workbench to build an initial XGBoost model from purchase transactions stored in Cloud Storage. You want an end to end process that automatically produces predictions for the team in a secure way while keeping costs down and minimizing ongoing code maintenance. What should you build to meet these goals?
-
✓ C. Create a scheduled pipeline on Vertex AI Pipelines that reads transactional data from Cloud Storage, uses Vertex AI for training and batch prediction, and writes a file to a Cloud Storage bucket that lists customer emails with projected spend
The correct option is Create a scheduled pipeline on Vertex AI Pipelines that reads transactional data from Cloud Storage, uses Vertex AI for training and batch prediction, and writes a file to a Cloud Storage bucket that lists customer emails with projected spend.
This approach automates the entire workflow with a managed pipeline that can be scheduled to run every three weeks and it cleanly separates steps for data ingestion, training, and batch prediction. It keeps infrastructure costs down because there is no always on orchestration environment and it leverages managed training and batch prediction so you pay primarily for the jobs you run. It also reduces maintenance because pipeline components are versioned and repeatable which simplifies retraining and reprocessing while minimizing custom glue code.
Security is stronger because data stays in Cloud Storage and results are written to a restricted bucket where IAM controls access instead of being distributed over email. The team can apply the dynamic spend threshold as a post processing step in the pipeline so only qualifying customers are included in the output file.
Create a scheduled workflow in Cloud Composer that loads data from Cloud Storage into BigQuery, trains and runs batch predictions with BigQuery ML, and stores results in a BigQuery table named crm_ops.predicted_spend is not ideal because it introduces an always running Composer environment that adds operational overhead and cost. It also shifts the solution to BigQuery ML which requires loading and managing data in BigQuery and diverges from the existing workflow that already uses managed training and batch prediction on Vertex which increases migration effort and maintenance.
Schedule a Vertex AI Workbench notebook that pulls data from Cloud Storage, performs both training and batch inference on the notebook instance, and writes a CSV with emails and expected spend back to Cloud Storage is not a production ready pattern. Notebooks are great for development but they are fragile as schedulers and require you to manage a VM, runtime environments, credentials, and reliability which increases cost and operational risk compared to a managed pipeline.
Build a Cloud Composer workflow that reads Cloud Storage data, invokes Vertex AI for training and prediction, and emails the loyalty team’s Google Group a password protected attachment containing customer emails and predicted spend adds orchestration overhead and proposes distributing sensitive outputs by email which is less secure and harder to audit. Storing results in a controlled bucket or table and sharing via IAM is the recommended pattern for security and compliance.
When a question asks for low cost and minimal maintenance, favor managed, serverless orchestration with built in scheduling and use batch prediction rather than running long lived environments or notebooks. Prefer storing results in controlled services and sharing via IAM instead of emailing files.
Question 14
You need to set up a workflow that runs Dataflow preprocessing and performs batch predictions for a TensorFlow model every 30 days, then writes the results to BigQuery. Which approach should you choose?
-
✓ C. Use Vertex AI Pipelines with DataflowPythonJobOp then ModelBatchPredictOp and load results to BigQuery
The correct option is Use Vertex AI Pipelines with DataflowPythonJobOp then ModelBatchPredictOp and load results to BigQuery.
This pipeline approach is built to orchestrate multi step ML workflows. You can trigger a Dataflow job for scalable preprocessing, then run a Vertex AI batch prediction job on the TensorFlow model, and finally load the prediction outputs into BigQuery. It also supports recurring schedules, so running the workflow every 30 days is straightforward and reliable.
Import the model into BigQuery and use BigQuery ML with SQL transformations is not appropriate because the requirement explicitly calls for Dataflow preprocessing and orchestration of a batch prediction workflow. BigQuery ML does not manage a Dataflow job and it centers on SQL based modeling or imported model prediction inside BigQuery rather than coordinating an external preprocessing pipeline and a Vertex AI batch prediction step.
Deploy to a Vertex AI endpoint and send requests from Dataflow targets online prediction and is optimized for low latency inference. The need here is scheduled batch predictions every 30 days, which is better served by batch prediction jobs that scale efficiently on large datasets and do not require an always on endpoint.
Confirm whether the use case is online or batch and then choose the tool that matches. For recurring workflows think orchestration with pipelines and native batch prediction rather than calling endpoints from a data pipeline.
Question 15
A nationwide retail bank has archived thousands of customer support call recordings on a local file server. The clips are in WAV format and average about 7 minutes each. You need to generate transcripts and analyze customer sentiment at scale, and you plan to use the Speech-to-Text API. You want the most efficient workflow with minimal operational overhead. What should you do?
-
✓ C. Upload the audio files to Cloud Storage, use the long running recognize method to create transcripts, and trigger a Cloud Function that calls the Natural Language API analyzeSentiment method
The correct option is Upload the audio files to Cloud Storage, use the long running recognize method to create transcripts, and trigger a Cloud Function that calls the Natural Language API analyzeSentiment method.
This approach fits seven minute audio because asynchronous Speech to Text is designed for longer clips and it scales well when the audio is stored in Cloud Storage. Using a Cloud Function to run sentiment analysis provides an event driven pipeline with minimal operational work since you avoid managing servers and can process transcripts as soon as they are produced. The Natural Language API offers a managed sentiment model which removes the need to build and maintain a custom model.
Iterate over the local audio files in Python, build RecognitionAudio objects from the file bytes, call the long running recognize method to produce transcripts, and then call the Natural Language API analyzeSentiment method is less efficient because streaming file bytes from a local server does not scale easily and increases operational overhead compared with placing the audio in Cloud Storage and using an event driven workflow.
Upload the audio files to Cloud Storage, call the long running recognize method to generate transcripts, and then call the predict method of an AutoML sentiment model to score the transcripts adds unnecessary complexity because AutoML sentiment requires dataset preparation and model training and management. The managed Natural Language API sentiment is sufficient for generic customer sentiment and better matches the goal of minimal overhead.
Iterate over the local audio files in Python, construct RecognitionAudio objects from the file bytes, call the synchronous recognize method to obtain transcripts, and then send the text to an AutoML sentiment model is not viable for seven minute clips because synchronous recognition is intended for short audio and it also combines the scaling drawbacks of local processing with the extra burden of AutoML.
Map question cues to services. If audio is longer than a minute use asynchronous Speech to Text with Cloud Storage and event driven processing. For generic sentiment prefer the Natural Language API to reduce operational overhead.

All GCP questions come from my Google ML Udemy Course and certificationexams.pro
Question 16
Which Google Cloud service allows you to train and evaluate multiple models directly on BigQuery tables within 24 hours, enabling algorithm comparisons without moving data?
-
✓ B. BigQuery ML
The correct option is BigQuery ML.
BigQuery ML lets you train and evaluate models with simple SQL directly on your BigQuery tables. You can iterate on multiple algorithms and compute metrics using built in functions without exporting data which makes rapid comparison within a day practical.
Vertex AI AutoML Tabular can read from BigQuery but it imports data into a managed dataset and hides algorithm selection, so it is not intended for side by side SQL driven comparisons directly on your tables.
Vertex AI Training runs custom training jobs in containers and typically relies on data in Cloud Storage, so it does not train or evaluate directly on BigQuery tables and would involve extra data movement and code.
When a question highlights training with SQL directly on BigQuery tables and avoiding data movement, think of BigQuery ML. If the scenario sounds like custom code in containers or fully automated AutoML, it is likely a Vertex AI service instead.
Question 17
AuroraCast runs a podcast streaming platform with a custom recommendation model that ranks the next episode from a listener’s history. The model is served on a Vertex AI endpoint and was recently retrained with newer data that performed well in offline evaluation. You want to validate the new model with live users while keeping operational effort minimal. What should you do?
-
✓ B. Deploy the new model to the existing Vertex AI endpoint and configure traffic splitting so that 10% of live requests go to the new deployment while you monitor session length and completion rate, then gradually raise the split if results improve
The correct option is Deploy the new model to the existing Vertex AI endpoint and configure traffic splitting so that 10% of live requests go to the new deployment while you monitor session length and completion rate, then gradually raise the split if results improve.
This choice uses Vertex AI endpoints that support multiple model deployments behind a single endpoint with configurable traffic splits. You can start with a small percentage to the new deployment and observe online business metrics from real users. This keeps operational effort low because you avoid additional infrastructure and can adjust or roll back the split quickly with a simple configuration change. Using traffic splitting on the existing endpoint is the intended and managed way to run a canary or A and B test for online predictions in Vertex AI.
Create a separate Vertex AI endpoint for the updated model and implement a lightweight router that randomly sends 15% of production requests to it, then grow that share only if business metrics like average minutes played per session improve is unnecessary operationally because Vertex AI already provides managed traffic splitting on a single endpoint. Spinning up a second endpoint and building a custom router adds complexity in routing, observability, and autoscaling coordination without providing benefits over the built-in capability.
Log production prediction payloads in BigQuery and run a Vertex AI Experiments study with batch predictions from both models, then promote the new model only after offline metrics look better than the baseline relies on offline evaluation rather than live user validation. Experiments help track training runs and evaluations but they do not route live traffic. This approach does not meet the requirement to validate with live users and it increases effort by exporting logs and running additional batch jobs.
Enable model monitoring on the current endpoint to watch for prediction drift and immediately replace the deployed model with the retrained version, then roll back if alerts fire is risky and does not satisfy gradual validation. Model monitoring is designed to detect data skew and drift over time, not to run controlled online experiments or measure business outcomes like session length. Immediate replacement increases blast radius and rollback is reactive rather than controlled.
When you see a requirement to test a new model with live users and minimal effort, look for Vertex AI endpoint traffic splitting. Prefer a gradual rollout on the same endpoint over custom routers or offline comparisons.
Question 18
Given a TensorFlow model that predicts whether a user will spend more than $20 in the next 30 days and data stored in BigQuery, what is the most cost effective and low overhead way to deploy the model and run predictions?
-
✓ B. BigQuery ML import SavedModel then scheduled batch predictions to Cloud SQL
The correct option is BigQuery ML import SavedModel then scheduled batch predictions to Cloud SQL.
This approach lets you import a TensorFlow SavedModel into BigQuery ML so you can run predictions directly where the data already lives. You avoid standing up serving infrastructure and you pay only for queries, which makes it cost effective and low overhead. You can schedule queries to run batch predictions on a regular cadence that matches the 30 day horizon, and then persist the results for use in Cloud SQL with a simple scheduled pipeline.
Dataflow streaming with Pub/Sub and results in Cloud SQL is not appropriate because the use case is naturally batch oriented and does not require real time inference. Streaming jobs keep workers running which raises ongoing costs and operational burden without adding value for a monthly prediction window.
TensorFlow Serving on GKE with per request BigQuery lookups adds significant operational complexity and cost to manage clusters and serving pods. Per request queries to BigQuery increase latency and can be expensive and inefficient when the requirement is periodic batch scoring rather than online inference.
When the model input data is already in BigQuery and the use case is batch, favor serverless options like BigQuery ML with scheduled queries. Watch for keywords like cost effective and low overhead to avoid managed clusters or always on streaming jobs.
Question 19
You are a data scientist at example.com which operates a regional ecommerce marketplace with around 900 short lifecycle SKUs, and the company has four years of historical sales stored in BigQuery; leadership wants monthly sales forecasts for every SKU and prefers the fastest path with minimal setup and maintenance; which solution should you choose?
-
✓ B. Train ARIMA_PLUS forecasting models in BigQuery ML for each SKU
The correct option is Train ARIMA_PLUS forecasting models in BigQuery ML for each SKU.
This approach uses the data where it already lives in BigQuery and lets you build time series forecasts with simple SQL. It supports monthly granularity and automatically handles trends and seasonality which is common in retail demand. It is the fastest path because you avoid data movement and infrastructure management and you can schedule training and prediction with SQL. It also minimizes maintenance because model selection and tuning are automated and retraining can be done with a scheduled query.
You can scale to hundreds of SKUs by training separate models in a single statement using the BY clause which is well suited to your roughly 900 short lifecycle products. This keeps operations lightweight while still producing per SKU forecasts that leadership requested.
Use Vertex AI Forecast to build a managed forecasting model is more complex to set up for this use case because it requires creating datasets and training jobs and potentially deploying endpoints. It can be powerful for complex multivariate setups yet it is not the quickest or lowest maintenance option when your data and requirements are already well served by BigQuery ML.
Build a custom model with TensorFlow on Vertex AI Training demands substantial code, feature engineering, tuning, and lifecycle management. This adds overhead and maintenance that conflicts with the requirement for the fastest path with minimal setup.
Use BigQuery ML with XGBoost regression to predict monthly sales is not time series aware by default and would require manual feature engineering to capture seasonality and autocorrelation. This increases effort and complexity, whereas ARIMA based forecasting in BigQuery ML is designed for this task and is simpler to operate.
When forecasts are needed quickly and the data already sits in BigQuery, prefer BigQuery ML ARIMA_PLUS with BY for many related time series. Move to Vertex AI or custom modeling only when you need complex multivariate features or bespoke architectures.
Question 0
How should you handle a categorical feature such as RegionTier with approximately 25 percent missing values to preserve predictive signal and ensure consistency between training and serving?
-
✓ C. Add “missing” category plus a binary missing indicator
The correct option is Add “missing” category plus a binary missing indicator.
This approach preserves any predictive signal carried by the absence of a value because the model can learn from both the explicit missing bucket and the separate indicator that the original value was absent. It also keeps training and serving consistent because your preprocessing will reliably map nulls or blanks to a known category while setting the indicator feature in exactly the same way for both phases. This reduces the risk of training serving skew and prevents missing values from being conflated with a real category.
Fill missing values with the most frequent category is not appropriate because it hides the fact that the value was missing and can bias the distribution toward the dominant class. If missingness is informative then this strategy erases that signal and can yield poorer predictions, and it can also cause inconsistencies if the fill value is computed on training data and not handled identically at serving.
Hash bucket encode without a missing flag is not ideal because hashing will map missing values into a bucket that is indistinguishable from real categories and collisions can blend distinct categories together. Without an explicit missing indicator the model cannot tell whether a value was truly absent or simply hashed to that bucket, which loses signal and can create ambiguity at serving time.
When you see many missing values in a categorical feature, look for answers that preserve missingness as signal and that keep preprocessing identical at training and serving to avoid skew.