Coffee Talk: Java, News, Stories and Opinions

BLOG

DP-100 Certified Data Scientist Practice Exams

TechTarget

All Azure Questions are from my DP-100 Udemy Course and certificationexams.pro

DP-100 Azure Data Scientist Certification Exam Topics

If you want to get certified in the DP-100 Microsoft Certified Azure Data Scientist Associate exam, you need to do more than just study. You need to practice by completing DP-100 practice exams, reviewing Azure data science sample questions, and spending time with a reliable DP-100 certification exam simulator.

In this quick DP-100 practice test, we will help you get started by providing a carefully written set of DP-100 exam questions and answers. These questions mirror the tone and difficulty of the actual DP-100 exam, giving you a clear sense of how prepared you are.

DP-100 Practice Questions

Study thoroughly, practice consistently, and gain hands-on experience with Azure Machine Learning, model training, experiment tracking, pipelines, and deployment workflows. With the right preparation, you will be ready to pass the DP-100 certification exam with confidence.

Git, GitHub & GitHub Copilot Certification Made Easy
Want to get certified on the most popular AI, ML & DevOps technologies of the day? These five resources will help you get GitHub certified in a hurry. Master vibe coding and prove it with the GitHub Copilot certification Prove you know how CI/CD with the GitHub Action certification Prove you know your way around Git and GitHub with the GitHub Foundation certification. Show you know how to work on a team with a Scrum Master cert or Product Owner credential Let employers know you understand LLMs and Agentic AI with a GCP AI Leader certification Get certified in the latest AI, ML and DevOps technologies. Advance your career today.

Git, GitHub & GitHub Copilot Certification Made Easy

Want to get certified on the most popular AI, ML & DevOps technologies of the day? These five resources will help you get GitHub certified in a hurry.

Master vibe coding and prove it with the GitHub Copilot certification
Prove you know how CI/CD with the GitHub Action certification
Prove you know your way around Git and GitHub with the GitHub Foundation certification.
Show you know how to work on a team with a Scrum Master cert or Product Owner credential
Let employers know you understand LLMs and Agentic AI with a GCP AI Leader certification

Get certified in the latest AI, ML and DevOps technologies. Advance your career today.

DP-100 Certification Exam Simulator Questions

All Azure Questions are from my DP-100 Udemy Course and certificationexams.pro

Data Scientist Question 1

Greylock Security Services is a private security contractor started by Evan Rhodes and the company uses Microsoft Azure for its infrastructure, and you are consulting on a project that needs prepared datasets, and an intern asks why training data is required, which statement best describes the role of training data in a machine learning project?

❏ A. A reserved portion of data used to evaluate the model after it is trained
❏ B. The entire dataset applied at once to train the model
❏ C. A subset of records used to train the model so it can learn patterns from examples
❏ D. A separate subset used to tune hyperparameters and guide model selection

Data Scientist Question 2

Aerix Components is a mid sized electronics manufacturer based in Harbor Point San Diego and led by Lena Ortiz. Lena is training a regression model in a notebook to predict how many customers will visit Aerix Boutique stores per day. She wants to log a single metric with MLflow at the end of each training epoch to monitor model performance. Which metric should Lena record in MLflow?

❏ A. Mean absolute error
❏ B. Accuracy
❏ C. Root mean squared error
❏ D. Recall

Data Scientist Question 3

A data engineering team at Meridian Insights plans to perform interactive data cleaning and exploration using a distributed Spark setup in an Azure Machine Learning workspace and they need to know which compute offerings can power their notebooks? (Choose 2)

❏ A. Notebook VM
❏ B. Google Cloud Dataproc
❏ C. Serverless Spark Compute
❏ D. Synapse Analytics Spark Pool
❏ E. AML Compute Cluster

Data Scientist Question 4

A regional agritech company called GreenHarvest is using its cloud machine learning workspace to develop a conventional model that will predict which plots are most suitable for different seed varieties. Which machine learning framework is the best fit for this traditional classification task?

❏ A. PyTorch
❏ B. ONNX
❏ C. scikit-learn
❏ D. TensorFlow

Data Scientist Question 5

Scenario: Bistro Solace, a high end Brooklyn restaurant, was founded by Anna Pierce and Marcus Lee who are improving their operations and have adopted Microsoft Azure for analytics. The team is building a training pipeline for a regression model using a dataset of many numeric features that exist on different ranges and Anna needs the numeric features scaled relative to each feature’s minimum and maximum values. Which module should Anna add to the pipeline to perform this min max scaling transformation?

❏ A. Cloud Dataflow
❏ B. Select Columns in Dataset
❏ C. Clean Missing Data
❏ D. Normalize Data

Data Scientist Question 6

Aurora Forecasting is a predictive analytics startup led by Maya Patel and they are preparing an hourly time series dataset that spans about 18 months in Azure Machine Learning Studio and they need to split the records into training and testing sets using the Split Data module while preserving temporal order; which splitting mode should they select?

❏ A. Regular Expression Split
❏ B. Relative Expression Split mode
❏ C. Split Rows with Randomized option enabled
❏ D. Recommender Split

Data Scientist Question 7

A data science team at NorthStar Analytics registered a tabular dataset named ‘model_train_set’ and assigned it to a variable for use by an estimator when running a training script. They want the script to have access to the dataset during the job run. Which estimator property should be configured to provide the training script with the dataset?

❏ A. data_reference = model_train_set
❏ B. script_params = {“–data”:model_train_set}
❏ C. inputs = [model_train_set.as_named_input(“model_train_set”)]
❏ D. source_directory = model_train_set

Data Scientist Question 8

Scenario The Velvet Room is an upscale nightclub in New Metro that also serves as a front for a small business owner. You are hired as a contractor to advise the IT team on machine learning workflows in Microsoft Azure. The team has a dataset with over 180 features. The lead developer plans to train a Two-Class Support Vector Machine binary classifier. The requirement is to compute feature importance with the Permutation Feature Importance module in Azure Machine Learning Designer. The developer lists these actions. a. Upload a dataset to the pipeline. b. Add a Split Data module to create training and test subsets. c. Add a Two-Class Support Vector Machine module to define the SVM estimator. d. Add a Train Model module to produce a trained model. e. Add a Permutation Feature Importance module and attach the trained model and test data. f. Set the performance metric to Classification Accuracy and run the experiment. What is the correct order of these steps?

❏ A. c then a then d then b then e then f
❏ B. b then c then a then d then e then f
❏ C. a then b then c then d then e then f
❏ D. a then c then b then d then e then f

Data Scientist Question 9

Fairlearn is a Python toolkit used to assess models and surface disparities in predictions and performance for specified sensitive attributes, and it can upload dashboard metrics to a StratusML workspace for team review. Which Fairlearn parity constraint matches the description “Use this constraint with any of the reduction-based mitigation algorithms to restrict the loss for each sensitive feature group in a regression model”?

❏ A. Equalized odds
❏ B. False-positive rate parity
❏ C. Bounded group loss
❏ D. Demographic parity
❏ E. True positive rate parity
❏ F. Error rate parity

Data Scientist Question 10

A mobility startup called HarborRent fit a linear regression using seven days of past scooter rental counts and temperature data and now asks whether the coefficient of determination R squared must always produce values that are zero or greater?

❏ A. False
❏ B. True

Data Scientist Question 11

A data scientist at a retail analytics firm is validating a binary classifier developed in Azure Machine Learning studio, which performance measures should they examine to understand how the classifier behaves on positive and negative cases? (Choose 2)

❏ A. Area under the ROC curve
❏ B. True positives
❏ C. Mean absolute error
❏ D. False positives

Data Scientist Question 12

Velvet Echo is an upscale music lounge in Harbor City that also serves as a covert base for a local syndicate and it stores a well structured CSV file in cloud storage and you are advising the venue’s IT staff on how to load that data efficiently into a Pandas DataFrame for analysis which Azure Machine Learning data object should be created to simplify conversion of the CSV into a Pandas DataFrame?

❏ A. A file dataset
❏ B. A workspace datastore
❏ C. None of the listed choices
❏ D. A tabular dataset

Data Scientist Question 13

Sentinel Security operates a protective logistics firm and Maya leads a group of data scientists who want to standardize a reproducible workspace using Azure CLI v2. Which Azure CLI v2 command should Maya run to create a new custom environment for her team?

❏ A. az ml environment update
❏ B. az ml environment create
❏ C. ml_client.environments.create_or_update
❏ D. az ml environment show

All Azure Questions are from my DP-100 Udemy Course and certificationexams.pro

Data Scientist Question 14

RestoreWorks is an urban restoration startup that Metro Emergency Authority hired to assist recovery after a string of major infrastructure incidents in Harbor City. The firm is led by CEO Ana Rios and she has engaged you as a machine learning consultant. The team will tune hyperparameters that are all discrete and they require every possible combination of values to be evaluated. Which sampling method should you recommend for their tuning process?

❏ A. Sobol sampling
❏ B. Grid sampling
❏ C. Bayesian optimization sampling
❏ D. Random sampling

Data Scientist Question 15

While preparing a forecasting model for a mid sized ecommerce analytics group at Nova Insights you must identify anomalous records in the dataset. Which visualizations are most helpful for highlighting those anomalies? (Choose 2)

❏ A. ROC curve
❏ B. Box plot
❏ C. Confusion matrix
❏ D. Scatter plot
❏ E. Venn diagram

Data Scientist Question 16

Dr. Mira Cole at BrightPath Analytics is tuning a deep neural network and decides to raise the learning rate hyperparameter to accelerate convergence during training. What effect does increasing the learning rate have on the training process?

❏ A. Cloud TPU
❏ B. Backpropagation applies larger weight updates
❏ C. Training uses more samples per mini batch
❏ D. The network gains additional hidden layers automatically

Data Scientist Question 17

A predictive analytics group at Meridian Insights built a regression model and now needs to choose an evaluation metric. Which metric is best described by taking the square root of the mean of the squared differences between predicted and actual values and yielding a result in the same units as the target where a larger gap from the mean absolute error signals greater dispersion among individual errors?

❏ A. Coefficient of Determination R2
❏ B. Relative Absolute Error RAE
❏ C. Root Mean Squared Error RMSE
❏ D. Relative Squared Error RSE

Data Scientist Question 18

Nimbus AI Studio supports many open source machine learning frameworks and libraries. Which framework serves as an open interchange standard for representing trained machine learning models?

❏ A. scikit-learn
❏ B. PyTorch
❏ C. TensorFlow SavedModel
❏ D. ONNX

Data Scientist Question 19

Mountvale Motors in Oregon buys and sells pre owned cars and small trucks and the owner plans to use past sales records to train a model that can estimate a resale price from features like manufacturer model engine displacement and odometer reading. What type of machine learning model should they create with automated machine learning to predict a numeric price?

❏ A. Classification
❏ B. Time series forecasting
❏ C. Clustering
❏ D. Regression

Data Scientist Question 20

Harbor Ledger is a regional news publisher led by Benjamin Carter that has grown from a small operation into a recognized media outlet. Carter has hired you as a data engineering consultant to improve analytics workflows and infrastructure. One active project uses Azure Machine Learning Studio to perform feature engineering on a dataset. The team needs to normalize values so they are placed into a feature column with values grouped into bins. Carter tells the team to use Entropy Minimum Description Length MDL binning mode to achieve this. Does Carter’s instruction satisfy the stated project requirement?

❏ A. Yes
❏ B. No

Data Scientist Question 21

Scenario Tailored Threads a boutique apparel company based in Manchester expanded by acquiring a label in Barcelona and the team is integrating systems with Microsoft Azure and they asked you to advise on Azure Machine Learning. The engineering group created an Azure Machine Learning compute target named ComputeAlpha using the STANDARD_D2 virtual machine size yet ComputeAlpha shows zero active nodes. A developer has a ws variable that references the Azure Machine Learning workspace and runs the following code Python from azureml.core.compute import ComputeTarget, AmlCompute from azureml.core.compute_target import ComputeTargetException the_cluster_name = “ComputeAlpha” try: the_cluster = ComputeTarget(workspace=ws,name=”ComputeAlpha”) print(“Step1″) except ComputeTargetException: config = AmlCompute.provisioning_configuration(vm_size=”STANDARD_DS3_v2″,max_nodes=6) the_cluster = ComputeTarget.create(ws,”ComputeAlpha”,config) print(“Step2”) The operations lead to no exception and Alfred wants to know whether experiments configured to use the_cluster will run on ComputeAlpha?

❏ A. Only interactive runs from the developer machine will use ComputeAlpha and not batch experiments
❏ B. No experiments will run because the cluster currently has zero active nodes
❏ C. Yes experiments configured to use the_cluster will automatically run on ComputeAlpha
❏ D. Experiments will run only after an administrator scales the cluster to one or more nodes

Data Scientist Question 22

A data science team at Evergreen Analytics is using a visual modeling tool to perform filter based feature selection for a multi class classification problem and the dataset includes categorical predictors that are strongly associated with the target label. Which statistical scoring technique is most appropriate to identify the best categorical features?

❏ A. Spearman rank correlation
❏ B. Pearson correlation
❏ C. Mutual information
❏ D. Chi squared test

Data Scientist Question 23

Nova Instruments is a Seattle based engineering firm led by Maria Chen and it is expanding quickly which is creating new data science requirements. The team needs to construct an Azure Machine Learning pipeline using Designer that trains a model from a comma separated values file hosted on a website and the pipeline must ingest the CSV into Designer with minimal administrative setup while no dataset has yet been registered for that file. Which module should be added to the Designer pipeline to accomplish this?

❏ A. Enter Data Manually
❏ B. Import Data
❏ C. Register a Dataset
❏ D. Convert to CSV

Data Scientist Question 24

Tavern Oak is an upscale Boston bistro founded by Lena Park and Marco Diaz and the restaurant is adopting Microsoft Azure Machine Learning to streamline its operations. You are consulting on several projects and Marco is preparing a new Azure Machine Learning experiment that will train on a small dataset while Lena wants to avoid paying for a cloud virtual machine. Which compute option should Marco select to minimize cost while still handling the low volume training workload?

❏ A. Compute cluster
❏ B. Inference cluster
❏ C. Local compute
❏ D. Compute instance

Data Scientist Question 25

Fill in the blank in the following sentence in the context of machine learning at a fictional cloud provider called Contoso Cloud. [__] is a type of machine learning where you train a model to predict which category or class an item belongs to. For instance a neighborhood clinic could use patient measurements such as height weight blood pressure and fasting glucose to decide whether a patient is diabetic. Which word or words correctly fill the blank?

❏ A. Regression
❏ B. Statistics
❏ C. BigQuery
❏ D. Probability
❏ E. Classification

Data Scientist Question 26

A fintech startup called NovaInsights needs to serve model predictions to its mobile clients with very low latency for live decision making. Which Azure service should the team use to host the trained model so it can deliver real time predictions with minimal delay?

❏ A. Azure Functions
❏ B. Azure App Service
❏ C. Azure Machine Learning Studio
❏ D. Azure Kubernetes Service (AKS)

Data Scientist Question 27

Which compute configuration is most suitable for training an image recognition model for a small robotics firm called NovaBots?

❏ A. TPU cluster with high memory and 20 nodes
❏ B. Single compute instance with low memory and 3 CPU cores
❏ C. Managed GPU cluster with medium memory and 20 nodes
❏ D. Single compute instance with high memory and 3 CPU cores

Data Scientist Question 28

Scenario: Meridian Insights is assisting a retail client with their Microsoft Azure data platform and they plan to apply K-means clustering for customer segmentation. The analytics team must define valid stopping rules for the K-means routine. Which of the following conditions can be used to stop the K-means algorithm? (Choose 3)

❏ A. Vertex AI
❏ B. A fixed number of iterations is reached
❏ C. The average distance among members of clusters increases beyond a threshold
❏ D. Centroid positions remain unchanged between updates
❏ E. The residual sum of squares drops below a preset threshold

Data Scientist Question 29

A data science team at Meridian Retail Analytics uses Orion Machine Learning to refine a demand forecasting model and they need to tune model hyperparameters. Which approach will be most effective for automated hyperparameter optimization?

❏ A. Grid Search
❏ B. Random Search
❏ C. Orion AutoML service
❏ D. Bayesian Optimization

Data Scientist Question 30

Meridian Insights is the analytics group at Horizon Tech and it is led by Lena Park and Omar Reyes. They adopted Azure Machine Learning and hired you as a consultant to oversee key projects. Lena is training a classification model and she wants to measure how much each feature influenced a single prediction. Which of the following should she examine?

❏ A. Permutation feature importance
❏ B. Recall and accuracy metrics
❏ C. Local feature attributions
❏ D. Dataset level feature importance
❏ E. Precision and accuracy metrics

Data Scientist Question 31

A data scientist at Meridian Insights is preparing a dataset for a predictive model and finds many records with missing fields. When using the Clean Missing Data component in Contoso Machine Learning Studio which choice will remove entire records that contain null values?

❏ A. Replace missing numeric entries with the column mean
❏ B. Impute missing values using hot deck imputation
❏ C. Delete rows that contain any missing value
❏ D. Drop the entire feature column
❏ E. Substitute a custom placeholder for missing entries
❏ F. Replace missing categorical values with the most frequent category

Data Scientist Question 32

Elena Cruz recently joined Crescent Analytics to lead a pilot machine learning program that will classify organizational units and the team is using Azure Machine Learning Designer to build the model and Elena must choose the most appropriate evaluation metric for a classification task. Which metric should she select to measure the model’s ability to distinguish between different classes?

❏ A. Log Loss
❏ B. R Squared
❏ C. Area Under ROC Curve (AUC)
❏ D. Mean Absolute Error MAE

Data Scientist Question 33

Scenario: Meridian Data, a privately held analytics firm led by CEO Elena Cross, has a valuation above thirty two million dollars and was founded after the Meridian Foundation. Elena has asked for help because her engineering team is adopting Azure Databricks within Microsoft Azure Machine Learning for model development. During a workshop the team is debating how to show the files stored in DBFS inside a Databricks notebook. Several solutions have been proposed and only one will correctly list the files in DBFS. Which approach will list files in DBFS from a Databricks notebook?

❏ A. ls %fs /mnt/data-files
❏ B. %fs dir /mnt/data-files
❏ C. %fs ls /mnt/data-files
❏ D. ls /mnt/data-files

Data Scientist Question 34

A data science team at Summit Analytics is creating an experiment in Azure Machine Learning Studio and needs to partition a dataset into training and holdout subsets. Which module should they use to perform that split?

❏ A. Group Data into Bins module
❏ B. Split Data module
❏ C. Clip Values module
❏ D. Group Categorical Values module

Data Scientist Question 35

Meridian Recruiting LLC is a search firm based in Chicago and it is led by CEO Aisha Grant. The firm stores candidate records as TSV files in an Azure Blob container that is registered as a datastore in an Azure Machine Learning workspace. A data engineer merged all TSV files and registered the combined dataset under the name candidateSet_5 using the Azure Machine Learning Python SDK. The engineer asks whether candidateSet_5 can be converted into a Pandas DataFrame by calling python candidateSet_5.to_pandas_dataframe(). What should you tell them?

❏ A. No, candidateSet_5 cannot be converted into a Pandas DataFrame using to_pandas_dataframe() even if it was defined as tabular
❏ B. Yes, candidateSet_5 can be converted into a Pandas DataFrame using to_pandas_dataframe() if the dataset was correctly registered as a tabular dataset

Data Scientist Question 36

Within Microsoft Azure machine learning projects statistics and mathematics form the foundation and it is important to understand the technical vocabulary used by statisticians mathematicians and data scientists. The difference between a predicted label and the observed label can be treated as a measure of error yet observed values come from sampled observations that can show random variation. To make explicit the comparison between a predicted value written as “y-hat” and an observed value y we call the difference between them the . We can then aggregate the across all validation predictions to compute the model loss as a measure of predictive performance. Which word or words correctly complete the sentence?

❏ A. Coefficient of determination
❏ B. Random sampling variance
❏ C. Root Mean Squared Error
❏ D. Residuals

All Azure Questions are from my DP-100 Udemy Course and certificationexams.pro

Data Scientist Question 37

Scenario: Nordal Materials a manufacturing group based in Stockholm runs production sites across Europe and Asia. An engineer named Priya will train a machine learning model on an Azure virtual machine that already has the needed libraries and tooling installed. The dataset is small to moderate in size and dynamic scaling is not necessary. Which compute targets should Priya choose to best fit these conditions? (Choose 2)

❏ A. Compute clusters
❏ B. Serverless compute
❏ C. Attached compute
❏ D. Inference clusters on Azure Kubernetes Service
❏ E. Local compute

Data Scientist Question 38

A boutique firm named Meridian Analytics must build a model to forecast equity prices using data stored in a PostgreSQL server and a workload that needs GPU acceleration. You need to provision a virtual machine image that arrives with common machine learning frameworks and GPU drivers already installed. Which virtual machine image best fits this requirement?

❏ A. Deep Learning Virtual Machine Windows edition
❏ B. Deep Learning Virtual Machine Linux edition
❏ C. Data Science Virtual Machine Windows edition
❏ D. Deep Learning VM image on Google Compute Engine

Data Scientist Question 39

Riverside Analytics was founded by Elena Morales and is currently worth more than forty million dollars. Elena created the Riverside Foundation and she serves as both president and board chair. She has asked you to advise her IT group as they deploy Microsoft Azure Machine Learning. In a planning meeting the team intends to spin up a new cluster inside an Azure Databricks workspace and they want to know what happens behind the scenes when a cluster is created. Which of the following accurately describes the actions that occur when Azure Databricks provisions a new cluster?

❏ A. When a workspace is created you are given a reserved group of virtual machines and clusters reuse machines from that reserved group
❏ B. Azure Databricks deploys a managed appliance into your subscription and then launches driver and worker virtual machines in the appliance using the VM sizes you choose
❏ C. The platform provisions a single dedicated virtual machine to execute all notebooks and jobs for the workspace
❏ D. A serverless style compute pool is automatically created and Azure Databricks draws scaled resources from that pool for interactive workloads

Data Scientist Question 40

Maria Rivera works at Nova Labs and has trained a model using Azure Machine Learning. She needs a deployment that runs every 24 hours to process an input file and save prediction outputs to a designated Azure Blob Storage container. Which deployment approach should she choose?

❏ A. Azure Functions
❏ B. Online endpoint
❏ C. Batch endpoint
❏ D. Azure Kubernetes Service

Data Scientist Question 41

While preparing a customer ledger for BrightData Labs you must drop repeated rows and keep only the final occurrence of each duplicate entry. Which pandas DataFrame method should you call to obtain a DataFrame that preserves only the last instance of every duplicate?

❏ A. duplicated(keep=’last’)
❏ B. dropdupes(retain=’last’)
❏ C. drop_duplicates(keep=’last’)
❏ D. drop_duplicates()

Data Scientist Question 42

Fairlearn is a Python library used to examine disparities in model predictions across protected attributes and it integrates with Acme Machine Learning Studio so engineers can run experiments and upload dashboard metrics to an Acme workspace. The selection of a parity constraint depends on the mitigation method and the fairness goal. Which Fairlearn parity constraint is described by the following sentence? “This constraint can be applied with any mitigation algorithm to reduce differences among protected groups and in a binary classification setting it ensures each group has a similar proportion of false positive predictions”?

❏ A. Equalized odds
❏ B. False positive rate parity
❏ C. Demographic parity
❏ D. Error rate parity
❏ E. True positive rate parity
❏ F. Bounded group loss

Data Scientist Question 43

Aquila Analytics was founded by Sarah Patel and now runs a large network of industrial sensors and the data science team is tuning hyperparameters with Azure Machine Learning. Which distribution type should Sarah choose to support discrete hyperparameters?

❏ A. Uniform
❏ B. Categorical
❏ C. QNormal
❏ D. LogNormal

Data Scientist Question 44

At DataForge Analytics we want to build a model that estimates a numeric target using past input features and their observed numeric outcomes. What type of model meets this requirement?

❏ A. Multinomial classification model
❏ B. Vertex AI
❏ C. Ordinal classification model
❏ D. Log linear regression model
❏ E. Binomial classification model
❏ F. Regression model

Data Scientist Question 45

A data science team at Harbor Analytics is comparing visualization methods to evaluate a new binary classifier and they want a chart that highlights model precision during assessment. Which visualization should they use?

❏ A. A Receiver Operating Characteristic curve visualization
❏ B. A precision recall curve visualization
❏ C. A box plot visualization of prediction scores
❏ D. A binary classification confusion matrix visualization

Data Scientist Question 46

A high end supper club named The Velvet Parlor operates as a front for a syndicate and it hired you to advise on machine learning processes. The proprietor Lucien Vale and his staff are training a model with Microsoft Azure Machine Learning Studio and their dataset contains rows with null entries so they plan to use the Clean Missing Data module to detect and address missing values. Which module parameter should they choose to properly handle rows that include missing values?

❏ A. Substitute missing values with the most frequent value
❏ B. Drop the entire feature column that contains missing values
❏ C. Delete rows containing missing values
❏ D. Provide a custom replacement value for missing entries
❏ E. Replace missing values with the mean of the column
❏ F. Substitute missing values with the median of the column

Data Scientist Question 47

Maya Lopez is a data science consultant at Nexa Insights and she must advise the operations team on a low or no code platform that allows business analysts to train machine learning models without writing code. Which solution should Maya recommend to Nexa Insights?

❏ A. Vertex AI AutoML
❏ B. Jupyter Notebooks in Azure Machine Learning Studio
❏ C. Azure CLI v2
❏ D. Azure Automated Machine Learning

Data Scientist Question 48

Dr Elena Rivers at Beacon Data Lab is retraining a production machine learning model in Vertex AI to keep its predictions accurate and relevant. Which primary action should she prioritize to ensure the updated model will not reduce performance when it replaces the current model?

❏ A. Automate data preprocessing with a reproducible pipeline
❏ B. Compare outputs of the new model with outputs of the existing model
❏ C. Enable Vertex AI continuous evaluation and model monitoring
❏ D. Replace the existing model whenever a predefined replacement rule is met

Data Scientist Question 49

BrightMart data scientists must deploy a trained model to serve live predictions for their online storefront. The deployment must provide very low latency and maintain high request throughput. Which deployment setup should they select for optimal online serving?

❏ A. Cloud Functions
❏ B. Compute Engine with GPUs
❏ C. Cloud Run
❏ D. Google Kubernetes Engine with autoscaling

Data Scientist Question 50

You are advising Sentinel Cyber Labs and helping Maria who leads the IT group with an anomaly detection initiative on Azure Machine Learning. The team is training a support vector machine from scikit-learn and Maria needs to evaluate the trained model’s accuracy on a held out test set. Which scikit-learn method should the team use to obtain the classifier’s accuracy on the test data?

❏ A. mlflow.log_metric
❏ B. predict
❏ C. fit
❏ D. score

Data Scientist Question 51

At Northbridge Healthcare on Solace you are advising Lena on a customer retention initiative and she needs to run counterfactual what if analysis to identify the smallest changes to customer features that would switch a churn prediction to a retained outcome. Which solution will best allow Lena to perform counterfactual analysis and refine retention actions?

❏ A. MLflow
❏ B. Azure Machine Learning compute
❏ C. Responsible AI Dashboard
❏ D. ML Pipelines

Data Scientist Question 52

Azure Automated Machine Learning streamlines many parts of model building but still allows engineers to apply some manual constraints to the pipeline. Which controls can a practitioner use to influence the AutoML workflow? (Choose 2)

❏ A. Override the automatic choice of the primary evaluation metric
❏ B. Exclude specific model families from the candidate pool
❏ C. Disable scaling and normalization of numeric inputs
❏ D. Block selected feature transformations and encodings

Data Scientist Question 53

NovaChem is a UK specialty materials firm with its headquarters in Manchester and operations in many countries and the company employs thousands of staff across several sites. Priya Kumar from the IT team needs to train a classification machine learning model using Azure Machine Learning and she wants to obtain the highest performing model without writing any code. Which development approach should Priya choose to meet this requirement?

❏ A. VS Code extensions for Azure ML
❏ B. Azure Machine Learning Designer pipelines
❏ C. Azure AutoML
❏ D. Azure CLI for Machine Learning

Data Scientist Question 54

A data engineering group has a reusable YAML file that defines an Azure machine learning compute cluster. You need to publish step by step guidance on the intranet so team members can reliably provision the cluster with Azure CLI using the YAML file. Which Azure CLI command should the documentation show?

❏ A. az ml resource create -f cluster-config.yml
❏ B. az ml compute create -f cluster-config.yml
❏ C. az ml compute create –name analyticsCluster –sku Standard_D3s_v2 –min-instances 2 –max-instances 6 –type AmlCompute –resource-group analyticsRg –workspace-name analyticsWorkspace
❏ D. az ml compute add -f cluster-config.yml

Data Scientist Question 55

You are working in a notebook within Azure Machine Learning studio and you must add Python libraries so they apply only to the notebook’s current kernel and do not alter other kernels or environments. Which notebook magic command should you use to install packages into the active kernel?

❏ A. %conda
❏ B. %load
❏ C. %pip
❏ D. !pip

DP-100 Exam Simulator Answers

All Azure Questions are from my DP-100 Udemy Course and certificationexams.pro

Data Scientist Question 1

✓ C. A subset of records used to train the model so it can learn patterns from examples

The correct answer is A subset of records used to train the model so it can learn patterns from examples.

Training data supplies the examples the learning algorithm uses to adjust model parameters and form patterns that map inputs to outputs. In supervised learning the training set is usually labeled and must be representative of the problem domain so the model can generalize from those examples.

During training the model is fitted on the training set often through multiple passes or batches while other datasets are reserved for tuning and final evaluation. The training set is distinct from sets used to tune hyperparameters or to provide an unbiased final performance estimate.

A reserved portion of data used to evaluate the model after it is trained is incorrect because that phrase describes the test set, which is held back until after training to give an unbiased measure of how the model performs on new data.

The entire dataset applied at once to train the model is incorrect because you typically split data and train iteratively in batches or epochs. Using the entire dataset without separation prevents proper validation and increases the risk of overfitting and misleading results.

A separate subset used to tune hyperparameters and guide model selection is incorrect because that describes the validation set, which is used during development to compare models and choose hyperparameters while the training set is used to fit the model parameters.