Google Professional-Machine-Learning-Engineer Exam Dumps
Google Professional Machine Learning Engineer
572 Reviews
Exam Code
Professional-Machine-Learning-Engineer
Exam Name
Google Professional Machine Learning Engineer
Questions
296 Questions Answers With Explanation
Update Date
04, 14, 2026
Price
Was :
$81
Today :
$45
Was :
$99
Today :
$55
Was :
$117
Today :
$65
Why Should You Prepare For Your Google Professional Machine Learning Engineer With MyCertsHub?
At MyCertsHub, we go beyond standard study material. Our platform provides authentic Google Professional-Machine-Learning-Engineer Exam Dumps, detailed exam guides, and reliable practice exams that mirror the actual Google Professional Machine Learning Engineer test. Whether you’re targeting Google certifications or expanding your professional portfolio, MyCertsHub gives you the tools to succeed on your first attempt.
Every set of exam dumps is carefully reviewed by certified experts to ensure accuracy. For the Professional-Machine-Learning-Engineer Google Professional Machine Learning Engineer , you’ll receive updated practice questions designed to reflect real-world exam conditions. This approach saves time, builds confidence, and focuses your preparation on the most important exam areas.
Realistic Test Prep For The Professional-Machine-Learning-Engineer
You can instantly access downloadable PDFs of Professional-Machine-Learning-Engineer practice exams with MyCertsHub. These include authentic practice questions paired with explanations, making our exam guide a complete preparation tool. By testing yourself before exam day, you’ll walk into the Google Exam with confidence.
Smart Learning With Exam Guides
Our structured Professional-Machine-Learning-Engineer exam guide focuses on the Google Professional Machine Learning Engineer's core topics and question patterns. You will be able to concentrate on what really matters for passing the test rather than wasting time on irrelevant content. Pass the Professional-Machine-Learning-Engineer Exam – Guaranteed
We Offer A 100% Money-Back Guarantee On Our Products.
After using MyCertsHub's exam dumps to prepare for the Google Professional Machine Learning Engineer exam, we will issue a full refund. That’s how confident we are in the effectiveness of our study resources.
Try Before You Buy – Free Demo
Still undecided? See for yourself how MyCertsHub has helped thousands of candidates achieve success by downloading a free demo of the Professional-Machine-Learning-Engineer exam dumps.
MyCertsHub – Your Trusted Partner For Google Exams
Whether you’re preparing for Google Professional Machine Learning Engineer or any other professional credential, MyCertsHub provides everything you need: exam dumps, practice exams, practice questions, and exam guides. Passing your Professional-Machine-Learning-Engineer exam has never been easier thanks to our tried-and-true resources.
Google Professional-Machine-Learning-Engineer Sample Question Answers
Question # 1
You are working on a system log anomaly detection model for a cybersecurity organization. You have
developed the model using TensorFlow, and you plan to use it for real-time prediction. You need to
create a Dataflow pipeline to ingest data via Pub/Sub and write the results to BigQuery. You want to
minimize the serving latency as much as possible. What should you do?
A. Containerize the model prediction logic in Cloud Run, which is invoked by Dataflow. B. Load the model directly into the Dataflow job as a dependency, and use it for prediction. C. Deploy the model to a Vertex AI endpoint, and invoke this endpoint in the Dataflow job. D. Deploy the model in a TFServing container on Google Kubernetes Engine, and invoke it in the
Dataflow job.
Answer: B
Explanation:
The best option for creating a Dataflow pipeline for real-time anomaly detection is to load the model
directly into the Dataflow job as a dependency, and use it for prediction. This option has the
following advantages:
It minimizes the serving latency, as the model prediction logic is executed within the same Dataflow
pipeline that ingests and processes the data. There is no need to invoke external services or
containers, which can introduce network overhead and latency.
It simplifies the deployment and management of the model, as the model is packaged with the
Dataflow job and does not require a separate service or container. The model can be updated by
redeploying the Dataflow job with a new model version.
It leverages the scalability and reliability of Dataflow, as the model prediction logic can scale up or
down with the data volume and handle failures and retries automatically.
The other options are less optimal for the following reasons:
Option A: Containerizing the model prediction logic in Cloud Run, which is invoked by Dataflow,
introduces additional latency and complexity. Cloud Run is a serverless platform that runs stateless
containers, which means that the model prediction logic needs to be initialized and loaded every
time a request is made. This can increase the cold start latency and reduce the throughput.
Moreover, Cloud Run has a limit on the number of concurrent requests per container, which can
affect the scalability of the model prediction logic. Additionally, this option requires managing two
separate services: the Dataflow pipeline and the Cloud Run container.
Option C: Deploying the model to a Vertex AI endpoint, and invoking this endpoint in the Dataflow
job, also introduces additional latency and complexity. Vertex AI is a managed service that provides
various tools and features for machine learning, such as training, tuning, serving, and monitoring.
However, invoking a Vertex AI endpoint from a Dataflow job requires making an HTTP request, which
can incur network overhead and latency. Moreover, this option requires managing two separate
services: the Dataflow pipeline and the Vertex AI endpoint.
Option D: Deploying the model in a TFServing container on Google Kubernetes Engine, and invoking
it in the Dataflow job, also introduces additional latency and complexity. TFServing is a highperformance
serving system for TensorFlow models, which can handle multiple versions and variants
of a model. However, invoking a TFServing container from a Dataflow job requires making a gRPC or
REST request, which can incur network overhead and latency. Moreover, this option requires
managing two separate services: the Dataflow pipeline and the Google Kubernetes Engine cluster.
Reference:
[Dataflow documentation]
[TensorFlow documentation]
[Cloud Run documentation]
[Vertex AI documentation]
[TFServing documentation]
Question # 2
You have created a Vertex Al pipeline that includes two steps. The first step preprocesses 10 TB datacompletes in about 1 hour, and saves the result in a Cloud Storage bucket The second step uses theprocessed data to train a model You need to update the model's code to allow you to test differentalgorithms You want to reduce pipeline execution time and cost, while also minimizing pipelinechanges What should you do?
A. Add a pipeline parameter and an additional pipeline step Depending on the parameter value thepipeline step conducts or skips data preprocessing and starts model training. B. Create another pipeline without the preprocessing step, and hardcode the preprocessed CloudStorage file location for model training. C. Configure a machine with more CPU and RAM from the compute-optimized machine family for thedata preprocessing step. D. Enable caching for the pipeline job. and disable caching for the model training step.
Answer: D
Explanation:
The best option for reducing pipeline execution time and cost, while also minimizing pipeline
changes, is to enable caching for the pipeline job, and disable caching for the model training step.
This option allows you to leverage the power and simplicity of Vertex AI Pipelines to reuse the output
of the data preprocessing step, and avoid unnecessary recomputation. Vertex AI Pipelines is a service
that can orchestrate machine learning workflows using Vertex AI. Vertex AI Pipelines can run
preprocessing and training steps on custom Docker images, and evaluate, deploy, and monitor the
machine learning model. Caching is a feature of Vertex AI Pipelines that can store and reuse the
output of a pipeline step, and skip the execution of the step if the input parameters and the code
have not changed. Caching can help you reduce the pipeline execution time and cost, as you do not
need to re-run the same step with the same input and code. Caching can also help you minimize the
pipeline changes, as you do not need to add or remove any pipeline steps or parameters. By enabling
caching for the pipeline job, and disabling caching for the model training step, you can create a
Vertex AI pipeline that includes two steps. The first step preprocesses 10 TB data, completes in about
1 hour, and saves the result in a Cloud Storage bucket. The second step uses the processed data to
train a model. You can update the models code to allow you to test different algorithms, and run the
pipeline job with caching enabled. The pipeline job will reuse the output of the data preprocessing
step from the cache, and skip the execution of the step. The pipeline job will run the model training
step with the updated code, and disable the caching for the step. This way, you can reduce the
pipeline execution time and cost, while also minimizing pipeline changes1.
The other options are not as good as option D, for the following reasons:
Option A: Adding a pipeline parameter and an additional pipeline step, depending on the parameter
value, the pipeline step conducts or skips data preprocessing and starts model training, would
require more skills and steps than enabling caching for the pipeline job, and disabling caching for the
model training step. A pipeline parameter is a variable that can be used to control the input or
output of a pipeline step. A pipeline parameter can help you customize the pipeline logic and
behavior, and experiment with different values. An additional pipeline step is a new instance of a
pipeline component that can perform a part of the pipeline workflow, such as data preprocessing or
model training. An additional pipeline step can help you extend the pipeline functionality and
complexity, and handle different scenarios. However, adding a pipeline parameter and an additional
pipeline step, depending on the parameter value, the pipeline step conducts or skips data
preprocessing and starts model training, would require more skills and steps than enabling caching
for the pipeline job, and disabling caching for the model training step. You would need to write code,
define the pipeline parameter, create the additional pipeline step, implement the conditional logic,
and compile and run the pipeline. Moreover, this option would not reuse the output of the data
preprocessing step from the cache, but rather from the Cloud Storage bucket, which can increase the
data transfer and access costs1.
Option B: Creating another pipeline without the preprocessing step, and hardcoding the
preprocessed Cloud Storage file location for model training, would require more skills and steps than
enabling caching for the pipeline job, and disabling caching for the model training step. A pipeline
without the preprocessing step is a pipeline that only includes the model training step, and uses the
preprocessed data from the Cloud Storage bucket as the input. A pipeline without the preprocessing
step can help you avoid running the data preprocessing step every time, and reduce the pipeline
execution time and cost. However, creating another pipeline without the preprocessing step, and
hardcoding the preprocessed Cloud Storage file location for model training, would require more skills
and steps than enabling caching for the pipeline job, and disabling caching for the model training
step. You would need to write code, create a new pipeline, remove the preprocessing step, hardcode
the Cloud Storage file location, and compile and run the pipeline. Moreover, this option would not
reuse the output of the data preprocessing step from the cache, but rather from the Cloud Storage
bucket, which can increase the data transfer and access costs. Furthermore, this option would create
another pipeline, which can increase the maintenance and management costs1.
Option C: Configuring a machine with more CPU and RAM from the compute-optimized machine
family for the data preprocessing step, would not reduce the pipeline execution time and cost, while
also minimizing pipeline changes, but rather increase the pipeline execution cost and complexity. A
machine with more CPU and RAM from the compute-optimized machine family is a virtual machine
that has a high ratio of CPU cores to memory, and can provide high performance and scalability for
compute-intensive workloads. A machine with more CPU and RAM from the compute-optimized
machine family can help you optimize the data preprocessing step, and reduce the pipeline execution
time. However, configuring a machine with more CPU and RAM from the compute-optimized
machine family for the data preprocessing step, would not reduce the pipeline execution time and
cost, while also minimizing pipeline changes, but rather increase the pipeline execution cost and
complexity. You would need to write code, configure the machine type parameters for the data
preprocessing step, and compile and run the pipeline. Moreover, this option would increase the
pipeline execution cost, as machines with more CPU and RAM from the compute-optimized machine
family are more expensive than machines with less CPU and RAM from other machine
families. Furthermore, this option would not reuse the output of the data preprocessing step from
the cache, but rather re-run the data preprocessing step every time, which can increase the pipeline
execution time and cost1.
Reference:
Preparing for Google Cloud Certification: Machine Learning Engineer, Course 3: Production ML
Systems, Week 3: MLOps
Google Cloud Professional Machine Learning Engineer Exam Guide, Section 3: Scaling ML models in
production, 3.2 Automating ML workflows
Official Google Cloud Certified Professional Machine Learning Engineer Study Guide, Chapter 6:
Production ML Systems, Section 6.4: Automating ML Workflows
Vertex AI Pipelines
Caching
Pipeline parameters
Machine types
Question # 3
You have been asked to productionize a proof-of-concept ML model built using Keras. The model was
trained in a Jupyter notebook on a data scientists local machine. The notebook contains a cell that
performs data validation and a cell that performs model analysis. You need to orchestrate the steps
contained in the notebook and automate the execution of these steps for weekly retraining. You
expect much more training data in the future. You want your solution to take advantage of managed
services while minimizing cost. What should you do?
A. Move the Jupyter notebook to a Notebooks instance on the largest N2 machine type, and
schedule the execution of the steps in the Notebooks instance using Cloud Scheduler. B. Write the code as a TensorFlow Extended (TFX) pipeline orchestrated with Vertex AI Pipelines. Use
standard TFX components for data validation and model analysis, and use Vertex AI Pipelines for
model retraining. C. Rewrite the steps in the Jupyter notebook as an Apache Spark job, and schedule the execution of
the job on ephemeral Dataproc clusters using Cloud Scheduler. D. Extract the steps contained in the Jupyter notebook as Python scripts, wrap each script in an
Apache Airflow BashOperator, and run the resulting directed acyclic graph (DAG) in Cloud Composer.
Answer: B
Explanation:
The best option for productionizing a Keras model is to use TensorFlow Extended (TFX), a framework
for building end-to-end machine learning pipelines that can handle large-scale data and complex
workflows. TFX provides standard components for data ingestion, transformation, validation,
analysis, training, tuning, serving, and monitoring. TFX pipelines can be orchestrated with Vertex AI
Pipelines, a managed service that runs on Google Cloud Platform and leverages Kubernetes and
Argo. Vertex AI Pipelines allows you to automate the execution of your TFX pipeline steps, schedule
retraining jobs, and scale up or down the resources as needed. By using TFX and Vertex AI Pipelines,
you can take advantage of the following benefits:
You can reuse the existing code in your Jupyter notebook, as TFX supports Keras as a first-class
citizen. You can also use the Keras Tuner to optimize your model hyperparameters.
You can ensure data quality and consistency by using the TFX Data Validation component, which can
detect anomalies, drift, and skew in your data. You can also use the TFX SchemaGen component to
generate a schema for your data and enforce it throughout the pipeline.
You can analyze your model performance and fairness by using the TFX Model Analysis component,
which can produce various metrics and visualizations. You can also use the TFX Model Validation
component to compare your new model with a baseline model and set thresholds for deploying the
model to production.
You can deploy your model to various serving platforms by using the TFX Pusher component, which
can push your model to Vertex AI, Cloud AI Platform, TensorFlow Serving, or TensorFlow Lite. You can
also use the TFX Model Registry to manage the versions and metadata of your models.
You can monitor your model performance and health by using the TFX Model Monitor component,
which can detect data drift, concept drift, and prediction skew in your model. You can also use the
TFX Evaluator component to compute metrics and validate your model against a baseline or a slice of
data.
You can reduce the cost and complexity of managing your own infrastructure by using Vertex AI
Pipelines, which provides a serverless environment for running your TFX pipeline. You can also use
the Vertex AI Experiments and Vertex AI TensorBoard to track and visualize your pipeline runs.
Reference:
[TensorFlow Extended (TFX)]
[Vertex AI Pipelines]
[TFX User Guide]
Question # 4
You work for a bank. You have created a custom model to predict whether a loan application shouldbe flagged for human review. The input features are stored in a BigQuery table. The model isperforming well and you plan to deploy it to production. Due to compliance requirements the modelmust provide explanations for each prediction. You want to add this functionality to your model codewith minimal effort and provide explanations that are as accurate as possible What should you do?
A. Create an AutoML tabular model by using the BigQuery data with integrated Vertex ExplainableAl. B. Create a BigQuery ML deep neural network model, and use the ML. EXPLAIN_PREDICT methodwith the num_integral_steps parameter. C. Upload the custom model to Vertex Al Model Registry and configure feature-based attribution byusing sampled Shapley with input baselines. D. Update the custom serving container to include sampled Shapley-based explanations in theprediction outputs.
Answer: C
Explanation:
The best option for adding explanations to your model code with minimal effort and providing
explanations that are as accurate as possible is to upload the custom model to Vertex AI Model
Registry and configure feature-based attribution by using sampled Shapley with input baselines. This
option allows you to leverage the power and simplicity of Vertex Explainable AI to generate feature
attributions for each prediction, and understand how each feature contributes to the model output.
Vertex Explainable AI is a service that can help you understand and interpret predictions made by
your machine learning models, natively integrated with a number of Googles products and services.
Vertex Explainable AI can provide feature-based and example-based explanations to provide better
understanding of model decision making. Feature-based explanations are explanations that show
how much each feature in the input influenced the prediction. Feature-based explanations can help
you debug and improve model performance, build confidence in the predictions, and understand
when and why things go wrong. Vertex Explainable AI supports various feature attribution methods,
such as sampled Shapley, integrated gradients, and XRAI. Sampled Shapley is a feature attribution
method that is based on the Shapley value, which is a concept from game theory that measures how
much each player in a cooperative game contributes to the total payoff. Sampled Shapley
approximates the Shapley value for each feature by sampling different subsets of features, and
computing the marginal contribution of each feature to the prediction. Sampled Shapley can provide
accurate and consistent feature attributions, but it can also be computationally expensive. To reduce
the computation cost, you can use input baselines, which are reference inputs that are used to
compare with the actual inputs. Input baselines can help you define the starting point or the default
state of the features, and calculate the feature attributions relative to the input baselines. By
uploading the custom model to Vertex AI Model Registry and configuring feature-based attribution
by using sampled Shapley with input baselines, you can add explanations to your model code with
minimal effort and provide explanations that are as accurate as possible1.
The other options are not as good as option C, for the following reasons:
Option A: Creating an AutoML tabular model by using the BigQuery data with integrated Vertex
Explainable AI would require more skills and steps than uploading the custom model to Vertex AI
Model Registry and configuring feature-based attribution by using sampled Shapley with input
baselines. AutoML tabular is a service that can automatically build and train machine learning
models for structured or tabular data. AutoML tabular can use BigQuery as the data source, and
provide feature-based explanations by using integrated gradients as the feature attribution method.
However, creating an AutoML tabular model by using the BigQuery data with integrated Vertex
Explainable AI would require more skills and steps than uploading the custom model to Vertex AI
Model Registry and configuring feature-based attribution by using sampled Shapley with input
baselines. You would need to create a new AutoML tabular model, import the BigQuery data,
configure the model settings, train and evaluate the model, and deploy the model. Moreover, this
option would not use your existing custom model, which is already performing well, but create a new
model, which may not have the same performance or behavior as your custom model2.
Option B: Creating a BigQuery ML deep neural network model, and using the ML.EXPLAIN_PREDICT
method with the num_integral_steps parameter would not allow you to deploy the model to
production, and could provide less accurate explanations than using sampled Shapley with input
baselines. BigQuery ML is a service that can create and train machine learning models by using SQL
queries on BigQuery. BigQuery ML can create a deep neural network model, which is a type of
machine learning model that consists of multiple layers of neurons, and can learn complex patterns
and relationships from the data. BigQuery ML can also provide feature-based explanations by using
the ML.EXPLAIN_PREDICT method, which is a SQL function that returns the feature attributions for
each prediction. The ML.EXPLAIN_PREDICT method uses integrated gradients as the feature
attribution method, which is a method that calculates the average gradient of the prediction output
with respect to the feature values along the path from the input baseline to the input. The
num_integral_steps parameter is a parameter that determines the number of steps along the path
from the input baseline to the input. However, creating a BigQuery ML deep neural network model,
and using the ML.EXPLAIN_PREDICT method with the num_integral_steps parameter would not
allow you to deploy the model to production, and could provide less accurate explanations than
using sampled Shapley with input baselines. BigQuery ML does not support deploying the model to
Vertex AI Endpoints, which is a service that can provide low-latency predictions for individual
instances. BigQuery ML only supports batch prediction, which is a service that can provide highthroughput
predictions for a large batch of instances. Moreover, integrated gradients can provide less
accurate and consistent explanations than sampled Shapley, as integrated gradients can be sensitive
to the choice of the input baseline and the num_integral_steps parameter3.
Option D: Updating the custom serving container to include sampled Shapley-based explanations in
the prediction outputs would require more skills and steps than uploading the custom model to
Vertex AI Model Registry and configuring feature-based attribution by using sampled Shapley with
input baselines. A custom serving container is a container image that contains the model, the
dependencies, and a web server. A custom serving container can help you customize the prediction
behavior of your model, and handle complex or non-standard data formats. However, updating the
custom serving container to include sampled Shapley-based explanations in the prediction outputs
would require more skills and steps than uploading the custom model to Vertex AI Model Registry
and configuring feature-based attribution by using sampled Shapley with input baselines. You would
need to write code, implement the sampled Shapley algorithm, build and test the container image,
and upload and deploy the container image. Moreover, this option would not leverage the power
and simplicity of Vertex Explainable AI, which can provide feature-based explanations natively
integrated with Vertex AI services4.
Reference:
Preparing for Google Cloud Certification: Machine Learning Engineer, Course 3: Production ML
Systems, Week 4: Evaluation
Google Cloud Professional Machine Learning Engineer Exam Guide, Section 3: Scaling ML models in
production, 3.3 Monitoring ML models in production
Official Google Cloud Certified Professional Machine Learning Engineer Study Guide, Chapter 6:
Production ML Systems, Section 6.3: Monitoring ML Models
Vertex Explainable AI
AutoML Tables
BigQuery ML
Using custom containers for prediction
Question # 5
You recently used XGBoost to train a model in Python that will be used for online serving Your modelprediction service will be called by a backend service implemented in Golang running on a GoogleKubemetes Engine (GKE) cluster Your model requires pre and postprocessing steps You need toimplement the processing steps so that they run at serving time You want to minimize code changesand infrastructure maintenance and deploy your model into production as quickly as possible. Whatshould you do?
A. Use FastAPI to implement an HTTP server Create a Docker image that runs your HTTP server anddeploy it on your organization's GKE cluster. B. Use FastAPI to implement an HTTP server Create a Docker image that runs your HTTP serverUpload the image to Vertex Al Model Registry and deploy it to a Vertex Al endpoint. C. Use the Predictor interface to implement a custom prediction routine Build the custom containupload the container to Vertex Al Model Registry, and deploy it to a Vertex Al endpoint. D. Use the XGBoost prebuilt serving container when importing the trained model into Vertex AlDeploy the model to a Vertex Al endpoint Work with the backend engineers to implement the preandpostprocessing steps in the Golang backend service.
Answer: C
Explanation:
The best option for implementing the processing steps so that they run at serving time, minimizing
code changes and infrastructure maintenance, and deploying the model into production as quickly as
possible, is to use the Predictor interface to implement a custom prediction routine. Build the custom
container, upload the container to Vertex AI Model Registry, and deploy it to a Vertex AI endpoint.
This option allows you to leverage the power and simplicity of Vertex AI to serve your XGBoost model
with minimal effort and customization. Vertex AI is a unified platform for building and deploying
machine learning solutions on Google Cloud. Vertex AI can deploy a trained XGBoost model to an
online prediction endpoint, which can provide low-latency predictions for individual instances. A
custom prediction routine (CPR) is a Python script that defines the logic for preprocessing the input
data, running the prediction, and postprocessing the output data. A CPR can help you customize the
prediction behavior of your model, and handle complex or non-standard data formats. A CPR can also
help you minimize the code changes, as you only need to write a few functions to implement the
prediction logic. A Predictor interface is a class that inherits from the base class aiplatform.Predictor,
and implements the abstract methods predict() and preprocess(). A Predictor interface can help you
create a CPR by defining the preprocessing and prediction logic for your model. A container image is
a package that contains the model, the CPR, and the dependencies. A container image can help you
standardize and simplify the deployment process, as you only need to upload the container image to
Vertex AI Model Registry, and deploy it to Vertex AI Endpoints. By using the Predictor interface to
implement a CPR, building the custom container, uploading the container to Vertex AI Model
Registry, and deploying it to a Vertex AI endpoint, you can implement the processing steps so that
they run at serving time, minimize code changes and infrastructure maintenance, and deploy the
model into production as quickly as possible1.
The other options are not as good as option C, for the following reasons:
Option A: Using FastAPI to implement an HTTP server, creating a Docker image that runs your HTTP
server, and deploying it on your organizations GKE cluster would require more skills and steps than
using the Predictor interface to implement a CPR, building the custom container, uploading the
container to Vertex AI Model Registry, and deploying it to a Vertex AI endpoint. FastAPI is a
framework for building web applications and APIs in Python. FastAPI can help you implement an
HTTP server that can handle prediction requests and responses, and perform data preprocessing and
postprocessing. A Docker image is a package that contains the model, the HTTP server, and the
dependencies. A Docker image can help you standardize and simplify the deployment process, as you
only need to build and run the Docker image. GKE is a service that can create and manage
Kubernetes clusters on Google Cloud. GKE can help you deploy and scale your Docker image on
Google Cloud, and provide high availability and performance. However, using FastAPI to implement
an HTTP server, creating a Docker image that runs your HTTP server, and deploying it on your
organizations GKE cluster would require more skills and steps than using the Predictor interface to
implement a CPR, building the custom container, uploading the container to Vertex AI Model
Registry, and deploying it to a Vertex AI endpoint. You would need to write code, create and
configure the HTTP server, build and test the Docker image, create and manage the GKE cluster, and
deploy and monitor the Docker image. Moreover, this option would not leverage the power and
simplicity of Vertex AI, which can provide online prediction natively integrated with Google Cloud
services2.
Option B: Using FastAPI to implement an HTTP server, creating a Docker image that runs your HTTP
server, uploading the image to Vertex AI Model Registry, and deploying it to a Vertex AI endpoint
would require more skills and steps than using the Predictor interface to implement a CPR, building
the custom container, uploading the container to Vertex AI Model Registry, and deploying it to a
Vertex AI endpoint. FastAPI is a framework for building web applications and APIs in Python. FastAPI
can help you implement an HTTP server that can handle prediction requests and responses, and
perform data preprocessing and postprocessing. A Docker image is a package that contains the
model, the HTTP server, and the dependencies. A Docker image can help you standardize and
simplify the deployment process, as you only need to build and run the Docker image. Vertex AI
Model Registry is a service that can store and manage your machine learning models on Google
Cloud. Vertex AI Model Registry can help you upload and organize your Docker image, and track the
model versions and metadata. Vertex AI Endpoints is a service that can provide online prediction for
your machine learning models on Google Cloud. Vertex AI Endpoints can help you deploy your
Docker image to an online prediction endpoint, which can provide low-latency predictions for
individual instances. However, using FastAPI to implement an HTTP server, creating a Docker image
that runs your HTTP server, uploading the image to Vertex AI Model Registry, and deploying it to a
Vertex AI endpoint would require more skills and steps than using the Predictor interface to
implement a CPR, building the custom container, uploading the container to Vertex AI Model
Registry, and deploying it to a Vertex AI endpoint. You would need to write code, create and
configure the HTTP server, build and test the Docker image, upload the Docker image to Vertex AI
Model Registry, and deploy the Docker image to Vertex AI Endpoints. Moreover, this option would
not leverage the power and simplicity of Vertex AI, which can provide online prediction natively
integrated with Google Cloud services2.
Option D: Using the XGBoost prebuilt serving container when importing the trained model into
Vertex AI, deploying the model to a Vertex AI endpoint, working with the backend engineers to
implement the pre- and postprocessing steps in the Golang backend service would not allow you to
implement the processing steps so that they run at serving time, and could increase the code
changes and infrastructure maintenance. A XGBoost prebuilt serving container is a container image
that is provided by Google Cloud, and contains the XGBoost framework and the dependencies. A
XGBoost prebuilt serving container can help you deploy a XGBoost model without writing any code,
but it also limits your customization options. A XGBoost prebuilt serving container can only handle
standard data formats, such as JSON or CSV, and cannot perform any preprocessing or postprocessing
on the input or output data. If your input data requires any transformation or normalization before
running the prediction, you cannot use a XGBoost prebuilt serving container. A Golang backend
service is a service that is implemented in Golang, a programming language that can be used for web
development and system programming. A Golang backend service can help you handle the
prediction requests and responses from the frontend, and communicate with the Vertex AI endpoint.
However, using the XGBoost prebuilt serving container when importing the trained model into
Vertex AI, deploying the model to a Vertex AI endpoint, working with the backend engineers to
implement the pre- and postprocessing steps in the Golang backend service would not allow you to
implement the processing steps so that they run at serving time, and could increase the code
changes and infrastructure maintenance. You would need to write code, import the trained model
into Vertex AI, deploy the model to a Vertex AI endpoint, implement the pre- and postprocessing
steps in the Golang backend service, and test and monitor the Golang backend service. Moreover,
this option would not leverage the power and simplicity of Vertex AI, which can provide online
prediction natively integrated with Google Cloud services2.
Reference:
Preparing for Google Cloud Certification: Machine Learning Engineer, Course 3: Production ML
Systems, Week 2: Serving ML Predictions
Google Cloud Professional Machine Learning Engineer Exam Guide, Section 3: Scaling ML models in
production, 3.1 Deploying ML models to production
Official Google Cloud Certified Professional Machine Learning Engineer Study Guide, Chapter 6:
Production ML Systems, Section 6.2: Serving ML Predictions
Custom prediction routines
Using pre-built containers for prediction
Using custom containers for prediction
Question # 6
You are an ML engineer on an agricultural research team working on a crop disease detection tool to
detect leaf rust spots in images of crops to determine the presence of a disease. These spots, which
can vary in shape and size, are correlated to the severity of the disease. You want to develop a
solution that predicts the presence and severity of the disease with high accuracy. What should you
do?
A. Create an object detection model that can localize the rust spots. B. Develop an image segmentation ML model to locate the boundaries of the rust spots. C. Develop a template matching algorithm using traditional computer vision libraries. D. Develop an image classification ML model to predict the presence of the disease.
Answer: B
Explanation:
The best option for developing a solution that predicts the presence and severity of the disease with
high accuracy is to develop an image segmentation ML model to locate the boundaries of the rust
spots. Image segmentation is a technique that partitions an image into multiple regions, each
corresponding to a different object or semantic category. Image segmentation can be used to detect
and localize the rust spots in the images of crops, and measure their shape and size. This information
can then be used to determine the presence and severity of the disease, as the rust spots are
correlated to the disease symptoms. Image segmentation can also handle the variability of the rust
spots, as it does not rely on predefined templates or thresholds. Image segmentation can be
implemented using deep learning models, such as U-Net, Mask R-CNN, or DeepLab, which can learn
from large-scale datasets and achieve high accuracy and robustness. The other options are not as
suitable for developing a solution that predicts the presence and severity of the disease with high
accuracy, because:
Creating an object detection model that can localize the rust spots would only provide the bounding
boxes of the rust spots, not their exact boundaries. This would result in less precise measurements of
the shape and size of the rust spots, and might affect the accuracy of the disease prediction. Object
detection models are also more complex and computationally expensive than image segmentation
models, as they have to perform both classification and localization tasks.
Developing a template matching algorithm using traditional computer vision libraries would require
manually designing and selecting the templates for the rust spots, which might not capture the
diversity and variability of the rust spots. Template matching algorithms are also sensitive to noise,
occlusion, rotation, and scale changes, and might fail to detect the rust spots in different scenarios.
Template matching algorithms are also less accurate and robust than deep learning models, as they
do not learn from data.
Developing an image classification ML model to predict the presence of the disease would only
provide a binary or categorical output, not the location or severity of the disease. Image
classification models are also less informative and interpretable than image segmentation models, as
they do not provide any spatial information or visual explanation for the prediction. Image
classification models might also suffer from class imbalance or mislabeling issues, as the presence of
the disease might not be consistent or clear across the images. Reference:
Image Segmentation | Computer Vision | Google Developers
Crop diseases and pests detection based on deep learning: a review | Plant Methods | Full Text
Using Deep Learning for Image-Based Plant Disease Detection
Computer Vision, IoT and Data Fusion for Crop Disease Detection Using ¦
On Using Artificial Intelligence and the Internet of Things for Crop ¦
Crop Disease Detection Using Machine Learning and Computer Vision
Question # 7
You recently deployed a pipeline in Vertex Al Pipelines that trains and pushes a model to a Vertex Alendpoint to serve real-time traffic. You need to continue experimenting and iterating on yourpipeline to improve model performance. You plan to use Cloud Build for CI/CD You want to quicklyand easily deploy new pipelines into production and you want to minimize the chance that the newpipeline implementations will break in production. What should you do?
A. Set up a CI/CD pipeline that builds and tests your source code If the tests are successful use theGoogle Cloud console to upload the built container to Artifact Registry and upload the compiledpipeline to Vertex Al Pipelines. B. Set up a CI/CD pipeline that builds your source code and then deploys built artifacts into a preproductionenvironment Run unit tests in the pre-production environment If the tests are successfuldeploy the pipeline to production. C. Set up a CI/CD pipeline that builds and tests your source code and then deploys built artifacts intoa pre-production environment. After a successful pipeline run in the pre-production environmentdeploy the pipeline to production D. Set up a CI/CD pipeline that builds and tests your source code and then deploys built arrets into apre-production environment After a successful pipeline run in the pre-production environment,rebuild the source code, and deploy the artifacts to production
Answer: C
Explanation:
The best option for continuing experimenting and iterating on your pipeline to improve model
performance, using Cloud Build for CI/CD, and deploying new pipelines into production quickly and
easily, is to set up a CI/CD pipeline that builds and tests your source code and then deploys built
artifacts into a pre-production environment. After a successful pipeline run in the pre-production
environment, deploy the pipeline to production. This option allows you to leverage the power and
simplicity of Cloud Build to automate, monitor, and manage your pipeline development and
deployment workflow. Cloud Build is a service that can create and run continuous integration and
continuous delivery (CI/CD) pipelines on Google Cloud. Cloud Build can build your source code, run
unit tests, and deploy built artifacts to various Google Cloud services, such as Vertex AI Pipelines,
Vertex AI Endpoints, and Artifact Registry. A CI/CD pipeline is a workflow that can automate the
process of building, testing, and deploying software. A CI/CD pipeline can help you improve the
quality and reliability of your software, accelerate the development and delivery cycle, and reduce
the manual effort and errors. A pre-production environment is an environment that can simulate the
production environment, but is isolated from the real users and data. A pre-production environment
can help you test and validate your software before deploying it to production, and catch any bugs or
issues that may affect the user experience or the system performance. By setting up a CI/CD pipeline
that builds and tests your source code and then deploys built artifacts into a pre-production
environment, you can ensure that your pipeline code is consistent and error-free, and that your
pipeline artifacts are compatible and functional. After a successful pipeline run in the pre-production
environment, you can deploy the pipeline to production, which is the environment where your
software is accessible and usable by the real users and data. By deploying the pipeline to production
after a successful pipeline run in the pre-production environment, you can minimize the chance that
the new pipeline implementations will break in production, and ensure that your software meets the
user expectations and requirements1.
The other options are not as good as option C, for the following reasons:
Option A: Setting up a CI/CD pipeline that builds and tests your source code, and if the tests are
successful, using the Google Cloud console to upload the built container to Artifact Registry and
upload the compiled pipeline to Vertex AI Pipelines would not allow you to deploy new pipelines into
production quickly and easily, and could increase the manual effort and errors. The Google Cloud
console is a web-based user interface that can help you access and manage various Google Cloud
services, such as Artifact Registry and Vertex AI Pipelines. Artifact Registry is a service that can store
and manage your container images and other artifacts on Google Cloud. Artifact Registry can help
you upload and organize your container images, and track the image versions and metadata. Vertex
AI Pipelines is a service that can orchestrate machine learning workflows using Vertex AI. Vertex AI
Pipelines can run preprocessing and training steps on custom Docker images, and evaluate, deploy,
and monitor the machine learning model. However, setting up a CI/CD pipeline that builds and tests
your source code, and if the tests are successful, using the Google Cloud console to upload the built
container to Artifact Registry and upload the compiled pipeline to Vertex AI Pipelines would not
allow you to deploy new pipelines into production quickly and easily, and could increase the manual
effort and errors. You would need to write code, create and run the CI/CD pipeline, use the Google
Cloud console to upload the built container to Artifact Registry, and use the Google Cloud console to
upload the compiled pipeline to Vertex AI Pipelines. Moreover, this option would not use a preproduction
environment to test and validate your pipeline before deploying it to production, which
could increase the chance that the new pipeline implementations will break in production1.
Option B: Setting up a CI/CD pipeline that builds your source code and then deploys built artifacts
into a pre-production environment, running unit tests in the pre-production environment, and if the
tests are successful, deploying the pipeline to production would not allow you to test and validate
your pipeline before deploying it to production, and could cause errors or poor performance. A unit
test is a type of test that can verify the functionality and correctness of a small and isolated unit of
code, such as a function or a class. A unit test can help you debug and improve your code quality, and
catch any bugs or issues that may affect the code logic or output. However, setting up a CI/CD
pipeline that builds your source code and then deploys built artifacts into a pre-production
environment, running unit tests in the pre-production environment, and if the tests are successful,
deploying the pipeline to production would not allow you to test and validate your pipeline before
deploying it to production, and could cause errors or poor performance. You would need to write
code, create and run the CI/CD pipeline, deploy the built artifacts to the pre-production
environment, run the unit tests in the pre-production environment, and deploy the pipeline to
production. Moreover, this option would not run the pipeline in the pre-production environment,
which could prevent you from testing and validating the pipeline functionality and compatibility, and
catching any bugs or issues that may affect the pipeline workflow or output1.
Option D: Setting up a CI/CD pipeline that builds and tests your source code and then deploys built
artifacts into a pre-production environment, after a successful pipeline run in the pre-production
environment, rebuilding the source code, and deploying the artifacts to production would not allow
you to deploy new pipelines into production quickly and easily, and could increase the complexity
and cost of the pipeline development and deployment. Rebuilding the source code is a process that
can recompile and repackage the source code into executable artifacts, such as container images and
pipeline files. Rebuilding the source code can help you incorporate any changes or updates that may
have occurred in the source code, and ensure that the artifacts are consistent and up-to-date.
However, setting up a CI/CD pipeline that builds and tests your source code and then deploys built
artifacts into a pre-production environment, after a successful pipeline run in the pre-production
environment, rebuilding the source code, and deploying the artifacts to production would not allow
you to deploy new pipelines into production quickly and easily, and could increase the complexity
and cost of the pipeline development and deployment. You would need to write code, create and run
the CI/CD pipeline, deploy the built artifacts to the pre-production environment, run the pipeline in
the pre-production environment, rebuild the source code, and deploy the artifacts to
production. Moreover, this option would increase the pipeline development and deployment time,
as rebuilding the source code can be a time-consuming and resource-intensive process1.
Reference:
Preparing for Google Cloud Certification: Machine Learning Engineer, Course 3: Production ML
Systems, Week 3: MLOps
Google Cloud Professional Machine Learning Engineer Exam Guide, Section 3: Scaling ML models in
production, 3.2 Automating ML workflows
Official Google Cloud Certified Professional Machine Learning Engineer Study Guide, Chapter 6:
Production ML Systems, Section 6.4: Automating ML Workflows
Cloud Build
Vertex AI Pipelines
Artifact Registry
Pre-production environment
Question # 8
While performing exploratory data analysis on a dataset, you find that an important categorical
feature has 5% null values. You want to minimize the bias that could result from the missing values.
How should you handle the missing values?
A. Remove the rows with missing values, and upsample your dataset by 5%. B. Replace the missing values with the features mean. C. Replace the missing values with a placeholder category indicating a missing value. D. Move the rows with missing values to your validation dataset.
Answer: C
Explanation:
The best option for handling missing values in a categorical feature is to replace them with a
placeholder category indicating a missing value. This is a type of imputation, which is a method of
estimating the missing values based on the observed data. Imputing the missing values with a
placeholder category preserves the information that the data is missing, and avoids introducing bias
or distortion in the feature distribution. It also allows the machine learning model to learn from the
missingness pattern, and potentially use it as a predictor for the target variable. The other options
are not suitable for handling missing values in a categorical feature, because:
Removing the rows with missing values and upsampling the dataset by 5% would reduce the size of
the dataset and potentially lose important information. It would also introduce sampling bias and
overfitting, as the upsampling process would create duplicate or synthetic observations that do not
reflect the true population.
Replacing the missing values with the features mean would not make sense for a categorical feature,
as the mean is a numerical measure that does not capture the mode or frequency of the categories.
It would also create a new category that does not exist in the original data, and might confuse the
machine learning model.
Moving the rows with missing values to the validation dataset would compromise the validity and
reliability of the model evaluation, as the validation dataset would not be representative of the test
or production data. It would also reduce the amount of data available for training the model, and
might introduce leakage or inconsistency between the training and validation datasets. Reference:
Imputation of missing values
Effective Strategies to Handle Missing Values in Data Analysis
How to Handle Missing Values of Categorical Variables?
Google Cloud launches machine learning engineer certification
Google Professional Machine Learning Engineer Certification
Professional ML Engineer Exam Guide
Preparing for Google Cloud Certification: Machine Learning Engineer Professional Certificate
Question # 9
You work for a bank with strict data governance requirements. You recently implemented a custommodel to detect fraudulent transactions You want your training code to download internal data byusing an API endpoint hosted in your projects network You need the data to be accessed in the mostsecure way, while mitigating the risk of data exfiltration. What should you do?
A. Enable VPC Service Controls for peerings, and add Vertex Al to a service perimeter B. Create a Cloud Run endpoint as a proxy to the data Use Identity and Access Management (1AM)authentication to secure access to the endpoint from the training job. C. Configure VPC Peering with Vertex Al and specify the network of the training job D. Download the data to a Cloud Storage bucket before calling the training job
Answer: A
Explanation:
The best option for accessing internal data in the most secure way, while mitigating the risk of data
exfiltration, is to enable VPC Service Controls for peerings, and add Vertex AI to a service perimeter.
This option allows you to leverage the power and simplicity of VPC Service Controls to isolate and
protect your data and services on Google Cloud. VPC Service Controls is a service that can create a
secure perimeter around your Google Cloud resources, such as BigQuery, Cloud Storage, and Vertex
AI. VPC Service Controls can help you prevent unauthorized access and data exfiltration from your
perimeter, and enforce fine-grained access policies based on context and identity. Peerings are
connections that can allow traffic to flow between different networks. Peerings can help you connect
your Google Cloud network with other Google Cloud networks or external networks, and enable
communication between your resources and services. By enabling VPC Service Controls for peerings,
you can allow your training code to download internal data by using an API endpoint hosted in your
projects network, and restrict the data transfer to only authorized networks and services. Vertex AI
is a unified platform for building and deploying machine learning solutions on Google Cloud. Vertex
AI can support various types of models, such as linear regression, logistic regression, k-means
clustering, matrix factorization, and deep neural networks. Vertex AI can also provide various tools
and services for data analysis, model development, model deployment, model monitoring, and
model governance. By adding Vertex AI to a service perimeter, you can isolate and protect your
Vertex AI resources, such as models, endpoints, pipelines, and feature store, and prevent data
exfiltration from your perimeter1.
The other options are not as good as option A, for the following reasons:
Option B: Creating a Cloud Run endpoint as a proxy to the data, and using Identity and Access
Management (IAM) authentication to secure access to the endpoint from the training job would
require more skills and steps than enabling VPC Service Controls for peerings, and adding Vertex AI
to a service perimeter. Cloud Run is a service that can run your stateless containers on a fully
managed environment or on your own Google Kubernetes Engine cluster. Cloud Run can help you
deploy and scale your containerized applications quickly and easily, and pay only for the resources
you use. A Cloud Run endpoint is a URL that can expose your containerized application to the
internet or to other Google Cloud services. A Cloud Run endpoint can help you access and invoke
your application from anywhere, and handle the load balancing and traffic routing. A proxy is a server
that can act as an intermediary between a client and a target server. A proxy can help you modify,
filter, or redirect the requests and responses between the client and the target server, and provide
additional functionality or security. IAM is a service that can manage access control for Google Cloud
resources. IAM can help you define who (identity) has what access (role) to which resource, and
enforce the access policies. By creating a Cloud Run endpoint as a proxy to the data, and using IAM
authentication to secure access to the endpoint from the training job, you can access internal data by
using an API endpoint hosted in your projects network, and restrict the data access to only
authorized identities and roles. However, creating a Cloud Run endpoint as a proxy to the data, and
using IAM authentication to secure access to the endpoint from the training job would require more
skills and steps than enabling VPC Service Controls for peerings, and adding Vertex AI to a service
perimeter. You would need to write code, create and configure the Cloud Run endpoint, implement
the proxy logic, deploy and monitor the Cloud Run endpoint, and set up the IAM policies. Moreover,
this option would not prevent data exfiltration from your network, as the Cloud Run endpoint can be
accessed from outside your network2.
Option C: Configuring VPC Peering with Vertex AI and specifying the network of the training job
would not allow you to access internal data by using an API endpoint hosted in your projects
network, and could cause errors or poor performance. VPC Peering is a service that can create a
peering connection between two VPC networks. VPC Peering can help you connect your Google
Cloud network with another Google Cloud network or an external network, and enable
communication between your resources and services. By configuring VPC Peering with Vertex AI and
specifying the network of the training job, you can allow your training code to access Vertex AI
resources, such as models, endpoints, pipelines, and feature store, and use the same network for the
training job. However, configuring VPC Peering with Vertex AI and specifying the network of the
training job would not allow you to access internal data by using an API endpoint hosted in your
projects network, and could cause errors or poor performance. You would need to write code,
create and configure the VPC Peering connection, and specify the network of the training
job. Moreover, this option would not isolate and protect your data and services on Google Cloud, as
the VPC Peering connection can expose your network to other networks and services3.
Option D: Downloading the data to a Cloud Storage bucket before calling the training job would not
allow you to access internal data by using an API endpoint hosted in your projects network, and
could increase the complexity and cost of the data access. Cloud Storage is a service that can store
and manage your data on Google Cloud. Cloud Storage can help you upload and organize your data,
and track the data versions and metadata. A Cloud Storage bucket is a container that can hold your
data on Cloud Storage. A Cloud Storage bucket can help you store and access your data from
anywhere, and provide various storage classes and options. By downloading the data to a Cloud
Storage bucket before calling the training job, you can access the data from Cloud Storage, and use it
as the input for the training job. However, downloading the data to a Cloud Storage bucket before
calling the training job would not allow you to access internal data by using an API endpoint hosted
in your projects network, and could increase the complexity and cost of the data access. You would
need to write code, create and configure the Cloud Storage bucket, download the data to the Cloud
Storage bucket, and call the training job. Moreover, this option would create an intermediate data
source on Cloud Storage, which can increase the storage and transfer costs, and expose the data to
unauthorized access or data exfiltration4.
Reference:
Preparing for Google Cloud Certification: Machine Learning Engineer, Course 3: Production ML
Systems, Week 1: Data Engineering
Google Cloud Professional Machine Learning Engineer Exam Guide, Section 1: Framing ML problems,
1.2 Defining data needs
Official Google Cloud Certified Professional Machine Learning Engineer Study Guide, Chapter 2: Data
Engineering, Section 2.2: Defining Data Needs
VPC Service Controls
Cloud Run
VPC Peering
Cloud Storage
Question # 10
You are training an object detection model using a Cloud TPU v2. Training time is taking longer than
expected. Based on this simplified trace obtained with a Cloud TPU profile, what action should you
take to decrease training time in a cost-efficient way?
A. Move from Cloud TPU v2 to Cloud TPU v3 and increase batch size. B. Move from Cloud TPU v2 to 8 NVIDIA V100 GPUs and increase batch size. C. Rewrite your input function to resize and reshape the input images. D. Rewrite your input function using parallel reads, parallel processing, and prefetch.
Answer: D
Explanation:
The trace in the question shows that the training time is taking longer than expected. This is likely
due to the input function not being optimized. To decrease training time in a cost-efficient way, the
best option is to rewrite the input function using parallel reads, parallel processing, and prefetch.
This will allow the model to process the data more efficiently and decrease training time. Reference:
[Cloud TPU Performance Guide]
[Data input pipeline performance guide]
Question # 11
You are deploying a new version of a model to a production Vertex Al endpoint that is serving trafficYou plan to direct all user traffic to the new model You need to deploy the model with minimaldisruption to your application What should you do?
A. 1 Create a new endpoint.2 Create a new model Set it as the default version Upload the model to Vertex Al Model Registry.3. Deploy the new model to the new endpoint.4 Update Cloud DNS to point to the new endpoint B. 1. Create a new endpoint.2. Create a new model Set the parentModel parameter to the model ID of the currently deployedmodel and set it as the default version Upload the model to Vertex Al Model Registry3. Deploy the new model to the new endpoint and set the new model to 100% of the traffic C. 1 Create a new model Set the parentModel parameter to the model ID of the currently deployedmodel Upload the model to Vertex Al Model Registry.2 Deploy the new model to the existing endpoint and set the new model to 100% of the traffic. D. 1, Create a new model Set it as the default version Upload the model to Vertex Al Model Registry2 Deploy the new model to the existing endpoint
Answer: C
Explanation:
The best option for deploying a new version of a model to a production Vertex AI endpoint that is
serving traffic, directing all user traffic to the new model, and deploying the model with minimal
disruption to your application, is to create a new model, set the parentModel parameter to the
model ID of the currently deployed model, upload the model to Vertex AI Model Registry, deploy the
new model to the existing endpoint, and set the new model to 100% of the traffic. This option allows
you to leverage the power and simplicity of Vertex AI to update your model version and serve online
predictions with low latency. Vertex AI is a unified platform for building and deploying machine
learning solutions on Google Cloud. Vertex AI can deploy a trained model to an online prediction
endpoint, which can provide low-latency predictions for individual instances. A model is a resource
that represents a machine learning model that you can use for prediction. A model can have one or
more versions, which are different implementations of the same model. A model version can have
different parameters, code, or data than another version of the same model. A model version can
help you experiment and iterate on your model, and improve the model performance and accuracy.
A parentModel parameter is a parameter that specifies the model ID of the model that the new
model version is based on. A parentModel parameter can help you inherit the settings and metadata
of the existing model, and avoid duplicating the model configuration. Vertex AI Model Registry is a
service that can store and manage your machine learning models on Google Cloud. Vertex AI Model
Registry can help you upload and organize your models, and track the model versions and metadata.
An endpoint is a resource that provides the service endpoint (URL) you use to request the prediction.
An endpoint can have one or more deployed models, which are instances of model versions that are
associated with physical resources. A deployed model can help you serve online predictions with low
latency, and scale up or down based on the traffic. By creating a new model, setting the parentModel
parameter to the model ID of the currently deployed model, uploading the model to Vertex AI Model
Registry, deploying the new model to the existing endpoint, and setting the new model to 100% of
the traffic, you can deploy a new version of a model to a production Vertex AI endpoint that is
serving traffic, direct all user traffic to the new model, and deploy the model with minimal disruption
to your application1.
The other options are not as good as option C, for the following reasons:
Option A: Creating a new endpoint, creating a new model, setting it as the default version, uploading
the model to Vertex AI Model Registry, deploying the new model to the new endpoint, and updating
Cloud DNS to point to the new endpoint would require more skills and steps than creating a new
model, setting the parentModel parameter to the model ID of the currently deployed model,
uploading the model to Vertex AI Model Registry, deploying the new model to the existing endpoint,
and setting the new model to 100% of the traffic. Cloud DNS is a service that can provide reliable and
scalable Domain Name System (DNS) services on Google Cloud. Cloud DNS can help you manage
your DNS records, and resolve domain names to IP addresses. By updating Cloud DNS to point to the
new endpoint, you can redirect the user traffic to the new endpoint, and avoid breaking the existing
application. However, creating a new endpoint, creating a new model, setting it as the default
version, uploading the model to Vertex AI Model Registry, deploying the new model to the new
endpoint, and updating Cloud DNS to point to the new endpoint would require more skills and steps
than creating a new model, setting the parentModel parameter to the model ID of the currently
deployed model, uploading the model to Vertex AI Model Registry, deploying the new model to the
existing endpoint, and setting the new model to 100% of the traffic. You would need to write code,
create and configure the new endpoint, create and configure the new model, upload the model to
Vertex AI Model Registry, deploy the model to the new endpoint, and update Cloud DNS to point to
the new endpoint. Moreover, this option would create a new endpoint, which can increase the
maintenance and management costs2.
Option B: Creating a new endpoint, creating a new model, setting the parentModel parameter to the
model ID of the currently deployed model and setting it as the default version, uploading the model
to Vertex AI Model Registry, and deploying the new model to the new endpoint and setting the new
model to 100% of the traffic would require more skills and steps than creating a new model, setting
the parentModel parameter to the model ID of the currently deployed model, uploading the model
to Vertex AI Model Registry, deploying the new model to the existing endpoint, and setting the new
model to 100% of the traffic. A parentModel parameter is a parameter that specifies the model ID of
the model that the new model version is based on. A parentModel parameter can help you inherit
the settings and metadata of the existing model, and avoid duplicating the model configuration. A
default version is a model version that is used for prediction when no other version is specified. A
default version can help you simplify the prediction request, and avoid specifying the model version
every time. By setting the parentModel parameter to the model ID of the currently deployed model
and setting it as the default version, you can create a new model that is based on the existing model,
and use it for prediction without specifying the model version. However, creating a new endpoint,
creating a new model, setting the parentModel parameter to the model ID of the currently deployed
model and setting it as the default version, uploading the model to Vertex AI Model Registry, and
deploying the new model to the new endpoint and setting the new model to 100% of the traffic
would require more skills and steps than creating a new model, setting the parentModel parameter
to the model ID of the currently deployed model, uploading the model to Vertex AI Model Registry,
deploying the new model to the existing endpoint, and setting the new model to 100% of the traffic.
You would need to write code, create and configure the new endpoint, create and configure the new
model, upload the model to Vertex AI Model Registry, and deploy the model to the new
endpoint. Moreover, this option would create a new endpoint, which can increase the maintenance
and management costs2.
Option D: Creating a new model, setting it as the default version, uploading the model to Vertex AI
Model Registry, and deploying the new model to the existing endpoint would not allow you to inherit
the settings and metadata of the existing model, and could cause errors or poor performance. A
default version is a model version that is used for prediction when no other version is specified. A
default version can help you simplify the prediction request, and avoid specifying the model version
every time. By setting the new model as the default version, you can use the new model for
prediction without specifying the model version. However, creating a new model, setting it as the
default version, uploading the model to Vertex AI Model Registry, and deploying the new model to
the existing endpoint would not allow you to inherit the settings and metadata of the existing model,
and could cause errors or poor performance. You would need to write code, create and configure the
new model, upload the model to Vertex AI Model Registry, and deploy the model to the existing
endpoint. Moreover, this option would not set the parentModel parameter to the model ID of the
currently deployed model, which could prevent you from inheriting the settings and metadata of the
existing model, and cause inconsistencies or conflicts between the model versions2.
Reference:
Preparing for Google Cloud Certification: Machine Learning Engineer, Course 3: Production ML
Systems, Week 2: Serving ML Predictions
Google Cloud Professional Machine Learning Engineer Exam Guide, Section 3: Scaling ML models in
production, 3.1 Deploying ML models to production
Official Google Cloud Certified Professional Machine Learning Engineer Study Guide, Chapter 6:
Production ML Systems, Section 6.2: Serving ML Predictions
Vertex AI
Cloud DNS
Question # 12
You manage a team of data scientists who use a cloud-based backend system to submit training jobs.
This system has become very difficult to administer, and you want to use a managed service instead.
The data scientists you work with use many different frameworks, including Keras, PyTorch, theano,
scikit-learn, and custom libraries. What should you do?
A. Use the Vertex AI Training to submit training jobs using any framework. B. Configure Kubeflow to run on Google Kubernetes Engine and submit training jobs through TFJob. C. Create a library of VM images on Compute Engine, and publish these images on a centralized
repository. D. Set up Slurm workload manager to receive jobs that can be scheduled to run on your cloud
infrastructure.
Answer: A
Explanation:
The best option for using a managed service to submit training jobs with different frameworks is to
use Vertex AI Training. Vertex AI Training is a fully managed service that allows you to train custom
models on Google Cloud using any framework, such as TensorFlow, PyTorch, scikit-learn, XGBoost,
etc. You can also use custom containers to run your own libraries and dependencies. Vertex AI
Training handles the infrastructure provisioning, scaling, and monitoring for you, so you can focus on
your model development and optimization. Vertex AI Training also integrates with other Vertex AI
services, such as Vertex AI Pipelines, Vertex AI Experiments, and Vertex AI Prediction. The other
options are not as suitable for using a managed service to submit training jobs with different
frameworks, because:
Configuring Kubeflow to run on Google Kubernetes Engine and submit training jobs through TFJob
would require more infrastructure maintenance, as Kubeflow is not a fully managed service, and you
would have to provision and manage your own Kubernetes cluster. This would also incur more costs,
as you would have to pay for the cluster resources, regardless of the training job usage. TFJob is also
mainly designed for TensorFlow models, and might not support other frameworks as well as Vertex
AI Training.
Creating a library of VM images on Compute Engine, and publishing these images on a centralized
repository would require more development time and effort, as you would have to create and
maintain different VM images for different frameworks and libraries. You would also have to
manually configure and launch the VMs for each training job, and handle the scaling and monitoring
yourself. This would not leverage the benefits of a managed service, such as Vertex AI Training.
Setting up Slurm workload manager to receive jobs that can be scheduled to run on your cloud
infrastructure would require more configuration and administration, as Slurm is not a native Google
Cloud service, and you would have to install and manage it on your own VMs or clusters. Slurm is
also a general-purpose workload manager, and might not have the same level of integration and
optimization for ML frameworks and libraries as Vertex AI Training. Reference:
Vertex AI Training | Google Cloud
Kubeflow on Google Cloud | Google Cloud
TFJob for training TensorFlow models with Kubernetes | Kubeflow
Compute Engine | Google Cloud
Slurm Workload Manager
Question # 13
You are training an ML model on a large dataset. You are using a TPU to accelerate the trainingprocess You notice that the training process is taking longer than expected. You discover that the TPUis not reaching its full capacity. What should you do?
A. Increase the learning rate B. Increase the number of epochs C. Decrease the learning rate D. Increase the batch size
Answer: D
Explanation:
The best option for training an ML model on a large dataset, using a TPU to accelerate the training
process, and discovering that the TPU is not reaching its full capacity, is to increase the batch size.
This option allows you to leverage the power and simplicity of TPUs to train your model faster and
more efficiently. A TPU is a custom-developed application-specific integrated circuit (ASIC) that can
accelerate machine learning workloads. A TPU can provide high performance and scalability for
various types of models, such as linear regression, logistic regression, k-means clustering, matrix
factorization, and deep neural networks. A TPU can also support various tools and frameworks, such
as TensorFlow, PyTorch, and JAX. A batch size is a parameter that specifies the number of training
examples in one forward/backward pass. A batch size can affect the speed and accuracy of the
training process. A larger batch size can help you utilize the parallel processing power of the TPU, and
reduce the communication overhead between the TPU and the host CPU. A larger batch size can also
help you avoid overfitting, as it can reduce the variance of the gradient updates. By increasing the
batch size, you can train your model on a large dataset faster and more efficiently, and make full use
of the TPU capacity1.
The other options are not as good as option D, for the following reasons:
Option A: Increasing the learning rate would not help you utilize the parallel processing power of the
TPU, and could cause errors or poor performance. A learning rate is a parameter that controls how
much the model is updated in each iteration. A learning rate can affect the speed and accuracy of the
training process. A larger learning rate can help you converge faster, but it can also cause instability,
divergence, or oscillation. By increasing the learning rate, you may not be able to find the optimal
solution, and your model may perform poorly on the validation or test data2.
Option B: Increasing the number of epochs would not help you utilize the parallel processing power
of the TPU, and could increase the complexity and cost of the training process. An epoch is a measure
of the number of times all of the training examples are used once in the training process. An epoch
can affect the speed and accuracy of the training process. A larger number of epochs can help you
learn more from the data, but it can also cause overfitting, underfitting, or diminishing returns. By
increasing the number of epochs, you may not be able to improve the model performance
significantly, and your training process may take longer and consume more resources3.
Option C: Decreasing the learning rate would not help you utilize the parallel processing power of the
TPU, and could slow down the training process. A learning rate is a parameter that controls how
much the model is updated in each iteration. A learning rate can affect the speed and accuracy of the
training process. A smaller learning rate can help you find a more precise solution, but it can also
cause slow convergence or local minima. By decreasing the learning rate, you may not be able to
reach the optimal solution in a reasonable time, and your training process may take longer2.
Reference:
Preparing for Google Cloud Certification: Machine Learning Engineer, Course 2: ML Models and
Architectures, Week 1: Introduction to ML Models and Architectures
Google Cloud Professional Machine Learning Engineer Exam Guide, Section 2: Architecting ML
solutions, 2.1 Designing ML models
Official Google Cloud Certified Professional Machine Learning Engineer Study Guide, Chapter 4: ML
Models and Architectures, Section 4.1: Designing ML Models
Use TPUs
Triose phosphate utilization and beyond: from photosynthesis to end ¦
Cloud TPU performance guide
Google TPU: Architecture and Performance Best Practices - Run
Question # 14
You are an ML engineer responsible for designing and implementing training pipelines for ML
models. You need to create an end-to-end training pipeline for a TensorFlow model. The TensorFlow
model will be trained on several terabytes of structured dat
a. You need the pipeline to include data quality checks before training and model quality checks after
training but prior to deployment. You want to minimize development time and the need for
infrastructure maintenance. How should you build and orchestrate your training pipeline?
A. Create the pipeline using Kubeflow Pipelines domain-specific language (DSL) and predefined
Google Cloud components. Orchestrate the pipeline using Vertex AI Pipelines. B. Create the pipeline using TensorFlow Extended (TFX) and standard TFX components. Orchestrate
the pipeline using Vertex AI Pipelines. C. Create the pipeline using Kubeflow Pipelines domain-specific language (DSL) and predefined
Google Cloud components. Orchestrate the pipeline using Kubeflow Pipelines deployed on Google
Kubernetes Engine D. Create the pipeline using TensorFlow Extended (TFX) and standard TFX components. Orchestrate
the pipeline using Kubeflow Pipelines deployed on Google Kubernetes Engine.
Answer: B
Explanation:
The best option for creating and orchestrating an end-to-end training pipeline for a TensorFlow
model is to use TensorFlow Extended (TFX) and standard TFX components, and deploy the pipeline to
Vertex AI Pipelines. TFX is an end-to-end platform for deploying production ML pipelines, which
consists of several built-in components that cover the entire ML lifecycle, from data ingestion and
validation, to model training and evaluation, to model deployment and monitoring. TFX also
supports custom components and integrations with other Google Cloud services, such as BigQuery,
Dataflow, and Cloud Storage. Vertex AI Pipelines is a fully managed service that allows you to run TFX
pipelines on Google Cloud, without having to worry about infrastructure provisioning, scaling, or
maintenance. Vertex AI Pipelines also provides a user-friendly interface to monitor and manage your
pipelines, as well as tools to track and compare experiments. The other options are not as suitable
for creating and orchestrating an end-to-end training pipeline for a TensorFlow model, because:
Creating the pipeline using Kubeflow Pipelines domain-specific language (DSL) and predefined
Google Cloud components would require more development time and effort, as Kubeflow Pipelines
DSL is not as expressive or compatible with TensorFlow as TFX. Predefined Google Cloud components
might not cover all the stages of the ML lifecycle, and might not be optimized for TensorFlow models.
Orchestrating the pipeline using Kubeflow Pipelines deployed on Google Kubernetes Engine would
require more infrastructure maintenance, as Kubeflow Pipelines is not a fully managed service, and
you would have to provision and manage your own Kubernetes cluster. This would also incur more
costs, as you would have to pay for the cluster resources, regardless of the pipeline usage. Reference:
TFX | ML Production Pipelines | TensorFlow
Vertex AI Pipelines | Google Cloud
Kubeflow Pipelines | Google Cloud
Google Cloud launches machine learning engineer certification
Google Professional Machine Learning Engineer Certification
Professional ML Engineer Exam Guide
Question # 15
You are developing an ML model to predict house prices. While preparing the data, you discover that
an important predictor variable, distance from the closest school, is often missing and does not have
high variance. Every instance (row) in your data is important. How should you handle the missing
data?
A. Delete the rows that have missing values. B. Apply feature crossing with another column that does not have missing values. C. Predict the missing values using linear regression. D. Replace the missing values with zeros.
Answer: C
Explanation:
The best option for handling missing data in this case is to predict the missing values using linear
regression. Linear regression is a supervised learning technique that can be used to estimate the
relationship between a continuous target variable and one or more predictor variables. In this case,
the target variable is the distance from the closest school, and the predictor variables are the other
features in the dataset, such as house size, location, number of rooms, etc. By fitting a linear
regression model on the data that has no missing values, we can then use the model to predict the
missing values for the distance from the closest school feature. This way, we can preserve all the
instances in the dataset and avoid introducing bias or reducing variance. The other options are not
suitable for handling missing data in this case, because:
Deleting the rows that have missing values would reduce the size of the dataset and potentially lose
important information. Since every instance is important, we want to keep as much data as possible.
Applying feature crossing with another column that does not have missing values would create a
new feature that combines the values of two existing features. This might increase the complexity of
the model and introduce noise or multicollinearity. It would not solve the problem of missing values,
as the new feature would still have missing values whenever the distance from the closest school
feature is missing.
Replacing the missing values with zeros would distort the distribution of the feature and introduce
bias. It would also imply that the houses with missing values are located at the same distance from
the closest school, which is unlikely to be true. A zero value might also be outside the range of the
feature, as the distance from the closest school is unlikely to be exactly zero for any
house. Reference:
Linear Regression
Imputation of missing values
Google Cloud launches machine learning engineer certification
Google Professional Machine Learning Engineer Certification
Professional ML Engineer Exam Guide
Preparing for Google Cloud Certification: Machine Learning Engineer Professional Certificate
Question # 16
You recently built the first version of an image segmentation model for a self-driving car. After
deploying the model, you observe a decrease in the area under the curve (AUC) metric. When
analyzing the video recordings, you also discover that the model fails in highly congested traffic but
works as expected when there is less traffic. What is the most likely reason for this result?
A. The model is overfitting in areas with less traffic and underfitting in areas with more traffic. B. AUC is not the correct metric to evaluate this classification model. C. Too much data representing congested areas was used for model training. D. Gradients become small and vanish while backpropagating from the output to input nodes.
Answer: A
Explanation:
The most likely reason for the observed result is that the model is overfitting in areas with less traffic
and underfitting in areas with more traffic. Overfitting means that the model learns the specific
patterns and noise in the training data, but fails to generalize well to new and unseen data.
Underfitting means that the model is not able to capture the complexity and variability of the data,
and performs poorly on both training and test data. In this case, the model might have learned to
segment the images well when there is less traffic, but it might not have enough data or features to
handle the more challenging scenarios when there is more traffic. This could lead to a decrease in
the AUC metric, which measures the ability of the model to distinguish between different classes.
AUC is a suitable metric for this classification model, as it is not affected by class imbalance or
threshold selection. The other options are not likely to be the reason for the result, as they are not
related to the traffic density. Too much data representing congested areas would not cause the
model to fail in those areas, but rather help the model learn better. Gradients vanishing or exploding
is a problem that occurs during the training process, not after the deployment, and it affects the
whole model, not specific scenarios. Reference:
Image Segmentation: U-Net For Self Driving Cars
Intelligent Semantic Segmentation for Self-Driving Vehicles Using Deep Learning
Sharing Pixelopolis, a self-driving car demo from Google I/O built with TensorFlow Lite
Google Cloud launches machine learning engineer certification
Google Professional Machine Learning Engineer Certification
Professional ML Engineer Exam Guide
Preparing for Google Cloud Certification: Machine Learning Engineer Professional Certificate
Question # 17
You work for a company that is developing a new video streaming platform. You have been asked to
create a recommendation system that will suggest the next video for a user to watch. After a review
by an AI Ethics team, you are approved to start development. Each video asset in your companys
catalog has useful metadata (e.g., content type, release date, country), but you do not have any
historical user event dat
a. How should you build the recommendation system for the first version of the product?
A. Launch the product without machine learning. Present videos to users alphabetically, and start
collecting user event data so you can develop a recommender model in the future. B. Launch the product without machine learning. Use simple heuristics based on content metadata to
recommend similar videos to users, and start collecting user event data so you can develop a
recommender model in the future. C. Launch the product with machine learning. Use a publicly available dataset such as MovieLens to
train a model using the Recommendations AI, and then apply this trained model to your data. D. Launch the product with machine learning. Generate embeddings for each video by training an
autoencoder on the content metadata using TensorFlow. Cluster content based on the similarity of
these embeddings, and then recommend videos from the same cluster.
Answer: B
Explanation:
The best option for building a recommendation system without any user event data is to use simple
heuristics based on content metadata. This is a type of content-based filtering, which recommends
items that are similar to the ones that the user has interacted with or selected, based on their
attributes. For example, if a user selects a comedy movie from the US released in 2020, the system
can recommend other comedy movies from the US released in 2020 or nearby years. This approach
does not require any machine learning, but it can leverage the existing metadata of the videos to
provide relevant recommendations. It also allows the system to start collecting user event data, such
as views, likes, ratings, etc., which can be used to train a more sophisticated machine learning model
in the future, such as a collaborative filtering model or a hybrid model that combines content and
collaborative information. Reference:
Recommendation Systems
Content-Based Filtering
Collaborative Filtering
Hybrid Recommender Systems: A Systematic Literature Review
Question # 18
One of your models is trained using data provided by a third-party data broker. The data broker does
not reliably notify you of formatting changes in the dat
a. You want to make your model training pipeline more robust to issues like this. What should you
do?
A. Use TensorFlow Data Validation to detect and flag schema anomalies. B. Use TensorFlow Transform to create a preprocessing component that will normalize data to the expected distribution, and replace values that dont match the schema with 0.
C. Use tf.math to analyze the data, compute summary statistics, and flag statistical anomalies. D. Use custom TensorFlow functions at the start of your model training to detect and flag known
formatting errors.
Answer: A
Explanation:
TensorFlow Data Validation (TFDV) is a library that helps you understand, validate, and monitor your
data for machine learning. It can automatically detect and report schema anomalies, such as missing
features, new features, or different data types, in your data. It can also generate descriptive statistics
and data visualizations to help you explore and debug your data. TFDV can be integrated with your
model training pipeline to ensure data quality and consistency throughout the machine learning
lifecycle. Reference:
TensorFlow Data Validation
Data Validation | TensorFlow
Data Validation | Machine Learning Crash Course | Google Developers
Question # 19
You have developed a BigQuery ML model that predicts customer churn and deployed the model toVertex Al Endpoints. You want to automate the retraining of your model by using minimal additionalcode when model feature values change. You also want to minimize the number of times that yourmodel is retrained to reduce training costs. What should you do?
A. 1. Enable request-response logging on Vertex Al Endpoints.2 Schedule a TensorFlow Data Validation job to monitor prediction drift3. Execute model retraining if there is significant distance between the distributions. B. 1. Enable request-response logging on Vertex Al Endpoints2. Schedule a TensorFlow Data Validation job to monitor training/serving skew3. Execute model retraining if there is significant distance between the distributions C. 1 Create a Vertex Al Model Monitoring job configured to monitor prediction drift.2. Configure alert monitoring to publish a message to a Pub/Sub queue when a monitonng alert isdetected.3. Use a Cloud Function to monitor the Pub/Sub queue, and trigger retraining in BigQuery D. 1. Create a Vertex Al Model Monitoring job configured to monitor training/serving skew2. Configure alert monitoring to publish a message to a Pub/Sub queue when a monitoring alertis detected3. Use a Cloud Function to monitor the Pub/Sub queue, and trigger retraining in BigQuery.
Answer: C
Explanation:
The best option for automating the retraining of your model by using minimal additional code when
model feature values change, and minimizing the number of times that your model is retrained to
reduce training costs, is to create a Vertex AI Model Monitoring job configured to monitor prediction
drift, configure alert monitoring to publish a message to a Pub/Sub queue when a monitoring alert is
detected, and use a Cloud Function to monitor the Pub/Sub queue, and trigger retraining in
BigQuery. This option allows you to leverage the power and simplicity of Vertex AI, Pub/Sub, and
Cloud Functions to monitor your model performance and retrain your model when needed. Vertex AI
is a unified platform for building and deploying machine learning solutions on Google Cloud. Vertex
AI can deploy a trained model to an online prediction endpoint, which can provide low-latency
predictions for individual instances. Vertex AI can also provide various tools and services for data
analysis, model development, model deployment, model monitoring, and model governance. A
Vertex AI Model Monitoring job is a resource that can monitor the performance and quality of your
deployed models on Vertex AI. A Vertex AI Model Monitoring job can help you detect and diagnose
issues with your models, such as data drift, prediction drift, training/serving skew, or model
staleness. Prediction drift is a type of model monitoring metric that measures the difference
between the distributions of the predictions generated by the model on the training data and the
predictions generated by the model on the online data. Prediction drift can indicate that the model
performance is degrading, or that the online data is changing over time. By creating a Vertex AI
Model Monitoring job configured to monitor prediction drift, you can track the changes in the model
predictions, and compare them with the expected predictions. Alert monitoring is a feature of Vertex
AI Model Monitoring that can notify you when a monitoring metric exceeds a predefined threshold.
Alert monitoring can help you set up rules and conditions for triggering alerts, and choose the
notification channel for receiving alerts. Pub/Sub is a service that can provide reliable and scalable
messaging and event streaming on Google Cloud. Pub/Sub can help you publish and subscribe to
messages, and deliver them to various Google Cloud services, such as Cloud Functions. A Pub/Sub
queue is a resource that can hold messages that are published to a Pub/Sub topic. A Pub/Sub queue
can help you store and manage messages, and ensure that they are delivered to the subscribers. By
configuring alert monitoring to publish a message to a Pub/Sub queue when a monitoring alert is
detected, you can send a notification to a Pub/Sub topic, and trigger a downstream action based on
the alert. Cloud Functions is a service that can run your stateless code in response to events on
Google Cloud. Cloud Functions can help you create and execute functions without provisioning or
managing servers, and pay only for the resources you use. A Cloud Function is a resource that can
execute a piece of code in response to an event, such as a Pub/Sub message. A Cloud Function can
help you perform various tasks, such as data processing, data transformation, or data analysis.
BigQuery is a service that can store and query large-scale data on Google Cloud. BigQuery can help
you analyze your data by using SQL queries, and perform various tasks, such as data exploration, data
transformation, or data visualization. BigQuery ML is a feature of BigQuery that can create and
execute machine learning models in BigQuery by using SQL queries. BigQuery ML can help you build
and train various types of models, such as linear regression, logistic regression, k-means clustering,
matrix factorization, and deep neural networks. By using a Cloud Function to monitor the Pub/Sub
queue, and trigger retraining in BigQuery, you can automate the retraining of your model by using
minimal additional code when model feature values change. You can write a Cloud Function that
listens to the Pub/Sub queue, and executes a SQL query to retrain your model in BigQuery ML when
a prediction drift alert is received. By retraining your model in BigQuery ML, you can update your
model parameters and improve your model performance and accuracy1.
The other options are not as good as option C, for the following reasons:
Option A: Enabling request-response logging on Vertex AI Endpoints, scheduling a TensorFlow Data
Validation job to monitor prediction drift, and executing model retraining if there is significant
distance between the distributions would require more skills and steps than creating a Vertex AI
Model Monitoring job configured to monitor prediction drift, configuring alert monitoring to publish
a message to a Pub/Sub queue when a monitoring alert is detected, and using a Cloud Function to
monitor the Pub/Sub queue, and trigger retraining in BigQuery. Request-response logging is a
feature of Vertex AI Endpoints that can record the requests and responses that are sent to and from
the online prediction endpoint. Request-response logging can help you collect and analyze the online
prediction data, and troubleshoot any issues with your model. TensorFlow Data Validation is a tool
that can analyze and validate your data for machine learning. TensorFlow Data Validation can help
you explore, understand, and clean your data, and detect various data issues, such as data drift, data
skew, or data anomalies. Prediction drift is a type of data issue that measures the difference between
the distributions of the predictions generated by the model on the training data and the predictions
generated by the model on the online data. Prediction drift can indicate that the model performance
is degrading, or that the online data is changing over time. By enabling request-response logging on
Vertex AI Endpoints, and scheduling a TensorFlow Data Validation job to monitor prediction drift, you
can collect and analyze the online prediction data, and compare the distributions of the predictions.
However, enabling request-response logging on Vertex AI Endpoints, scheduling a TensorFlow Data
Validation job to monitor prediction drift, and executing model retraining if there is significant
distance between the distributions would require more skills and steps than creating a Vertex AI
Model Monitoring job configured to monitor prediction drift, configuring alert monitoring to publish
a message to a Pub/Sub queue when a monitoring alert is detected, and using a Cloud Function to
monitor the Pub/Sub queue, and trigger retraining in BigQuery. You would need to write code,
enable and configure the request-response logging, create and run the TensorFlow Data Validation
job, define and measure the distance between the distributions, and execute the model
retraining. Moreover, this option would not automate the retraining of your model, as you would
need to manually check the prediction drift and trigger the retraining2.
Option B: Enabling request-response logging on Vertex AI Endpoints, scheduling a TensorFlow Data
Validation job to monitor training/serving skew, and executing model retraining if there is significant
distance between the distributions would not help you monitor the changes in the model feature
values, and could cause errors or poor performance. Training/serving skew is a type of data issue that
measures the difference between the distributions of the features used to train the model and the
features used to serve the model. Training/serving skew can indicate that the model is not trained on
the representative data, or that the data is changing over time. By enabling request-response logging
on Vertex AI Endpoints, and scheduling a TensorFlow Data Validation job to monitor training/serving
skew, you can collect and analyze the online prediction data, and compare the distributions of the
features. However, enabling request-response logging on Vertex AI Endpoints, scheduling a
TensorFlow Data Validation job to monitor training/serving skew, and executing model retraining if
there is significant distance between the distributions would not help you monitor the changes in the
model feature values, and could cause errors or poor performance. You would need to write code,
enable and configure the request-response logging, create and run the TensorFlow Data Validation
job, define and measure the distance between the distributions, and execute the model
retraining. Moreover, this option would not monitor the prediction drift, which is a more direct and
relevant metric for measuring the model performance and quality2.
Option D: Creating a Vertex AI Model Monitoring job configured to monitor training/serving skew,
configuring alert monitoring to publish a message to a Pub/Sub queue when a monitoring alert is
detected, and using a Cloud Function to monitor the Pub/Sub queue, and trigger retraining in
BigQuery would not help you monitor the changes in the model feature values, and could cause
errors or poor performance. Training/serving skew is a type of data issue that measures the
difference between the distributions of the features used to train the model and the features used to
serve the model. Training/serving skew can indicate that the model is not trained on the
representative data, or that the data is changing over time. By creating a Vertex AI Model Monitoring
job configured to monitor training/serving skew, you can track the changes in the model features,
and compare them with the expected features. However, creating a Vertex AI Model Monitoring job
configured to monitor training/serving skew, configuring alert monitoring to publish a message to a
Pub/Sub queue when a monitoring alert is detected, and using a Cloud Function to monitor the
Pub/Sub queue, and trigger retraining in BigQuery would not help you monitor the changes in the
model feature values, and could cause errors or poor performance. You would need to write code,
create and configure the Vertex AI Model Monitoring job, configure the alert monitoring, create and
configure the Pub/Sub queue, and write a Cloud Function to trigger the retraining. Moreover, this
option would not monitor the prediction drift, which is a more direct and relevant metric for
measuring the model performance and quality1.
Reference:
Preparing for Google Cloud Certification: Machine Learning Engineer, Course 3: Production ML
Systems, Week 4: ML Governance
Google Cloud Professional Machine Learning Engineer Exam Guide, Section 3: Scaling ML models in
production
Question # 20
You work for a company that provides an anti-spam service that flags and hides spam posts on social
media platforms. Your company currently uses a list of 200,000 keywords to identify suspected spam
posts. If a post contains more than a few of these keywords, the post is identified as spam. You want
to start using machine learning to flag spam posts for human review. What is the main advantage of
implementing machine learning for this business case?
A. Posts can be compared to the keyword list much more quickly. B. New problematic phrases can be identified in spam posts. C. A much longer keyword list can be used to flag spam posts. D. Spam posts can be flagged using far fewer keywords.
Answer: B
Explanation:
The main advantage of implementing machine learning for this business case is that new
problematic phrases can be identified in spam posts. This is because machine learning can learn from
the data and the feedback, and adapt to the changing patterns and trends of spam posts. Machine
learning can also capture the semantic and contextual meaning of the posts, and not just rely on the
presence or absence of keywords. By using machine learning, you can improve the accuracy and
coverage of your anti-spam service, and detect new and emerging types of spam posts that may not
be captured by the keyword list.
The other options are not advantages of implementing machine learning for this business case for
the following reasons:
A) Posts can be compared to the keyword list much more quickly is not an advantage, as it does not
improve the quality or effectiveness of the anti-spam service. It only improves the efficiency of the
service, which is not the primary objective. Moreover, machine learning may not necessarily be
faster than the keyword list, depending on the complexity and size of the model and the data.
C) A much longer keyword list can be used to flag spam posts is not an advantage, as it does not
address the limitations or challenges of the keyword list approach. It only increases the size and
complexity of the keyword list, which can make it harder to maintain and update. Moreover, a longer
keyword list may not improve the accuracy or coverage of the anti-spam service, as it may introduce
more false positives or false negatives, or miss new and emerging types of spam posts.
D) Spam posts can be flagged using far fewer keywords is not an advantage, as it does not reflect the
capabilities or benefits of machine learning. It only reduces the size and complexity of the keyword
list, which can make it easier to maintain and update. However, using fewer keywords may not
improve the accuracy or coverage of the anti-spam service, as it may lose some information or
meaning of the posts, or miss some types of spam posts.
Reference:
Professional ML Engineer Exam Guide
Preparing for Google Cloud Certification: Machine Learning Engineer Professional Certificate
Google Cloud launches machine learning engineer certification
Machine Learning for Spam Detection
Spam Detection Using Machine Learning
Question # 21
You have been tasked with deploying prototype code to production. The feature engineering code isin PySpark and runs on Dataproc Serverless. The model training is executed by using a Vertex Alcustom training job. The two steps are not connected, and the model training must currently be runmanually after the feature engineering step finishes. You need to create a scalable and maintainableproduction process that runs end-to-end and tracks the connections between steps. What should youdo?
A. Create a Vertex Al Workbench notebook Use the notebook to submit the Dataproc Serverlessfeature engineering job Use the same notebook to submit the custom model training job Run thenotebook cells sequentially to tie the steps together end-to-end B. Create a Vertex Al Workbench notebook Initiate an Apache Spark context in the notebook, and runthe PySpark feature engineering code Use the same notebook to run the custom model training jobin TensorFlow Run the notebook cells sequentially to tie the steps together end-to-end C. Use the Kubeflow pipelines SDK to write code that specifies two components - The first is a Dataproc Serverless component that launches the feature engineering job - The second is a custom component wrapped in thecreare_cusrora_rraining_job_from_ccraponent Utility that launches the custom model trainingjob. D. Create a Vertex Al Pipelines job to link and run both components Use the Kubeflow pipelines SDKto write code that specifies two components - The first component initiates an Apache Spark context that runs the PySpark feature engineering code - The second component runs the TensorFlow custom model training code Create a Vertex Al Pipelines job to link and run both components
Answer: C
Explanation:
The best option for creating a scalable and maintainable production process that runs end-to-end
and tracks the connections between steps, using prototype code to production, feature engineering
code in PySpark that runs on Dataproc Serverless, and model training that is executed by using a
Vertex AI custom training job, is to use the Kubeflow pipelines SDK to write code that specifies two
components. The first is a Dataproc Serverless component that launches the feature engineering job.
The second is a custom component wrapped in the create_custom_training_job_from_component
utility that launches the custom model training job. This option allows you to leverage the power and
simplicity of Kubeflow pipelines to orchestrate and automate your machine learning workflows on
Vertex AI. Kubeflow pipelines is a platform that can build, deploy, and manage machine learning
pipelines on Kubernetes. Kubeflow pipelines can help you create reusable and scalable pipelines,
experiment with different pipeline versions and parameters, and monitor and debug your pipelines.
Kubeflow pipelines SDK is a set of Python packages that can help you build and run Kubeflow
pipelines. Kubeflow pipelines SDK can help you define pipeline components, specify pipeline
parameters and inputs, and create pipeline steps and tasks. A component is a self-contained set of
code that performs one step in a pipeline, such as data preprocessing, model training, or model
evaluation. A component can be created from a Python function, a container image, or a prebuilt
component. A custom component is a component that is not provided by Kubeflow pipelines, but is
created by the user to perform a specific task. A custom component can be wrapped in a utility
function that can help you create a Vertex AI custom training job from the component. A custom
training job is a resource that can run your custom training code on Vertex AI. A custom training job
can help you train various types of models, such as linear regression, logistic regression, k-means
clustering, matrix factorization, and deep neural networks. By using the Kubeflow pipelines SDK to
write code that specifies two components, the first is a Dataproc Serverless component that launches
the feature engineering job, and the second is a custom component wrapped in the
create_custom_training_job_from_component utility that launches the custom model training job,
you can create a scalable and maintainable production process that runs end-to-end and tracks the
connections between steps. You can write code that defines the two components, their inputs and
outputs, and their dependencies. You can then use the Kubeflow pipelines SDK to create a pipeline
that runs the two components in sequence, and submit the pipeline to Vertex AI Pipelines for
execution. By using Dataproc Serverless component, you can run your PySpark feature engineering
code on Dataproc Serverless, which is a service that can run Spark batch workloads without
provisioning and managing your own cluster. By using custom component wrapped in the
create_custom_training_job_from_component utility, you can run your custom model training code
on Vertex AI, which is a unified platform for building and deploying machine learning solutions on
Google Cloud1.
The other options are not as good as option C, for the following reasons:
Option A: Creating a Vertex AI Workbench notebook, using the notebook to submit the Dataproc
Serverless feature engineering job, using the same notebook to submit the custom model training
job, and running the notebook cells sequentially to tie the steps together end-to-end would require
more skills and steps than using the Kubeflow pipelines SDK to write code that specifies two
components, the first is a Dataproc Serverless component that launches the feature engineering job,
and the second is a custom component wrapped in the
create_custom_training_job_from_component utility that launches the custom model training job.
Vertex AI Workbench is a service that can provide managed notebooks for machine learning
development and experimentation. Vertex AI Workbench can help you create and run JupyterLab
notebooks, and access various tools and frameworks, such as TensorFlow, PyTorch, and JAX. By
creating a Vertex AI Workbench notebook, using the notebook to submit the Dataproc Serverless
feature engineering job, using the same notebook to submit the custom model training job, and
running the notebook cells sequentially to tie the steps together end-to-end, you can create a
production process that runs end-to-end and tracks the connections between steps. You can write
code that submits the Dataproc Serverless feature engineering job and the custom model training job
to Vertex AI, and run the code in the notebook cells. However, creating a Vertex AI Workbench
notebook, using the notebook to submit the Dataproc Serverless feature engineering job, using the
same notebook to submit the custom model training job, and running the notebook cells
sequentially to tie the steps together end-to-end would require more skills and steps than using the
Kubeflow pipelines SDK to write code that specifies two components, the first is a Dataproc
Serverless component that launches the feature engineering job, and the second is a custom
component wrapped in the create_custom_training_job_from_component utility that launches the
custom model training job. You would need to write code, create and configure the Vertex AI
Workbench notebook, submit the Dataproc Serverless feature engineering job and the custom model
training job, and run the notebook cells. Moreover, this option would not use the Kubeflow pipelines
SDK, which can simplify the pipeline creation and execution process, and provide various features,
such as pipeline parameters, pipeline metrics, and pipeline visualization2.
Option B: Creating a Vertex AI Workbench notebook, initiating an Apache Spark context in the
notebook, and running the PySpark feature engineering code, using the same notebook to run the
custom model training job in TensorFlow, and running the notebook cells sequentially to tie the steps
together end-to-end would not allow you to use Dataproc Serverless to run the feature engineering
job, and could increase the complexity and cost of the production process. Apache Spark is a
framework that can perform large-scale data processing and machine learning. Apache Spark can
help you run various tasks, such as data ingestion, data transformation, data analysis, and data
visualization. PySpark is a Python API for Apache Spark. PySpark can help you write and run Spark
code in Python. An Apache Spark context is a resource that can initialize and configure the Spark
environment. An Apache Spark context can help you create and manage Spark objects, such as
SparkSession, SparkConf, and SparkContext. By creating a Vertex AI Workbench notebook, initiating
an Apache Spark context in the notebook, and running the PySpark feature engineering code, using
the same notebook to run the custom model training job in TensorFlow, and running the notebook
cells sequentially to tie the steps together end-to-end, you can create a production process that runs
end-to-end and tracks the connections between steps. You can write code that initiates an Apache
Spark context and runs the PySpark feature engineering code, and runs the custom model training
job in TensorFlow, and run the code in the notebook cells. However, creating a Vertex AI Workbench
notebook, initiating an Apache Spark context in the notebook, and running the PySpark feature
engineering code, using the same notebook to run the custom model training job in TensorFlow, and
running the notebook cells sequentially to tie the steps together end-to-end would not allow you to
use Dataproc Serverless to run the feature engineering job, and could increase the complexity and
cost of the production process. You would need to write code, create and configure the Vertex AI
Workbench notebook, initiate and configure the Apache Spark context, run the PySpark feature
engineering code, and run the custom model training job in TensorFlow. Moreover, this option would
not use Dataproc Serverless, which is a service that can run Spark batch workloads without
provisioning and managing your own cluster, and provide various benefits, such as autoscaling,
dynamic resource allocation, and serverless billing2.
Option D: Creating a Vertex AI Pipelines job to link and run both components, using the Kubeflow
pipelines SDK to write code that specifies two components, the first component initiates an Apache
Spark context that runs the PySpark feature engineering code, and the second component runs the
TensorFlow custom model training code, would not allow you to use Dataproc Serverless to run the
feature engineering job, and could increase the complexity and cost of the production process.
Vertex AI Pipelines is a service that can run Kubeflow pipelines on Vertex AI. Vertex AI Pipelines can
help you create and manage machine learning pipelines, and integrate with various Vertex AI
services, such as Vertex AI Workbench, Vertex AI Training, and Vertex AI Prediction. A Vertex AI
Pipelines job is a resource that can execute a pipeline on Vertex AI Pipelines. A Vertex AI Pipelines
job can help you run your pipeline steps and tasks, and monitor and debug your pipeline execution.
By creating a Vertex AI Pipelines job to link and run both components, using the Kubeflow pipelines
SDK to write code that specifies two components, the first component initiates an Apache Spark
context that runs the PySpark feature engineering code, and the second component runs the
TensorFlow custom model training code, you can create a scalable and maintainable production
process that runs end-to-end and tracks the connections between steps. You can write code that
defines the two components, their inputs and outputs, and their dependencies. You can then use the
Kubeflow pipelines SDK to create a pipeline that runs the two components in sequence, and submit
the pipeline to Vertex AI Pipelines for execution. However, creating a Vertex AI Pipelines job to link
and run both components, using the Kubeflow pipelines SDK to write code that specifies two
components, the first component initiates an Apache Spark context that runs the PySpark feature
engineering code,
Question # 22
You are building a TensorFlow model for a financial institution that predicts the impact of consumer
spending on inflation globally. Due to the size and nature of the data, your model is long-running
across all types of hardware, and you have built frequent checkpointing into the training process.
Your organization has asked you to minimize cost. What hardware should you choose?
A. A Vertex AI Workbench user-managed notebooks instance running on an n1-standard-16 with 4
NVIDIA P100 GPUs B. A Vertex AI Workbench user-managed notebooks instance running on an n1-standard-16 with an
NVIDIA P100 GPU C. A Vertex AI Workbench user-managed notebooks instance running on an n1-standard-16 with a
non-preemptible v3-8 TPU D. A Vertex AI Workbench user-managed notebooks instance running on an n1-standard-16 with a
preemptible v3-8 TPU
Answer: D
Explanation:
The best hardware to choose for your model while minimizing cost is a Vertex AI Workbench usermanaged
notebooks instance running on an n1-standard-16 with a preemptible v3-8 TPU. This
hardware configuration can provide you with high performance, scalability, and efficiency for your
TensorFlow model, as well as low cost and flexibility for your long-running and checkpointing
process. The v3-8 TPU is a cloud tensor processing unit (TPU) device, which is a custom ASIC chip
designed by Google to accelerate ML workloads. It can handle large and complex models and
datasets, and offer fast and stable training and inference. The n1-standard-16 is a general-purpose
VM that can support the CPU and memory requirements of your model, as well as the data
preprocessing and postprocessing tasks. By choosing a preemptible v3-8 TPU, you can take advantage
of the lower price and availability of the TPU devices, as long as you can tolerate the possibility of the
device being reclaimed by Google at any time. However, since you have built frequent checkpointing
into your training process, you can resume your model from the last saved state, and avoid losing any
progress or data. Moreover, you can use the Vertex AI Workbench user-managed notebooks to
create and manage your notebooks instances, and leverage the integration with Vertex AI and other
Google Cloud services.
The other options are not optimal for the following reasons:
A) A Vertex AI Workbench user-managed notebooks instance running on an n1-standard-16 with 4
NVIDIA P100 GPUs is not a good option, as it has higher cost and lower performance than the v3-8
TPU. The NVIDIA P100 GPUs are the previous generation of GPUs from NVIDIA, which have lower
performance, scalability, and efficiency than the latest NVIDIA A100 GPUs or the TPUs. They also
have higher price and lower availability than the preemptible TPUs, which can increase the cost and
complexity of your solution.
B) A Vertex AI Workbench user-managed notebooks instance running on an n1-standard-16 with an
NVIDIA P100 GPU is not a good option, as it has higher cost and lower performance than the v3-8
TPU. It also has less GPU memory and compute power than the option with 4 NVIDIA P100 GPUs,
which can limit the size and complexity of your model, and affect the training and inference speed
and quality.
C) A Vertex AI Workbench user-managed notebooks instance running on an n1-standard-16 with a
non-preemptible v3-8 TPU is not a good option, as it has higher cost and lower flexibility than the
preemptible v3-8 TPU. The non-preemptible v3-8 TPU has the same performance, scalability, and
efficiency as the preemptible v3-8 TPU, but it has higher price and lower availability, as it is reserved
for your exclusive use. Moreover, since your model is long-running and checkpointing, you do not
need the guarantee of the device not being reclaimed by Google, and you can benefit from the lower
cost and higher availability of the preemptible v3-8 TPU.
Reference:
Professional ML Engineer Exam Guide
Preparing for Google Cloud Certification: Machine Learning Engineer Professional Certificate
Google Cloud launches machine learning engineer certification
Cloud TPU
Vertex AI Workbench user-managed notebooks
Preemptible VMs
NVIDIA Tesla P100 GPU
Question # 23
You recently deployed a scikit-learn model to a Vertex Al endpoint You are now testing the model onlive production traffic While monitoring the endpoint. you discover twice as many requests per hourthan expected throughout the day You want the endpoint to efficiently scale when the demandincreases in the future to prevent users from experiencing high latency What should you do?
A. Deploy two models to the same endpoint and distribute requests among them evenly. B. Configure an appropriate minReplicaCount value based on expected baseline traffic. C. Set the target utilization percentage in the autcscalir.gMetricspecs configuration to a higher value D. Change the model's machine type to one that utilizes GPUs.
Answer: B
Explanation:
The best option for scaling a Vertex AI endpoint efficiently when the demand increases in the future,
using a scikit-learn model that is deployed to a Vertex AI endpoint and tested on live production
traffic, is to configure an appropriate minReplicaCount value based on expected baseline traffic. This
option allows you to leverage the power and simplicity of Vertex AI to automatically scale your
endpoint resources according to the traffic patterns. Vertex AI is a unified platform for building and
deploying machine learning solutions on Google Cloud. Vertex AI can deploy a trained model to an
online prediction endpoint, which can provide low-latency predictions for individual instances.
Vertex AI can also provide various tools and services for data analysis, model development, model
deployment, model monitoring, and model governance. A minReplicaCount value is a parameter
that specifies the minimum number of replicas that the endpoint must always have, regardless of the
load. A minReplicaCount value can help you ensure that the endpoint has enough resources to
handle the expected baseline traffic, and avoid high latency or errors. By configuring an appropriate
minReplicaCount value based on expected baseline traffic, you can scale your endpoint efficiently
when the demand increases in the future. You can set the minReplicaCount value when you deploy
the model to the endpoint, or update it later. Vertex AI will automatically scale up or down the
number of replicas within the range of the minReplicaCount and maxReplicaCount values, based on
the target utilization percentage and the autoscaling metric1.
The other options are not as good as option B, for the following reasons:
Option A: Deploying two models to the same endpoint and distributing requests among them evenly
would not allow you to scale your endpoint efficiently when the demand increases in the future, and
could increase the complexity and cost of the deployment process. A model is a resource that
represents a machine learning model that you can use for prediction. A model can have one or more
versions, which are different implementations of the same model. A model version can help you
experiment and iterate on your model, and improve the model performance and accuracy. An
endpoint is a resource that provides the service endpoint (URL) you use to request the prediction. An
endpoint can have one or more deployed models, which are instances of model versions that are
associated with physical resources. A deployed model can help you serve online predictions with low
latency, and scale up or down based on the traffic. By deploying two models to the same endpoint
and distributing requests among them evenly, you can create a load balancing mechanism that can
distribute the traffic across the models, and reduce the load on each model. However, deploying two
models to the same endpoint and distributing requests among them evenly would not allow you to
scale your endpoint efficiently when the demand increases in the future, and could increase the
complexity and cost of the deployment process. You would need to write code, create and configure
the two models, deploy the models to the same endpoint, and distribute the requests among them
evenly. Moreover, this option would not use the autoscaling feature of Vertex AI, which can
automatically adjust the number of replicas based on the traffic patterns, and provide various
benefits, such as optimal resource utilization, cost savings, and performance improvement2.
Option C: Setting the target utilization percentage in the autoscalingMetricSpecs configuration to a
higher value would not allow you to scale your endpoint efficiently when the demand increases in
the future, and could cause errors or poor performance. A target utilization percentage is a
parameter that specifies the desired utilization level of each replica. A target utilization percentage
can affect the speed and accuracy of the autoscaling process. A higher target utilization percentage
can help you reduce the number of replicas, but it can also cause high latency, low throughput, or
resource exhaustion. By setting the target utilization percentage in the autoscalingMetricSpecs
configuration to a higher value, you can increase the utilization level of each replica, and save some
resources. However, setting the target utilization percentage in the autoscalingMetricSpecs
configuration to a higher value would not allow you to scale your endpoint efficiently when the
demand increases in the future, and could cause errors or poor performance. You would need to
write code, create and configure the autoscalingMetricSpecs, and set the target utilization
percentage to a higher value. Moreover, this option would not ensure that the endpoint has enough
resources to handle the expected baseline traffic, which could cause high latency or errors1.
Option D: Changing the models machine type to one that utilizes GPUs would not allow you to scale
your endpoint efficiently when the demand increases in the future, and could increase the
complexity and cost of the deployment process. A machine type is a parameter that specifies the
type of virtual machine that the prediction service uses for the deployed model. A machine type can
affect the speed and accuracy of the prediction process. A machine type that utilizes GPUs can help
you accelerate the computation and processing of the prediction, and handle more prediction
requests at the same time. By changing the models machine type to one that utilizes GPUs, you can
improve the prediction performance and efficiency of your model. However, changing the models
machine type to one that utilizes GPUs would not allow you to scale your endpoint efficiently when
the demand increases in the future, and could increase the complexity and cost of the deployment
process. You would need to write code, create and configure the model, deploy the model to the
endpoint, and change the machine type to one that utilizes GPUs. Moreover, this option would not
use the autoscaling feature of Vertex AI, which can automatically adjust the number of replicas based
on the traffic patterns, and provide various benefits, such as optimal resource utilization, cost
savings, and performance improvement2.
Reference:
Configure compute resources for prediction | Vertex AI | Google Cloud
Deploy a model to an endpoint | Vertex AI | Google Cloud
Question # 24
You are an ML engineer at an ecommerce company and have been tasked with building a model that
predicts how much inventory the logistics team should order each month. Which approach should
you take?
A. Use a clustering algorithm to group popular items together. Give the list to the logistics team so
they can increase inventory of the popular items. B. Use a regression model to predict how much additional inventory should be purchased each
month. Give the results to the logistics team at the beginning of the month so they can increase
inventory by the amount predicted by the model. C. Use a time series forecasting model to predict each item's monthly sales. Give the results to the
logistics team so they can base inventory on the amount predicted by the model. D. Use a classification model to classify inventory levels as UNDER_STOCKED, OVER_STOCKED, and
CORRECTLY_STOCKED. Give the report to the logistics team each month so they can fine-tune
inventory levels.
Answer: C
Explanation:
The best approach to build a model that predicts how much inventory the logistics team should order
each month is to use a time series forecasting model to predict each items monthly sales. This
approach can capture the temporal patterns and trends in the sales data, such as seasonality,
cyclicality, and autocorrelation. It can also account for the variability and uncertainty in the demand,
and provide confidence intervals and error metrics for the predictions. By using a time series
forecasting model, you can provide the logistics team with accurate and reliable estimates of the
future sales for each item, which can help them optimize the inventory levels and avoid overstocking
or understocking. You can use various methods and tools to build a time series forecasting model,
such as ARIMA, LSTM, Prophet, or BigQuery ML.
The other options are not optimal for the following reasons:
A) Using a clustering algorithm to group popular items together is not a good approach, as it does
not provide any quantitative or temporal information about the sales or the inventory. It only
provides a qualitative and static categorization of the items based on their similarity or dissimilarity.
Moreover, clustering is an unsupervised learning technique, which does not use any target variable
or feedback to guide the learning process. This can result in arbitrary and inconsistent clusters, which
may not reflect the true demand or preferences of the customers.
B) Using a regression model to predict how much additional inventory should be purchased each
month is not a good approach, as it does not account for the individual differences and dynamics of
each item. It only provides a single aggregated value for the whole inventory, which can be
misleading and inaccurate. Moreover, a regression model is not well-suited for handling time series
data, as it assumes that the data points are independent and identically distributed, which is not the
case for sales data. A regression model can also suffer from overfitting or underfitting, depending on
the choice and complexity of the features and the model.
D) Using a classification model to classify inventory levels as UNDER_STOCKED, OVER_STOCKED, and
CORRECTLY_STOCKED is not a good approach, as it does not provide any numerical or predictive
information about the sales or the inventory. It only provides a discrete and subjective label for the
inventory levels, which can be vague and ambiguous. Moreover, a classification model is not wellsuited
for handling time series data, as it assumes that the data points are independent and
identically distributed, which is not the case for sales data. A classification model can also suffer from
class imbalance, misclassification, or overfitting, depending on the choice and complexity of the
features, the model, and the threshold.
Reference:
Professional ML Engineer Exam Guide
Preparing for Google Cloud Certification: Machine Learning Engineer Professional Certificate
Google Cloud launches machine learning engineer certification
Time Series Forecasting: Principles and Practice
BigQuery ML: Time series analysis
Question # 25
You work at a bank You have a custom tabular ML model that was provided by the bank's vendor. Thetraining data is not available due to its sensitivity. The model is packaged as a Vertex Al Modelserving container which accepts a string as input for each prediction instance. In each string thefeature values are separated by commas. You want to deploy this model to production for onlinepredictions, and monitor the feature distribution over time with minimal effort What should you do?
A. 1 Upload the model to Vertex Al Model Registry and deploy the model to a Vertex Ai endpoint.2. Create a Vertex Al Model Monitoring job with feature drift detection as the monitoringobjective, and provide an instance schema. B. 1 Upload the model to Vertex Al Model Registry and deploy the model to a Vertex Al endpoint.2 Create a Vertex Al Model Monitoring job with feature skew detection as the monitoringobjective and provide an instance schema. C. 1 Refactor the serving container to accept key-value pairs as input format.2. Upload the model to Vertex Al Model Registry and deploy the model to a Vertex Al endpoint.3. Create a Vertex Al Model Monitoring job with feature drift detection as the monitoringobjective. D. 1 Refactor the serving container to accept key-value pairs as input format.2 Upload the model to Vertex Al Model Registry and deploy the model to a Vertex Al endpoint.3. Create a Vertex Al Model Monitoring job with feature skew detection as the monitoringobjective.
Answer: A
Explanation:
The best option for deploying a custom tabular ML model to production for online predictions, and
monitoring the feature distribution over time with minimal effort, using a model that was provided
by the banks vendor, the training data is not available due to its sensitivity, and the model is
packaged as a Vertex AI Model serving container which accepts a string as input for each prediction
instance, is to upload the model to Vertex AI Model Registry and deploy the model to a Vertex AI
endpoint, create a Vertex AI Model Monitoring job with feature drift detection as the monitoring
objective, and provide an instance schema. This option allows you to leverage the power and
simplicity of Vertex AI to serve and monitor your model with minimal code and configuration. Vertex
AI is a unified platform for building and deploying machine learning solutions on Google Cloud.
Vertex AI can deploy a trained model to an online prediction endpoint, which can provide lowlatency
predictions for individual instances. Vertex AI can also provide various tools and services for
data analysis, model development, model deployment, model monitoring, and model governance. A
Vertex AI Model Registry is a resource that can store and manage your models on Vertex AI. A Vertex
AI Model Registry can help you organize and track your models, and access various model
information, such as model name, model description, and model labels. A Vertex AI Model serving
container is a resource that can run your custom model code on Vertex AI. A Vertex AI Model serving
container can help you package your model code and dependencies into a container image, and
deploy the container image to an online prediction endpoint. A Vertex AI Model serving container
can accept various input formats, such as JSON, CSV, or TFRecord. A string input format is a type of
input format that accepts a string as input for each prediction instance. A string input format can help
you encode your feature values into a single string, and separate them by commas. By uploading the
model to Vertex AI Model Registry and deploying the model to a Vertex AI endpoint, you can serve
your model for online predictions with minimal code and configuration. You can use the Vertex AI API
or the gcloud command-line tool to upload the model to Vertex AI Model Registry, and provide the
model name, model description, and model labels. You can also use the Vertex AI API or the gcloud
command-line tool to deploy the model to a Vertex AI endpoint, and provide the endpoint name,
endpoint description, endpoint labels, and endpoint resources. A Vertex AI Model Monitoring job is a
resource that can monitor the performance and quality of your deployed models on Vertex AI. A
Vertex AI Model Monitoring job can help you detect and diagnose issues with your models, such as
data drift, prediction drift, training/serving skew, or model staleness. Feature drift is a type of model
monitoring metric that measures the difference between the distributions of the features used to
train the model and the features used to serve the model over time. Feature drift can indicate that
the online data is changing over time, and the model performance is degrading. By creating a Vertex
AI Model Monitoring job with feature drift detection as the monitoring objective, and providing an
instance schema, you can monitor the feature distribution over time with minimal effort. You can use
the Vertex AI API or the gcloud command-line tool to create a Vertex AI Model Monitoring job, and
provide the monitoring objective, the monitoring frequency, the alerting threshold, and the
notification channel. You can also provide an instance schema, which is a JSON file that describes the
features and their types in the prediction input data. An instance schema can help Vertex AI Model
Monitoring parse and analyze the string input format, and calculate the feature distributions and
distance scores1.
The other options are not as good as option A, for the following reasons:
Option B: Uploading the model to Vertex AI Model Registry and deploying the model to a Vertex AI
endpoint, creating a Vertex AI Model Monitoring job with feature skew detection as the monitoring
objective, and providing an instance schema would not help you monitor the changes in the online
data over time, and could cause errors or poor performance. Feature skew is a type of model
monitoring metric that measures the difference between the distributions of the features used to
train the model and the features used to serve the model at a given point in time. Feature skew can
indicate that the model is not trained on the representative data, or that the data is changing over
time. By creating a Vertex AI Model Monitoring job with feature skew detection as the monitoring
objective, and providing an instance schema, you can monitor the feature distribution at a given
point in time with minimal effort. However, uploading the model to Vertex AI Model Registry and
deploying the model to a Vertex AI endpoint, creating a Vertex AI Model Monitoring job with feature
skew detection as the monitoring objective, and providing an instance schema would not help you
monitor the changes in the online data over time, and could cause errors or poor performance. You
would need to use the Vertex AI API or the gcloud command-line tool to upload the model to Vertex
AI Model Registry, deploy the model to a Vertex AI endpoint, create a Vertex AI Model Monitoring
job, and provide an instance schema. Moreover, this option would not monitor the feature drift,
which is a more direct and relevant metric for measuring the changes in the online data over time,
and the model performance and quality1.
Option C: Refactoring the serving container to accept key-value pairs as input format, uploading the
model to Vertex AI Model Registry and deploying the model to a Vertex AI endpoint, creating a
Vertex AI Model Monitoring job with feature drift detection as the monitoring objective would
require more skills and steps than uploading the model to Vertex AI Model Registry and deploying
the model to a Vertex AI endpoint, creating a Vertex AI Model Monitoring job with feature drift
detection as the monitoring objective, and providing an instance schema. A key-value pair input
format is a type of input format that accepts a key-value pair as input for each prediction instance. A
key-value pair input format can help you specify the feature names and values in a JSON object, and
separate them by colons. By refactoring the serving container to accept key-value pairs as input
format, uploading the model to Vertex AI Model Registry and deploying the model to a Vertex AI
endpoint, creating a Vertex AI Model Monitoring job with feature drift detection as the monitoring
objective, you can serve and monitor your model with minimal code and configuration. You can write
code to refactor the serving container to accept key-value pairs as input format, and use the Vertex
AI API or the gcloud command-line tool to upload the model to Vertex AI Model Registry, deploy the
model to a Vertex AI endpoint, and create a Vertex AI Model Monitoring job. However, refactoring
the serving container to accept key-value pairs as input format, uploading the model to Vertex AI
Model Registry and deploying the model to a Vertex AI endpoint, creating a Vertex AI Model
Monitoring job with feature drift detection as the monitoring objective would require more skills and
steps than uploading the model to Vertex AI Model Registry and deploying the model to a Vertex AI
endpoint, creating a Vertex AI Model Monitoring job with feature drift detection as the monitoring
objective, and providing an instance schema. You would need to write code, refactor the serving
container, upload the model to Vertex AI Model Registry, deploy the model to a Vertex AI endpoint,
and create a Vertex AI Model Monitoring job. Moreover, this option would not use the instance
schema, which is a JSON file that can help Vertex AI Model Monitoring parse and analyze the string
input format, and calculate the feature distributions and distance scores1.
Option D: Refactoring the serving container to accept key-value pairs as input format, uploading the
model to Vertex AI Model Registry and deploying the model to a Vertex AI endpoint, creating a
Vertex AI Model Monitoring job with feature skew detection as the monitoring objective would
require more skills and steps than uploading the model to Vertex AI Model Registry and deploying
the model to a Vertex AI endpoint, creating a Vertex AI Model Monitoring job with feature drift
detection as the monitoring objective, and providing an instance schema, and would not help you
monitor the changes in the online data over time, and could cause errors or poor performance.
Feature skew is a type of model monitoring metric that measures the difference between the
distributions of the features used to train the model and the features used to serve the model at a
given point in time. Feature skew can indicate that the model is not trained on the representative
data, or that the data is changing over time. By creating a Vertex AI Model Monitoring job with
feature skew detection as the monitoring objective, you can monitor the feature distribution at a
given point in time with minimal effort. However, refactoring the serving container to accept keyvalue
pairs as input format, uploading the model to Vertex AI Model Registry and deploying the
model to a Vertex AI endpoint, creating a Vertex AI Model Monitoring job with feature skew
detection as the monitoring objective would require more skills and steps than uploading the model
to Vertex AI Model Registry and deploying the model to a Vertex AI endpoint, creating a Vertex AI
Model Monitoring job with feature drift detection as the monitoring objective, and providing an
instance schema, and would not help you monitor the changes in the online data over time, and
could cause errors or poor performance. You would need to write code, refactor the serving
container, upload the model to Vertex AI Model Registry, deploy the model to a Vertex AI endpoint,
and create a Vertex AI Model Monitoring job. Moreover, this option would not monitor the feature
drift, which is a more direct and relevant metric for measuring the changes in the online data over
time, and the model performance and quality1.
Reference:
Using Model Monitoring | Vertex AI | Google Cloud
Feedback That Matters: Reviews of Our Google Professional-Machine-Learning-Engineer Dumps