Was :
$90
Today :
$50
Was :
$108
Today :
$60
Was :
$126
Today :
$70
Why Should You Prepare For Your AWS Certified Machine Learning - Specialty With MyCertsHub?
At MyCertsHub, we go beyond standard study material. Our platform provides authentic Amazon MLS-C01 Exam Dumps, detailed exam guides, and reliable practice exams that mirror the actual AWS Certified Machine Learning - Specialty test. Whether you’re targeting Amazon certifications or expanding your professional portfolio, MyCertsHub gives you the tools to succeed on your first attempt.
Verified MLS-C01 Exam Dumps
Every set of exam dumps is carefully reviewed by certified experts to ensure accuracy. For the MLS-C01 AWS Certified Machine Learning - Specialty , you’ll receive updated practice questions designed to reflect real-world exam conditions. This approach saves time, builds confidence, and focuses your preparation on the most important exam areas.
Realistic Test Prep For The MLS-C01
You can instantly access downloadable PDFs of MLS-C01 practice exams with MyCertsHub. These include authentic practice questions paired with explanations, making our exam guide a complete preparation tool. By testing yourself before exam day, you’ll walk into the Amazon Exam with confidence.
Smart Learning With Exam Guides
Our structured MLS-C01 exam guide focuses on the AWS Certified Machine Learning - Specialty's core topics and question patterns. You will be able to concentrate on what really matters for passing the test rather than wasting time on irrelevant content. Pass the MLS-C01 Exam – Guaranteed
We Offer A 100% Money-Back Guarantee On Our Products.
After using MyCertsHub's exam dumps to prepare for the AWS Certified Machine Learning - Specialty exam, we will issue a full refund. That’s how confident we are in the effectiveness of our study resources.
Try Before You Buy – Free Demo
Still undecided? See for yourself how MyCertsHub has helped thousands of candidates achieve success by downloading a free demo of the MLS-C01 exam dumps.
MyCertsHub – Your Trusted Partner For Amazon Exams
Whether you’re preparing for AWS Certified Machine Learning - Specialty or any other professional credential, MyCertsHub provides everything you need: exam dumps, practice exams, practice questions, and exam guides. Passing your MLS-C01 exam has never been easier thanks to our tried-and-true resources.
Amazon MLS-C01 Sample Question Answers
Question # 1
A company wants to forecast the daily price of newly launched products based on 3 yearsof data for older product prices, sales, and rebates. The time-series data has irregulartimestamps and is missing some values.Data scientist must build a dataset to replace the missing values. The data scientist needsa solution that resamptes the data daily and exports the data for further modeling.Which solution will meet these requirements with the LEAST implementation effort?
A. Use Amazon EMR Serveriess with PySpark. B. Use AWS Glue DataBrew. C. Use Amazon SageMaker Studio Data Wrangler. D. Use Amazon SageMaker Studio Notebook with Pandas.
Answer: C
Explanation: Amazon SageMaker Studio Data Wrangler is a visual data preparation tool
that enables users to clean and normalize data without writing any code. Using Data
Wrangler, the data scientist can easily import the time-series data from various sources,
such as Amazon S3, Amazon Athena, or Amazon Redshift. Data Wrangler can
automatically generate data insights and quality reports, which can help identify and fix
missing values, outliers, and anomalies in the data. Data Wrangler also provides over 250
built-in transformations, such as resampling, interpolation, aggregation, and filtering, which
can be applied to the data with a point-and-click interface. Data Wrangler can also export
the prepared data to different destinations, such as Amazon S3, Amazon SageMaker
Feature Store, or Amazon SageMaker Pipelines, for further modeling and analysis. Data
Wrangler is integrated with Amazon SageMaker Studio, a web-based IDE for machine
learning, which makes it easy to access and use the tool. Data Wrangler is a serverless
and fully managed service, which means the data scientist does not need to provision,
configure, or manage any infrastructure or clusters.
Option A is incorrect because Amazon EMR Serverless is a serverless option for running big data analytics applications using open-source frameworks, such as Apache Spark.
However, using Amazon EMR Serverless would require the data scientist to write PySpark
code to perform the data preparation tasks, such as resampling, imputation, and
aggregation. This would require more implementation effort than using Data Wrangler,
which provides a visual and code-free interface for data preparation.
Option B is incorrect because AWS Glue DataBrew is another visual data preparation tool
that can be used to clean and normalize data without writing code. However, DataBrew
does not support time-series data as a data type, and does not provide built-in
transformations for resampling, interpolation, or aggregation of time-series data. Therefore,
using DataBrew would not meet the requirements of the use case.
Option D is incorrect because using Amazon SageMaker Studio Notebook with Pandas
would also require the data scientist to write Python code to perform the data preparation
tasks. Pandas is a popular Python library for data analysis and manipulation, which
supports time-series data and provides various methods for resampling, interpolation, and
aggregation. However, using Pandas would require more implementation effort than using
Data Wrangler, which provides a visual and code-free interface for data preparation.
References:
1: Amazon SageMaker Data Wrangler documentation
2: Amazon EMR Serverless documentation
3: AWS Glue DataBrew documentation
4: Pandas documentation
Question # 2
A company operates large cranes at a busy port. The company plans to use machinelearning (ML) for predictive maintenance of the cranes to avoid unexpected breakdownsand to improve productivity.The company already uses sensor data from each crane to monitor the health of thecranes in real time. The sensor data includes rotation speed, tension, energy consumption,vibration, pressure, and …perature for each crane. The company contracts AWS MLexperts to implement an ML solution.Which potential findings would indicate that an ML-based solution is suitable for thisscenario? (Select TWO.)
A. The historical sensor data does not include a significant number of data points andattributes for certain time periods. B. The historical sensor data shows that simple rule-based thresholds can predict cranefailures. C. The historical sensor data contains failure data for only one type of crane model that isin operation and lacks failure data of most other types of crane that are in operation. D. The historical sensor data from the cranes are available with high granularity for the last3 years. E. The historical sensor data contains most common types of crane failures that thecompany wants to predict.
Answer: D,E
Explanation: The best indicators that an ML-based solution is suitable for this scenario are
D and E, because they imply that the historical sensor data is sufficient and relevant for building a predictive maintenance model. This model can use machine learning techniques
such as regression, classification, or anomaly detection to learn from the past data and
forecast future failures or issues12. Having high granularity and diversity of data can
improve the accuracy and generalization of the model, as well as enable the detection of
complex patterns and relationships that are not captured by simple rule-based thresholds3.
The other options are not good indicators that an ML-based solution is suitable, because
they suggest that the historical sensor data is incomplete, inconsistent, or inadequate for
building a predictive maintenance model. These options would require additional data
collection, preprocessing, or augmentation to overcome the data quality issues and ensure
that the model can handle different scenarios and types of cranes4 .
References:
1: Machine Learning Techniques for Predictive Maintenance
2: A Guide to Predictive Maintenance & Machine Learning
3: Machine Learning for Predictive Maintenance: Reinventing Asset Upkeep
4: Predictive Maintenance with Machine Learning: A Complete Guide
A company is creating an application to identify, count, and classify animal images that areuploaded to the company’s website. The company is using the Amazon SageMaker imageclassification algorithm with an ImageNetV2 convolutional neural network (CNN). Thesolution works well for most animal images but does not recognize many animal speciesthat are less common.The company obtains 10,000 labeled images of less common animal species and storesthe images in Amazon S3. A machine learning (ML) engineer needs to incorporate theimages into the model by using Pipe mode in SageMaker.Which combination of steps should the ML engineer take to train the model? (Choose two.)
A. Use a ResNet model. Initiate full training mode by initializing the network with randomweights. B. Use an Inception model that is available with the SageMaker image classificationalgorithm. C. Create a .lst file that contains a list of image files and corresponding class labels. Uploadthe .lst file to Amazon S3. D. Initiate transfer learning. Train the model by using the images of less common species. E. Use an augmented manifest file in JSON Lines format.
Answer: C,D
Explanation: The combination of steps that the ML engineer should take to train the model
are to create a .lst file that contains a list of image files and corresponding class labels,
upload the .lst file to Amazon S3, and initiate transfer learning by training the model using
the images of less common species. This approach will allow the ML engineer to leverage
the existing ImageNetV2 CNN model and fine-tune it with the new data using Pipe mode in
SageMaker.
A .lst file is a text file that contains a list of image files and corresponding class labels,
separated by tabs. The .lst file format is required for using the SageMaker image
classification algorithm with Pipe mode. Pipe mode is a feature of SageMaker that enables
streaming data directly from Amazon S3 to the training instances, without downloading the
data first. Pipe mode can reduce the startup time, improve the I/O throughput, and enable
training on large datasets that exceed the disk size limit. To use Pipe mode, the ML
engineer needs to upload the .lst file to Amazon S3 and specify the S3 path as the input
data channel for the training job1.
Transfer learning is a technique that enables reusing a pre-trained model for a new task by
fine-tuning the model parameters with new data. Transfer learning can save time and
computational resources, as well as improve the performance of the model, especially
when the new task is similar to the original task. The SageMaker image classification
algorithm supports transfer learning by allowing the ML engineer to specify the number of
output classes and the number of layers to be retrained. The ML engineer can use the
existing ImageNetV2 CNN model, which is trained on 1,000 classes of common objects,
and fine-tune it with the new data of less common animal species, which is a similar task2.
The other options are either less effective or not supported by the SageMaker image
classification algorithm. Using a ResNet model and initiating full training mode would
require training the model from scratch, which would take more time and resources than
transfer learning. Using an Inception model is not possible, as the SageMaker image
classification algorithm only supports ResNet and ImageNetV2 models. Using an
augmented manifest file in JSON Lines format is not compatible with Pipe mode, as Pipe
mode only supports .lst files for image classification1.
References:
1: Using Pipe input mode for Amazon SageMaker algorithms | AWS Machine
A machine learning (ML) specialist is using the Amazon SageMaker DeepAR forecastingalgorithm to train a model on CPU-based Amazon EC2 On-Demand instances. The modelcurrently takes multiple hours to train. The ML specialist wants to decrease the trainingtime of the model.Which approaches will meet this requirement7 (SELECT TWO )
A. Replace On-Demand Instances with Spot Instances B. Configure model auto scaling dynamically to adjust the number of instancesautomatically. C. Replace CPU-based EC2 instances with GPU-based EC2 instances. D. Use multiple training instances. E. Use a pre-trained version of the model. Run incremental training.
Answer: C,D
Explanation: The best approaches to decrease the training time of the model are C and D,
because they can improve the computational efficiency and parallelization of the training
process. These approaches have the following benefits:
C: Replacing CPU-based EC2 instances with GPU-based EC2 instances can
speed up the training of the DeepAR algorithm, as it can leverage the parallel
processing power of GPUs to perform matrix operations and gradient
computations faster than CPUs12. The DeepAR algorithm supports GPU-based
EC2 instances such as ml.p2 and ml.p33.
D: Using multiple training instances can also reduce the training time of the
DeepAR algorithm, as it can distribute the workload across multiple nodes and
perform data parallelism4. The DeepAR algorithm supports distributed training with
multiple CPU-based or GPU-based EC2 instances3.
The other options are not effective or relevant, because they have the following drawbacks:
A: Replacing On-Demand Instances with Spot Instances can reduce the cost of
the training, but not necessarily the time, as Spot Instances are subject to
interruption and availability5. Moreover, the DeepAR algorithm does not support
checkpointing, which means that the training cannot resume from the last saved
state if the Spot Instance is terminated3.
B: Configuring model auto scaling dynamically to adjust the number of instances
automatically is not applicable, as this feature is only available for inference
endpoints, not for training jobs6.
E: Using a pre-trained version of the model and running incremental training is not
possible, as the DeepAR algorithm does not support incremental training or
transfer learning3. The DeepAR algorithm requires a full retraining of the model
whenever new data is added or the hyperparameters are changed7.
References: 1: GPU vs CPU: What Matters Most for Machine Learning? | by Louis (What’s AI)
Bouchard | Towards Data Science
2: How GPUs Accelerate Machine Learning Training | NVIDIA Developer Blog
7: How the DeepAR Algorithm Works - Amazon SageMaker
Question # 5
A manufacturing company has a production line with sensors that collect hundreds ofquality metrics. The company has stored sensor data and manual inspection results in adata lake for several months. To automate quality control, the machine learning team mustbuild an automated mechanism that determines whether the produced goods are goodquality, replacement market quality, or scrap quality based on the manual inspectionresults.Which modeling approach will deliver the MOST accurate prediction of product quality?
A. Amazon SageMaker DeepAR forecasting algorithm B. Amazon SageMaker XGBoost algorithm C. Amazon SageMaker Latent Dirichlet Allocation (LDA) algorithm D. A convolutional neural network (CNN) and ResNet
Answer: D
Explanation: A convolutional neural network (CNN) is a type of deep learning model that
can learn to extract features from images and perform tasks such as classification,
segmentation, and detection1. ResNet is a popular CNN architecture that uses residual
connections to overcome the problem of vanishing gradients and enable very deep
networks2. For the task of predicting product quality based on sensor data, a CNN and
ResNet approach can leverage the spatial structure of the data and learn complex patterns
that distinguish different quality levels.
References:
Convolutional Neural Networks (CNNs / ConvNets)
PyTorch ResNet: The Basics and a Quick Tutorial
Question # 6
A data scientist at a financial services company used Amazon SageMaker to train anddeploy a model that predicts loan defaults. The model analyzes new loan applications andpredicts the risk of loan default. To train the model, the data scientist manually extractedloan data from a database. The data scientist performed the model training anddeployment steps in a Jupyter notebook that is hosted on SageMaker Studio notebooks.The model's prediction accuracy is decreasing over time. Which combination of slept in theMOST operationally efficient way for the data scientist to maintain the model's accuracy?(Select TWO.)
A. Use SageMaker Pipelines to create an automated workflow that extracts fresh data,trains the model, and deploys a new version of the model. B. Configure SageMaker Model Monitor with an accuracy threshold to check for model drift.Initiate an Amazon CloudWatch alarm when the threshold is exceeded. Connect theworkflow in SageMaker Pipelines with the CloudWatch alarm to automatically initiateretraining. C. Store the model predictions in Amazon S3 Create a daily SageMaker Processing jobthat reads the predictions from Amazon S3, checks for changes in model predictionaccuracy, and sends an email notification if a significant change is detected. D. Rerun the steps in the Jupyter notebook that is hosted on SageMaker Studio notebooksto retrain the model and redeploy a new version of the model. E. Export the training and deployment code from the SageMaker Studio notebooks into aPython script. Package the script into an Amazon Elastic Container Service (Amazon ECS)task that an AWS Lambda function can initiate.
Answer: A,B
Explanation:
Option A is correct because SageMaker Pipelines is a service that enables you to
create and manage automated workflows for your machine learning projects. You
can use SageMaker Pipelines to orchestrate the steps of data extraction, model
training, and model deployment in a repeatable and scalable way1.
Option B is correct because SageMaker Model Monitor is a service that monitors
the quality of your models in production and alerts you when there are deviations
in the model quality. You can use SageMaker Model Monitor to set an accuracy
threshold for your model and configure a CloudWatch alarm that triggers when the
threshold is exceeded. You can then connect the alarm to the workflow in
SageMaker Pipelines to automatically initiate retraining and deployment of a new
version of the model2.
Option C is incorrect because it is not the most operationally efficient way to
maintain the model’s accuracy. Creating a daily SageMaker Processing job that
reads the predictions from Amazon S3 and checks for changes in model prediction
accuracy is a manual and time-consuming process. It also requires you to write
custom code to perform the data analysis and send the email notification.
Moreover, it does not automatically retrain and deploy the model when the
accuracy drops.
Option D is incorrect because it is not the most operationally efficient way to
maintain the model’s accuracy. Rerunning the steps in the Jupyter notebook that is
hosted on SageMaker Studio notebooks to retrain the model and redeploy a new
version of the model is a manual and error-prone process. It also requires you to
monitor the model’s performance and initiate the retraining and deployment steps
yourself. Moreover, it does not leverage the benefits of SageMaker Pipelines and
SageMaker Model Monitor to automate and streamline the workflow.
Option E is incorrect because it is not the most operationally efficient way to
maintain the model’s accuracy. Exporting the training and deployment code from
the SageMaker Studio notebooks into a Python script and packaging the script into
an Amazon ECS task that an AWS Lambda function can initiate is a complex and
cumbersome process. It also requires you to manage the infrastructure and
resources for the Amazon ECS task and the AWS Lambda function. Moreover, it
does not leverage the benefits of SageMaker Pipelines and SageMaker Model
Monitor to automate and streamline the workflow. References:
1: SageMaker Pipelines - Amazon SageMaker
2: Monitor data and model quality - Amazon SageMaker
Question # 7
A data scientist uses Amazon SageMaker Data Wrangler to define and performtransformations and feature engineering on historical data. The data scientist saves thetransformations to SageMaker Feature Store.The historical data is periodically uploaded to an Amazon S3 bucket. The data scientistneeds to transform the new historic data and add it to the online feature store The datascientist needs to prepare the .....historic data for training and inference by using nativeintegrations.Which solution will meet these requirements with the LEAST development effort?
A. Use AWS Lambda to run a predefined SageMaker pipeline to perform thetransformations on each new dataset that arrives in the S3 bucket. B. Run an AWS Step Functions step and a predefined SageMaker pipeline to perform thetransformations on each new dalaset that arrives in the S3 bucket C. Use Apache Airflow to orchestrate a set of predefined transformations on each newdataset that arrives in the S3 bucket. D. Configure Amazon EventBridge to run a predefined SageMaker pipeline to perform thetransformations when a new data is detected in the S3 bucket.
Answer: D
Explanation: The best solution is to configure Amazon EventBridge to run a predefined
SageMaker pipeline to perform the transformations when a new data is detected in the S3
bucket. This solution requires the least development effort because it leverages the native
integration between EventBridge and SageMaker Pipelines, which allows you to trigger a
pipeline execution based on an event rule. EventBridge can monitor the S3 bucket for new
data uploads and invoke the pipeline that contains the same transformations and feature
engineering steps that were defined in SageMaker Data Wrangler. The pipeline can then
ingest the transformed data into the online feature store for training and inference.
The other solutions are less optimal because they require more development effort and
additional services. Using AWS Lambda or AWS Step Functions would require writing
custom code to invoke the SageMaker pipeline and handle any errors or retries. Using
Apache Airflow would require setting up and maintaining an Airflow server and DAGs, as
well as integrating with the SageMaker API.
References:
Amazon EventBridge and Amazon SageMaker Pipelines integration
Create a pipeline using a JSON specification
Ingest data into a feature group
Question # 8
A financial services company wants to automate its loan approval process by building amachine learning (ML) model. Each loan data point contains credit history from a thirdpartydata source and demographic information about the customer. Each loan approvalprediction must come with a report that contains an explanation for why the customer wasapproved for a loan or was denied for a loan. The company will use Amazon SageMaker tobuild the model.Which solution will meet these requirements with the LEAST development effort?
A. Use SageMaker Model Debugger to automatically debug the predictions, generate theexplanation, and attach the explanation report. B. Use AWS Lambda to provide feature importance and partial dependence plots. Use theplots to generate and attach the explanation report. C. Use SageMaker Clarify to generate the explanation report. Attach the report to thepredicted results. D. Use custom Amazon Cloud Watch metrics to generate the explanation report. Attach thereport to the predicted results.
Answer: C
Explanation:
The best solution for this scenario is to use SageMaker Clarify to generate the explanation
report and attach it to the predicted results. SageMaker Clarify provides tools to help
explain how machine learning (ML) models make predictions using a model-agnostic
feature attribution approach based on SHAP values. It can also detect and measure
potential bias in the data and the model. SageMaker Clarify can generate explanation
reports during data preparation, model training, and model deployment. The reports include
metrics, graphs, and examples that help understand the model behavior and predictions.
The reports can be attached to the predicted results using the SageMaker SDK or the SageMaker API.
The other solutions are less optimal because they require more development effort and
additional services. Using SageMaker Model Debugger would require modifying the
training script to save the model output tensors and writing custom rules to debug and
explain the predictions. Using AWS Lambda would require writing code to invoke the ML
model, compute the feature importance and partial dependence plots, and generate and
attach the explanation report. Using custom Amazon CloudWatch metrics would require
writing code to publish the metrics, create dashboards, and generate and attach the
explanation report.
References:
Bias Detection and Model Explainability - Amazon SageMaker Clarify - AWS
Amazon SageMaker Clarify Model Explainability
Amazon SageMaker Clarify: Machine Learning Bias Detection and Explainability
A manufacturing company has structured and unstructured data stored in an Amazon S3bucket. A Machine Learning Specialist wants to use SQL to run queries on this data.Which solution requires the LEAST effort to be able to query this data?
A. Use AWS Data Pipeline to transform the data and Amazon RDS to run queries. B. Use AWS Glue to catalogue the data and Amazon Athena to run queries. C. Use AWS Batch to run ETL on the data and Amazon Aurora to run the queries. D. Use AWS Lambda to transform the data and Amazon Kinesis Data Analytics to runqueries.
Answer: B
Explanation: Using AWS Glue to catalogue the data and Amazon Athena to run queries is
the solution that requires the least effort to be able to query the data stored in an Amazon
S3 bucket using SQL. AWS Glue is a service that provides a serverless data integration
platform for data preparation and transformation. AWS Glue can automatically discover,
crawl, and catalogue the data stored in various sources, such as Amazon S3, Amazon
RDS, Amazon Redshift, etc. AWS Glue can also use AWS KMS to encrypt the data at rest
on the Glue Data Catalog and Glue ETL jobs. AWS Glue can handle both structured and
unstructured data, and support various data formats, such as CSV, JSON, Parquet,
etc. AWS Glue can also use built-in or custom classifiers to identify and parse the data
schema and format1 Amazon Athena is a service that provides an interactive query engine
that can run SQL queries directly on data stored in Amazon S3. Amazon Athena can
integrate with AWS Glue to use the Glue Data Catalog as a central metadata repository for
the data sources and tables. Amazon Athena can also use AWS KMS to encrypt the data
at rest on Amazon S3 and the query results. Amazon Athena can query both structured
and unstructured data, and support various data formats, such as CSV, JSON, Parquet,
etc. Amazon Athena can also use partitions and compression to optimize the query
performance and reduce the query cost23
The other options are not valid or require more effort to query the data stored in an Amazon
S3 bucket using SQL. Using AWS Data Pipeline to transform the data and Amazon RDS to
run queries is not a good option, as it involves moving the data from Amazon S3 to
Amazon RDS, which can incur additional time and cost. AWS Data Pipeline is a service that can orchestrate and automate data movement and transformation across various AWS
services and on-premises data sources. AWS Data Pipeline can be integrated with
Amazon EMR to run ETL jobs on the data stored in Amazon S3. Amazon RDS is a service
that provides a managed relational database service that can run various database
engines, such as MySQL, PostgreSQL, Oracle, etc. Amazon RDS can use AWS KMS to
encrypt the data at rest and in transit. Amazon RDS can run SQL queries on the data
stored in the database tables45 Using AWS Batch to run ETL on the data and Amazon
Aurora to run the queries is not a good option, as it also involves moving the data from
Amazon S3 to Amazon Aurora, which can incur additional time and cost. AWS Batch is a
service that can run batch computing workloads on AWS. AWS Batch can be integrated
with AWS Lambda to trigger ETL jobs on the data stored in Amazon S3. Amazon Aurora is
a service that provides a compatible and scalable relational database engine that can run
MySQL or PostgreSQL. Amazon Aurora can use AWS KMS to encrypt the data at rest and
in transit. Amazon Aurora can run SQL queries on the data stored in the database tables.
Using AWS Lambda to transform the data and Amazon Kinesis Data Analytics to run
queries is not a good option, as it is not suitable for querying data stored in Amazon S3
using SQL. AWS Lambda is a service that can run serverless functions on AWS. AWS
Lambda can be integrated with Amazon S3 to trigger data transformation functions on the
data stored in Amazon S3. Amazon Kinesis Data Analytics is a service that can analyze
streaming data using SQL or Apache Flink. Amazon Kinesis Data Analytics can be
integrated with Amazon Kinesis Data Streams or Amazon Kinesis Data Firehose to ingest
streaming data sources, such as web logs, social media, IoT devices, etc. Amazon Kinesis
Data Analytics is not designed for querying data stored in Amazon S3 using SQL.
Question # 10
A data scientist has been running an Amazon SageMaker notebook instance for a fewweeks. During this time, a new version of Jupyter Notebook was released along withadditional software updates. The security team mandates that all running SageMakernotebook instances use the latest security and software updates provided by SageMaker.How can the data scientist meet these requirements?
A. Call the CreateNotebookInstanceLifecycleConfig API operation B. Create a new SageMaker notebook instance and mount the Amazon Elastic Block Store(Amazon EBS) volume from the original instance C. Stop and then restart the SageMaker notebook instance D. Call the UpdateNotebookInstanceLifecycleConfig API operation
Answer: C
Explanation: The correct solution for updating the software on a SageMaker notebook
instance is to stop and then restart the notebook instance. This will automatically apply the
latest security and software updates provided by SageMaker1
The other options are incorrect because they either do not update the software or require
unnecessary steps. For example:
Option A calls the CreateNotebookInstanceLifecycleConfig API operation. This
operation creates a lifecycle configuration, which is a set of shell scripts that run
when a notebook instance is created or started. A lifecycle configuration can be
used to customize the notebook instance, such as installing additional libraries or
packages. However, it does not update the software on the notebook instance2
Option B creates a new SageMaker notebook instance and mounts the Amazon
Elastic Block Store (Amazon EBS) volume from the original instance. This option
will create a new notebook instance with the latest software, but it will also incur
additional costs and require manual steps to transfer the data and settings from
the original instance3
Option D calls the UpdateNotebookInstanceLifecycleConfig API operation. This
operation updates an existing lifecycle configuration. As explained in option A, a
lifecycle configuration does not update the software on the notebook instance4
A large company has developed a B1 application that generates reports and dashboardsusing data collected from various operational metrics The company wants to provideexecutives with an enhanced experience so they can use natural language to get data fromthe reports The company wants the executives to be able ask questions using written andspoken interlacesWhich combination of services can be used to build this conversational interface? (SelectTHREE)
A. Alexa for Business B. Amazon Connect C. Amazon Lex D. Amazon Poly E. Amazon Comprehend F. Amazon Transcribe
Answer: C,E,F
Explanation:
To build a conversational interface that can use natural language to get data from
the reports, the company can use a combination of services that can handle both
written and spoken inputs, understand the user’s intent and query, and extract the
relevant information from the reports. The services that can be used for this
purpose are:
Therefore, the company can use the following architecture to build the
conversational interface:
References:
What Is Amazon Lex?
What Is Amazon Comprehend?
What Is Amazon Transcribe?
Question # 12
A manufacturing company needs to identify returned smartphones that have beendamaged by moisture. The company has an automated process that produces 2.000diagnostic values for each phone. The database contains more than five million phoneevaluations. The evaluation process is consistent, and there are no missing values in thedata. A machine learning (ML) specialist has trained an Amazon SageMaker linear learnerML model to classify phones as moisture damaged or not moisture damaged by using allavailable features. The model's F1 score is 0.6.What changes in model training would MOST likely improve the model's F1 score? (SelectTWO.)
A. Continue to use the SageMaker linear learner algorithm. Reduce the number of featureswith the SageMaker principal component analysis (PCA) algorithm. B. Continue to use the SageMaker linear learner algorithm. Reduce the number of featureswith the scikit-iearn multi-dimensional scaling (MDS) algorithm. C. Continue to use the SageMaker linear learner algorithm. Set the predictor type toregressor. D. Use the SageMaker k-means algorithm with k of less than 1.000 to train the model E. Use the SageMaker k-nearest neighbors (k-NN) algorithm. Set a dimension reductiontarget of less than 1,000 to train the model.
Answer: A,E
Explanation:
Option A is correct because reducing the number of features with the SageMaker
PCA algorithm can help remove noise and redundancy from the data, and improve
the model’s performance. PCA is a dimensionality reduction technique that
transforms the original features into a smaller set of linearly uncorrelated features
called principal components. The SageMaker linear learner algorithm supports
PCA as a built-in feature transformation option.
Option E is correct because using the SageMaker k-NN algorithm with a
dimension reduction target of less than 1,000 can help the model learn from the
similarity of the data points, and improve the model’s performance. k-NN is a nonparametric
algorithm that classifies an input based on the majority vote of its k
nearest neighbors in the feature space. The SageMaker k-NN algorithm supports
dimension reduction as a built-in feature transformation option.
Option B is incorrect because using the scikit-learn MDS algorithm to reduce the
number of features is not a feasible option, as MDS is a computationally expensive
technique that does not scale well to large datasets. MDS is a dimensionality
reduction technique that tries to preserve the pairwise distances between the
original data points in a lower-dimensional space.
Option C is incorrect because setting the predictor type to regressor would change
the model’s objective from classification to regression, which is not suitable for the given problem. A regressor model would output a continuous value instead of a
binary label for each phone.
Option D is incorrect because using the SageMaker k-means algorithm with k of
less than 1,000 would not help the model classify the phones, as k-means is a
clustering algorithm that groups the data points into k clusters based on their
similarity, without using any labels. A clustering model would not output a binary
A beauty supply store wants to understand some characteristics of visitors to the store. Thestore has security video recordings from the past several years. The store wants togenerate a report of hourly visitors from the recordings. The report should group visitors byhair style and hair color.Which solution will meet these requirements with the LEAST amount of effort?
A. Use an object detection algorithm to identify a visitor’s hair in video frames. Pass theidentified hair to an ResNet-50 algorithm to determine hair style and hair color. B. Use an object detection algorithm to identify a visitor’s hair in video frames. Pass theidentified hair to an XGBoost algorithm to determine hair style and hair color. C. Use a semantic segmentation algorithm to identify a visitor’s hair in video frames. Passthe identified hair to an ResNet-50 algorithm to determine hair style and hair color. D. Use a semantic segmentation algorithm to identify a visitor’s hair in video frames. Passthe identified hair to an XGBoost algorithm to determine hair style and hair.
Answer: C
Explanation: The solution that will meet the requirements with the least amount of effort is
to use a semantic segmentation algorithm to identify a visitor’s hair in video frames, and
pass the identified hair to an ResNet-50 algorithm to determine hair style and hair color.
This solution can leverage the existing Amazon SageMaker algorithms and frameworks to
perform the tasks of hair segmentation and classification.
Semantic segmentation is a computer vision technique that assigns a class label to every
pixel in an image, such that pixels with the same label share certain characteristics.
Semantic segmentation can be used to identify and isolate different objects or regions in an
image, such as a visitor’s hair in a video frame. Amazon SageMaker provides a built-in
semantic segmentation algorithm that can train and deploy models for semantic
segmentation tasks. The algorithm supports three state-of-the-art network architectures:
Fully Convolutional Network (FCN), Pyramid Scene Parsing Network (PSP), and DeepLab
v3. The algorithm can also use pre-trained or randomly initialized ResNet-50 or ResNet-
101 as the backbone network. The algorithm can be trained using P2/P3 type Amazon EC2
instances in single machine configurations1.
ResNet-50 is a convolutional neural network that is 50 layers deep and can classify images
into 1000 object categories. ResNet-50 is trained on more than a million images from the
ImageNet database and can achieve high accuracy on various image recognition tasks.
ResNet-50 can be used to determine hair style and hair color from the segmented hair
regions in the video frames. Amazon SageMaker provides a built-in image classification
algorithm that can use ResNet-50 as the network architecture. The algorithm can also
perform transfer learning by fine-tuning the pre-trained ResNet-50 model with new
data. The algorithm can be trained using P2/P3 type Amazon EC2 instances in single or
multiple machine configurations2.
The other options are either less effective or more complex to implement. Using an object
detection algorithm to identify a visitor’s hair in video frames would not segment the hair at
the pixel level, but only draw bounding boxes around the hair regions. This could result in
inaccurate or incomplete hair segmentation, especially if the hair is occluded or has
irregular shapes. Using an XGBoost algorithm to determine hair style and hair color would
require transforming the segmented hair images into numerical features, which could lose
some information or introduce noise. XGBoost is also not designed for image classification
tasks, and may not achieve high accuracy or performance.
Each morning, a data scientist at a rental car company creates insights about the previousday’s rental car reservation demands. The company needs to automate this process bystreaming the data to Amazon S3 in near real time. The solution must detect high-demandrental cars at each of the company’s locations. The solution also must create avisualization dashboard that automatically refreshes with the most recent data.Which solution will meet these requirements with the LEAST development time?
A. Use Amazon Kinesis Data Firehose to stream the reservation data directly to AmazonS3. Detect high-demand outliers by using Amazon QuickSight ML Insights. Visualize the data in QuickSight. B. Use Amazon Kinesis Data Streams to stream the reservation data directly to AmazonS3. Detect high-demand outliers by using the Random Cut Forest (RCF) trained model inAmazon SageMaker. Visualize the data in Amazon QuickSight. C. Use Amazon Kinesis Data Firehose to stream the reservation data directly to AmazonS3. Detect high-demand outliers by using the Random Cut Forest (RCF) trained model inAmazon SageMaker. Visualize the data in Amazon QuickSight. D. Use Amazon Kinesis Data Streams to stream the reservation data directly to AmazonS3. Detect high-demand outliers by using Amazon QuickSight ML Insights. Visualize thedata in QuickSight.
Answer: A
Explanation: The solution that will meet the requirements with the least development time
is to use Amazon Kinesis Data Firehose to stream the reservation data directly to Amazon
S3, detect high-demand outliers by using Amazon QuickSight ML Insights, and visualize
the data in QuickSight. This solution does not require any custom development or ML
domain expertise, as it leverages the built-in features of QuickSight ML Insights to
automatically run anomaly detection and generate insights on the streaming data.
QuickSight ML Insights can also create a visualization dashboard that automatically
refreshes with the most recent data, and allows the data scientist to explore the outliers
and their key drivers. References:
1: Simplify and automate anomaly detection in streaming data with Amazon
Lookout for Metrics | AWS Machine Learning Blog
2: Detecting outliers with ML-powered anomaly detection - Amazon QuickSight
3: Real-time Outlier Detection Over Streaming Data - IEEE Xplore
4: Towards a deep learning-based outlier detection … - Journal of Big Data
Question # 15
A company wants to conduct targeted marketing to sell solar panels to homeowners. Thecompany wants to use machine learning (ML) technologies to identify which housesalready have solar panels. The company has collected 8,000 satellite images as training data and will use Amazon SageMaker Ground Truth to label the data.The company has a small internal team that is working on the project. The internal teamhas no ML expertise and no ML experience.Which solution will meet these requirements with the LEAST amount of effort from theinternal team?
A. Set up a private workforce that consists of the internal team. Use the private workforceand the SageMaker Ground Truth active learning feature to label the data. Use AmazonRekognition Custom Labels for model training and hosting. B. Set up a private workforce that consists of the internal team. Use the private workforceto label the data. Use Amazon Rekognition Custom Labels for model training and hosting. C. Set up a private workforce that consists of the internal team. Use the private workforceand the SageMaker Ground Truth active learning feature to label the data. Use theSageMaker Object Detection algorithm to train a model. Use SageMaker batch transformfor inference. D. Set up a public workforce. Use the public workforce to label the data. Use theSageMaker Object Detection algorithm to train a model. Use SageMaker batch transformfor inference.
Answer: A
Explanation: The solution A will meet the requirements with the least amount of effort
from the internal team because it uses Amazon SageMaker Ground Truth and Amazon
Rekognition Custom Labels, which are fully managed services that can provide the desired
functionality. The solution A involves the following steps:
Set up a private workforce that consists of the internal team. Use the private
workforce and the SageMaker Ground Truth active learning feature to label the
data. Amazon SageMaker Ground Truth is a service that can create high-quality
training datasets for machine learning by using human labelers. A private
workforce is a group of labelers that the company can manage and control. The
internal team can use the private workforce to label the satellite images as having
solar panels or not. The SageMaker Ground Truth active learning feature can
reduce the labeling effort by using a machine learning model to automatically label
the easy examples and only send the difficult ones to the human labelers1.
Use Amazon Rekognition Custom Labels for model training and hosting. Amazon
Rekognition Custom Labels is a service that can train and deploy custom machine
learning models for image analysis. Amazon Rekognition Custom Labels can use
the labeled data from SageMaker Ground Truth to train a model that can detect
solar panels in satellite images. Amazon Rekognition Custom Labels can also host
the model and provide an API endpoint for inference2.
The other options are not suitable because:
Option B: Setting up a private workforce that consists of the internal team, using
the private workforce to label the data, and using Amazon Rekognition Custom
Labels for model training and hosting will incur more effort from the internal team than using SageMaker Ground Truth active learning feature. The internal team will
have to label all the images manually, without the assistance of the machine
learning model that can automate some of the labeling tasks1.
Option C: Setting up a private workforce that consists of the internal team, using
the private workforce and the SageMaker Ground Truth active learning feature to
label the data, using the SageMaker Object Detection algorithm to train a model,
and using SageMaker batch transform for inference will incur more operational
overhead than using Amazon Rekognition Custom Labels. The company will have
to manage the SageMaker training job, the model artifact, and the batch transform
job. Moreover, SageMaker batch transform is not suitable for real-time inference,
as it processes the data in batches and stores the results in Amazon S33.
Option D: Setting up a public workforce, using the public workforce to label the
data, using the SageMaker Object Detection algorithm to train a model, and using
SageMaker batch transform for inference will incur more operational overhead and
cost than using a private workforce and Amazon Rekognition Custom Labels. A
public workforce is a group of labelers from Amazon Mechanical Turk, a
crowdsourcing marketplace. The company will have to pay the public workforce for
each labeling task, and it may not have full control over the quality and security of
the labeled data. The company will also have to manage the SageMaker training
job, the model artifact, and the batch transform job, as explained in option C4.
References:
1: Amazon SageMaker Ground Truth
2: Amazon Rekognition Custom Labels
3: Amazon SageMaker Object Detection
4: Amazon Mechanical Turk
Question # 16
A finance company needs to forecast the price of a commodity. The company has compileda dataset of historical daily prices. A data scientist must train various forecasting models on80% of the dataset and must validate the efficacy of those models on the remaining 20% ofthe dataset.What should the data scientist split the dataset into a training dataset and a validationdataset to compare model performance?
A. Pick a date so that 80% to the data points precede the date Assign that group of datapoints as the training dataset. Assign all the remaining data points to the validation dataset. B. Pick a date so that 80% of the data points occur after the date. Assign that group of datapoints as the training dataset. Assign all the remaining data points to the validation dataset. C. Starting from the earliest date in the dataset. pick eight data points for the trainingdataset and two data points for the validation dataset. Repeat this stratified sampling untilno data points remain. D. Sample data points randomly without replacement so that 80% of the data points are inthe training dataset. Assign all the remaining data points to the validation dataset.
Answer: A
Explanation: A Comprehensive Explanation: The best way to split the dataset into a
training dataset and a validation dataset is to pick a date so that 80% of the data points
precede the date and assign that group of data points as the training dataset. This method
preserves the temporal order of the data and ensures that the validation dataset reflects
the most recent trends and patterns in the commodity price. This is important for
forecasting models that rely on time series analysis and sequential data. The other
methods would either introduce bias or lose information by ignoring the temporal structure
of the data.
References:
Time Series Forecasting - Amazon SageMaker
Time Series Splitting - scikit-learn
Time Series Forecasting - Towards Data Science
Question # 17
A chemical company has developed several machine learning (ML) solutions to identifychemical process abnormalities. The time series values of independent variables and thelabels are available for the past 2 years and are sufficient to accurately model the problem.The regular operation label is marked as 0. The abnormal operation label is marked as 1 .Process abnormalities have a significant negative effect on the companys profits. Thecompany must avoid these abnormalities.Which metrics will indicate an ML solution that will provide the GREATEST probability ofdetecting an abnormality?
A. Precision = 0.91Recall = 0.6 B. Precision = 0.61Recall = 0.98 C. Precision = 0.7Recall = 0.9 D. Precision = 0.98Recall = 0.8
Answer: B
Explanation: The metrics that will indicate an ML solution that will provide the greatest
probability of detecting an abnormality are precision and recall. Precision is the ratio of true
positives (TP) to the total number of predicted positives (TP + FP), where FP is false
positives. Recall is the ratio of true positives (TP) to the total number of actual positives (TP
+ FN), where FN is false negatives. A high precision means that the ML solution has a low
rate of false alarms, while a high recall means that the ML solution has a high rate of true
detections. For the chemical company, the goal is to avoid process abnormalities, which
are marked as 1 in the labels. Therefore, the company needs an ML solution that has a
high recall for the positive class, meaning that it can detect most of the abnormalities and
minimize the false negatives. Among the four options, option B has the highest recall for
the positive class, which is 0.98. This means that the ML solution can detect 98% of the
abnormalities and miss only 2%. Option B also has a reasonable precision for the positive
class, which is 0.61. This means that the ML solution has a false alarm rate of 39%, which
may be acceptable for the company, depending on the cost and benefit analysis. The other options have lower recall for the positive class, which means that they have higher false
negative rates, which can be more detrimental for the company than false positive rates.
3: AWS Whitepaper - An Overview of Machine Learning on AWS
4: Precision and recall
Question # 18
A machine learning (ML) specialist uploads 5 TB of data to an Amazon SageMaker Studioenvironment. The ML specialist performs initial data cleansing. Before the ML specialistbegins to train a model, the ML specialist needs to create and view an analysis report thatdetails potential bias in the uploaded data.Which combination of actions will meet these requirements with the LEAST operationaloverhead? (Choose two.)
A. Use SageMaker Clarify to automatically detect data bias B. Turn on the bias detection option in SageMaker Ground Truth to automatically analyzedata features. C. Use SageMaker Model Monitor to generate a bias drift report. D. Configure SageMaker Data Wrangler to generate a bias report. E. Use SageMaker Experiments to perform a data check
Answer: A,D
Explanation: The combination of actions that will meet the requirements with the least
operational overhead is to use SageMaker Clarify to automatically detect data bias and to
configure SageMaker Data Wrangler to generate a bias report. SageMaker Clarify is a
feature of Amazon SageMaker that provides machine learning (ML) developers with tools
to gain greater insights into their ML training data and models. SageMaker Clarify can
detect potential bias during data preparation, after model training, and in your deployed
model. For instance, you can check for bias related to age in your dataset or in your trained
model and receive a detailed report that quantifies different types of potential bias1.
SageMaker Data Wrangler is another feature of Amazon SageMaker that enables you to
prepare data for machine learning (ML) quickly and easily. You can use SageMaker Data
Wrangler to identify potential bias during data preparation without having to write your own
code. You specify input features, such as gender or age, and SageMaker Data Wrangler
runs an analysis job to detect potential bias in those features. SageMaker Data Wrangler
then provides a visual report with a description of the metrics and measurements of
potential bias so that you can identify steps to remediate the bias2. The other actions either
require more customization (such as using SageMaker Model Monitor or SageMaker
Experiments) or do not meet the requirement of detecting data bias (such as using
SageMaker Ground Truth). References:
1: Bias Detection and Model Explainability – Amazon Web Services
2: Amazon SageMaker Data Wrangler – Amazon Web Services
Question # 19
A company uses sensors on devices such as motor engines and factory machines tomeasure parameters, temperature and pressure. The company wants to use the sensordata to predict equipment malfunctions and reduce services outages.The Machine learning (ML) specialist needs to gather the sensors data to train a model topredict device malfunctions The ML spoctafst must ensure that the data does not containoutliers before training the ..el.What can the ML specialist meet these requirements with the LEAST operationaloverhead?
A. Load the data into an Amazon SagcMaker Studio notebook. Calculate the first and thirdquartile Use a SageMaker Data Wrangler data (low to remove only values that are outside of those quartiles. B. Use an Amazon SageMaker Data Wrangler bias report to find outliers in the dataset Usea Data Wrangler data flow to remove outliers based on the bias report. C. Use an Amazon SageMaker Data Wrangler anomaly detection visualization to findoutliers in the dataset. Add a transformation to a Data Wrangler data flow to removeoutliers. D. Use Amazon Lookout for Equipment to find and remove outliers from the dataset.
Answer: C
Explanation: Amazon SageMaker Data Wrangler is a tool that helps data scientists and
ML developers to prepare data for ML. One of the features of Data Wrangler is the anomaly
detection visualization, which uses an unsupervised ML algorithm to identify outliers in the
dataset based on statistical properties. The ML specialist can use this feature to quickly
explore the sensor data and find any anomalous values that may affect the model
performance. The ML specialist can then add a transformation to a Data Wrangler data
flow to remove the outliers from the dataset. The data flow can be exported as a script or a
pipeline to automate the data preparation process. This option requires the least
operational overhead compared to the other options.
References:
Amazon SageMaker Data Wrangler - Amazon Web Services (AWS)
The chief editor for a product catalog wants the research and development team to build amachine learning system that can be used to detect whether or not individuals in acollection of images are wearing the company's retail brand. The team has a set of trainingdata.Which machine learning algorithm should the researchers use that BEST meets theirrequirements?
A. Latent Dirichlet Allocation (LDA) B. Recurrent neural network (RNN) C. K-means D. Convolutional neural network (CNN)
Answer: D
Explanation: The problem of detecting whether or not individuals in a collection of images
are wearing the company’s retail brand is an example of image recognition, which is a type
of machine learning task that identifies and classifies objects in an image. Convolutional
neural networks (CNNs) are a type of machine learning algorithm that are well-suited for
image recognition, as they can learn to extract features from images and handle variations
in size, shape, color, and orientation of the objects. CNNs consist of multiple layers that
perform convolution, pooling, and activation operations on the input images, resulting in a
high-level representation that can be used for classification or detection. Therefore, option
D is the best choice for the machine learning algorithm that meets the requirements of the
chief editor.
Option A is incorrect because latent Dirichlet allocation (LDA) is a type of machine learning
algorithm that is used for topic modeling, which is a task that discovers the hidden themes
or topics in a collection of text documents. LDA is not suitable for image recognition, as it
does not preserve the spatial information of the pixels. Option B is incorrect because
recurrent neural networks (RNNs) are a type of machine learning algorithm that are used
for sequential data, such as text, speech, or time series. RNNs can learn from the temporal
dependencies and patterns in the input data, and generate outputs that depend on the
previous states. RNNs are not suitable for image recognition, as they do not capture the
spatial dependencies and patterns in the input images. Option C is incorrect because kmeans
is a type of machine learning algorithm that is used for clustering, which is a task
that groups similar data points together based on their features. K-means is not suitable for
image recognition, as it does not perform classification or detection of the objects in the
images.
References:
Image Recognition Software - ML Image & Video Analysis - Amazon …
Image classification and object detection using Amazon Rekognition … AWS Amazon Rekognition - Deep Learning Face and Image Recognition …
GitHub - awslabs/aws-ai-solution-kit: Machine Learning APIs for common …
Meet iNaturalist, an AWS-powered nature app that helps you identify …
Question # 21
A wildlife research company has a set of images of lions and cheetahs. The companycreated a dataset of the images. The company labeled each image with a binary label thatindicates whether an image contains a lion or cheetah. The company wants to train amodel to identify whether new images contain a lion or cheetah..... Dh Amazon SageMaker algorithm will meet this requirement?
A. XGBoost B. Image Classification - TensorFlow C. Object Detection - TensorFlow D. Semantic segmentation - MXNet
Answer: B
Explanation: The best Amazon SageMaker algorithm for this task is Image Classification -
TensorFlow. This algorithm is a supervised learning algorithm that supports transfer
learning with many pretrained models from the TensorFlow Hub. Transfer learning allows
the company to fine-tune one of the available pretrained models on their own dataset, even
if a large amount of image data is not available. The image classification algorithm takes an
image as input and outputs a probability for each provided class label. The company can
choose from a variety of models, such as MobileNet, ResNet, or Inception, depending on
their accuracy and speed requirements. The algorithm also supports distributed training,
Amazon SageMaker Provides New Built-in TensorFlow Image Classification
Algorithm
Image Classification with ResNet :: Amazon SageMaker Workshop
Image classification on Amazon SageMaker | by Julien Simon - Medium
Question # 22
A company’s data scientist has trained a new machine learning model that performs betteron test data than the company’s existing model performs in the production environment.The data scientist wants to replace the existing model that runs on an Amazon SageMakerendpoint in the production environment. However, the company is concerned that the newmodel might not work well on the production environment data.The data scientist needs to perform A/B testing in the production environment to evaluatewhether the new model performs well on production environment data.Which combination of steps must the data scientist take to perform the A/B testing?(Choose two.)
A. Create a new endpoint configuration that includes a production variant for each of thetwo models. B. Create a new endpoint configuration that includes two target variants that point todifferent endpoints. C. Deploy the new model to the existing endpoint. D. Update the existing endpoint to activate the new model. E. Update the existing endpoint to use the new endpoint configuration.
Answer: A,E
Explanation: The combination of steps that the data scientist must take to perform the A/B
testing are to create a new endpoint configuration that includes a production variant for
each of the two models, and update the existing endpoint to use the new endpoint
configuration. This approach will allow the data scientist to deploy both models on the same endpoint and split the inference traffic between them based on a specified
distribution.
Amazon SageMaker is a fully managed service that provides developers and data
scientists the ability to quickly build, train, and deploy machine learning models. Amazon
SageMaker supports A/B testing on machine learning models by allowing the data scientist
to run multiple production variants on an endpoint. A production variant is a version of a
model that is deployed on an endpoint. Each production variant has a name, a machine
learning model, an instance type, an initial instance count, and an initial weight. The initial
weight determines the percentage of inference requests that the variant will handle. For
example, if there are two variants with weights of 0.5 and 0.5, each variant will handle 50%
of the requests. The data scientist can use production variants to test models that have
been trained using different training datasets, algorithms, and machine learning
frameworks; test how they perform on different instance types; or a combination of all of the
above1.
To perform A/B testing on machine learning models, the data scientist needs to create a
new endpoint configuration that includes a production variant for each of the two models.
An endpoint configuration is a collection of settings that define the properties of an
endpoint, such as the name, the production variants, and the data capture configuration.
The data scientist can use the Amazon SageMaker console, the AWS CLI, or the AWS
SDKs to create a new endpoint configuration. The data scientist needs to specify the name,
model name, instance type, initial instance count, and initial variant weight for each
production variant in the endpoint configuration2.
After creating the new endpoint configuration, the data scientist needs to update the
existing endpoint to use the new endpoint configuration. Updating an endpoint is the
process of deploying a new endpoint configuration to an existing endpoint. Updating an
endpoint does not affect the availability or scalability of the endpoint, as Amazon
SageMaker creates a new endpoint instance with the new configuration and switches the
DNS record to point to the new instance when it is ready. The data scientist can use the
Amazon SageMaker console, the AWS CLI, or the AWS SDKs to update an endpoint. The
data scientist needs to specify the name of the endpoint and the name of the new endpoint
configuration to update the endpoint3.
The other options are either incorrect or unnecessary. Creating a new endpoint
configuration that includes two target variants that point to different endpoints is not
possible, as target variants are only used to invoke a specific variant on an endpoint, not to
define an endpoint configuration. Deploying the new model to the existing endpoint would
replace the existing model, not run it side-by-side with the new model. Updating the
existing endpoint to activate the new model is not a valid operation, as there is no
activation parameter for an endpoint.
References:
1: A/B Testing ML models in production using Amazon SageMaker | AWS Machine
Learning Blog 2: Create an Endpoint Configuration - Amazon SageMaker
3: Update an Endpoint - Amazon SageMake
Question # 23
A data science team is working with a tabular dataset that the team stores in Amazon S3.The team wants to experiment with different feature transformations such as categoricalfeature encoding. Then the team wants to visualize the resulting distribution of the dataset.After the team finds an appropriate set of feature transformations, the team wants toautomate the workflow for feature transformations.Which solution will meet these requirements with the MOST operational efficiency?
A. Use Amazon SageMaker Data Wrangler preconfigured transformations to explorefeature transformations. Use SageMaker Data Wrangler templates for visualization. Exportthe feature processing workflow to a SageMaker pipeline for automation. B. Use an Amazon SageMaker notebook instance to experiment with different featuretransformations. Save the transformations to Amazon S3. Use Amazon QuickSight forvisualization. Package the feature processing steps into an AWS Lambda function forautomation. C. Use AWS Glue Studio with custom code to experiment with different featuretransformations. Save the transformations to Amazon S3. Use Amazon QuickSight forvisualization. Package the feature processing steps into an AWS Lambda function forautomation. D. Use Amazon SageMaker Data Wrangler preconfigured transformations to experimentwith different feature transformations. Save the transformations to Amazon S3. UseAmazon QuickSight for visualzation. Package each feature transformation step into aseparate AWS Lambda function. Use AWS Step Functions for workflow automation.
Answer: A
Explanation: The solution A will meet the requirements with the most operational
efficiency because it uses Amazon SageMaker Data Wrangler, which is a service that
simplifies the process of data preparation and feature engineering for machine learning.
The solution A involves the following steps:
Use Amazon SageMaker Data Wrangler preconfigured transformations to explore
feature transformations. Amazon SageMaker Data Wrangler provides a visual
interface that allows data scientists to apply various transformations to their tabular
data, such as encoding categorical features, scaling numerical features, imputing
missing values, and more. Amazon SageMaker Data Wrangler also supports
custom transformations using Python code or SQL queries1.
Use SageMaker Data Wrangler templates for visualization. Amazon SageMaker
Data Wrangler also provides a set of templates that can generate visualizations of
the data, such as histograms, scatter plots, box plots, and more. These
visualizations can help data scientists to understand the distribution and
characteristics of the data, and to compare the effects of different feature
transformations1.
Export the feature processing workflow to a SageMaker pipeline for automation.
Amazon SageMaker Data Wrangler can export the feature processing workflow as
a SageMaker pipeline, which is a service that orchestrates and automates
machine learning workflows. A SageMaker pipeline can run the feature processing
steps as a preprocessing step, and then feed the output to a training step or an
inference step. This can reduce the operational overhead of managing the feature
processing workflow and ensure its consistency and reproducibility2.
The other options are not suitable because:
Option B: Using an Amazon SageMaker notebook instance to experiment with
different feature transformations, saving the transformations to Amazon S3, using
Amazon QuickSight for visualization, and packaging the feature processing steps
into an AWS Lambda function for automation will incur more operational overhead
than using Amazon SageMaker Data Wrangler. The data scientist will have to
write the code for the feature transformations, the data storage, the data
visualization, and the Lambda function. Moreover, AWS Lambda has limitations on
the execution time, memory size, and package size, which may not be sufficient
for complex feature processing tasks3.
Option C: Using AWS Glue Studio with custom code to experiment with different
feature transformations, saving the transformations to Amazon S3, using Amazon
QuickSight for visualization, and packaging the feature processing steps into an
AWS Lambda function for automation will incur more operational overhead than
using Amazon SageMaker Data Wrangler. AWS Glue Studio is a visual interface
that allows data engineers to create and run extract, transform, and load (ETL)
jobs on AWS Glue. However, AWS Glue Studio does not provide preconfigured
transformations or templates for feature engineering or data visualization. The data
scientist will have to write custom code for these tasks, as well as for the Lambda
function. Moreover, AWS Glue Studio is not integrated with SageMaker pipelines,
and it may not be optimized for machine learning workflows4.
Option D: Using Amazon SageMaker Data Wrangler preconfigured
transformations to experiment with different feature transformations, saving the
transformations to Amazon S3, using Amazon QuickSight for visualization, packaging each feature transformation step into a separate AWS Lambda function,
and using AWS Step Functions for workflow automation will incur more operational
overhead than using Amazon SageMaker Data Wrangler. The data scientist will
have to create and manage multiple AWS Lambda functions and AWS Step
Functions, which can increase the complexity and cost of the solution. Moreover,
AWS Lambda and AWS Step Functions may not be compatible with SageMaker
pipelines, and they may not be optimized for machine learning workflows5.
References:
1: Amazon SageMaker Data Wrangler
2: Amazon SageMaker Pipelines
3: AWS Lambda
4: AWS Glue Studio
5: AWS Step Functions
Question # 24
A Machine Learning Specialist is training a model to identify the make and model ofvehicles in images The Specialist wants to use transfer learning and an existing modeltrained on images of general objects The Specialist collated a large custom dataset ofpictures containing different vehicle makes and models.What should the Specialist do to initialize the model to re-train it with the custom data?
A. Initialize the model with random weights in all layers including the last fully connectedlayer B. Initialize the model with pre-trained weights in all layers and replace the last fullyconnected layer. C. Initialize the model with random weights in all layers and replace the last fully connectedlayer D. Initialize the model with pre-trained weights in all layers including the last fully connectedlayer
Answer: B
Explanation: Transfer learning is a technique that allows us to use a model trained for a
certain task as a starting point for a machine learning model for a different task. For image
classification, a common practice is to use a pre-trained model that was trained on a large
and general dataset, such as ImageNet, and then customize it for the specific task. One
way to customize the model is to replace the last fully connected layer, which is responsible
for the final classification, with a new layer that has the same number of units as the
number of classes in the new task. This way, the model can leverage the features learned
by the previous layers, which are generic and useful for many image recognition tasks, and
learn to map them to the new classes. The new layer can be initialized with random
weights, and the rest of the model can be initialized with the pre-trained weights. This
method is also known as feature extraction, as it extracts meaningful features from the pretrained
model and uses them for the new task. References:
Transfer learning and fine-tuning
Deep transfer learning for image classification: a survey
Question # 25
A retail company is ingesting purchasing records from its network of 20,000 stores toAmazon S3 by using Amazon Kinesis Data Firehose. The company uses a small, serverbasedapplication in each store to send the data to AWS over the internet. The companyuses this data to train a machine learning model that is retrained each day. The company'sdata science team has identified existing attributes on these records that could becombined to create an improved model.Which change will create the required transformed records with the LEAST operationaloverhead?
A. Create an AWS Lambda function that can transform the incoming records. Enable datatransformation on the ingestion Kinesis Data Firehose delivery stream. Use the Lambdafunction as the invocation target. B. Deploy an Amazon EMR cluster that runs Apache Spark and includes the transformationlogic. Use Amazon EventBridge (Amazon CloudWatch Events) to schedule an AWS Lambda function to launch the cluster each day and transform the records that accumulatein Amazon S3. Deliver the transformed records to Amazon S3. C. Deploy an Amazon S3 File Gateway in the stores. Update the in-store software todeliver data to the S3 File Gateway. Use a scheduled daily AWS Glue job to transform thedata that the S3 File Gateway delivers to Amazon S3. D. Launch a fleet of Amazon EC2 instances that include the transformation logic. Configurethe EC2 instances with a daily cron job to transform the records that accumulate in AmazonS3. Deliver the transformed records to Amazon S3.
Answer: A
Explanation: The solution A will create the required transformed records with the least
operational overhead because it uses AWS Lambda and Amazon Kinesis Data Firehose,
which are fully managed services that can provide the desired functionality. The solution A
involves the following steps:
Create an AWS Lambda function that can transform the incoming records. AWS
Lambda is a service that can run code without provisioning or managing
servers. AWS Lambda can execute the transformation logic on the purchasing
records and add the new attributes to the records1.
Enable data transformation on the ingestion Kinesis Data Firehose delivery
stream. Use the Lambda function as the invocation target. Amazon Kinesis Data
Firehose is a service that can capture, transform, and load streaming data into
AWS data stores. Amazon Kinesis Data Firehose can enable data transformation
and invoke the Lambda function to process the incoming records before delivering
them to Amazon S3. This can reduce the operational overhead of managing the
transformation process and the data storage2.
The other options are not suitable because:
Option B: Deploying an Amazon EMR cluster that runs Apache Spark and includes
the transformation logic, using Amazon EventBridge (Amazon CloudWatch
Events) to schedule an AWS Lambda function to launch the cluster each day and
transform the records that accumulate in Amazon S3, and delivering the
transformed records to Amazon S3 will incur more operational overhead than
using AWS Lambda and Amazon Kinesis Data Firehose. The company will have to
manage the Amazon EMR cluster, the Apache Spark application, the AWS
Lambda function, and the Amazon EventBridge rule. Moreover, this solution will
introduce a delay in the transformation process, as it will run only once a day3.
Option C: Deploying an Amazon S3 File Gateway in the stores, updating the instore
software to deliver data to the S3 File Gateway, and using a scheduled daily
AWS Glue job to transform the data that the S3 File Gateway delivers to Amazon
S3 will incur more operational overhead than using AWS Lambda and Amazon
Kinesis Data Firehose. The company will have to manage the S3 File Gateway,
the in-store software, and the AWS Glue job. Moreover, this solution will introduce
a delay in the transformation process, as it will run only once a day4.
Option D: Launching a fleet of Amazon EC2 instances that include the
transformation logic, configuring the EC2 instances with a daily cron job to
transform the records that accumulate in Amazon S3, and delivering the
transformed records to Amazon S3 will incur more operational overhead than
using AWS Lambda and Amazon Kinesis Data Firehose. The company will have to manage the EC2 instances, the transformation code, and the cron job. Moreover,
this solution will introduce a delay in the transformation process, as it will run only
once a day5.
References:
1: AWS Lambda
2: Amazon Kinesis Data Firehose
3: Amazon EMR
4: Amazon S3 File Gateway
5: Amazon EC2
Feedback That Matters: Reviews of Our Amazon MLS-C01 Dumps
Maeve WalkerApr 25, 2026
The MLS-C01 preparation provided by MyCertsHub resembled actual AWS mentor training. I was challenged to consider data preprocessing, model optimization, and deployment patterns by the multiple-choice questions. I wasn't just prepared when I took the test; I also felt confident. Easily passed and gained knowledge that I already use in production.
Holden BriggsApr 24, 2026
Last week, I passed MLS-C01. MyCertsHub covered a lot of ground, particularly feature engineering and SageMaker. I was able to make sense of the scenarios because they felt real.
Catalina KimApr 24, 2026
MyCertsHub's explanations went beyond "right or wrong," which I appreciated. For an exam like MLS-C01, understanding why something worked was crucial. Not just exam preparation; actual skill development.
Claire ReidApr 23, 2026
I was nervous about the ML theory parts because I had experience working with AWS infrastructure. The AWS-specific and math-heavy content in MyCertsHub were perfectly balanced. It was manageable thanks to their practice sets' short study bursts.
Don DietrichApr 23, 2026
I was able to connect academic ML concepts with AWS services thanks to MyCertsHub. I now know how to translate theory into actual cloud workflows. I took the exam shortly after finishing my thesis, and I passed it without any anxiety.
Bagwati LachmanApr 22, 2026
The practice tests were excellent. I went from being confused about tuning hyperparameters in SageMaker to explaining them confidently in meetings. Yes, I did pass MLS-C01, too!