Machine Learning on AWS: CloudFormation & SageMaker

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

An ML engineer needs to use AWS CloudFormation to create an ML model that an Amazon SageMaker endpoint will host. Which resource should the ML engineer declare in the CloudFormation template to meet this requirement?

AWS::SageMaker::Endpoint

AWS::SageMaker::NotebookInstance

AWS::SageMaker::Pipeline

AWS::SageMaker::Model (correct)

An advertising company uses AWS Lake Formation to manage a data lake. The data lake contains structured data and unstructured data. The company's ML engineers are assigned to specific advertisement campaigns. The ML engineers must interact with the data through Amazon Athena and by browsing the data directly in an Amazon S3 bucket. The ML engineers must have access to only the resources that are specific to their assigned advertisement campaigns. Which solution will meet these requirements in the MOST operationally efficient way?

Store users and campaign information in an Amazon DynamoDB table. Configure DynamoDB Streams to invoke an AWS Lambda function to update S3 bucket policies.

Configure S3 bucket policies to restrict access to the S3 bucket based on the ML engineers' campaigns.

Configure IAM policies on an AWS Glue Data Catalog to restrict access to Athena based on the ML engineers' campaigns.

Use Lake Formation to authorize AWS Glue to access the S3 bucket. Configure Lake Formation tags to map ML engineers to their campaigns. (correct)

An ML engineer needs to use data with Amazon SageMaker Canvas to train an ML model. The data is stored in Amazon S3 and is complex in structure. The ML engineer must use a file format that minimizes processing time for the data. Which file format will meet these requirements?

Apache Parquet files (correct)

CSV files compressed with Snappy

JSON objects in JSONL format

JSON files compressed with gzip

An ML engineer is evaluating several ML models and must choose one model to use in production. The cost of false negative predictions by the models is much higher than the cost of false positive predictions. Which metric finding should the ML engineer prioritize the MOST when choosing the model?

High recall (D) Signup and view all the answers

A company has trained and deployed an ML model by using Amazon SageMaker. The company needs to implement a solution to record and monitor all the API call events for the SageMaker endpoint. The solution also must provide a notification when the number of API call events breaches a threshold. Which solution will meet these requirements?

Log all the endpoint invocation API events by using AWS CloudTrail. Use an Amazon CloudWatch dashboard for monitoring. Set up a CloudWatch alarm to provide notification when the threshold is breached. (D) Signup and view all the answers

A company has AWS Glue data processing jobs that are orchestrated by an AWS Glue workflow. The AWS Glue jobs can run on a schedule or can be launched manually. The company is developing pipelines in Amazon SageMaker Pipelines for ML model development. The pipelines will use the output of the AWS Glue jobs during the data processing phase of model development. An ML engineer needs to implement a solution that integrates the AWS Glue jobs with the pipelines. Which solution will meet these requirements with the LEAST operational overhead?

Use Callback steps in SageMaker Pipelines to start the AWS Glue workflow and to stop the pipelines until the AWS Glue jobs finish running. (D) Signup and view all the answers

A company is using an Amazon Redshift database as its single data source. Some of the data is sensitive. A data scientist needs to use some of the sensitive data from the database. An ML engineer must give the data scientist access to the data without transforming the source data and without storing anonymized data in the database. Which solution will meet these requirements with the LEAST implementation effort?

Configure dynamic data masking policies to control how sensitive data is shared with the data scientist at query time. (D) Signup and view all the answers

An ML engineer is using a training job to fine-tune a deep learning model in Amazon SageMaker Studio. The ML engineer previously used the same pre-trained model with a similar dataset. The ML engineer expects vanishing gradient, underutilized GPU, and overfitting problems. The ML engineer needs to implement a solution to detect these issues and to react in predefined ways when the issues occur. The solution also must provide comprehensive real-time metrics during the training. Which solution will meet these requirements with the LEAST operational overhead?

Use SageMaker Debugger built-in rules to monitor the training job. Configure the rules to initiate the predefined actions. (A) Signup and view all the answers

A credit card company has a fraud detection model in production on an Amazon SageMaker endpoint. The company develops a new version of the model. The company needs to assess the new model's performance by using live data and without affecting production end users. Which solution will meet these requirements?

Set up shadow testing with a shadow variant of the new model. (C) Signup and view all the answers

A company stores time-series data about user clicks in an Amazon S3 bucket. The raw data consists of millions of rows of user activity every day. ML engineers access the data to develop their ML models. The ML engineers need to generate daily reports and analyze click trends over the past 3 days by using Amazon Athena. The company must retain the data for 30 days before archiving the data. Which solution will provide the HIGHEST performance for data retrieval?

Organize the time-series data into partitions by date prefix in the S3 bucket. Apply S3 Lifecycle policies to archive partitions that are older than 30 days to S3 Glacier Flexible Retrieval. (B) Signup and view all the answers

A company has deployed an ML model that detects fraudulent credit card transactions in real time in a banking application. The model uses Amazon SageMaker Asynchronous Inference. Consumers are reporting delays in receiving the inference results. An ML engineer needs to implement a solution to improve the inference performance. The solution also must provide a notification when a deviation in model quality occurs. Which solution will meet these requirements?

Use SageMaker real-time inference for inference. Use SageMaker Model Monitor for notifications about model quality. (D) Signup and view all the answers

An ML engineer needs to implement a solution to host a trained ML model. The rate of requests to the model will be inconsistent throughout the day. The ML engineer needs a scalable solution that minimizes costs when the model is not in use. The solution also must maintain the model's capacity to respond to requests during times of peak usage. Which solution will meet these requirements?

Deploy the model to an Amazon SageMaker endpoint. Create SageMaker endpoint auto scaling policies that are based on Amazon CloudWatch metrics to adjust the number of instances dynamically. (D) Signup and view all the answers

A company uses Amazon SageMaker Studio to develop an ML model. The company has a single SageMaker Studio domain. An ML engineer needs to implement a solution that provides an automated alert when SageMaker compute costs reach a specific threshold. Which solution will meet these requirements?

Add resource tagging by editing the SageMaker user profile in the SageMaker domain. Configure AWS Budgets to send an alert when the threshold is reached. (C) Signup and view all the answers

A company uses Amazon SageMaker for its ML workloads. The company's ML engineer receives a 50 MB Apache Parquet data file to build a fraud detection model. The file includes several correlated columns that are not required. What should the ML engineer do to drop the unnecessary columns in the file with the LEAST effort?

Create a data flow in SageMaker Data Wrangler. Configure a transform step. (B) Signup and view all the answers

A company is creating an application that will recommend products for customers to purchase. The application will make API calls to Amazon Q Business. The company must ensure that responses from Amazon Q Business do not include the name of the company's main competitor. Which solution will meet this requirement?

Configure the competitor's name as a blocked phrase in Amazon Q Business. (A) Signup and view all the answers

An ML engineer needs to use Amazon SageMaker to fine-tune a large language model (LLM) for text summarization. The ML engineer must follow a low-code no-code (LCNC) approach. Which solution will meet these requirements?

Use SageMaker Autopilot to fine-tune an LLM that is deployed by SageMaker JumpStart. (B) Signup and view all the answers

A company has an ML model that needs to run one time each night to predict stock values. The model input is 3 MB of data that is collected during the current day. The model produces the predictions for the next day. The prediction process takes less than 1 minute to finish running. How should the company deploy the model on Amazon SageMaker to meet these requirements?

Use a serverless inference endpoint. Set the MaxConcurrency parameter to 1. (C) Signup and view all the answers

An ML engineer trained an ML model on Amazon SageMaker to detect automobile accidents from dosed-circuit TV footage. The ML engineer used SageMaker Data Wrangler to create a training dataset of images of accidents and non-accidents. The model performed well during training and validation. However, the model is underperforming in production because of variations in the quality of the images from various cameras. Which solution will improve the model's accuracy in the LEAST amount of time?

Recreate the training dataset by using the Data Wrangler corrupt image transform. Specify the impulse noise option. (C) Signup and view all the answers

A company has an application that uses different APIs to generate embeddings for input text. The company needs to implement a solution to automatically rotate the API tokens every 3 months. Which solution will meet this requirement?

Store the tokens in AWS Secrets Manager. Create an AWS Lambda function to perform the rotation. (A) Signup and view all the answers

An ML engineer receives datasets that contain missing values, duplicates, and extreme outliers. The ML engineer must consolidate these datasets into a single data frame and must prepare the data for ML. Which solution will meet these requirements?

Use Amazon SageMaker Data Wrangler to import the datasets and to consolidate them into a single data frame. Use the cleansing and enrichment functionalities to prepare the data. (D) Signup and view all the answers

A company has historical data that shows whether customers needed long-term support from company staff. The company needs to develop an ML model to predict whether new customers will require long-term support. Which modeling approach should the company use to meet this requirement?

Logistic regression (C) Signup and view all the answers

An ML engineer has developed a binary classification model outside of Amazon SageMaker. The ML engineer needs to make the model accessible to a SageMaker Canvas user for additional tuning. The model artifacts are stored in an Amazon S3 bucket. The ML engineer and the Canvas user are part of the same SageMaker domain. Which combination of requirements must be met so that the ML engineer can share the model with the Canvas user? (Choose two.)

The model must be registered in the SageMaker Model Registry. (A), The Canvas user must have permissions to access the S3 bucket where the model artifacts are stored. (C) Signup and view all the answers

A company is building a deep learning model on Amazon SageMaker. The company uses a large amount of data as the training dataset. The company needs to optimize the model's hyperparameters to minimize the loss function on the validation dataset. Which hyperparameter tuning strategy will accomplish this goal with the LEAST computation time?

Hyperband (C) Signup and view all the answers

A company is planning to use Amazon Redshift ML in its primary AWS account. The source data is in an Amazon S3 bucket in a secondary account. An ML engineer needs to set up an ML pipeline in the primary account to access the S3 bucket in the secondary account. The solution must not require public IPv4 addresses. Which solution will meet these requirements?

Provision a Redshift cluster and Amazon SageMaker Studio in a VPC in the primary account. Create an S3 gateway endpoint. Update the S3 bucket policy to allow IAM principals from the primary account. Set up interface VPC endpoints for SageMaker and Amazon Redshift. (D) Signup and view all the answers

A company is using an AWS Lambda function to monitor the metrics from an ML model. An ML engineer needs to implement a solution to send an email message when the metrics breach a threshold. Which solution will meet this requirement?

Log the metrics from the Lambda function to Amazon CloudWatch. Configure a CloudWatch alarm to send the email message. (B) Signup and view all the answers

A company has used Amazon SageMaker to deploy a predictive ML model in production. The company is using SageMaker Model Monitor on the model. After a model update, an ML engineer notices data quality issues in the Model Monitor checks. What should the ML engineer do to mitigate the data quality issues that Model Monitor has identified?

Include additional data in the existing training set for the model. Retrain and redeploy the model. (B) Signup and view all the answers

A company has an ML model that generates text descriptions based on images that customers upload to the company's website. The images can be up to 50 MB in total size. An ML engineer decides to store the images in an Amazon S3 bucket. The ML engineer must implement a processing solution that can scale to accommodate changes in demand. Which solution will meet these requirements with the LEAST operational overhead?

Create an Amazon SageMaker Asynchronous Inference endpoint and a scaling policy. Run a script to make an inference request for each image. (D) Signup and view all the answers

An ML engineer needs to use AWS services to identify and extract meaningful unique keywords from documents. Which solution will meet these requirements with the LEAST operational overhead?

Use Amazon Comprehend custom entity recognition and key phrase extraction to identify and extract relevant keywords. (D) Signup and view all the answers

A company needs to give its ML engineers appropriate access to training data. The ML engineers must access training data from only their own business group. The ML engineers must not be allowed to access training data from other business groups. The company uses a single AWS account and stores all the training data in Amazon S3 buckets. All ML model training occurs in Amazon SageMaker. Which solution will provide the ML engineers with the appropriate access?

Create IAM policies. Attach the policies to IAM users or IAM roles. (D) Signup and view all the answers

A company needs to host a custom ML model to perform forecast analysis. The forecast analysis willoccur with predictable and sustained load during the same 2-hour period every day. Multiple invocations during the analysis period will require quick responses. The company needs AWS to manage the underlying infrastructure and any auto scaling activities. Which solution will meet these requirements? to manage the underlying infrastructure and any auto scaling activities. Which solution will meet these requirements?

Use Amazon SageMaker Serverless Inference with provisioned concurrency. (C) Signup and view all the answers

Study Notes

CloudFormation and SageMaker

An ML engineer needs a AWS::SageMaker::Model resource in a CloudFormation template to create an ML model that an Amazon SageMaker endpoint will host.

AWS Lake Formation and ML Engineers

An advertising company uses AWS Lake Formation to manage a data lake.
The data lake contains structured and unstructured data.
ML engineers are assigned to specific ad campaigns.
The engineers interact with data through Amazon Athena and by directly browsing S3 buckets.
The solution to restrict access to resources based on assigned campaigns is using Lake Formation tags to map ML engineers to their campaigns which will configure S3 bucket policies and authorize AWS Glue to access the bucket.

SageMaker Canvas and File Format

ML engineers use Amazon SageMaker Canvas with data stored in Amazon S3.
The best file format for minimizing processing time with complex data structures is Apache Parquet files.

Model Evaluation and Metrics

When the cost of false negative predictions is higher than false positives, prioritize the model's high recall metric.

SageMaker Endpoint Monitoring

To record and monitor all API calls for a SageMaker endpoint and alert when a threshold is breached, log all endpoint invocation API events using AWS CloudTrail and set up a CloudWatch alarm.

AWS Glue Pipelines and Jobs

For orchestrated AWS Glue data processing jobs, the LEAST operational overhead in integration with SageMaker Pipelines is to configure inputs pointing to the Amazon Resource Names (ARNs) of the jobs using processing steps in SageMaker Pipelines.

Data Storage and Access in Redshift

An ML engineer needs minimum implementation effort to make sensitive data available in an Amazon Redshift database to the data scientist without transforming the source data, by using dynamic data masking policies to control access at query time.

SageMaker Training Job Monitoring

To detect issues during model training and react in predefined ways, use SageMaker Debugger with built-in rules to monitor the training job and configure rules to initiate predefined actions. Real-time metrics during training are also provided.

Model Performance Evaluation with Shadow Testing

To perform a performance evaluation of a new model without impacting production end-users, use shadow testing with a shadow variant of the new model.

Time-Series Data Retrieval and Archiving

For the fastest data retrieval from time-series data in Amazon S3 for click trend analysis and 30-day archiving, prioritize using Amazon S3 Lifecycle policies to archive partitions that are older than 30 days to S3 Glacier Flexible Retrieval. The partitions should be organized by a daily date prefix.

AWS SageMaker Inference Recommender

To address model performance issues, use SageMaker Inference Recommender to provide notifications about model quality.

SageMaker Model Hosting with Variable Demand

For hosting a trained ML model with variable request rates, use SageMaker endpoint auto scaling with policies based on Amazon CloudWatch metrics to determine the optimal number of instances.

SageMaker Studio and Cost Threshold Alerts

To automatically send an alert when SageMaker compute costs surpass a certain threshold, add resource tagging by editing each user's IAM profile and configure AWS Budget to send an alert.

SageMaker and Data Wrangling

When creating a training dataset for an ML model with variations in image quality, recreate the dataset by using Data Wrangler to prepare a new training dataset and using data transformation methods like "enhance image contrast" with specified parameters.

Amazon S3 Data and File Format

For the fastest performance, organize Amazon S3 data by using daily date prefixes. Use S3 Lifecycle policies to archive data older than 30 days to S3 Glacier Flexible Retrieval.

Data Issues for ML

To prepare data for ML models with missing values, duplicates, and outliers, use Amazon SageMaker Data Wrangler to import the datasets and consolidate into a single data frame.

Model Prediction and Explanation

For model predictions with explanation/interpretation, use SageMaker Model Monitor.

Distributed Training and Instance Communication

To minimize communication overhead during distributed training, place training instances in the same VPC subnet within the same Availability Zone and store data in that same region.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Description

This quiz covers key concepts in deploying machine learning models using AWS services like CloudFormation and SageMaker. It addresses data management with AWS Lake Formation and best practices for file formats in SageMaker Canvas. Test your knowledge on ML engineering and resource management within the AWS ecosystem.