AWS Data Processing and ML Pipelines Quiz
44 Questions
8 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

A company has AWS Glue data processing jobs that are orchestrated by an AWS Glue workflow. The AWS Glue jobs can run on a schedule or can be launched manually. The company is developing pipelines in Amazon SageMaker Pipelines for ML model development. The pipelines will use the output of the AWS Glue jobs during the data processing phase of model development. An ML engineer needs to implement a solution that integrates the AWS Glue jobs with the pipelines. Which solution will meet these requirements with the LEAST operational overhead?

  • Use processing steps in SageMaker Pipelines. Configure inputs that point to the Amazon Resource Names (ARNs) of the AWS Glue jobs.
  • Use Amazon EventBridge to invoke the pipelines and the AWS Glue jobs in the desired order.
  • Use Callback steps in SageMaker Pipelines to start the AWS Glue workflow and to stop the pipelines until the AWS Glue jobs finish running. (correct)
  • Use AWS Step Functions for orchestration of the pipelines and the AWS Glue jobs.
  • A company is using an Amazon Redshift database as its single data source. Some of the data is sensitive. A data scientist needs to use some of the sensitive data from the database. An ML engineer must give the data scientist access to the data without transforming the source data and without storing anonymized data in the database. Which solution will meet these requirements with the LEAST implementation effort?

  • Create a materialized view with masking logic on top of the database. Grant the necessary read permissions to the data scientist.
  • Unload the Amazon Redshift data to Amazon S3. Use Amazon Athena to create schema-on-read with masking logic. Share the view with the data scientist.
  • Unload the Amazon Redshift data to Amazon S3. Create an AWS Glue job to anonymize the data. Share the dataset with the data scientist.
  • Configure dynamic data masking policies to control how sensitive data is shared with the data scientist at query time. (correct)
  • An ML engineer is using a training job to fine-tune a deep learning model in Amazon SageMaker Studio. The ML engineer previously used the same pre-trained model with a similar dataset. The ML engineer expects vanishing gradient, underutilized GPU, and overfitting problems. The ML engineer needs to implement a solution to detect these issues and to react in predefined ways when the issues occur. The solution also must provide comprehensive real-time metrics during the training. Which solution will meet these requirements with the LEAST operational overhead?

  • Use SageMaker Debugger built-in rules to monitor the training job. Configure the rules to initiate the predefined actions. (correct)
  • Expand the metrics in Amazon CloudWatch to include the gradients in each training step. Use the metrics to invoke an AWS Lambda function to initiate the predefined actions.
  • Use TensorBoard to monitor the training job. Publish the findings to an Amazon Simple Notification Service (Amazon SNS) topic. Create an AWS Lambda function to consume the findings and to initiate the predefined actions.
  • Use Amazon CloudWatch default metrics to gain insights about the training job. Use the metrics to invoke an AWS Lambda function to initiate the predefined actions.
  • A credit card company has a fraud detection model in production on an Amazon SageMaker endpoint. The company develops a new version of the model. The company needs to assess the new model's performance by using live data and without affecting production end users. Which solution will meet these requirements?

    <p>Set up shadow testing with a shadow variant of the new model. (C)</p> Signup and view all the answers

    A company stores time-series data about user clicks in an Amazon S3 bucket. The raw data consists of millions of rows of user activity every day. ML engineers access the data to develop their ML models. The ML engineers need to generate daily reports and analyze click trends over the past 3 days by using Amazon Athena. The company must retain the data for 30 days before archiving the data. Which solution will provide the HIGHEST performance for data retrieval?

    <p>Organize the time-series data into partitions by date prefix in the S3 bucket. Apply S3 Lifecycle policies to archive partitions that are older than 30 days to S3 Glacier Flexible Retrieval. (C)</p> Signup and view all the answers

    A company has deployed an ML model that detects fraudulent credit card transactions in real time in a banking application. The model uses Amazon SageMaker Asynchronous Inference. Consumers are reporting delays in receiving the inference results. An ML engineer needs to implement a solution to improve the inference performance. The solution also must provide a notification when a deviation in model quality occurs. Which solution will meet these requirements?

    <p>Use SageMaker real-time inference for inference. Use SageMaker Model Monitor for notifications about model quality. (B)</p> Signup and view all the answers

    An ML engineer needs to implement a solution to host a trained ML model. The rate of requests to the model will be inconsistent throughout the day. The ML engineer needs a scalable solution that minimizes costs when the model is not in use. The solution also must maintain the model's capacity to respond to requests during times of peak usage. Which solution will meet these requirements?

    <p>Deploy the model to an Amazon SageMaker endpoint. Create SageMaker endpoint auto scaling policies that are based on Amazon CloudWatch metrics to adjust the number of instances dynamically. (C)</p> Signup and view all the answers

    A company uses Amazon SageMaker Studio to develop an ML model. The company has a single SageMaker Studio domain. An ML engineer needs to implement a solution that provides an automated alert when SageMaker compute costs reach a specific threshold. Which solution will meet these requirements?

    <p>Add resource tagging by editing the SageMaker user profile in the SageMaker domain. Configure AWS Budgets to send an alert when the threshold is reached. (D)</p> Signup and view all the answers

    A company uses Amazon SageMaker for its ML workloads. The company's ML engineer receives a 50 MB Apache Parquet data file to build a fraud detection model. The file includes several correlated columns that are not required. What should the ML engineer do to drop the unnecessary columns in the file with the LEAST effort?

    <p>Create a data flow in SageMaker Data Wrangler. Configure a transform step. (C)</p> Signup and view all the answers

    A company is creating an application that will recommend products for customers to purchase. The application will make API calls to Amazon Q Business. The company must ensure that responses from Amazon Q Business do not include the name of the company's main competitor. Which solution will meet this requirement?

    <p>Configure the competitor's name as a blocked phrase in Amazon Q Business. (B)</p> Signup and view all the answers

    An ML engineer needs to use Amazon SageMaker to fine-tune a large language model (LLM) for text summarization. The ML engineer must follow a low-code no-code (LCNC) approach. Which solution will meet these requirements?

    <p>Use SageMaker Autopilot to fine-tune an LLM that is deployed by SageMaker JumpStart. (B)</p> Signup and view all the answers

    A company has an ML model that needs to run one time each night to predict stock values. The model input is 3 MB of data that is collected during the current day. The model produces the predictions for the next day. The prediction process takes less than 1 minute to finish running. How should the company deploy the model on Amazon SageMaker to meet these requirements?

    <p>Use a serverless inference endpoint. Set the MaxConcurrency parameter to 1. (D)</p> Signup and view all the answers

    An ML engineer trained an ML model on Amazon SageMaker to detect automobile accidents from closed-circuit TV footage. The ML engineer used SageMaker Data Wrangler to create a training dataset of images of accidents and non-accidents. The model performed well during training and validation. However, the model is underperforming in production because of variations in the quality of the images from various cameras. Which solution will improve the model's accuracy in the LEAST amount of time?

    <p>Recreate the training dataset by using the Data Wrangler enhance image contrast transform. Specify the Gamma contrast option. (B)</p> Signup and view all the answers

    A company has an application that uses different APIs to generate embeddings for input text. The company needs to implement a solution to automatically rotate the API tokens every 3 months. Which solution will meet this requirement?

    <p>Store the tokens in AWS Secrets Manager. Create an AWS Lambda function to perform the rotation. (B)</p> Signup and view all the answers

    An ML engineer receives datasets that contain missing values, duplicates, and extreme outliers. The ML engineer must consolidate these datasets into a single data frame and must prepare the data for ML. Which solution will meet these requirements?

    <p>Use Amazon SageMaker Data Wrangler to import the datasets and to consolidate them into a single data frame. Use the cleansing and enrichment functionalities to prepare the data. (C)</p> Signup and view all the answers

    A company has historical data that shows whether customers needed long-term support from the company staff. The company needs to develop an ML model to predict whether new customers will require long-term support. Which modeling approach should the company use to meet this requirement?

    <p>Logistic regression (B)</p> Signup and view all the answers

    An ML engineer has developed a binary classification model outside of Amazon SageMaker. The ML engineer needs to make the model accessible to a SageMaker Canvas user for additional tuning. The model artifacts are stored in an Amazon S3 bucket. The ML engineer and the Canvas user are part of the same SageMaker domain. Which combination of requirements must be met so that the ML engineer can share the model with the Canvas user? (Choose two)

    <p>The model must be registered in the SageMaker Model Registry. (C), The Canvas user must have permissions to access the S3 bucket where the model artifacts are stored. (D)</p> Signup and view all the answers

    A company is building a deep learning model on Amazon SageMaker. The company uses a large amount of data as the training dataset. The company needs to optimize the model's hyperparameters to minimize the loss function on the validation dataset. Which hyperparameter tuning strategy will accomplish this goal with the LEAST computation time?

    <p>Hyperband (C)</p> Signup and view all the answers

    A company is planning to use Amazon Redshift ML in its primary AWS account. The source data is in an Amazon S3 bucket in a secondary account. An ML engineer needs to set up an ML pipeline in the primary account to access the S3 bucket in the secondary account. The solution must not require public IPv4 addresses.

    <p>Provision a Redshift cluster and Amazon SageMaker Studio in a VPC in the primary account. Create an S3 gateway endpoint. Update the S3 bucket policy to allow IAM principals from the primary account. Set up interface VPC endpoints for SageMaker and Amazon Redshift. (A)</p> Signup and view all the answers

    A company is using an AWS Lambda function to monitor the metrics from an ML model. An ML engineer needs to implement a solution to send an email message when the metrics breach a threshold. Which solution will meet this requirement?

    <p>Log the metrics from the Lambda function to Amazon CloudWatch. Configure a CloudWatch alarm to send the email message. (C)</p> Signup and view all the answers

    A company has used Amazon SageMaker to deploy a predictive ML model in production. The company is using SageMaker Model Monitor on the model. After a model update, an ML engineer notices data quality issues in the Model Monitor checks. What should the ML engineer do to mitigate the data quality issues that Model Monitor has identified?

    <p>Create a new baseline from the latest dataset. Update Model Monitor to use the new baseline for evaluations. (B)</p> Signup and view all the answers

    A company has an ML model that generates text descriptions based on images that customers upload to the company's website. The images can be up to 50 MB in total size. An ML engineer decides to store the images in an Amazon S3 bucket. The ML engineer must implement a processing solution that can scale to accommodate changes in demand. Which solution will meet these requirements with the LEAST operational overhead?

    <p>Create an Amazon SageMaker Asynchronous Inference endpoint and a scaling policy. Run a script to make an inference request for each image. (B)</p> Signup and view all the answers

    An ML engineer needs to use AWS services to identify and extract meaningful unique keywords from documents. Which solution will meet these requirements with the LEAST operational overhead?

    <p>Use Amazon Comprehend custom entity recognition and key phrase extraction to identify and extract relevant keywords. (B)</p> Signup and view all the answers

    A company needs to give its ML engineers appropriate access to training data. The ML engineers must access training data from only their own business group. The ML engineers must not be allowed to access training data from other business groups. The company uses a single AWS account and stores all the training data in Amazon S3 buckets. All ML model training occurs in Amazon SageMaker. Which solution will provide the ML engineers with the appropriate access?

    <p>Create IAM policies. Attach the policies to IAM users or IAM roles. (D)</p> Signup and view all the answers

    A company needs to host a custom ML model to perform forecast analysis. The forecast analysis will occur with predictable and sustained load during the same 2-hour period every day. Multiple invocations during the analysis period will require quick responses. The company needs AWS to manage the underlying infrastructure and any auto scaling activities. Which solution will meet these requirements?

    <p>Use Amazon SageMaker Serverless Inference with provisioned concurrency. (C)</p> Signup and view all the answers

    A company's ML engineer has deployed an ML model for sentiment analysis to an Amazon SageMaker endpoint. The ML engineer needs to explain to company stakeholders how the model makes predictions. Which solution will provide an explanation for the model's predictions?

    <p>Use SageMaker Clarify on the deployed model. (B)</p> Signup and view all the answers

    An ML engineer is using Amazon SageMaker to train a deep learning model that requires distributed training. After some training attempts, the ML engineer observes that the instances are not performing as expected. The ML engineer identifies communication overhead between the training instances. What should the ML engineer do to MINIMIZE the communication overhead between the instances?

    <p>Place the instances in the same VPC subnet. Store the data in the same AWS Region and Availability Zone where the instances are deployed. (A)</p> Signup and view all the answers

    A company has AWS Glue data processing jobs that are orchestrated by an AWS Glue workflow. The AWS Glue jobs can run on a schedule or can be launched manually. The company is developing pipelines in Amazon SageMaker Pipelines for ML model development. The pipelines will use the output of the AWS Glue jobs during the data processing phase of model development. An ML engineer needs to implement a solution that integrates the AWS Glue jobs with the pipelines. Which solution will meet these requirements with the LEAST operational overhead?

    <p>Use Callback steps in SageMaker Pipelines to start the AWS Glue workflow and to stop the pipelines until the AWS Glue jobs finish running. (A)</p> Signup and view all the answers

    A company is using an Amazon Redshift database as its single data source. Some of the data is sensitive. A data scientist needs to use some of the sensitive data from the database. An ML engineer must give the data scientist access to the data without transforming the source data and without storing anonymized data in the database. Which solution will meet these requirements with the LEAST implementation effort?

    <p>Configure dynamic data masking policies to control how sensitive data is shared with the data scientist at query time. (C)</p> Signup and view all the answers

    An ML engineer is using a training job to fine-tune a deep learning model in Amazon SageMaker Studio. The ML engineer previously used the same pre-trained model with a similar dataset. The ML engineer expects vanishing gradient, underutilized GPU, and overfitting problems. The ML engineer needs to implement a solution to detect these issues and to react in predefined ways when the issues occur. The solution also must provide comprehensive real-time metrics during the training. Which solution will meet these requirements with the LEAST operational overhead?

    <p>Use SageMaker Debugger built-in rules to monitor the training job. Configure the rules to initiate the predefined actions. (D)</p> Signup and view all the answers

    A company stores time-series data about user clicks in an Amazon S3 bucket. The raw data consists of millions of rows of user activity every day. ML engineers access the data to develop their ML models. The ML engineers need to generate daily reports and analyze click trends over the past 3 days by using Amazon Athena. The company must retain the data for 30 days before archiving the data. Which solution will provide the HIGHEST performance for data retrieval?

    <p>Organize the time-series data into partitions by date prefix in the S3 bucket. Apply S3 Lifecycle policies to archive partitions that are older than 30 days to S3 Glacier Flexible Retrieval. (C)</p> Signup and view all the answers

    An ML engineer trained an ML model on Amazon SageMaker to detect automobile accidents from dosed-circuit TV footage. The ML engineer used SageMaker Data Wrangler to create a training dataset of images of accidents and non-accidents. The model performed well during training and validation. However, the model is underperforming in production because of variations in the quality of the images from various cameras. Which solution will improve the model's accuracy in the LEAST amount of time?

    <p>Recreate the training dataset by using the Data Wrangler enhance image contrast transform. Specify the Gamma contrast option. (C)</p> Signup and view all the answers

    A company has developed a new ML model. The company requires online model validation on 10% ofthe traffic before the company fully releases the model in production. The company uses an Amazon SageMaker endpoint behind an Application Load Balancer (ALB) to serve the model. Which solution will set up the required online validation with the LEAST operational overhead?

    <p>Use production variants to add the new model to the existing SageMaker endpoint. Set the variant weight to 0.1 for the new model. Monitor the number of invocations by using Amazon CloudWatch. (A)</p> Signup and view all the answers

    A company needs to create a central catalog for all the company's ML models. The models are in AWS accounts where the company developed the models initially. The models are hosted in Amazon Elastic Container Registry (Amazon ECR) repositories. Which solution will meet these requirements?

    <p>Use the Amazon SageMaker Model Registry to create a model group for models hosted in Amazon ECR. Create a new AWS account. In the new account, use the SageMaker Model Registry as the central catalog. Attach a cross-account resource policy to each model group in the initial AWS accounts. (C)</p> Signup and view all the answers

    A company is planning to use Amazon Redshift ML in its primary AWS account. The source data is in an Amazon S3 bucket in a secondary account. An ML engineer needs to set up an ML pipeline in the primary account to access the S3 bucket in the secondary account. The solution must not require public IPv4 addresses. Which solution will meet these requirements?

    <p>Provision a Redshift cluster and Amazon SageMaker Studio in a VPC in the primary account. Create an S3 gateway endpoint. Update the S3 bucket policy to allow IAM principals from the primary account. Set up interface VPC endpoints for SageMaker and Amazon Redshift. (D)</p> Signup and view all the answers

    A company is using Amazon SageMaker to deploy a predictive ML model in production. The company is using SageMaker Model Monitor on the model. After a model update, an ML engineer notices data quality issues in the Model Monitor checks. What should the ML engineer do to mitigate the data quality issues that Model Monitor has identified?

    <p>Create a new baseline from the latest dataset. Update Model Monitor to use the new baseline for evaluations. (D)</p> Signup and view all the answers

    A company has an ML model that generates text descriptions based on images that customers upload to the company's website. The images can be up to 50 MB in total size. An ML engineer decides to store the images in an Amazon S3 bucket. The ML engineer must implement a processing solution that can scale to accommodate changes in demand. Which solution will meet these requirements with the LEAST operational overhead?

    <p>Create an Amazon SageMaker Asynchronous Inference endpoint and a scaling policy. Run a script to make an inference request for each image. (C)</p> Signup and view all the answers

    A company is running ML models on premises by using custom Python scripts and proprietary datasets. The company is using PyTorch. The model building requires unique domain knowledge. The company needs to move the models to AWS. Which solution will meet these requirements with the LEAST effort?

    <p>Use SageMaker script mode and premade images for ML frameworks. (B)</p> Signup and view all the answers

    A company is using Amazon SageMaker and millions of files to train an ML model. Each file is several megabytes in size. The files are stored in an Amazon S3 bucket. The company needs to improve training performance. Which solution will meet these requirements in the LEAST amount of time?

    <p>Create an Amazon FSx for Lustre file system. Link the file system to the existing S3 bucket. Adjust the training job to read from the file system. (D)</p> Signup and view all the answers

    A company wants to develop an ML model by using tabular data from its customers. The data contains meaningful ordered features with sensitive information that should not be discarded. An ML engineer must ensure that the sensitive data is masked before another team starts to build the model. Which solution will meet these requirements?

    <p>Prepare the data by using AWS Glue DataBrew. (B)</p> Signup and view all the answers

    An ML engineer needs to deploy ML models to get inferences from large datasets in an asynchronous manner. The ML engineer also needs to implement scheduled monitoring of the data quality of the models. The ML engineer must receive alerts when changes in data quality occur. Which solution will meet these requirements?

    <p>Deploy the models by using Amazon SageMaker batch transform. Use SageMaker Model Monitor to monitor the data quality and to send alerts. (A)</p> Signup and view all the answers

    An ML engineer normalized training data by using min-max normalization in AWS Glue DataBrew. The ML engineer must normalize the production inference data in the same way as the training data before passing the production inference data to the model for predictions. Which solution will meet this requirement?

    <p>Keep the min-max normalization statistics from the training set. Use these values to normalize the production samples. (B)</p> Signup and view all the answers

    A company is planning to use Amazon SageMaker to make classification ratings that are based on images. The company has 6 TB of training data that is stored on an Amazon FSx for NetApp ONTAP system virtual machine (SVM). The SVM is in the same VPC as SageMaker. An ML engineer must make the training data accessible for ML models that are in the SageMaker environment. Which solution will meet these requirements?

    <p>Mount the FSx for ONTAP file system as a volume to the SageMaker Instance. (B)</p> Signup and view all the answers

    A company regularly receives new training data from the vendor of an ML model. The vendor delivers cleaned and prepared data to the company's Amazon S3 bucket every 3-4 days. The company has an Amazon SageMaker pipeline to retrain the model. An ML engineer needs to implement a solution to run the pipeline when new data is uploaded to the S3 bucket. Which solution will meet these requirements with the LEAST operational effort?

    <p>Create an Amazon EventBridge rule that has an event pattern that matches the S3 upload. Configure the pipeline as the target of the rule. (D)</p> Signup and view all the answers

    Study Notes

    Question 46

    • A company uses AWS Glue workflows to orchestrate data processing jobs that can run on a schedule or manually.
    • The company is developing ML model development pipelines in SageMaker Pipelines.
    • The pipelines require integration with the Glue jobs during data processing.
    • The best solution with the least operational overhead is to use SageMaker Pipelines Callback steps.
    • This method starts the Glue job and suspends the pipeline until the Glue job completes.

    Question 47

    • A company uses Amazon Redshift as a data source.
    • Some data is sensitive and needs to be masked for sharing with a data scientist.
    • Dynamic data masking, implemented at query time, provides the easiest method to meet these requirements.

    Question 48

    • An ML engineer is fine-tuning a deep learning model in SageMaker Studio using a pre-trained model with a similar dataset.
    • Potential issues include vanishing gradients, underutilized GPUs, and overfitting.
    • The solution should detect these issues and react in predefined ways with real-time metrics during the training process.
    • Using SageMaker Debugger built-in rules is the most efficient approach to accomplish this.

    Question 49

    • A credit card company has a fraud detection model in production on SageMaker.
    • A new model needs assessment without impacting production users.
    • Shadow testing with a shadow variant of the new model provides necessary testing.

    Question 50

    • A company stores time-series data about user clicks in S3 with millions of rows daily.
    • ML engineers require daily reports and analysis of past 3 days' click trends.
    • Data retention is needed for 30 days before archiving with the highest performance.
    • Organizing data into partitions by date prefix in the S3 bucket and managing archived data using S3 lifecycle policies is optimal.

    Question 51

    • A banking application uses an ML model to detect fraudulent transactions with SageMaker asynchronous inference.
    • Customers are experiencing delays.
    • Using SageMaker real-time inference combined with SageMaker Model Monitor for continuous model quality monitoring is the solution.

    Question 52

    • An ML engineer needs to host a trained ML model with varying request rates throughout the day.
    • Cost minimization and peak usage capacity are required.
    • SageMaker endpoint auto-scaling policies triggered by CloudWatch metrics is the most effective solution.

    Question 53

    • A company uses SageMaker Studio for ML model development.
    • Automated alerts when SageMaker compute costs exceed a threshold are needed.
    • Using AWS Budgets and resource tagging provides an automatic alert system.

    Question 54

    • A company receives a large Apache Parquet file for ML model training.
    • Correlated columns are unnecessary.
    • The fastest approach for dropping unnecessary columns is using SageMaker Data Wrangler.

    Question 55

    • An application recommends products, interacting with Amazon Q Business, with a requirement to exclude competitor's names.
    • The solution is by configuring the competitor's name as a blocked phrase within Amazon Q Business.

    Question 56

    • An ML engineer needs to fine-tune a large language model (LLM) for text summarization using a low-code/no-code method.
    • Use SageMaker Autopilot.

    Question 57

    • A company uses a model to predict stock values daily with previous day data as input.
    • The prediction process takes less than a minute.
    • Use a serverless inference endpoint with MaxConcurrency set to 1 for minimal operational overhead.

    Question 58

    • An ML engineer has a model for detecting automobile accidents from video footage.
    • Underperformance in production due to varied image quality.
    • Improve model accuracy by using the image enhancement transform in Data Wrangler on training data.

    Question 59

    • An application uses various APIs for generating embeddings.
    • API tokens need rotation every 3 months.
    • Using AWS Secrets Manager and AWS Lambda functions to securely and automatically rotate API tokens.

    Question 60

    • An ML engineer receives multiple datasets with issues (missing values, duplicates, extreme outliers), needs data consolidation, and data prep for analysis.
    • Utilize SageMaker Data Wrangler to perform importing, consolidating, and data preparation tasks efficiently.

    Question 61

    • A company has historical data on customer support staff needs, wants to predict support for new customers.
    • Logistic Regression (or similar classification model) is suitable for prediction modeling.

    Question 62

    • An ML engineer hosts a model outside SageMaker that needs access by a SageMaker Canvas user.
    • The model is stored in S3. This solution requires both S3 bucket permissions and registration in the SageMaker Model Registry.

    Question 63

    • A company builds a deep learning model in SageMaker with lots of training data.
    • Objective is to optimize hyperparameters with minimal computational time.
    • Use Hyperband for efficient hyperparameter optimization.

    Question 64

    • A company uses Amazon Redshift ML (primary account) with data stored in S3 in another account.
    • Data sharing between accounts without public IPv4 addresses is needed in ML pipelines.
    • Utilize the S3 gateway endpoint method to establish secure communication between VPCs.

    Question 65

    • An ML model uses Lambda for monitoring metrics.
    • Email alerts are required when metrics exceed certain thresholds.
    • Use Amazon CloudWatch alarms to monitor metrics and configure alarms to trigger email notifications.

    Question 66

    • A company utilizes SageMaker Model Monitor with a deployed predictive model.
    • Data quality problems exist after an update.
    • Using a new baseline from the latest data for Model Monitor evaluation solves the issue.

    Question 67

    • An ML model generates text descriptions from images up to 50 MB, needs processing solution scaling.
    • Use SageMaker Asynchronous Inference endpoints with auto-scaling capabilities for efficient image processing.

    Question 68

    • An ML engineer needs to extract unique keywords from documents.
    • Use Amazon Comprehend's Custom Entity Recognition and Key Phrase Extraction for efficient keyword extraction from documents.

    Question 69

    • Data access in S3 for ML engineers needs to be controlled based on business groups.
    • Use IAM policies for granular access control of data resources and restrict access to specific business groups.

    Question 70

    • A company needs an ML model that will perform reliably with forecasted load for daily analysis.
    • Use SageMaker Serverless Inference with provisioned concurrency for efficient and reliable processing.

    Question 71

    • A company's sentiment analysis model requires prediction explanations for stakeholders.
    • SageMaker Clarify provides explanations of model predictions for sentiment analysis.

    Question 72

    • An ML engineer uses SageMaker for distributed training and experiences communication overhead.
    • Placing instances in the same VPC and Availability Zone reduces communication overhead for SageMaker distributed training.

    Question 73

    • A company wants to move custom Python/PyTorch models to AWS with minimal effort.
    • Use SageMaker script mode with pre-built images of familiar ML frameworks to streamline model deployment.

    Question 74

    • A company wants to improve SageMaker training on large files in S3.
    • Using Amazon FSx for Lustre improves training speed by providing a high-performance file system accessible from SageMaker.

    Question 75

    • A company needs to develop an ML model using tabular data with sensitive information.
    • AWS Glue DataBrew provides sensitive data masking capabilities, enabling secure data transformation before model training.

    Question 76

    • ML models must support asynchronous inference on large datasets, with quality monitoring and alerts.
    • Use SageMaker Model Monitor to monitor data quality and send email alerts.

    Question 77

    • Min-max normalization on training data needs to be applied to production data using the trained statistics.
    • Use the pre-calculated min-max normalization statistics from the training data to normalize the production data.

    Question 78

    • A company needs data from a 6TB Amazon FSx for NetApp ONTAP dataset accessible for SageMaker.
    • Mounting the FSx for ONTAP file system as a volume in SageMaker is a way to easily access data from the FSx file system.

    Question 79

    • New data is uploaded to S3 periodically, triggering a SageMaker pipeline.
    • Use Amazon EventBridge with an event rule on S3 to trigger the SageMaker pipeline automatically when new data is uploaded.

    Question 80

    • A fraud detection model performs well on training data but poorly on new data.
    • Decrease the max_depth hyperparameter value to improve the model's performance on unseen data.

    Question 81

    • A binary classification model needs recalibration to maximize correct predictions for positive and negative labels.
    • Use the accuracy metric for recalibration and monitor model performance on positive and negative labels separately if needed.

    Question 82

    • A company needs fine-grained control of ML workflows, visualization, and model governance.
    • Use AWS SageMaker Pipelines and ML Lineage Tracking to manage workflows, visualize DAGs, and implement model governance.

    Question 83

    • A company has costs associated with containerized ML applications that run on EC2, Lambda, and ECS resources.
    • Use AWS Compute Optimizer to identify inefficient resources and generate cost-saving recommendations.

    Question 84

    • A company needs a central catalog for ML models hosted in ECR across multiple accounts.
    • Use Amazon SageMaker Model Registry to create a centrally-managed catalog of ML models.

    Question 85

    • A model needs online validation on 10% of traffic before full release using an ALB and SageMaker endpoint.
    • Configure the ALB to route 10% of traffic to the new model variant in production (with a weight of 0.1) for validating the model.

    Studying That Suits You

    Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

    Quiz Team

    Related Documents

    Description

    Test your knowledge on AWS Glue workflows and SageMaker Pipelines for orchestrating data processing jobs. Explore concepts like dynamic data masking in Amazon Redshift and fine-tuning deep learning models. This quiz is essential for those interested in AWS data engineering and machine learning.

    More Like This

    AWS Glue Job Run Metrics
    8 questions

    AWS Glue Job Run Metrics

    UserReplaceableRose avatar
    UserReplaceableRose
    Use Quizgecko on...
    Browser
    Browser