Summary

This document contains sample questions and answers related to Amazon Web Services (AWS) services, specifically focusing on machine learning with tools like SageMaker and CloudFormation.

Full Transcript

Question 41 An ML engineer needs to use AWS CloudFormation to create an ML model that an Amazon SageMaker endpoint will host. Which resource should the ML engineer declare in the CloudFormation template to meet this requirement? A. AWS::SageMaker::Model B. AWS::SageMaker::Endpoint C. AWS::SageMake...

Question 41 An ML engineer needs to use AWS CloudFormation to create an ML model that an Amazon SageMaker endpoint will host. Which resource should the ML engineer declare in the CloudFormation template to meet this requirement? A. AWS::SageMaker::Model B. AWS::SageMaker::Endpoint C. AWS::SageMaker::NotebookInstance D. AWS::SageMaker::Pipeline Answer: A Question 42 An advertising company uses AWS Lake Formation to manage a data lake. The data lake contains structured data and unstructured data. The company's ML engineers are assigned to specific advertisement campaigns. The ML engineers must interact with the data through Amazon Athena and by browsing the data directly in an Amazon S3 bucket. The ML engineers must have access to only the resources that are specific to their assigned advertisement campaigns. Which solution will meet these requirements in the MOST operationally efficient way? A. Configure IAM policies on an AWS Glue Data Catalog to restrict access to Athena based on the ML engineers' campaigns. B. Store users and campaign information in an Amazon DynamoDB table. Configure DynamoDB Streams to invoke an AWS Lambda function to update S3 bucket policies. C. Use Lake Formation to authorize AWS Glue to access the S3 bucket. Configure Lake Formation tags to map ML engineers to their campaigns. D. Configure S3 bucket policies to restrict access to the S3 bucket based on the ML engineers' campaigns. Answer: C Question 43 An ML engineer needs to use data with Amazon SageMaker Canvas to train an ML model. The data is stored in Amazon S3 and is complex in structure. The ML engineer must use a file format that minimizes processing time for the data. Which file format will meet these requirements? A. CSV files compressed with Snappy B. JSON objects in JSONL format C. JSON files compressed with gzip D. Apache Parquet files Answer: D Question 44 An ML engineer is evaluating several ML models and must choose one model to use in production. The cost of false negative predictions by the models is much higher than the cost of false positive predictions. Which metric finding should the ML engineer prioritize the MOST when choosing the model? A. Low precision B. High precision C. Low recall D. High recall Answer: D Question 45 A company has trained and deployed an ML model by using Amazon SageMaker. The company needs to implement a solution to record and monitor all the API call events for the SageMaker endpoint. The solution also must provide a notification when the number of API call events breaches a threshold. Which solution will meet these requirements? A. Use SageMaker Debugger to track the inferences and to report metrics. Create a custom rule to provide a notification when the threshold is breached. B. Use SageMaker Debugger to track the inferences and to report metrics. Use the tensor_variance built-in rule to provide a notification when the threshold is breached. C. Log all the endpoint invocation API events by using AWS CloudTrail. Use an Amazon CloudWatch dashboard for monitoring. Set up a CloudWatch alarm to provide notification when the threshold is breached. D. Add the Invocations metric to an Amazon CloudWatch dashboard for monitoring. Set up a CloudWatch alarm to provide notification when the threshold is breached. Answer: C Question 46 A company has AWS Glue data processing jobs that are orchestrated by an AWS Glue workflow. The AWS Glue jobs can run on a schedule or can be launched manually. The company is developing pipelines in Amazon SageMaker Pipelines for ML model development. The pipelines will use the output of the AWS Glue jobs during the data processing phase of model development. An ML engineer needs to implement a solution that integrates the AWS Glue jobs with the pipelines. Which solution will meet these requirements with the LEAST operational overhead? A. Use AWS Step Functions for orchestration of the pipelines and the AWS Glue jobs. B. Use processing steps in SageMaker Pipelines. Configure inputs that point to the Amazon Resource Names (ARNs) of the AWS Glue jobs. C. Use Callback steps in SageMaker Pipelines to start the AWS Glue workflow and to stop the pipelines until the AWS Glue jobs finish running. D. Use Amazon EventBridge to invoke the pipelines and the AWS Glue jobs in the desired order. Answer: C Question 47 A company is using an Amazon Redshift database as its single data source. Some of the data is sensitive. A data scientist needs to use some of the sensitive data from the database. An ML engineer must give the data scientist access to the data without transforming the source data and without storing anonymized data in the database. Which solution will meet these requirements with the LEAST implementation effort? A. Configure dynamic data masking policies to control how sensitive data is shared with the data scientist at query time. B. Create a materialized view with masking logic on top of the database. Grant the necessary read permissions to the data scientist. C. Unload the Amazon Redshift data to Amazon S3. Use Amazon Athena to create schema-on-read with masking logic. Share the view with the data scientist. D. Unload the Amazon Redshift data to Amazon S3. Create an AWS Glue job to anonymize the data. Share the dataset with the data scientist. Answer: A Question 48 An ML engineer is using a training job to fine-tune a deep learning model in Amazon SageMaker Studio. The ML engineer previously used the same pre-trained model with a similar dataset. The ML engineer expects vanishing gradient, underutilized GPU, and overfitting problems. The ML engineer needs to implement a solution to detect these issues and to react in predefined ways when the issues occur. The solution also must provide comprehensive real-time metrics during the training. Which solution will meet these requirements with the LEAST operational overhead? A. Use TensorBoard to monitor the training job. Publish the findings to an Amazon Simple Notification Service (Amazon SNS) topic. Create an AWS Lambda function to consume the findings and to initiate the predefined actions. B. Use Amazon CloudWatch default metrics to gain insights about the training job. Use the metrics to invoke an AWS Lambda function to initiate the predefined actions. C. Expand the metrics in Amazon CloudWatch to include the gradients in each training step. Use the metrics to invoke an AWS Lambda function to initiate the predefined actions. D. Use SageMaker Debugger built-in rules to monitor the training job. Configure the rules to initiate the predefined actions. Answer: D Question 49 A credit card company has a fraud detection model in production on an Amazon SageMaker endpoint. The company develops a new version of the model. The company needs to assess the new model's performance by using live data and without affecting production end users. Which solution will meet these requirements? A. Set up SageMaker Debugger and create a custom rule. B. Set up blue/green deployments with all-at-once traffic shifting. C. Set up blue/green deployments with canary traffic shifting. D. Set up shadow testing with a shadow variant of the new model. Answer: D Question 50 A company stores time-series data about user clicks in an Amazon S3 bucket. The raw data consists of millions of rows of user activity every day. ML engineers access the data to develop their ML models. The ML engineers need to generate daily reports and analyze click trends over the past 3 days by using Amazon Athena. The company must retain the data for 30 days before archiving the data. Which solution will provide the HIGHEST performance for data retrieval? A. Keep all the time-series data without partitioning in the S3 bucket. Manually move data that is older than 30 days to separate S3 buckets. B. Create AWS Lambda functions to copy the time-series data into separate S3 buckets. Apply S3 Lifecycle policies to archive data that is older than 30 days to S3 Glacier Flexible Retrieval. C. Organize the time-series data into partitions by date prefix in the S3 bucket. Apply S3 Lifecycle policies to archive partitions that are older than 30 days to S3 Glacier Flexible Retrieval. D. Put each day's time-series data into its own S3 bucket. Use S3 Lifecycle policies to archive S3 buckets that hold data that is older than 30 days to S3 Glacier Flexible Retrieval. Answer: C Question 51 A company has deployed an ML model that detects fraudulent credit card transactions in real time in a banking application. The model uses Amazon SageMaker Asynchronous Inference. Consumers are reporting delays in receiving the inference results. An ML engineer needs to implement a solution to improve the inference performance. The solution also must provide a notification when a deviation in model quality occurs. Which solution will meet these requirements? A. Use SageMaker real-time inference for inference. Use SageMaker Model Monitor for notifications about model quality. B. Use SageMaker batch transform for inference. Use SageMaker Model Monitor for notifications about model quality. C. Use SageMaker Serverless Inference for inference. Use SageMaker Inference Recommender for notifications about model quality. D. Keep using SageMaker Asynchronous Inference for inference. Use SageMaker Inference Recommender for notifications about model quality. Answer: A Question 52 An ML engineer needs to implement a solution to host a trained ML model. The rate of requests to the model will be inconsistent throughout the day. The ML engineer needs a scalable solution that minimizes costs when the model is not in use. The solution also must maintain the model's capacity to respond to requests during times of peak usage. Which solution will meet these requirements? A. Create AWS Lambda functions that have fixed concurrency to host the model. Configure the Lambda functions to automatically scale based on the number of requests to the model. B. Deploy the model on an Amazon Elastic Container Service (Amazon ECS) cluster that uses AWS Fargate. Set a static number of tasks to handle requests during times of peak usage. C. Deploy the model to an Amazon SageMaker endpoint. Deploy multiple copies of the model to the endpoint. Create an Application Load Balancer to route traffic between the different copies of the model at the endpoint. D. Deploy the model to an Amazon SageMaker endpoint. Create SageMaker endpoint auto scaling policies that are based on Amazon CloudWatch metrics to adjust the number of instances dynamically. Answer: D Question 53 A company uses Amazon SageMaker Studio to develop an ML model. The company has a single SageMaker Studio domain. An ML engineer needs to implement a solution that provides an automated alert when SageMaker compute costs reach a specific threshold. Which solution will meet these requirements? A. Add resource tagging by editing the SageMaker user profile in the SageMaker domain. Configure AWS Cost Explorer to send an alert when the threshold is reached. B. Add resource tagging by editing the SageMaker user profile in the SageMaker domain. Configure AWS Budgets to send an alert when the threshold is reached. C. Add resource tagging by editing each user's IAM profile. Configure AWS Cost Explorer to send an alert when the threshold is reached. D. Add resource tagging by editing each user's IAM profile. Configure AWS Budgets to send an alert when the threshold is reached. Answer: B Question 54 A company uses Amazon SageMaker for its ML workloads. The company's ML engineer receives a 50 MB Apache Parquet data file to build a fraud detection model. The file includes several correlated columns that are not required. What should the ML engineer do to drop the unnecessary columns in the file with the LEAST effort? A. Download the file to a local workstation. Perform one-hot encoding by using a custom Python script. B. Create an Apache Spark job that uses a custom processing script on Amazon EMR. C. Create a SageMaker processing job by calling the SageMaker Python SDK. D. Create a data flow in SageMaker Data Wrangler. Configure a transform step. Answer: D Question 55 A company is creating an application that will recommend products for customers to purchase. The application will make API calls to Amazon Q Business. The company must ensure that responses from Amazon Q Business do not include the name of the company's main competitor. Which solution will meet this requirement? A. Configure the competitor's name as a blocked phrase in Amazon Q Business. B. Configure an Amazon Q Business retriever to exclude the competitor’s name. C. Configure an Amazon Kendra retriever for Amazon Q Business to build indexes that exclude the competitor's name. D. Configure document attribute boosting in Amazon Q Business to deprioritize the competitor's name. Answer: A Question 56 An ML engineer needs to use Amazon SageMaker to fine-tune a large language model (LLM) for text summarization. The ML engineer must follow a low-code no-code (LCNC) approach. Which solution will meet these requirements? A. Use SageMaker Studio to fine-tune an LLM that is deployed on Amazon EC2 instances. B. Use SageMaker Autopilot to fine-tune an LLM that is deployed by a custom API endpoint. C. Use SageMaker Autopilot to fine-tune an LLM that is deployed on Amazon EC2 instances. D. Use SageMaker Autopilot to fine-tune an LLM that is deployed by SageMaker JumpStart. Answer: D Question 57 A company has an ML model that needs to run one time each night to predict stock values. The model input is 3 MB of data that is collected during the current day. The model produces the predictions for the next day. The prediction process takes less than 1 minute to finish running. How should the company deploy the model on Amazon SageMaker to meet these requirements? A. Use a multi-model serverless endpoint. Enable caching. B. Use an asynchronous inference endpoint. Set the InitialInstanceCount parameter to 0. C. Use a real-time endpoint. Configure an auto scaling policy to scale the model to 0 when the model is not in use. D. Use a serverless inference endpoint. Set the MaxConcurrency parameter to 1. Answer: D Question 58 An ML engineer trained an ML model on Amazon SageMaker to detect automobile accidents from dosed-circuit TV footage. The ML engineer used SageMaker Data Wrangler to create a training dataset of images of accidents and non-accidents. The model performed well during training and validation. However, the model is underperforming in production because of variations in the quality of the images from various cameras. Which solution will improve the model's accuracy in the LEAST amount of time? A. Collect more images from all the cameras. Use Data Wrangler to prepare a new training dataset. B. Recreate the training dataset by using the Data Wrangler corrupt image transform. Specify the impulse noise option. C. Recreate the training dataset by using the Data Wrangler enhance image contrast transform. Specify the Gamma contrast option. D. Recreate the training dataset by using the Data Wrangler resize image transform. Crop all images to the same size. Answer: B Question 59 A company has an application that uses different APIs to generate embeddings for input text. The company needs to implement a solution to automatically rotate the API tokens every 3 months. Which solution will meet this requirement? A. Store the tokens in AWS Secrets Manager. Create an AWS Lambda function to perform the rotation. B. Store the tokens in AWS Systems Manager Parameter Store. Create an AWS Lambda function to perform the rotation. C. Store the tokens in AWS Key Management Service (AWS KMS). Use an AWS managed key to perform the rotation. D. Store the tokens in AWS Key Management Service (AWS KMS). Use an AWS owned key to perform the rotation. Answer: A Question 60 An ML engineer receives datasets that contain missing values, duplicates, and extreme outliers. The ML engineer must consolidate these datasets into a single data frame and must prepare the data for ML. Which solution will meet these requirements? A. Use Amazon SageMaker Data Wrangler to import the datasets and to consolidate them into a single data frame. Use the cleansing and enrichment functionalities to prepare the data. B. Use Amazon SageMaker Ground Truth to import the datasets and to consolidate them into a single data frame. Use the human-in-the-loop capability to prepare the data. C. Manually import and merge the datasets. Consolidate the datasets into a single data frame. Use Amazon Q Developer to generate code snippets that will prepare the data. D. Manually import and merge the datasets. Consolidate the datasets into a single data frame. Use Amazon SageMaker data labeling to prepare the data. Answer: A Question 61 A company has historical data that shows whether customers needed long-term support from company staff. The company needs to develop an ML model to predict whether new customers will require long- term support. Which modeling approach should the company use to meet this requirement? A. Anomaly detection B. Linear regression C. Logistic regression D. Semantic segmentation Answer: C Question 62 An ML engineer has developed a binary classification model outside of Amazon SageMaker. The ML engineer needs to make the model accessible to a SageMaker Canvas user for additional tuning. The model artifacts are stored in an Amazon S3 bucket. The ML engineer and the Canvas user are part of the same SageMaker domain. Which combination of requirements must be met so that the ML engineer can share the model with the Canvas user? (Choose two.) A. The ML engineer and the Canvas user must be in separate SageMaker domains. B. The Canvas user must have permissions to access the S3 bucket where the model artifacts are stored. C. The model must be registered in the SageMaker Model Registry. D. The ML engineer must host the model on AWS Marketplace. E. The ML engineer must deploy the model to a SageMaker endpoint. Answer: B,C Question 63 A company is building a deep learning model on Amazon SageMaker. The company uses a large amount of data as the training dataset. The company needs to optimize the model's hyperparameters to minimize the loss function on the validation dataset. Which hyperparameter tuning strategy will accomplish this goal with the LEAST computation time? A. Hyperband B. Grid search C. Bayesian optimization D. Random search Answer: A Question 64 A company is planning to use Amazon Redshift ML in its primary AWS account. The source data is in an Amazon S3 bucket in a secondary account. An ML engineer needs to set up an ML pipeline in the primary account to access the S3 bucket in the secondary account. The solution must not require public IPv4 addresses. Which solution will meet these requirements? A. Provision a Redshift cluster and Amazon SageMaker Studio in a VPC with no public access enabled in the primary account. Create a VPC peering connection between the accounts. Update the VPC route tables to remove the route to 0.0.0.0/0. B. Provision a Redshift cluster and Amazon SageMaker Studio in a VPC with no public access enabled in the primary account. Create an AWS Direct Connect connection and a transit gateway. Associate the VPCs from both accounts with the transit gateway. Update the VPC route tables to remove the route to 0.0.0.0/0. C. Provision a Redshift cluster and Amazon SageMaker Studio in a VPC in the primary account. Create an AWS Site-to-Site VPN connection with two encrypted IPsec tunnels between the accounts. Set up interface VPC endpoints for Amazon S3. D. Provision a Redshift cluster and Amazon SageMaker Studio in a VPC in the primary account. Create an S3 gateway endpoint. Update the S3 bucket policy to allow IAM principals from the primary account. Set up interface VPC endpoints for SageMaker and Amazon Redshift. Answer: D Question 65 A company is using an AWS Lambda function to monitor the metrics from an ML model. An ML engineer needs to implement a solution to send an email message when the metrics breach a threshold. Which solution will meet this requirement? A. Log the metrics from the Lambda function to AWS CloudTrail. Configure a CloudTrail trail to send the email message. B. Log the metrics from the Lambda function to Amazon CloudFront. Configure an Amazon CloudWatch alarm to send the email message. C. Log the metrics from the Lambda function to Amazon CloudWatch. Configure a CloudWatch alarm to send the email message. D. Log the metrics from the Lambda function to Amazon CloudWatch. Configure an Amazon CloudFront rule to send the email message. Answer: C Question 66 A company has used Amazon SageMaker to deploy a predictive ML model in production. The company is using SageMaker Model Monitor on the model. After a model update, an ML engineer notices data quality issues in the Model Monitor checks. What should the ML engineer do to mitigate the data quality issues that Model Monitor has identified? A. Adjust the model's parameters and hyperparameters. B. Initiate a manual Model Monitor job that uses the most recent production data. C. Create a new baseline from the latest dataset. Update Model Monitor to use the new baseline for evaluations. D. Include additional data in the existing training set for the model. Retrain and redeploy the model. Answer: D Question 67 A company has an ML model that generates text descriptions based on images that customers upload to the company's website. The images can be up to 50 MB in total size. An ML engineer decides to store the images in an Amazon S3 bucket. The ML engineer must implement a processing solution that can scale to accommodate changes in demand. Which solution will meet these requirements with the LEAST operational overhead? A. Create an Amazon SageMaker batch transform job to process all the images in the S3 bucket. B. Create an Amazon SageMaker Asynchronous Inference endpoint and a scaling policy. Run a script to make an inference request for each image. C. Create an Amazon Elastic Kubernetes Service (Amazon EKS) cluster that uses Karpenter for auto scaling. Host the model on the EKS cluster. Run a script to make an inference request for each image. D. Create an AWS Batch job that uses an Amazon Elastic Container Service (Amazon ECS) cluster. Specify a list of images to process for each AWS Batch job. Answer: B Question 68 An ML engineer needs to use AWS services to identify and extract meaningful unique keywords from documents. Which solution will meet these requirements with the LEAST operational overhead? A. Use the Natural Language Toolkit (NLTK) library on Amazon EC2 instances for text pre-processing. Use the Latent Dirichlet Allocation (LDA) algorithm to identify and extract relevant keywords. B. Use Amazon SageMaker and the BlazingText algorithm. Apply custom pre-processing steps for stemming and removal of stop words. Calculate term frequency-inverse document frequency (TF-IDF) scores to identify and extract relevant keywords. C. Store the documents in an Amazon S3 bucket. Create AWS Lambda functions to process the documents and to run Python scripts for stemming and removal of stop words. Use bigram and trigram techniques to identify and extract relevant keywords. D. Use Amazon Comprehend custom entity recognition and key phrase extraction to identify and extract relevant keywords. Answer: D Question 69 A company needs to give its ML engineers appropriate access to training data. The ML engineers must access training data from only their own business group. The ML engineers must not be allowed to access training data from other business groups. The company uses a single AWS account and stores all the training data in Amazon S3 buckets. All ML model training occurs in Amazon SageMaker. Which solution will provide the ML engineers with the appropriate access? A. Enable S3 bucket versioning. B. Configure S3 Object Lock settings for each user. C. Add cross-origin resource sharing (CORS) policies to the S3 buckets. D. Create IAM policies. Attach the policies to IAM users or IAM roles. Answer: D Question 70 A company needs to host a custom ML model to perform forecast analysis. The forecast analysis will occur with predictable and sustained load during the same 2-hour period every day. Multiple invocations during the analysis period will require quick responses. The company needs AWS to manage the underlying infrastructure and any auto scaling activities. Which solution will meet these requirements? A. Schedule an Amazon SageMaker batch transform job by using AWS Lambda. B. Configure an Auto Scaling group of Amazon EC2 instances to use scheduled scaling. C. Use Amazon SageMaker Serverless Inference with provisioned concurrency. D. Run the model on an Amazon Elastic Kubernetes Service (Amazon EKS) cluster on Amazon EC2 with pod auto scaling. Answer: C Question 71 A company's ML engineer has deployed an ML model for sentiment analysis to an Amazon SageMaker endpoint. The ML engineer needs to explain to company stakeholders how the model makes predictions. Which solution will provide an explanation for the model's predictions? A. Use SageMaker Model Monitor on the deployed model. B. Use SageMaker Clarify on the deployed model. C. Show the distribution of inferences from A/В testing in Amazon CloudWatch. D. Add a shadow endpoint. Analyze prediction differences on samples. Answer: B Question 72 An ML engineer is using Amazon SageMaker to train a deep learning model that requires distributed training. After some training attempts, the ML engineer observes that the instances are not performing as expected. The ML engineer identifies communication overhead between the training instances. What should the ML engineer do to MINIMIZE the communication overhead between the instances? A. Place the instances in the same VPC subnet. Store the data in a different AWS Region from where the instances are deployed. B. Place the instances in the same VPC subnet but in different Availability Zones. Store the data in a different AWS Region from where the instances are deployed. C. Place the instances in the same VPC subnet. Store the data in the same AWS Region and Availability Zone where the instances are deployed. D. Place the instances in the same VPC subnet. Store the data in the same AWS Region but in a different Availability Zone from where the instances are deployed. Answer: C Question 73 A company is running ML models on premises by using custom Python scripts and proprietary datasets. The company is using PyTorch. The model building requires unique domain knowledge. The company needs to move the models to AWS. Which solution will meet these requirements with the LEAST effort? A. Use SageMaker built-in algorithms to train the proprietary datasets. B. Use SageMaker script mode and premade images for ML frameworks. C. Build a container on AWS that includes custom packages and a choice of ML frameworks. D. Purchase similar production models through AWS Marketplace. Answer: B Question 74 A company is using Amazon SageMaker and millions of files to train an ML model. Each file is several megabytes in size. The files are stored in an Amazon S3 bucket. The company needs to improve training performance. Which solution will meet these requirements in the LEAST amount of time? A. Transfer the data to a new S3 bucket that provides S3 Express One Zone storage. Adjust the training job to use the new S3 bucket. B. Create an Amazon FSx for Lustre file system. Link the file system to the existing S3 bucket. Adjust the training job to read from the file system. C. Create an Amazon Elastic File System (Amazon EFS) file system. Transfer the existing data to the file system. Adjust the training job to read from the file system. D. Create an Amazon ElastiCache (Redis OSS) cluster. Link the Redis OSS cluster to the existing S3 bucket. Stream the data from the Redis OSS cluster directly to the training job. Answer: B Question 75 A company wants to develop an ML model by using tabular data from its customers. The data contains meaningful ordered features with sensitive information that should not be discarded. An ML engineer must ensure that the sensitive data is masked before another team starts to build the model. Which solution will meet these requirements? A. Use Amazon Made to categorize the sensitive data. B. Prepare the data by using AWS Glue DataBrew. C. Run an AWS Batch job to change the sensitive data to random values. D. Run an Amazon EMR job to change the sensitive data to random values. Answer: B Question 76 An ML engineer needs to deploy ML models to get inferences from large datasets in an asynchronous manner. The ML engineer also needs to implement scheduled monitoring of the data quality of the models. The ML engineer must receive alerts when changes in data quality occur. Which solution will meet these requirements? A. Deploy the models by using scheduled AWS Glue jobs. Use Amazon CloudWatch alarms to monitor the data quality and to send alerts. B. Deploy the models by using scheduled AWS Batch jobs. Use AWS CloudTrail to monitor the data quality and to send alerts. C. Deploy the models by using Amazon Elastic Container Service (Amazon ECS) on AWS Fargate. Use Amazon EventBridge to monitor the data quality and to send alerts. D. Deploy the models by using Amazon SageMaker batch transform. Use SageMaker Model Monitor to monitor the data quality and to send alerts. Answer: D Question 77 An ML engineer normalized training data by using min-max normalization in AWS Glue DataBrew. The ML engineer must normalize the production inference data in the same way as the training data before passing the production inference data to the model for predictions. Which solution will meet this requirement? A. Apply statistics from a well-known dataset to normalize the production samples. B. Keep the min-max normalization statistics from the training set. Use these values to normalize the production samples. C. Calculate a new set of min-max normalization statistics from a batch of production samples. Use these values to normalize all the production samples. D. Calculate a new set of min-max normalization statistics from each production sample. Use these values to normalize all the production samples. Answer: B Question 78 A company is planning to use Amazon SageMaker to make classification ratings that are based on images. The company has 6 ТВ of training data that is stored on an Amazon FSx for NetApp ONTAP system virtual machine (SVM). The SVM is in the same VPC as SageMaker. An ML engineer must make the training data accessible for ML models that are in the SageMaker environment. Which solution will meet these requirements? A. Mount the FSx for ONTAP file system as a volume to the SageMaker Instance. B. Create an Amazon S3 bucket. Use Mountpoint for Amazon S3 to link the S3 bucket to the FSx for ONTAP file system. C. Create a catalog connection from SageMaker Data Wrangler to the FSx for ONTAP file system. D. Create a direct connection from SageMaker Data Wrangler to the FSx for ONTAP file system. Answer: A Question 79 A company regularly receives new training data from the vendor of an ML model. The vendor delivers cleaned and prepared data to the company's Amazon S3 bucket every 3-4 days. The company has an Amazon SageMaker pipeline to retrain the model. An ML engineer needs to implement a solution to run the pipeline when new data is uploaded to the S3 bucket. Which solution will meet these requirements with the LEAST operational effort? A. Create an S3 Lifecycle rule to transfer the data to the SageMaker training instance and to initiate training. B. Create an AWS Lambda function that scans the S3 bucket. Program the Lambda function to initiate the pipeline when new data is uploaded. C. Create an Amazon EventBridge rule that has an event pattern that matches the S3 upload. Configure the pipeline as the target of the rule. D. Use Amazon Managed Workflows for Apache Airflow (Amazon MWAA) to orchestrate the pipeline when new data is uploaded. Answer: C Question 80 An ML engineer is developing a fraud detection model by using the Amazon SageMaker XGBoost algorithm. The model classifies transactions as either fraudulent or legitimate. During testing, the model excels at identifying fraud in the training dataset. However, the model is inefficient at identifying fraud in new and unseen transactions. What should the ML engineer do to improve the fraud detection for new transactions? A. Increase the learning rate. B. Remove some irrelevant features from the training dataset. C. Increase the value of the max_depth hyperparameter. D. Decrease the value of the max_depth hyperparameter. Answer: D

Use Quizgecko on...
Browser
Browser