Podcast
Questions and Answers
Flashcards
AWS Glue Workflow
AWS Glue Workflow
Orchestrates AWS Glue jobs either on a schedule or manually.
Amazon SageMaker Pipelines
Amazon SageMaker Pipelines
Tools for building ML workflows, integrating data processing and model training.
Dynamic Data Masking
Dynamic Data Masking
Controls data visibility at query time without altering the source data.
SageMaker Model Monitor
SageMaker Model Monitor
Signup and view all the flashcards
Blue/Green Deployment
Blue/Green Deployment
Signup and view all the flashcards
Amazon EventBridge
Amazon EventBridge
Signup and view all the flashcards
Auto Scaling
Auto Scaling
Signup and view all the flashcards
Shadow Testing
Shadow Testing
Signup and view all the flashcards
Asynchronous Inference
Asynchronous Inference
Signup and view all the flashcards
Error Metric for Models
Error Metric for Models
Signup and view all the flashcards
SageMaker Debugger
SageMaker Debugger
Signup and view all the flashcards
Feature Selection
Feature Selection
Signup and view all the flashcards
Data Normalization
Data Normalization
Signup and view all the flashcards
Amazon SageMaker Autopilot
Amazon SageMaker Autopilot
Signup and view all the flashcards
SageMaker Data Wrangler
SageMaker Data Wrangler
Signup and view all the flashcards
IAM Policies
IAM Policies
Signup and view all the flashcards
SageMaker Canvas
SageMaker Canvas
Signup and view all the flashcards
Hyperparameter Tuning
Hyperparameter Tuning
Signup and view all the flashcards
Amazon Comprehend
Amazon Comprehend
Signup and view all the flashcards
AWS Secrets Manager
AWS Secrets Manager
Signup and view all the flashcards
Amazon S3 Lifecycle
Amazon S3 Lifecycle
Signup and view all the flashcards
AWS Glue DataBrew
AWS Glue DataBrew
Signup and view all the flashcards
SageMaker Endpoint
SageMaker Endpoint
Signup and view all the flashcards
Model Bias
Model Bias
Signup and view all the flashcards
Precision-Recall Tradeoff
Precision-Recall Tradeoff
Signup and view all the flashcards
Model Validation
Model Validation
Signup and view all the flashcards
Data Drift
Data Drift
Signup and view all the flashcards
AWS Lambda
AWS Lambda
Signup and view all the flashcards
Amazon S3 Buckets
Amazon S3 Buckets
Signup and view all the flashcards
Study Notes
Question 46
- A company uses AWS Glue workflows to orchestrate data processing jobs.
- The jobs can be scheduled or launched manually.
- Pipelines in Amazon SageMaker are used for ML model development.
- The pipelines need data processed by the AWS Glue jobs.
- The best solution with the least overhead is to use SageMaker pipelines' callback steps.
- This stops the pipelines until the Glue jobs complete.
Question 47
- A company uses Amazon Redshift for data storage.
- Some data is sensitive.
- A data scientist needs access to sensitive data without transformation or storing anonymized data.
- The solution with the least effort is to configure dynamic data masking policies.
- This controls sharing sensitive data at query time.
Question 48
- An ML engineer is fine-tuning a deep learning model in SageMaker Studio.
- The engineer expects problems like vanishing gradients, underutilized GPUs, and overfitting.
- The solution needs to detect these issues and provide comprehensive, real-time metrics during training.
- Using SageMaker Debugger built-in rules is the best option for monitoring and predefined actions.
Question 49
- A credit card company has a fraud detection model in production in SageMaker.
- A new model version needs to be assessed without impacting production users.
- The best solution is shadow testing with a shadow variant of the new model.
Question 50
- A company has time-series data in Amazon S3.
- The data consists of user clicks (millions of rows/day)
- ML engineers access the data for modeling.
- Reports are needed for the last 3 days, using Amazon Athena.
- Data retention is 30 days before archiving.
- The best option, for highest retrieval performance, is to partition the data by date and use Glacier flexible retrieval.
Question 51
- A banking application uses an ML model with SageMaker Asynchronous Inference.
- Consumers report delays in receiving inference results.
- The solution should improve performance and provide notifications.
- Use real-time inference with SageMaker Model Monitor for quality notifications.
Question 52
- An ML engineer needs a solution for hosting a trained ML model.
- The model receives inconsistent request rates throughout the day.
- The solution needs to minimize costs when not in use while maintaining capacity during peak usage.
- Deploying the model to an Amazon SageMaker endpoint with auto scaling is the best fit.
Question 53
- A SageMaker Studio domain needs automated alerts for compute cost exceeding a threshold.
- Configuring AWS Budgets is the straightforward solution.
Question 54
- A company uses SageMaker for ML tasks.
- A 50 MB Apache Parquet file contains correlated columns not needed.
- Using SageMaker Data Wrangler to configure a transform step is the most efficient approach for dropping columns.
Question 55
- An application uses Amazon Q Business APIs to recommend products.
- Responses must exclude competitor names.
- Configuring the competitor's name as a blocked phrase in Amazon Q Business meets the requirement.
Question 56
- An ML engineer needs to fine-tune an LLM (large language model) for summarisation.
- A low-code, no-code approaches is preferred.
- SageMaker Autopilot fine-tuning on an LLM deployed by SageMaker JumpStart is a good option.
Question 57
- A model needs to run once per night to predict stock values.
- The input is 3 MB of data, collected daily.
- Prediction takes less than 1 minute.
- Using a serverless inference endpoint with a configured MaxConcurrency is suitable for this requirement.
Question 58
- An ML model detects automobile accidents using SageMaker Data Wrangler.
- The model underperforms in production due to image quality variations between cameras.
- Image contrast enhancement using Data Wrangler is the best solution to improve model accuracy quickly.
Question 59
- An application needs API token rotation every 3 months.
- Using AWS Secrets Manager and a Lambda function for rotation is suitable.
Question 60
- An ML engineer has multiple datasets with missing values, duplicates and outliers.
- The datasets need to be consolidated into a single data frame and prepared for ML.
- Using SageMaker Data Wrangler to import, consolidate, and prepare the data is the best approach.
Question 61
- A company needs a model to predict if customers require extended support.
- Use historical data.
- Logistic regression is the suitable modeling approach for classifying customer support needs.
Question 62
- An ML engineer developed a binary classification model outside of SageMaker, and needs SageMaker Canvas access for tuning.
- The model artifacts are in an S3 bucket.
- The engineer and Canvas user are in the same SageMaker domain.
- The solution requires permissions to access the S3 bucket for model artifacts and the model be registered in SageMaker Model Registry.
Question 63
- A company optimizes a large-scale, deep-learning model's hyperparameters using SageMaker.
- Hyperband is the optimal strategy due to minimized compute time.
Question 64
- A company uses Amazon Redshift ML in its primary AWS account. Source data is in a secondary account S3 bucket.
- The solution needs to access the secondary account's S3 bucket without public IPv4 addresses.
- An S3 gateway endpoint in the primary account, with appropriate security configuration, is a suitable solution.
Question 65
- An ML model uses an AWS Lambda function to monitor.
- An email needs to be sent when the model metrics exceed a threshold.
- Log the metrics to Amazon CloudWatch, configure an alarm to send the email notification.
Question 66
- After a model update in Sagemaker, data quality issues are detected by Model Monitor.
- Creating a new baseline for Model Monitor, using the latest data, is the appropriate solution.
Question 67
- A company needs a scalable solution to process images (up to 50 MB) uploaded to a website.
- Images will be stored in an Amazon S3 bucket.
- Using Sagemaker's batch transform job to handle image processing tasks is appropriate, with a clear process.
Question 68
- A company needs to give ML engineers access to training data from their specific business groups without wider access.
- Training data is stored in Amazon S3 buckets in a single AWS account.
- Using IAM policies to grant access to appropriate users or roles is the best solution to control access granularly.
Question 69
- A company uses SageMaker for ML model training and training data is stored in Amazon S3.
- The solution must allow granular access control.
- Utilizing IAM policies to create and apply specific access control to user roles and to the Amazon S3 buckets are required.
Question 70
- A company needs a ML inference solution with predictable load and immediate responses for analysis, which needs to automatically scale.
- The best suited solutions is to use SageMaker Serverless Inference with provisioned concurrency.
Question 71
- A company wants to explain how its SageMaker sentiment analysis model makes predictions for stakeholders.
- Using SageMaker Clarify on the deployed model provides this functionality.
Question 72
- An ML engineer uses SageMaker to train a distributed deep learning model and experiences communication overhead between instances.
- The solution needs to reduce communication overhead.
- Placing the instances in the same VPC subnet within the same Availability Zone and data location resolves the problem.
Question 73
- A company needs to move ML models running in Python scripts and proprietary data to AWS.
- The goal is a solution involving the least operational effort.
- Using SageMaker script mode with pre-built images for frameworks (like PyTorch) is a suitable approach.
Question 74
- A company needs to improve the performance of training an ML model using several, large files in Amazon S3.
- Creating an Amazon FSx for Lustre file system, and coupling it with the training job, is the time effective way to improve performance.
Question 75
- A company must build a model from tabular data including sensitive information.
- The solution needs to mask sensitive data.
- Using SageMaker DataBrew is the best fit to prepare sensitive data by masking.
Question 76
- An ML engineer needs to deploy models and get inferences from large datasets asynchronously.
- The solution needs scheduled monitoring of data quality, with alerts on quality changes.
- Using SageMaker Model Monitor with SageMaker batch transform is a suitable solution.
Question 77
- An ML engineer normalized training data in AWS Glue DataBrew using min-max normalization.
- The engineer needs production inference data to be normalized using the training set's normalization statistics.
- Using the min-max normalization statistics from the training set is the best approach to normalize production inference data.
Question 78
- A company uses 6 TB of training data in an Amazon FSx for NetApp ONTAP system.
- The data is in the same VPC as SageMaker.
- Mounting the FSx for ONTAP file system as a volume to the SageMaker instance enables accessing it.
Question 79
- New training data is uploaded to an Amazon S3 bucket every few days.
- A SageMaker pipeline needs to be triggered to retrain the model using the new data.
- Creating an Amazon EventBridge rule to automatically trigger the pipeline when new data is uploaded to the S3 bucket provides an efficient solution.
Question 80
- An ML engineer uses SageMaker XGBoost for fraud detection.
- The model performs well in training but poorly in real-world scenarios with new transactions.
- The ML engineer should decrease the max_depth hyperparameter value.
Question 81
- A binary classification model is in production.
- A new model version needs to develop a higher performance.
- Maximizing prediction correctness for both positive and negative values requires using accuracy.
Question 82
- A company needs greater control of their SageMaker ML workflows.
- They need a visualization of jobs as a DAG (Directed Acyclic Graph).
- Need to maintain a model discovery history, and establish model governance.
- Utilizing SageMaker Pipelines, in combination with SageMaker Experiments and ML lineage tracking, provides the complete solution to meet the needs.
Question 83
- A company wants to reduce the cost of its containerized ML applications that run on EC2, Lambda and ECS.
- The ML engineer must identify inefficient resources and provide recommendations for cost reduction.
- Running AWS Compute Optimizer.
Question 84
- A central catalog of ML models hosted in ECR repositories across different accounts needs to be created.
- Using Amazon SageMaker Model Registry allows centralized cataloging capabilities, including cross-account replication.
Question 85
- A company needs to validate a new ML model on a portion of traffic, to ensure its performance before deployment.
- Validation should be done on 10% of the traffic.
- Using production variants (with 10% weight for the new model) on an existing endpoint directly, using AWS CloudWatch for monitoring, is the best option.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Related Documents
Description
Test your knowledge on AWS services like Glue, Redshift, and SageMaker. This quiz covers data processing, machine learning model development, and best practices for managing sensitive data. Get ready to dive into practical scenarios faced by data professionals.