AWS Data Processing and Machine Learning Quiz
0 Questions
0 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

Flashcards

AWS Glue Workflow

Orchestrates AWS Glue jobs either on a schedule or manually.

Amazon SageMaker Pipelines

Tools for building ML workflows, integrating data processing and model training.

Dynamic Data Masking

Controls data visibility at query time without altering the source data.

SageMaker Model Monitor

Monitors production models for data and prediction quality.

Signup and view all the flashcards

Blue/Green Deployment

Two environments to minimize downtime when updating applications.

Signup and view all the flashcards

Amazon EventBridge

Service for event-driven architectures, reacting to changes from AWS resources.

Signup and view all the flashcards

Auto Scaling

Automatically adjusts the number of Amazon SageMaker instances based on demand.

Signup and view all the flashcards

Shadow Testing

Testing a new model's predictions alongside an existing model without impacting users.

Signup and view all the flashcards

Asynchronous Inference

Non-blocking method to get model predictions, useful for high-volume requests.

Signup and view all the flashcards

Error Metric for Models

Quantitative method to evaluate model accuracy such as precision or recall.

Signup and view all the flashcards

SageMaker Debugger

Tool for monitoring and debugging training jobs, providing insights into model performance.

Signup and view all the flashcards

Feature Selection

Process of identifying the most relevant features for model training to improve performance.

Signup and view all the flashcards

Data Normalization

Adjusting the range of data values to a standard scale to improve model learning.

Signup and view all the flashcards

Amazon SageMaker Autopilot

Automatically creates ML models, simplifying the machine learning process.

Signup and view all the flashcards

SageMaker Data Wrangler

Tool for data preparation, making it easier to clean, visualize, and transform data for ML.

Signup and view all the flashcards

IAM Policies

Access management rules defining permissions for AWS users and resources.

Signup and view all the flashcards

SageMaker Canvas

No-code interface for building and deploying ML models without needing programming skills.

Signup and view all the flashcards

Hyperparameter Tuning

Optimizing the model's settings to improve performance.

Signup and view all the flashcards

Amazon Comprehend

Natural language processing service for extracting insights and understanding text.

Signup and view all the flashcards

AWS Secrets Manager

Service that helps protect access to your applications, services, and IT resources without the upfront investment in hardware.

Signup and view all the flashcards

Amazon S3 Lifecycle

Rules that automate the transitioning of objects in your S3 buckets between different storage classes.

Signup and view all the flashcards

AWS Glue DataBrew

Visual data preparation tool to clean and transform data for analytics.

Signup and view all the flashcards

SageMaker Endpoint

Service that allows real-time or batch predictions from deployed models.

Signup and view all the flashcards

Model Bias

A systematic error in the predictions of a model based on data skew.

Signup and view all the flashcards

Precision-Recall Tradeoff

Balancing the ratio of true positives to total positives and true positives to total predictions.

Signup and view all the flashcards

Model Validation

Process of assessing a model’s performance using new data.

Signup and view all the flashcards

Data Drift

When the statistical properties of the target variable change, causing model performance degradation.

Signup and view all the flashcards

AWS Lambda

Serverless compute service that runs code in response to events.

Signup and view all the flashcards

Amazon S3 Buckets

Storage containers for Amazon S3 that hold data files.

Signup and view all the flashcards

Study Notes

Question 46

  • A company uses AWS Glue workflows to orchestrate data processing jobs.
  • The jobs can be scheduled or launched manually.
  • Pipelines in Amazon SageMaker are used for ML model development.
  • The pipelines need data processed by the AWS Glue jobs.
  • The best solution with the least overhead is to use SageMaker pipelines' callback steps.
  • This stops the pipelines until the Glue jobs complete.

Question 47

  • A company uses Amazon Redshift for data storage.
  • Some data is sensitive.
  • A data scientist needs access to sensitive data without transformation or storing anonymized data.
  • The solution with the least effort is to configure dynamic data masking policies.
  • This controls sharing sensitive data at query time.

Question 48

  • An ML engineer is fine-tuning a deep learning model in SageMaker Studio.
  • The engineer expects problems like vanishing gradients, underutilized GPUs, and overfitting.
  • The solution needs to detect these issues and provide comprehensive, real-time metrics during training.
  • Using SageMaker Debugger built-in rules is the best option for monitoring and predefined actions.

Question 49

  • A credit card company has a fraud detection model in production in SageMaker.
  • A new model version needs to be assessed without impacting production users.
  • The best solution is shadow testing with a shadow variant of the new model.

Question 50

  • A company has time-series data in Amazon S3.
  • The data consists of user clicks (millions of rows/day)
  • ML engineers access the data for modeling.
  • Reports are needed for the last 3 days, using Amazon Athena.
  • Data retention is 30 days before archiving.
  • The best option, for highest retrieval performance, is to partition the data by date and use Glacier flexible retrieval.

Question 51

  • A banking application uses an ML model with SageMaker Asynchronous Inference.
  • Consumers report delays in receiving inference results.
  • The solution should improve performance and provide notifications.
  • Use real-time inference with SageMaker Model Monitor for quality notifications.

Question 52

  • An ML engineer needs a solution for hosting a trained ML model.
  • The model receives inconsistent request rates throughout the day.
  • The solution needs to minimize costs when not in use while maintaining capacity during peak usage.
  • Deploying the model to an Amazon SageMaker endpoint with auto scaling is the best fit.

Question 53

  • A SageMaker Studio domain needs automated alerts for compute cost exceeding a threshold.
  • Configuring AWS Budgets is the straightforward solution.

Question 54

  • A company uses SageMaker for ML tasks.
  • A 50 MB Apache Parquet file contains correlated columns not needed.
  • Using SageMaker Data Wrangler to configure a transform step is the most efficient approach for dropping columns.

Question 55

  • An application uses Amazon Q Business APIs to recommend products.
  • Responses must exclude competitor names.
  • Configuring the competitor's name as a blocked phrase in Amazon Q Business meets the requirement.

Question 56

  • An ML engineer needs to fine-tune an LLM (large language model) for summarisation.
  • A low-code, no-code approaches is preferred.
  • SageMaker Autopilot fine-tuning on an LLM deployed by SageMaker JumpStart is a good option.

Question 57

  • A model needs to run once per night to predict stock values.
  • The input is 3 MB of data, collected daily.
  • Prediction takes less than 1 minute.
  • Using a serverless inference endpoint with a configured MaxConcurrency is suitable for this requirement.

Question 58

  • An ML model detects automobile accidents using SageMaker Data Wrangler.
  • The model underperforms in production due to image quality variations between cameras.
  • Image contrast enhancement using Data Wrangler is the best solution to improve model accuracy quickly.

Question 59

  • An application needs API token rotation every 3 months.
  • Using AWS Secrets Manager and a Lambda function for rotation is suitable.

Question 60

  • An ML engineer has multiple datasets with missing values, duplicates and outliers.
  • The datasets need to be consolidated into a single data frame and prepared for ML.
  • Using SageMaker Data Wrangler to import, consolidate, and prepare the data is the best approach.

Question 61

  • A company needs a model to predict if customers require extended support.
  • Use historical data.
  • Logistic regression is the suitable modeling approach for classifying customer support needs.

Question 62

  • An ML engineer developed a binary classification model outside of SageMaker, and needs SageMaker Canvas access for tuning.
  • The model artifacts are in an S3 bucket.
  • The engineer and Canvas user are in the same SageMaker domain.
  • The solution requires permissions to access the S3 bucket for model artifacts and the model be registered in SageMaker Model Registry.

Question 63

  • A company optimizes a large-scale, deep-learning model's hyperparameters using SageMaker.
  • Hyperband is the optimal strategy due to minimized compute time.

Question 64

  • A company uses Amazon Redshift ML in its primary AWS account. Source data is in a secondary account S3 bucket.
  • The solution needs to access the secondary account's S3 bucket without public IPv4 addresses.
  • An S3 gateway endpoint in the primary account, with appropriate security configuration, is a suitable solution.

Question 65

  • An ML model uses an AWS Lambda function to monitor.
  • An email needs to be sent when the model metrics exceed a threshold.
  • Log the metrics to Amazon CloudWatch, configure an alarm to send the email notification.

Question 66

  • After a model update in Sagemaker, data quality issues are detected by Model Monitor.
  • Creating a new baseline for Model Monitor, using the latest data, is the appropriate solution.

Question 67

  • A company needs a scalable solution to process images (up to 50 MB) uploaded to a website.
  • Images will be stored in an Amazon S3 bucket.
  • Using Sagemaker's batch transform job to handle image processing tasks is appropriate, with a clear process.

Question 68

  • A company needs to give ML engineers access to training data from their specific business groups without wider access.
  • Training data is stored in Amazon S3 buckets in a single AWS account.
  • Using IAM policies to grant access to appropriate users or roles is the best solution to control access granularly.

Question 69

  • A company uses SageMaker for ML model training and training data is stored in Amazon S3.
  • The solution must allow granular access control.
  • Utilizing IAM policies to create and apply specific access control to user roles and to the Amazon S3 buckets are required.

Question 70

  • A company needs a ML inference solution with predictable load and immediate responses for analysis, which needs to automatically scale.
  • The best suited solutions is to use SageMaker Serverless Inference with provisioned concurrency.

Question 71

  • A company wants to explain how its SageMaker sentiment analysis model makes predictions for stakeholders.
  • Using SageMaker Clarify on the deployed model provides this functionality.

Question 72

  • An ML engineer uses SageMaker to train a distributed deep learning model and experiences communication overhead between instances.
  • The solution needs to reduce communication overhead.
  • Placing the instances in the same VPC subnet within the same Availability Zone and data location resolves the problem.

Question 73

  • A company needs to move ML models running in Python scripts and proprietary data to AWS.
  • The goal is a solution involving the least operational effort.
  • Using SageMaker script mode with pre-built images for frameworks (like PyTorch) is a suitable approach.

Question 74

  • A company needs to improve the performance of training an ML model using several, large files in Amazon S3.
  • Creating an Amazon FSx for Lustre file system, and coupling it with the training job, is the time effective way to improve performance.

Question 75

  • A company must build a model from tabular data including sensitive information.
  • The solution needs to mask sensitive data.
  • Using SageMaker DataBrew is the best fit to prepare sensitive data by masking.

Question 76

  • An ML engineer needs to deploy models and get inferences from large datasets asynchronously.
  • The solution needs scheduled monitoring of data quality, with alerts on quality changes.
  • Using SageMaker Model Monitor with SageMaker batch transform is a suitable solution.

Question 77

  • An ML engineer normalized training data in AWS Glue DataBrew using min-max normalization.
  • The engineer needs production inference data to be normalized using the training set's normalization statistics.
  • Using the min-max normalization statistics from the training set is the best approach to normalize production inference data.

Question 78

  • A company uses 6 TB of training data in an Amazon FSx for NetApp ONTAP system.
  • The data is in the same VPC as SageMaker.
  • Mounting the FSx for ONTAP file system as a volume to the SageMaker instance enables accessing it.

Question 79

  • New training data is uploaded to an Amazon S3 bucket every few days.
  • A SageMaker pipeline needs to be triggered to retrain the model using the new data.
  • Creating an Amazon EventBridge rule to automatically trigger the pipeline when new data is uploaded to the S3 bucket provides an efficient solution.

Question 80

  • An ML engineer uses SageMaker XGBoost for fraud detection.
  • The model performs well in training but poorly in real-world scenarios with new transactions.
  • The ML engineer should decrease the max_depth hyperparameter value.

Question 81

  • A binary classification model is in production.
  • A new model version needs to develop a higher performance.
  • Maximizing prediction correctness for both positive and negative values requires using accuracy.

Question 82

  • A company needs greater control of their SageMaker ML workflows.
  • They need a visualization of jobs as a DAG (Directed Acyclic Graph).
  • Need to maintain a model discovery history, and establish model governance.
  • Utilizing SageMaker Pipelines, in combination with SageMaker Experiments and ML lineage tracking, provides the complete solution to meet the needs.

Question 83

  • A company wants to reduce the cost of its containerized ML applications that run on EC2, Lambda and ECS.
  • The ML engineer must identify inefficient resources and provide recommendations for cost reduction.
  • Running AWS Compute Optimizer.

Question 84

  • A central catalog of ML models hosted in ECR repositories across different accounts needs to be created.
  • Using Amazon SageMaker Model Registry allows centralized cataloging capabilities, including cross-account replication.

Question 85

  • A company needs to validate a new ML model on a portion of traffic, to ensure its performance before deployment.
  • Validation should be done on 10% of the traffic.
  • Using production variants (with 10% weight for the new model) on an existing endpoint directly, using AWS CloudWatch for monitoring, is the best option.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

Description

Test your knowledge on AWS services like Glue, Redshift, and SageMaker. This quiz covers data processing, machine learning model development, and best practices for managing sensitive data. Get ready to dive into practical scenarios faced by data professionals.

More Like This

AWS Certified Machine Learning Specialty Quiz
8 questions
AWS AI Practitioner Exam - ML Project Phases
24 questions
Use Quizgecko on...
Browser
Browser