Podcast
Questions and Answers
A company stores historical data in .csv files in Amazon S3. Only some of the rows and columns in the .csv files are populated. The columns are not labeled. An ML engineer needs to prepare and store the data so that the company can use the data to train ML models. Select and order the correct steps from the following list to perform this task. Each step should be selected one time or not at all. (Select and order three.)
- Create an Amazon SageMaker batch transform job for data cleaning and feature engineering.
- Store the resulting data back in Amazon S3.
- Use Amazon Athena to infer the schemas and available columns.
- Use AWS Glue crawlers to infer the schemas and available columns.
- Use AWS Glue DataBrew for data cleaning and feature engineering.
A company stores historical data in .csv files in Amazon S3. Only some of the rows and columns in the .csv files are populated. The columns are not labeled. An ML engineer needs to prepare and store the data so that the company can use the data to train ML models. Select and order the correct steps from the following list to perform this task. Each step should be selected one time or not at all. (Select and order three.)
- Create an Amazon SageMaker batch transform job for data cleaning and feature engineering.
- Store the resulting data back in Amazon S3.
- Use Amazon Athena to infer the schemas and available columns.
- Use AWS Glue crawlers to infer the schemas and available columns.
- Use AWS Glue DataBrew for data cleaning and feature engineering.
An ML engineer needs to use Amazon SageMaker Feature Store to create and manage features to train a model. Select and order the correct steps from the following list to create and use the features in Features Store. Each step should be selected one time. (Select and order three.)
- Access the store to build datasets for training.
- Create a feature group.
- Ingest the records.
An ML engineer needs to use Amazon SageMaker Feature Store to create and manage features to train a model. Select and order the correct steps from the following list to create and use the features in Features Store. Each step should be selected one time. (Select and order three.)
- Access the store to build datasets for training.
- Create a feature group.
- Ingest the records.
A company wants to host an ML model on Amazon SageMaker. An ML engineer is configuring a continuous integration and continuous delivery (CI/CD) pipeline in AWS CodePipeline to deploy the model. The pipeline must run automatically when new training data for the model is uploaded to an Amazon S3 bucket. Select and order the pipeline's correct steps from the following list. Each step should be selected one time or not at all. (Select and order three.)
- An S3 event notification invokes the pipeline when new data is uploaded.
- S3 Lifecycle rule invokes the pipeline when new data is uploaded.
- SageMaker retrains the model by using the data in the S3 bucket.
- The pipeline deploys the model to a SageMaker endpoint.
- The pipeline deploys the model to SageMaker Model Registry.
A company wants to host an ML model on Amazon SageMaker. An ML engineer is configuring a continuous integration and continuous delivery (CI/CD) pipeline in AWS CodePipeline to deploy the model. The pipeline must run automatically when new training data for the model is uploaded to an Amazon S3 bucket. Select and order the pipeline's correct steps from the following list. Each step should be selected one time or not at all. (Select and order three.)
- An S3 event notification invokes the pipeline when new data is uploaded.
- S3 Lifecycle rule invokes the pipeline when new data is uploaded.
- SageMaker retrains the model by using the data in the S3 bucket.
- The pipeline deploys the model to a SageMaker endpoint.
- The pipeline deploys the model to SageMaker Model Registry.
An ML engineer is building a generative AI application on Amazon Bedrock by using large language models (LLMs). Select the correct generative AI term from the following list. For each description, Each term should be selected one time or not at all. (Select and order three.)
Text representation of basic units of data processed by LLMs
High-dimensional vectors that contain the semantic meaning of text
Enrichment of information from additional data sources to improve a generated response
- Embedding
- Retrieval Augmented Generation (RAG)
- Temperature
- Token
An ML engineer is building a generative AI application on Amazon Bedrock by using large language models (LLMs). Select the correct generative AI term from the following list. For each description, Each term should be selected one time or not at all. (Select and order three.) Text representation of basic units of data processed by LLMs High-dimensional vectors that contain the semantic meaning of text Enrichment of information from additional data sources to improve a generated response
- Embedding
- Retrieval Augmented Generation (RAG)
- Temperature
- Token
Signup and view all the answers
An ML engineer is working on an ML model to predict the prices of similarly sized homes. The model will base predictions on several features. The ML engineer will use the following feature engineering techniques to estimate the prices of the homes:
- Feature splitting
- Logarithmic transformation
- One-hot encoding
- Standardized distribution
Select the correct feature engineering techniques for the following list of features. Each feature engineering technique should be selected one time or not at all (Select three.)
City (name)
Type_year (type of home and year the home was built)
Size of the building (square feet or square meters)
An ML engineer is working on an ML model to predict the prices of similarly sized homes. The model will base predictions on several features. The ML engineer will use the following feature engineering techniques to estimate the prices of the homes:
- Feature splitting
- Logarithmic transformation
- One-hot encoding
- Standardized distribution Select the correct feature engineering techniques for the following list of features. Each feature engineering technique should be selected one time or not at all (Select three.) City (name) Type_year (type of home and year the home was built) Size of the building (square feet or square meters)
Signup and view all the answers
Study Notes
Data Preparation Steps for Machine Learning Models
- Data Source: Historical data in .csv files stored in Amazon S3.
- Data Quality: Some rows and columns contain missing data; columns are unlabeled.
- Goal: Prepare the data for machine learning models.
- Step 1: Create an Amazon SageMaker batch transform job for data cleaning and feature engineering.
- Step 2: Store the resulting data back in Amazon S3.
- Step 3: Use AWS Glue crawlers to infer the schemas and available columns(optional).
Feature Store Creation Steps
- Goal: Create and manage features to train a machine learning model using Amazon SageMaker Feature Store.
- Step 1: Access the feature store to prepare for training data. Create a feature group. Ingest data records.
- Step 2: Access the feature store. Create a feature group. Ingest data records.
- Step 3: Access the feature store. Create a feature group. Ingest data records.
Continuous Integration and Continuous Delivery (CI/CD) Pipeline for ML Model Deployment
- Goal: Configure a CI/CD pipeline in AWS CodePipeline for automatic deployment of an ML model hosted in Amazon SageMaker. The pipeline triggers upon new data upload into Amazon S3.
- Step 1: An S3 event notification invokes the pipeline when new data is uploaded.
- Step 2: SageMaker retrains the model using the data from S3.
- Step 3: The pipeline deploys the model to a SageMaker Endpoint.
- Additional Step: The pipeline deploys the model to SageMaker Model Registry (optional).
Generative AI Terms
- Token: Text representation of basic units of data processed by LLMs (Large Language Models).
- Embedding: High-dimensional vectors containing the semantic meaning of text.
- Retrieval Augmented Generation (RAG): Enrichment of information from additional data sources to improve a generated response.
Feature Engineering Techniques for Home Price Prediction
- Feature Splitting: Splitting features (optional).
- Logarithmic Transformation: Logarithmic transformation for numerical features (optional).
- One-Hot Encoding: Transforming categorical features (optional).
- Standardized Distribution: Standardizing numerical features (optional)
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Related Documents
Description
This quiz covers essential steps in preparing data for machine learning models, including data cleaning, feature engineering, and using Amazon SageMaker Feature Store. You'll learn how to manage data effectively and ensure high quality for your machine learning projects. Test your understanding of these crucial processes.