Machine Learning Data Preparation Techniques
5 Questions
1 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

A company stores historical data in .csv files in Amazon S3. Only some of the rows and columns in the .csv files are populated. The columns are not labeled. An ML engineer needs to prepare and store the data so that the company can use the data to train ML models. Select and order the correct steps from the following list to perform this task. Each step should be selected one time or not at all. (Select and order three.)

  • Create an Amazon SageMaker batch transform job for data cleaning and feature engineering.
  • Store the resulting data back in Amazon S3.
  • Use Amazon Athena to infer the schemas and available columns.
  • Use AWS Glue crawlers to infer the schemas and available columns.
  • Use AWS Glue DataBrew for data cleaning and feature engineering.

  • Use AWS Glue DataBrew for data cleaning and feature engineering. (correct)
  • Create an Amazon SageMaker batch transform job for data cleaning and feature engineering.
  • Use Amazon Athena to infer the schemas and available columns.
  • Use AWS Glue crawlers to infer the schemas and available columns. (correct)
  • Store the resulting data back in Amazon S3. (correct)
  • An ML engineer needs to use Amazon SageMaker Feature Store to create and manage features to train a model. Select and order the correct steps from the following list to create and use the features in Features Store. Each step should be selected one time. (Select and order three.)

    • Access the store to build datasets for training.
    • Create a feature group.
    • Ingest the records.

  • Ingest the records. (correct)
  • Create a feature group. (correct)
  • Access the store to build datasets for training. (correct)
  • A company wants to host an ML model on Amazon SageMaker. An ML engineer is configuring a continuous integration and continuous delivery (CI/CD) pipeline in AWS CodePipeline to deploy the model. The pipeline must run automatically when new training data for the model is uploaded to an Amazon S3 bucket. Select and order the pipeline's correct steps from the following list. Each step should be selected one time or not at all. (Select and order three.)

    • An S3 event notification invokes the pipeline when new data is uploaded.
    • S3 Lifecycle rule invokes the pipeline when new data is uploaded.
    • SageMaker retrains the model by using the data in the S3 bucket.
    • The pipeline deploys the model to a SageMaker endpoint.
    • The pipeline deploys the model to SageMaker Model Registry.

  • SageMaker retrains the model by using the data in the S3 bucket. (correct)
  • S3 Lifecycle rule invokes the pipeline when new data is uploaded.
  • The pipeline deploys the model to a SageMaker endpoint. (correct)
  • The pipeline deploys the model to SageMaker Model Registry.
  • An S3 event notification invokes the pipeline when new data is uploaded. (correct)
  • An ML engineer is building a generative AI application on Amazon Bedrock by using large language models (LLMs). Select the correct generative AI term from the following list. For each description, Each term should be selected one time or not at all. (Select and order three.) Text representation of basic units of data processed by LLMs High-dimensional vectors that contain the semantic meaning of text Enrichment of information from additional data sources to improve a generated response

    • Embedding
    • Retrieval Augmented Generation (RAG)
    • Temperature
    • Token

    <p>Embedding (A), Retrieval Augmented Generation (RAG) (B), Token (D)</p> Signup and view all the answers

    An ML engineer is working on an ML model to predict the prices of similarly sized homes. The model will base predictions on several features. The ML engineer will use the following feature engineering techniques to estimate the prices of the homes:

    • Feature splitting
    • Logarithmic transformation
    • One-hot encoding
    • Standardized distribution Select the correct feature engineering techniques for the following list of features. Each feature engineering technique should be selected one time or not at all (Select three.) City (name) Type_year (type of home and year the home was built) Size of the building (square feet or square meters)

    <p>One-hot encoding (A), Standardized distribution (B), Feature splitting (D)</p> Signup and view all the answers

    Study Notes

    Data Preparation Steps for Machine Learning Models

    • Data Source: Historical data in .csv files stored in Amazon S3.
    • Data Quality: Some rows and columns contain missing data; columns are unlabeled.
    • Goal: Prepare the data for machine learning models.
    • Step 1: Create an Amazon SageMaker batch transform job for data cleaning and feature engineering.
    • Step 2: Store the resulting data back in Amazon S3.
    • Step 3: Use AWS Glue crawlers to infer the schemas and available columns(optional).

    Feature Store Creation Steps

    • Goal: Create and manage features to train a machine learning model using Amazon SageMaker Feature Store.
    • Step 1: Access the feature store to prepare for training data. Create a feature group. Ingest data records.
    • Step 2: Access the feature store. Create a feature group. Ingest data records.
    • Step 3: Access the feature store. Create a feature group. Ingest data records.

    Continuous Integration and Continuous Delivery (CI/CD) Pipeline for ML Model Deployment

    • Goal: Configure a CI/CD pipeline in AWS CodePipeline for automatic deployment of an ML model hosted in Amazon SageMaker. The pipeline triggers upon new data upload into Amazon S3.
    • Step 1: An S3 event notification invokes the pipeline when new data is uploaded.
    • Step 2: SageMaker retrains the model using the data from S3.
    • Step 3: The pipeline deploys the model to a SageMaker Endpoint.
    • Additional Step: The pipeline deploys the model to SageMaker Model Registry (optional).

    Generative AI Terms

    • Token: Text representation of basic units of data processed by LLMs (Large Language Models).
    • Embedding: High-dimensional vectors containing the semantic meaning of text.
    • Retrieval Augmented Generation (RAG): Enrichment of information from additional data sources to improve a generated response.

    Feature Engineering Techniques for Home Price Prediction

    • Feature Splitting: Splitting features (optional).
    • Logarithmic Transformation: Logarithmic transformation for numerical features (optional).
    • One-Hot Encoding: Transforming categorical features (optional).
    • Standardized Distribution: Standardizing numerical features (optional)

    Studying That Suits You

    Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

    Quiz Team

    Related Documents

    Description

    This quiz covers essential steps in preparing data for machine learning models, including data cleaning, feature engineering, and using Amazon SageMaker Feature Store. You'll learn how to manage data effectively and ensure high quality for your machine learning projects. Test your understanding of these crucial processes.

    More Like This

    Data Preparation for Machine Learning
    18 questions
    Machine Learning Data Preparation Steps
    40 questions
    AWS AI Practitioner Exam - ML Project Phases
    24 questions
    Use Quizgecko on...
    Browser
    Browser