Data Engineering Overview
5 Questions
1 Views

Data Engineering Overview

Created by
@BrotherlyAshcanSchool

Questions and Answers

Which service is primarily involved in online inference within Amazon SageMaker?

  • Step Functions
  • SageMaker Hosting (correct)
  • Glue
  • SageMaker Training
  • Which of the following services is NOT associated with feature authoring in Amazon SageMaker?

  • SageMaker Training (correct)
  • DataBrew
  • Glue
  • SageMaker Data Wrangler
  • What technology is used for streaming data in Amazon SageMaker?

  • SageMaker Processing
  • Kinesis (correct)
  • Athena
  • Apache Airflow
  • Which service would you use for batch scoring in Amazon SageMaker?

    <p>SageMaker Batch Transform</p> Signup and view all the answers

    Which of these services is primarily responsible for feature processing?

    <p>EMR</p> Signup and view all the answers

    Study Notes

    Feature Authoring

    • SageMaker Data Wrangler: A tool for data preparation and feature engineering, enabling users to visualize, transform, and clean data for machine learning.
    • Glue: A fully managed ETL (extract, transform, load) service that prepares data for analytics and machine learning.
    • DataBrew: An interactive data preparation tool that allows users to clean and normalize data without writing code.

    Feature Processing

    • EMR: Amazon Elastic MapReduce, a cloud-native big data platform to process vast amounts of data quickly, often used for data processing workflows.
    • Glue: Also serves as a processing engine for ETL tasks, orchestrating complex data workflows.
    • SageMaker Processing: Facilitates the execution of data processing workloads, including pre-processing and evaluation of machine learning models.

    Streaming

    • Kinesis: A platform for real-time processing of streaming data at scale, enabling applications to consume and analyze data streams in real time.
    • Kafka: A distributed event streaming platform that allows building real-time data pipelines and streaming applications.

    Feature Discovery

    • SageMaker Studio: An integrated development environment (IDE) for machine learning that provides tools for feature discovery, allowing data scientists to explore and manage datasets more effectively.

    Feature Pipelines

    • Step Functions: A serverless orchestration service that allows building complex workflows by coordinating multiple AWS services.
    • SageMaker Pipelines: A feature that helps manage end-to-end machine learning workflows, automating the process of training and deploying models.
    • Apache Airflow: An open-source platform for orchestrating complex computational workflows, allowing for scheduling and monitoring of tasks.

    Amazon SageMaker Feature Store

    • A fully managed repository for storing and managing features used in machine learning models, supporting feature retrieval and sharing across different models.

    Training, Batch Scoring

    • SageMaker Training: Provides built-in algorithms and frameworks to train machine learning models at scale with managed infrastructure.
    • SageMaker Batch Transform: Facilitates batch processing of data for inference, allowing multiple data points to be transformed using a model in bulk.
    • Athena: An interactive query service that allows running SQL queries on data stored in Amazon S3, often used for data exploration and analysis.

    Online Inference

    • SageMaker Hosting: Offers a fully managed environment for deploying models, enabling real-time predictions via APIs.
    • Lambda: A serverless compute service that runs code in response to events, often used for executing inference logic without needing to manage servers.

    Studying That Suits You

    Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

    Quiz Team

    Description

    Explore the key tools and services for data preparation, processing, and streaming in data engineering. This quiz covers tools like SageMaker Data Wrangler, Glue, and Kinesis to help you understand their functionalities and applications effectively.

    More Quizzes Like This

    Data Preparation and Structuring Quiz
    5 questions
    Time Series Data Preparation
    18 questions
    Data Preparation Process
    10 questions
    Use Quizgecko on...
    Browser
    Browser