Recent Lessons

Show all results for ""

Data Engineering Overview

Data Engineering Overview

Choose a study mode

Play Quiz

Study Flashcards

Spaced Repetition

Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Download our mobile app to listen on the go

Get App

Questions and Answers

Which service is primarily involved in online inference within Amazon SageMaker?

Step Functions
SageMaker Hosting (correct)
Glue
SageMaker Training

Which of the following services is NOT associated with feature authoring in Amazon SageMaker?

SageMaker Training (correct)
DataBrew
Glue
SageMaker Data Wrangler

What technology is used for streaming data in Amazon SageMaker?

SageMaker Processing
Kinesis (correct)
Athena
Apache Airflow

Which service would you use for batch scoring in Amazon SageMaker?

<p>SageMaker Batch Transform (B)</p>

Signup and view all the answers

Which of these services is primarily responsible for feature processing?

<p>EMR (C)</p>

Signup and view all the answers

Flashcards are hidden until you start studying

Study Notes

Feature Authoring

SageMaker Data Wrangler: A tool for data preparation and feature engineering, enabling users to visualize, transform, and clean data for machine learning.
Glue: A fully managed ETL (extract, transform, load) service that prepares data for analytics and machine learning.
DataBrew: An interactive data preparation tool that allows users to clean and normalize data without writing code.

Feature Processing

EMR: Amazon Elastic MapReduce, a cloud-native big data platform to process vast amounts of data quickly, often used for data processing workflows.
Glue: Also serves as a processing engine for ETL tasks, orchestrating complex data workflows.
SageMaker Processing: Facilitates the execution of data processing workloads, including pre-processing and evaluation of machine learning models.

Streaming

Kinesis: A platform for real-time processing of streaming data at scale, enabling applications to consume and analyze data streams in real time.
Kafka: A distributed event streaming platform that allows building real-time data pipelines and streaming applications.

Feature Discovery

SageMaker Studio: An integrated development environment (IDE) for machine learning that provides tools for feature discovery, allowing data scientists to explore and manage datasets more effectively.

Feature Pipelines

Step Functions: A serverless orchestration service that allows building complex workflows by coordinating multiple AWS services.
SageMaker Pipelines: A feature that helps manage end-to-end machine learning workflows, automating the process of training and deploying models.
Apache Airflow: An open-source platform for orchestrating complex computational workflows, allowing for scheduling and monitoring of tasks.

Amazon SageMaker Feature Store

A fully managed repository for storing and managing features used in machine learning models, supporting feature retrieval and sharing across different models.

Training, Batch Scoring

SageMaker Training: Provides built-in algorithms and frameworks to train machine learning models at scale with managed infrastructure.
SageMaker Batch Transform: Facilitates batch processing of data for inference, allowing multiple data points to be transformed using a model in bulk.
Athena: An interactive query service that allows running SQL queries on data stored in Amazon S3, often used for data exploration and analysis.

Online Inference

SageMaker Hosting: Offers a fully managed environment for deploying models, enabling real-time predictions via APIs.
Lambda: A serverless compute service that runs code in response to events, often used for executing inference logic without needing to manage servers.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

More Like This

Data Preparation and Structuring Quiz

5 questions

Data Preparation and Structuring Quiz

ReceptiveLeopard

Data Science Principles: Data Preparation

18 questions

Data Science Principles: Data Preparation

LionheartedPansy

Data Preparation and Cleaning Techniques

13 questions

Data Preparation and Cleaning Techniques

Ahana24

Data Preparation: Code Sheets and Libraries

41 questions

Data Preparation: Code Sheets and Libraries

WellBeingWhale498

Use Quizgecko on...

Browser