Podcast
Questions and Answers
Which service is primarily involved in online inference within Amazon SageMaker?
Which service is primarily involved in online inference within Amazon SageMaker?
Which of the following services is NOT associated with feature authoring in Amazon SageMaker?
Which of the following services is NOT associated with feature authoring in Amazon SageMaker?
What technology is used for streaming data in Amazon SageMaker?
What technology is used for streaming data in Amazon SageMaker?
Which service would you use for batch scoring in Amazon SageMaker?
Which service would you use for batch scoring in Amazon SageMaker?
Signup and view all the answers
Which of these services is primarily responsible for feature processing?
Which of these services is primarily responsible for feature processing?
Signup and view all the answers
Study Notes
Feature Authoring
- SageMaker Data Wrangler: A tool for data preparation and feature engineering, enabling users to visualize, transform, and clean data for machine learning.
- Glue: A fully managed ETL (extract, transform, load) service that prepares data for analytics and machine learning.
- DataBrew: An interactive data preparation tool that allows users to clean and normalize data without writing code.
Feature Processing
- EMR: Amazon Elastic MapReduce, a cloud-native big data platform to process vast amounts of data quickly, often used for data processing workflows.
- Glue: Also serves as a processing engine for ETL tasks, orchestrating complex data workflows.
- SageMaker Processing: Facilitates the execution of data processing workloads, including pre-processing and evaluation of machine learning models.
Streaming
- Kinesis: A platform for real-time processing of streaming data at scale, enabling applications to consume and analyze data streams in real time.
- Kafka: A distributed event streaming platform that allows building real-time data pipelines and streaming applications.
Feature Discovery
- SageMaker Studio: An integrated development environment (IDE) for machine learning that provides tools for feature discovery, allowing data scientists to explore and manage datasets more effectively.
Feature Pipelines
- Step Functions: A serverless orchestration service that allows building complex workflows by coordinating multiple AWS services.
- SageMaker Pipelines: A feature that helps manage end-to-end machine learning workflows, automating the process of training and deploying models.
- Apache Airflow: An open-source platform for orchestrating complex computational workflows, allowing for scheduling and monitoring of tasks.
Amazon SageMaker Feature Store
- A fully managed repository for storing and managing features used in machine learning models, supporting feature retrieval and sharing across different models.
Training, Batch Scoring
- SageMaker Training: Provides built-in algorithms and frameworks to train machine learning models at scale with managed infrastructure.
- SageMaker Batch Transform: Facilitates batch processing of data for inference, allowing multiple data points to be transformed using a model in bulk.
- Athena: An interactive query service that allows running SQL queries on data stored in Amazon S3, often used for data exploration and analysis.
Online Inference
- SageMaker Hosting: Offers a fully managed environment for deploying models, enabling real-time predictions via APIs.
- Lambda: A serverless compute service that runs code in response to events, often used for executing inference logic without needing to manage servers.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Description
Explore the key tools and services for data preparation, processing, and streaming in data engineering. This quiz covers tools like SageMaker Data Wrangler, Glue, and Kinesis to help you understand their functionalities and applications effectively.