Questions and Answers
What functionality does Sagemaker Feature Store provide?
Which statement accurately describes AWS CloudTrail?
What is a limitation of Amazon Macie?
In AWS Step Functions, what does a Pass state do?
Signup and view all the answers
What is a primary benefit of using Apache Parquet files over CSV?
Signup and view all the answers
What is managed by Amazon Managed Workflows for Apache Airflow (MWAA)?
Signup and view all the answers
Which operation does Sagemaker processing NOT perform?
Signup and view all the answers
What distinguishes AWS CloudWatch from AWS CloudTrail?
Signup and view all the answers
Which aspect of Sagemaker ML Lineage Tracking is crucial?
Signup and view all the answers
What is a feature of Amazon Managed Service for Apache Flink?
Signup and view all the answers
What is the primary function of the Fail state in a state machine?
Signup and view all the answers
Which of the following best describes the role of AWS CloudHSM?
Signup and view all the answers
What is the primary purpose of the Wait state in a state machine?
Signup and view all the answers
What is the benefit of using Local Secondary Indexes (LSIs) in DynamoDB?
Signup and view all the answers
Which AWS service is primarily used to automate the movement of large datasets between on-premises storage and AWS?
Signup and view all the answers
In terms of Kafka terminology, what role do producers play?
Signup and view all the answers
What does AWS AppFlow primarily enable?
Signup and view all the answers
How does AWS Graviton improve performance for cloud services?
Signup and view all the answers
What does the term 'MSK serverless' refer to in the context of Kafka?
Signup and view all the answers
What is the function of Cipher Message Authentication Codes (CMACs) in CloudHSM?
Signup and view all the answers
Study Notes
Sagemaker Data Tools
- Sagemaker Data Wrangler facilitates import, preparation, transformation, featurization, and analysis of data, including running Exploratory Data Analysis (EDA).
- Lacks built-in functionality to ensure data accuracy, completeness, and trustworthiness or to identify and mask Personally Identifiable Information (PII).
Sagemaker Feature Store
- Provides a storage and data management layer for machine learning (ML).
- Enables creation, storage, and sharing of features for ML models.
Sagemaker ML Lineage Tracking
- Creates and stores information regarding steps in the ML workflow.
- Supports model governance and auditing standards while ensuring data accuracy and trustworthiness.
Sagemaker Processing
- Managed service for executing processing workloads, data validation, and model evaluation.
Amazon Macie
- Utilizes machine learning to automatically discover sensitive data and privacy issues within datasets.
- Can discover but not mask PII and is restricted to working on Amazon S3.
- Generates detailed reports on data findings, including source information.
AWS AppFlow
- Connects Software as a Service (SaaS) applications with AWS services for seamless data flow.
Amazon Managed Workflows for Apache Airflow (MWAA)
- Managed orchestration service for Apache Airflow, which is an open-source tool for programmatic authoring, scheduling, and monitoring workflows.
- Used to set up and operate scalable data pipelines in the cloud.
Amazon Aurora DB
- Fully managed relational database service that is compatible with MySQL and PostgreSQL, providing high performance.
Fault Injection Techniques
- Can simulate different fault scenarios in Amazon Aurora to test systems, e.g., invoking read replica failures or disk failures.
AWS CloudWatch
- Monitors AWS resources and applications by storing logs and tracking metrics.
- Supports delivering real-time log events from CloudWatch logs to other services such as Kinesis Data Streams (KDS) or OpenSearch.
AWS CloudTrail
- Records user and API activity across AWS services, with a default activation status.
- Provides an event history as a searchable and downloadable immutable record for up to 90 days, with no charges for viewing.
CloudTrail Lake
- Managed data lake dedicated to storing user and API activities.
- CloudTrail Insights analyzes normal usage patterns, generating insights when anomalies, such as abnormal volume or errors, occur.
Management vs Data Events
- Management events pertain to administrative actions, while data events relate to AWS resource actions such as get, put, or invoke, which incur additional costs.
Amazon Managed Service for Apache Flink
- A managed service for running Apache Flink, an open-source framework for stream and batch processing that supports various programming languages (Java, Scala, Python, SQL).
- Capable of processing streaming and static data for time-series analytics.
Apache Parquet vs CSV
- Parquet is a columnar data format, enhancing efficiency and speed of data read operations compared to the row-oriented CSV format.
- Supports schema representation, predicate pushdown, and is stored in a binary format.
AWS Step Functions
- Serverless orchestration service enabling integration with AWS Lambda and other services to build applications with visual workflows.
- Workflows consist of various states, including Pass, Task, Choice, Wait, Success, Fail, Parallel, and Map states.
AWS CloudHSM
- Combines cloud infrastructure with the security of Hardware Security Modules (HSMs) for cryptographic operations and key storage.
- Use cases include managing private keys, encrypting/decrypting data, and supporting message authentication via CMACs and HMACs.
Managed Streaming for Apache Kafka (MSK)
- Fully managed service for building and running applications using Kafka, facilitating operations such as creating, updating, and deleting clusters.
- Features replication of messages between brokers for fault tolerance and utilizes Zookeeper for broker management.
AWS DataSync
- Automatically transfers large datasets between on-premises storage and AWS services (S3, EFS, FSX).
AWS Schema Conversion Tool (SCT)
- Converts database schemas from one engine to another, facilitating database migration.
DynamoDB
- Fast, NoSQL key-value database that employs both partition keys and sort keys.
- Local Secondary Indexes (LSIs) allow additional sorting options for enhanced query flexibility.
Amazon Managed Grafana
- Fully managed service for data visualization, providing instant queries and operational metrics visualization.
Workflow Types
- Step Functions offer broader AWS service integration, Glue Workflows focus on ETL tasks, and Apache Airflow is a more complex option.
- AWS AppFlow is specifically designed to link SaaS applications with AWS services.
AWS PrivateLink
- Facilitates private network connections between consumer Virtual Private Clouds (VPC) and service provider VPCs.
CloudShell
- Provides a browser-based, pre-authenticated shell accessible from the AWS Management Console for executing AWS CLI commands without local installation.
AWS Cloud9
- Web-based Integrated Development Environment (IDE) for coding and collaboration.
Ephemeral Volume
- Temporary storage local to individual instances, providing low-latency access.
AWS Graviton
- Offers up to 40% better price performance compared to other instance types.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Description
This quiz covers the functionalities of Sagemaker Data Wrangler, including data import, preparation, transformation, and analysis for machine learning workflows. It also highlights features such as Feature Store and ML Lineage Tracking, critical for model governance and data trustworthiness.