Questions and Answers
Which formats can Amazon Kinesis Data Firehose convert input data to?
Amazon Managed Workflows for Apache Airflow is primarily used for converting log files into Apache Parquet format.
False
What service can be used to transform input formats like CSV to JSON before using Kinesis Data Firehose?
AWS Lambda
Amazon Kinesis Data Firehose delivers real-time streaming data to destinations such as Amazon S3, Amazon Redshift, and __________.
Signup and view all the answers
Match the following services with their primary function:
Signup and view all the answers
Which of the following options would likely involve the least operational overhead?
Signup and view all the answers
Using an EC2-based Amazon EMR cluster requires more maintenance than using Amazon EMR Serverless.
Signup and view all the answers
What is the recommended save format for log files in the external Amazon S3 table when using Hive?
Signup and view all the answers
Setting up an external table on Amazon S3 in Hive should have the file format set to __________.
Signup and view all the answers
Match the following options with their corresponding operational characteristics:
Signup and view all the answers
Study Notes
Amazon Kinesis Data Firehose Overview
- Converts input data format from JSON to columnar formats like Apache Parquet or Apache ORC before storing in Amazon S3.
- Columnar formats save space and enhance query performance compared to row-oriented formats like JSON.
Transformation of Other Input Formats
- AWS Lambda can be utilized to transform input formats (e.g., CSV or structured text) to JSON before processing with Kinesis Data Firehose.
Data Delivery Destinations
- Amazon Kinesis Data Firehose is a fully managed service that delivers real-time streaming data to multiple destinations:
- Amazon S3
- Amazon Redshift
- Amazon OpenSearch Service
- Amazon OpenSearch Serverless
- Splunk
- Custom HTTP endpoints, including third-party services like Datadog, Dynatrace, LogicMonitor, MongoDB, New Relic, Coralogix, and Elastic.
Recommended Solution
- Setup includes sending cryptocurrency log files directly to Amazon Kinesis Data Firehose.
- Configure Kinesis Data Firehose to trigger a Lambda function that converts log files to Apache Parquet format, delivering files to a centralized S3 bucket.
Incorrect Options Explained
-
Amazon Managed Workflows for Apache Airflow (Amazon MWAA):
- Primarily an orchestration service, not designed for converting log files to Apache Parquet format.
-
Amazon Kinesis Data Streams with EC2 Instances:
- Involves maintaining a Kinesis Client Library on an EC2 Auto Scaling group, which requires significant management and upkeep, contrary to the requirement for minimal operational overhead.
-
Apache Hive on Amazon EMR:
- Using EMR requires maintenance and operational tasks. It is not viable unless it explicitly refers to Amazon EMR Serverless for streamlined management.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Description
This quiz covers Amazon Kinesis Data Firehose and its capabilities in converting input data formats, specifically from JSON to Apache Parquet or ORC. It also discusses the use of AWS Lambda for transforming other formats like CSV to JSON, enhancing data storage efficiency. Test your knowledge about this fully managed service and its applications in data handling.