Recent Lessons

Show all results for ""

Data Engineering with AWS

Data Engineering with AWS

Choose a study mode

Play Quiz

Study Flashcards

Spaced Repetition

Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Download our mobile app to listen on the go

Get App

Questions and Answers

Which solution will meet the requirements for orchestrating a data pipeline with the least management overhead?

Use an Apache Airflow workflow that is deployed on an Amazon EC2 instance.
Use an AWS Glue workflow to run the Lambda function and then the AWS Glue job.
Use an Apache Airflow workflow that is deployed on Amazon Elastic Kubernetes Service.
Use an AWS Step Functions workflow that includes a state machine. (correct)

Which solution will meet the requirements for setting up a data catalog and metadata management with the least operational overhead?

Use the AWS Glue Data Catalog as the central metadata repository. (correct)
Use Amazon DynamoDB as the data catalog.
Use Amazon Aurora as the data catalog.
Use the AWS Glue Data Catalog and extract the schema for Amazon RDS and Amazon Redshift sources.

What is the capacity mode of the Amazon DynamoDB table described in the scenario?

Provisioned capacity mode

Flashcards are hidden until you start studying

Study Notes

Orchestrating Data Pipelines

A data engineer needs to orchestrate a data pipeline with an AWS Lambda function and an AWS Glue job.
The solution should have minimal management overhead.
Using an AWS Step Functions workflow with a state machine to run the Lambda function and then the AWS Glue job is the least management-intensive option.

Data Catalog and Metadata Management

A company needs to set up a data catalog and metadata management for their data sources in AWS.
The data sources include structured sources like Amazon RDS and Amazon Redshift, as well as semi-structured sources like JSON and XML files stored in Amazon S3.
The company needs a solution that updates the data catalog regularly and detects changes to the source metadata, with minimal operational overhead.
Using the AWS Glue Data Catalog as the central metadata repository is the least operationally intensive solution.
AWS Glue crawlers can connect to multiple data stores, update the Data Catalog, and infer schemas for data in Amazon S3.

DynamoDB Workload Optimization

A company stores data from an application in an Amazon DynamoDB table operating in provisioned capacity mode.
The application has predictable throughput loads on a regular schedule, with an immediate increase in activity early every Monday morning.
Weekends experience very low usage.
The company should use DynamoDB's provisioned capacity options to adjust capacity based on the predicted workload patterns.
Configure lower capacity for weekends and higher capacity for Mondays.
This approach optimizes resources and minimizes costs.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

AWS Certified Data Engineer-page2.pdf

More Like This

Quiz 2

3 questions

Quiz 2

OrganizedGarnet

Use Quizgecko on...

Browser