Data Engineering with AWS
3 Questions
0 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

Which solution will meet the requirements for orchestrating a data pipeline with the least management overhead?

  • Use an Apache Airflow workflow that is deployed on an Amazon EC2 instance.
  • Use an AWS Glue workflow to run the Lambda function and then the AWS Glue job.
  • Use an Apache Airflow workflow that is deployed on Amazon Elastic Kubernetes Service.
  • Use an AWS Step Functions workflow that includes a state machine. (correct)
  • Which solution will meet the requirements for setting up a data catalog and metadata management with the least operational overhead?

  • Use the AWS Glue Data Catalog as the central metadata repository. (correct)
  • Use Amazon DynamoDB as the data catalog.
  • Use Amazon Aurora as the data catalog.
  • Use the AWS Glue Data Catalog and extract the schema for Amazon RDS and Amazon Redshift sources.
  • What is the capacity mode of the Amazon DynamoDB table described in the scenario?

    Provisioned capacity mode

    Study Notes

    Orchestrating Data Pipelines

    • A data engineer needs to orchestrate a data pipeline with an AWS Lambda function and an AWS Glue job.
    • The solution should have minimal management overhead.
    • Using an AWS Step Functions workflow with a state machine to run the Lambda function and then the AWS Glue job is the least management-intensive option.

    Data Catalog and Metadata Management

    • A company needs to set up a data catalog and metadata management for their data sources in AWS.
    • The data sources include structured sources like Amazon RDS and Amazon Redshift, as well as semi-structured sources like JSON and XML files stored in Amazon S3.
    • The company needs a solution that updates the data catalog regularly and detects changes to the source metadata, with minimal operational overhead.
    • Using the AWS Glue Data Catalog as the central metadata repository is the least operationally intensive solution.
    • AWS Glue crawlers can connect to multiple data stores, update the Data Catalog, and infer schemas for data in Amazon S3.

    DynamoDB Workload Optimization

    • A company stores data from an application in an Amazon DynamoDB table operating in provisioned capacity mode.
    • The application has predictable throughput loads on a regular schedule, with an immediate increase in activity early every Monday morning.
    • Weekends experience very low usage.
    • The company should use DynamoDB's provisioned capacity options to adjust capacity based on the predicted workload patterns.
    • Configure lower capacity for weekends and higher capacity for Mondays.
    • This approach optimizes resources and minimizes costs.

    Studying That Suits You

    Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

    Quiz Team

    Related Documents

    Description

    This quiz covers key concepts in orchestrating data pipelines and managing data catalogs in AWS. You will explore the use of AWS Lambda, AWS Glue, and AWS Step Functions for efficient data processing. Additionally, learn about setting up a data catalog and regular updates using AWS Glue Crawlers.

    More Like This

    Aa
    1 questions

    Aa

    OrganizedGarnet avatar
    OrganizedGarnet
    Quiz 2
    3 questions

    Quiz 2

    OrganizedGarnet avatar
    OrganizedGarnet
    Data Engineering and ETL Pipelines Quiz
    45 questions
    Use Quizgecko on...
    Browser
    Browser