Data Collection and Processing Overview
13 Questions
2 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

Which processing zone is responsible for verifying data quality and accuracy in a data lake architecture?

  • Trusted zone (correct)
  • Refined zone
  • Transient zone
  • Raw zone
  • What describes the function of the discovery sandbox in a data treatment process?

  • To enrich data and automate workflows
  • To ingest data from various sources
  • To support exploratory analysis and experimentation (correct)
  • To store unaltered data in its original form
  • Which of the following is NOT considered a consumer system in big data architecture?

  • Data Warehouse (correct)
  • External Connectors
  • Data Prep Tools
  • Data Visualization Tools
  • In a serverless big data architecture, which of the following is a common cloud service provider?

    <p>Google Cloud Platform</p> Signup and view all the answers

    What is the primary role of metadata in big data architecture?

    <p>To provide context and information about data</p> Signup and view all the answers

    Which of the following describes the process of data capturing?

    <p>It involves collecting and entering data from various sources.</p> Signup and view all the answers

    What distinguishes a data warehouse from a data lake?

    <p>In a data warehouse, data is cleaned and organized before ingestion.</p> Signup and view all the answers

    Which of the following is NOT a main type of data source?

    <p>Social Media Platforms</p> Signup and view all the answers

    Which aspect is NOT considered key in effective data presentation?

    <p>Predictive Analytics</p> Signup and view all the answers

    What is a responsibility of the user in an on-premise data management setup?

    <p>Managing all aspects of hardware and software.</p> Signup and view all the answers

    Which statement regarding data processing is incorrect?

    <p>Data storage does not require consideration of access speed.</p> Signup and view all the answers

    What is one disadvantage of manual data entry?

    <p>It can be time-consuming and prone to human error.</p> Signup and view all the answers

    Which method is typically used for automated data collection?

    <p>Mobile Data Capture</p> Signup and view all the answers

    Study Notes

    Data Collection and Processing

    • Data is captured from various sources: web browsers, smartphones, search engines, internet, banking, and gaming.
    • Data is collected by government agencies, pharmaceutical companies, consumer product companies, big box stores, and credit card companies.
    • Data collection includes: collection, ingestion, preparation, computation, and presentation.
    • A data source is any system providing raw material for analysis, reporting, or applications. Sources can be structured or unstructured (databases, APIs, flat files, cloud storage, data warehouses, spreadsheets).
    • Data capturing methods: manual data entry (e.g., spreadsheets), automated data collection (e.g., sensors, IoT, APIs, surveys, forms, point-of-sale systems, mobile devices). Capturing must be accurate, private, secure, and integrated.
    • Data processing steps: collection, cleaning, transformation, storage, and analysis.
    • Data presentation displays data clearly, engagingly, and understandably, emphasizing key insights (visualization, clarity, context, interactivity, storytelling).
    • Data warehouses organize incoming data into a single schema for analysis.
    • Data lakes store raw data for flexible future use.

    Data Architecture

    • On-Premise: Users manage all components, including networking, storage, servers, virtualization, operating systems, middleware, runtime, data, and applications (e.g., private data centers).
    • Software as a Service (SaaS): Vendors manage everything, users access applications via internet (e.g., Gmail, Microsoft Office 365).
    • Serverless Big Data Architecture: Cloud service providers (AWS, Azure, GCP) handle the infrastructure.

    Data Treatment (Two Approaches)

    • Data Treatment 1:*

    • Data Sources: Streaming, file data, and relational data.

    • Data Lake Zones:

      • Transient: Ingest, tag, catalog data.
      • Raw: Metadata, protect sensitive data. Identify data.
      • Trusted: Data quality, validation; verified accuracy.
      • Refined: Enrich data, automate workflows.
    • Consumer Systems: Data catalog, data prep tools, data visualization, external connectors.

    • Data Treatment 2:*

    • Data Sources: OLTP/ODS, enterprise data warehouse, logs, cloud services, streaming, and file data.

    • Steps:

      • Transient loading zone: Ingest data without transformation.
      • Raw data zone: Store original, unaltered data and tokenize.
      • Refined data zone: Integrate data into consistent format.
      • Trusted data zone: Provides high-quality data.
      • Discovery sandbox: Enables exploratory analysis and experimentation.
    • Consumers: Business analysts, data scientists.

    • Foundation Layers: Metadata, data quality, data catalog, and security.

    Data Technologies and Software Architecture

    • Data technologies involve collecting using apps and IoT, ingestion by third parties (MQTT), preparation/computation (data lake), sending to a data warehouse, and presentation.
    • Software architecture considers design, quality attributes, IT environment, business strategy, and human dynamics.

    Studying That Suits You

    Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

    Quiz Team

    Description

    This quiz explores the fundamentals of data collection and processing, covering various methods used to capture and handle data. It highlights the importance of accurate and secure data collection from different sources, along with the necessary steps in processing and presenting data effectively.

    More Like This

    Use Quizgecko on...
    Browser
    Browser