Data Collection and Processing Overview
13 Questions
2 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

Gmail and Microsoft Office 365 are examples of software that is directly delivered by servers rather than the internet.

False

The trusted zone in a data lake ensures that data has been verified for accuracy.

True

In the data treatment process, the raw data zone is where data is transformed and standardized.

False

Amazon Web Services, Microsoft Azure, and Google Cloud Platform are examples of OLTP systems.

<p>False</p> Signup and view all the answers

The transient loading zone is the area where data is transformed and integrated into a consistent format.

<p>False</p> Signup and view all the answers

Data sources can only be structured and not unstructured.

<p>False</p> Signup and view all the answers

Automated data collection can include the use of sensors and APIs.

<p>True</p> Signup and view all the answers

A data lake stores incoming data in a cleaned and organized manner.

<p>False</p> Signup and view all the answers

Data presentation is solely focused on the accuracy of the data.

<p>False</p> Signup and view all the answers

In a Software as a Service (SaaS) model, the vendor manages everything, while users only interact with the application.

<p>True</p> Signup and view all the answers

Effective data capturing is valuable for building unreliable datasets.

<p>False</p> Signup and view all the answers

Data processing encompasses data collection, cleaning, and storage.

<p>True</p> Signup and view all the answers

Manual data entry involves automated methods to input data into systems.

<p>False</p> Signup and view all the answers

Study Notes

Data Collection and Processing Overview

  • Data is collected from various sources including web browsers, smartphones, search engines, and more.
  • Government agencies, pharmaceutical companies, consumer product companies, and retailers collect data.
  • Big data infrastructure involves collection, ingestion, preparation, computation, and presentation.
  • A data source provides raw material for analysis, reporting, and other applications. Sources can be structured or unstructured.
  • Examples of data sources include databases, APIs, flat files, cloud storage, data warehouses, and spreadsheets.

Data Capturing Methods

  • Data capturing involves collecting and entering data from various sources into a system.
  • Methods include manual data entry (humans inputting data) and automated collection (sensors, IoT, APIs).
  • Automated data collection also includes surveys, forms, point-of-sale systems, and mobile data capture.
  • Data capture should be accurate, private, secure, and integrated.

Data Processing Steps

  • Data processing involves data collection, cleaning, transformation, storage, and analysis.

Data Presentation Methods

  • Data presentation displays data clearly, engagingly, and understandably.
  • Aspects include visualization, clarity, context, interactivity, and storytelling.

Data Warehousing and Data Lakes

  • Data warehouses organize incoming data into a single schema before analysis. Analysis is on the curated data.
  • Data lakes store incoming data in its raw form; data is selected and organized based on needs.

Data Infrastructure Models

  • On-Premise: User manages all components (networking, storage, servers, etc.).
  • SaaS (Software as a Service): Vendor manages all components; users access applications.

Big Data Architecture

  • Cloud-based architecture like AWS, Azure, and GCP are serverless, relying on cloud providers.

Data Treatment (Data Lake Approach)

  • Data Sources: Includes streaming, file, and relational data.
  • Data Lake Zones (Data Treatment 1):
    • Transient: Ingest, tag, catalog data.
    • Raw: Apply metadata, protect sensitive data.
    • Trusted: Data quality and validation.
    • Refined: Enrich data, automate workflows.
  • Consumer Systems: Data catalog, data prep tools, visualization, external connectors.

Data Treatment (Data Warehouse Approach)

  • Data Sources: OLTP/ODS, enterprise data warehouse, logs, cloud services, streaming, file data.
  • Zones (Data Treatment 2):
    • Transient loading: Data ingested without transformation.
    • Raw data: Store unaltered data, tokenize.
    • Refined data: Integrate data consistently.
    • Trusted data: Reliable, high-quality data.
    • Discovery sandbox: Exploratory analysis, experimentation.
  • Consumers: Business analysts, data scientists.
  • Foundation Layers: Metadata, data quality, catalog, security.
  • Technologies: Collecting data, ingestion, preparation, computation, and presentation.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Description

Explore the essential concepts of data collection and processing methods in this quiz. Learn about various data sources, capturing techniques, and the importance of accuracy and privacy in data handling. Test your understanding of both structured and unstructured data systems.

More Like This

Use Quizgecko on...
Browser
Browser