Data Warehouse characteristics and design
48 Questions
1 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

According to the provided content, which of the following is NOT a characteristic of a data warehouse?

  • Subject-oriented organization
  • Historical data storage
  • Non-volatile data
  • Frequent updates to data (correct)

Query-driven approaches are always inferior to data warehouses for accessing rapidly changing information.

False (B)

In the context provided, what key problem does a data warehouse aim to solve regarding data?

data integration problem

Bill Inmon coined the term 'Data Warehouse' in the year ______.

<p>1990</p> Signup and view all the answers

Compared to standard databases, data warehouses are characterized by which of the following?

<p>Queries are long and complex (D)</p> Signup and view all the answers

What is the primary focus of a data warehouse user interface as mentioned in the content?

<p>Executive analysis and decision-making (B)</p> Signup and view all the answers

Which of the following is a definition of a data warehouse according to Ralph Kimball?

<p>A copy of transaction data specifically structured for query and analysis. (E)</p> Signup and view all the answers

Match the following data characteristics with the database type they best describe:

<p>Mostly updates, small transactions = Standard DB Mostly reads, complex queries = Warehouse $Gb - $Tb of data, history = Warehouse $Mb - $Gb of data, current snapshot = Standard DB</p> Signup and view all the answers

Which of the following best describes the primary goal of digital data management?

<p>To provide secure, efficient access to digital data for analysis and use. (B)</p> Signup and view all the answers

Continuous data can only be expressed in whole numbers, without fractions or decimals.

<p>False (B)</p> Signup and view all the answers

Name three key components of a digital data management system.

<p>Optimized storage, data security, and backup systems.</p> Signup and view all the answers

Data with distinct and separate values is known as ______ data.

<p>discrete</p> Signup and view all the answers

In data classification, what does 'stability' refer to?

<p>The consistency of data items within their assigned groups, unaffected by investigations. (D)</p> Signup and view all the answers

Match the data type with its appropriate statistical representation:

<p>Discrete Data = Bar charts Continuous Data = Line graphs</p> Signup and view all the answers

Which data warehouse architecture stores every data element only once?

<p>Single-Layer (B)</p> Signup and view all the answers

Which feature of data classification allows for changes in the classification based on different purposes?

<p>Elasticity (A)</p> Signup and view all the answers

Data security in digital data management focuses solely on preventing external threats and does not include measures for internal privacy.

<p>False (B)</p> Signup and view all the answers

Real-time data is considered the ultimate source of all business data.

<p>True (A)</p> Signup and view all the answers

What type of data is created from existing data sets to generate new insights?

<p>Derived data</p> Signup and view all the answers

Data that is produced and stored for its own intrinsic value is known as data as a ______.

<p>product</p> Signup and view all the answers

Match each data extraction method with its description:

<p>Replication tool = Duplicates data from a source to the warehouse. Dump file = Extracts data into a static file for loading into the warehouse. Create report = Generates a formatted report from the source data. ODBC Wrappers = Uses standard interfaces to extract data.</p> Signup and view all the answers

A multinational retailer analyzing global sales data would most likely use which data warehouse architecture?

<p>Three-Tier (D)</p> Signup and view all the answers

Industry has primarily focused on the process of getting information into the data warehouse rather than what to do with the data once it's there.

<p>False (B)</p> Signup and view all the answers

What is the term for metadata that describes how data is being used?

<p>Usage metadata</p> Signup and view all the answers

Which factor most significantly dictates the 'thickness' of a source-specific adapter (wrapper)?

<p>The data model used by the source and the interface it exposes. (B)</p> Signup and view all the answers

Hard coding a wrapper for each data source is the most scalable and maintainable solution for wrapper generation in large data integration projects.

<p>False (B)</p> Signup and view all the answers

Name three typical data transformations that might be performed by a wrapper during data integration.

<p>Convert data to uniform format; remove, add, and reorder attributes; add data to get history; sort tuples.</p> Signup and view all the answers

A key goal of a data monitor is to detect _________ of interest and propagate them to the integrator.

<p>changes</p> Signup and view all the answers

Match the following data integration actions with their descriptions:

<p>Resolve inconsistencies = Identify and correct conflicting data values from different sources. Eliminate duplicates = Remove redundant data entries to ensure data uniqueness. Integrate into warehouse = Incorporate transformed data into the central data repository. Summarize data = Aggregate data to provide higher-level insights and reduce data volume.</p> Signup and view all the answers

Which of the following scenarios typically requires more complex data integration techniques?

<p>Integrating data from unstructured sources with limited schema information. (A)</p> Signup and view all the answers

Digital technology stores and processes data using an infinite number of states.

<p>False (B)</p> Signup and view all the answers

Briefly explain the role of data cleansing within the context of a data warehouse.

<p>Ensuring data accuracy and reliability by identifying and correcting inconsistencies, errors, and missing values.</p> Signup and view all the answers

Which of the following is NOT a typical function of digital data platforms?

<p>Designing new marketing campaigns (A)</p> Signup and view all the answers

Digital data management is limited to use in marketing and advertising only.

<p>False (B)</p> Signup and view all the answers

Name three benefits of using a Digital Management Platform (DMP).

<p>Unifies data and breaks down silos, helps identify audiences and customers, and provides continuous results.</p> Signup and view all the answers

__________ modeling techniques are great features offered in some DMPs that let you discover and target new customer groups.

<p>lookalike</p> Signup and view all the answers

Match the following DMPs with their highlighted feature:

<p>Google Marketing Platform = Flexible scaling, improved data analytics, automation and connectivity Nielsen DMP = Seamless media-buying platform that forms a complete view of customer behavior Oracle BlueKai DMP = Unifying advertising data, building effective audiences, connecting across devices, activating digital campaigns, and analyzing audiences</p> Signup and view all the answers

What is the primary function of frequency capping, as offered by some DMPs like Nielsen?

<p>To limit the number of times an ad is shown to a unique user. (A)</p> Signup and view all the answers

DMPs are one-off reporting solutions, and do not enable long-term strategies.

<p>False (B)</p> Signup and view all the answers

Which of the following is a function highlighted by Oracle BlueKai DMP for its users?

<p>Building effective audiences (B)</p> Signup and view all the answers

Which of the following best describes the primary purpose of a data warehouse (DW)?

<p>To provide a large collection of organized and clean business data for decision-making. (B)</p> Signup and view all the answers

Data marts are typically broader in scope than data warehouses, encompassing data from across the entire organization.

<p>False (B)</p> Signup and view all the answers

Explain how the warehousing approach differs from querying data directly from operational systems in terms of workload impact and data currency.

<p>The warehousing approach involves copying and integrating data in advance, which reduces the load on operational systems and allows for complex queries without interfering with real-time operations. However, the data in the warehouse may not be the most current, as it is periodically updated.</p> Signup and view all the answers

Vertical fragmentation of informational systems, also known as vertical stove pipes, is a key problem in large enterprises that leads to difficulties in data management. This fragmentation is primarily driven by application-driven development of _________ systems.

<p>operational</p> Signup and view all the answers

Match the following data systems with their typical data characteristics:

<p>Operational Systems = Current, real-time data Data Warehouses = Historical, summarized data Data Lakes = Raw, structured and unstructured data</p> Signup and view all the answers

Which of the following is NOT an advantage of the warehousing approach?

<p>Guaranteed access to the most current information in real-time. (D)</p> Signup and view all the answers

A key goal of data warehousing is to provide uniform user interface and integrated data access for improved decision-making.

<p>True (A)</p> Signup and view all the answers

Describe the types of systems analysts and decision-makers interact with and the types of systems that operational staff primarily use.

<p>Analysts and decision-makers primarily use informational systems like BI tools and data warehouses for decision-making. Operational staff primarily use operational systems like POS, ERP, and inventory systems for day-to-day operations.</p> Signup and view all the answers

Flashcards

Data Lake

A location to dump all sorts of raw data, both structured and unstructured, in its native format.

Data Warehouse (DW)

A large, organized collection of cleaned business data used to assist organizations in making informed decisions.

Data Mart

A subset of a data warehouse, specific to a particular business domain, such as HR or Finance.

Operational Systems

Systems designed for day-to-day operations with current, real-time data and high-speed transactional processing.

Signup and view all the flashcards

Informational Systems

Systems used for decision-making and analysis, dealing with historical, summarized data and complex queries.

Signup and view all the flashcards

Unified Access to Data

Collecting and combining information to provide an integrated view and uniform interface, supporting data sharing.

Signup and view all the flashcards

Warehousing Approach

An approach where information is integrated in advance and stored for direct querying and analysis.

Signup and view all the flashcards

OLTP

Online Transaction Processing: High-speed transactional processing

Signup and view all the flashcards

Data Warehouse

Complete, consistent data store from various sources, understandable and usable for end users in a business context.

Signup and view all the flashcards

Data Warehouse Definition

Subject-oriented, integrated, time-variant, and non-volatile data collection supporting management's decision-making.

Signup and view all the flashcards

Subject-Oriented

Data organized by business subject (e.g., customer, product) rather than application.

Signup and view all the flashcards

Historical Data

Historical data is stored, and time attributes are crucial for analysis.

Signup and view all the flashcards

Single Repository

A single place for information, solving data integration problems.

Signup and view all the flashcards

Warehouse Queries

Long and complex queries that read large volumes of data, used especially by decision-makers.

Signup and view all the flashcards

Non-Volatile Data

Append-only updates that are infrequent

Signup and view all the flashcards

Query-Driven Approach

Extracting answers directly each time from the source instead of relying on pre-written notes.

Signup and view all the flashcards

Wrapper

Converts data and queries between different data models.

Signup and view all the flashcards

Source-Specific Adapter

Adapters specific to data sources, translating data and queries.

Signup and view all the flashcards

Data Transformations

Converting data to a uniform format, resolving inconsistencies like byte ordering.

Signup and view all the flashcards

Monitors

Detect changes of interest and propagate them to the integrator using triggers.

Signup and view all the flashcards

Data Integration

Receive data changes from multiple wrappers, resolve inconsistencies, and integrate into warehouse.

Signup and view all the flashcards

Data Cleansing

Find and remove duplicate tuples and detect inconsistent/wrong data.

Signup and view all the flashcards

Digital

Electronic technology processing data in two states: positive and non-positive.

Signup and view all the flashcards

Data Cleansing

Finding and correcting dirty data.

Signup and view all the flashcards

Business Data

Data that represents meaning within a business context.

Signup and view all the flashcards

Derived Data

Data created from existing datasets to provide new information.

Signup and view all the flashcards

Metadata

Data that describes other data, providing context and meaning.

Signup and view all the flashcards

Single-Layer Architecture

Architecture where every data element is stored only once.

Signup and view all the flashcards

Two-Layer Architecture

Comprises real-time and derived data layers, a common industry approach.

Signup and view all the flashcards

Three-Layer Architecture

Involves transforming real-time data to derived data in two steps.

Signup and view all the flashcards

Data Warehousing

The process of getting information into the data warehouse.

Signup and view all the flashcards

Issues in Data Warehousing

Challenges like warehouse design, extraction, integration, and maintenance.

Signup and view all the flashcards

Discrete Data

Data with distinct, separate values that can be counted.

Signup and view all the flashcards

Continuous Data

Data that can take any value within a range and can be measured.

Signup and view all the flashcards

Digital Data Management

Collecting, storing, and providing access to information for analysis and use.

Signup and view all the flashcards

Optimized storage

Expediting retrieval of digital data in computer memory, clouds or on-premises systems.

Signup and view all the flashcards

Data Security

Protecting digital assets from unauthorized use and maximizing privacy for sensitive data.

Signup and view all the flashcards

Backup Systems

Systems for digital data safety and to restore data in case of loss.

Signup and view all the flashcards

Data Retention Policies

Managing policies for how long data is kept, including archiving and deletion.

Signup and view all the flashcards

Data Transfer Tools

Moving data between systems, users, applications, analytics, and algorithms.

Signup and view all the flashcards

Digital Data Platform (DDP)

Securely and efficiently handle digital data, offering tools for fault resolution, resource allocation and performance optimization.

Signup and view all the flashcards

Fault Identification & Resolution

Tools to find and fix problems in the digital data platform or linked systems.

Signup and view all the flashcards

Resource Allocation

Assigning memory and storage space for digital data.

Signup and view all the flashcards

Performance Optimization

Improving system speed by optimizing responses to queries.

Signup and view all the flashcards

Data Management Platform (DMP)

A platform that houses digital data like customer data and campaign data.

Signup and view all the flashcards

Audience Extension

Finding new audience and customers using audience extension or lookalike modeling techniques.

Signup and view all the flashcards

Continuous Reporting

Platforms that enable long-term strategies through constant reporting, not one-off reporting.

Signup and view all the flashcards

Oracle BlueKai DMP

Consolidating advertising data, creating audiences, connecting across devices, activating campaigns, and analyzing audiences.

Signup and view all the flashcards

Study Notes

  • Data warehousing is the first topic

Data Lakes

  • These are locations used to store all types of raw data, whether structured or unstructured

Data Warehouse (DW)

  • A data warehouse involves is a organized and clean dataset
  • Data warehouses help organizations in making informed decisions

Data Marts

  • Data marts are subsets of data warehouses

  • They are business-domain specific, such as HR, Operations, and Finance

  • Heterogeneous information sources are a problem due to different interfaces, data representations, and potential for duplication and inconsistency

  • Data management in large enterprises faces issues like vertical fragmentation of informational systems

Systems Used in Organizations: Operational vs. Informational

  • Operational systems focus on day-to-day operations with current, real-time, high-speed transactional processing

  • Informational systems are for decision-making and analysis, using historical, summarized data with complex queries

  • Operational systems are typically used by operational staff, while analysts and decision-makers use informational systems

  • Examples of operational systems include POS, ERP, and inventory systems, which require immediate response times and highly structured, detailed data

  • Informational systems include BI tools and data warehouses, which are less time-sensitive and use aggregated, multidimensional data

  • The goal is unified acess to data through collection, combination, integrated views, uniform user interfaces, and sharing capabilities

The Warehousing Approach

  • Involves integrating information in advance and storing it for direct querying and analysis

Advantages of Warehousing

  • High query performance
  • Does not interfere with local processing at sources
  • Enables complex queries at the warehouse
  • Involves copying information to the warehouse
  • Allows modification, annotation, summarization, and restructuring of data
  • Provides security without auditing
  • Stores historical information
  • OLTP, or online transaction processing, occurs at information sources

Other Considerations

  • A query-driven approach is suitable for rapidly changing information and sources, truly vast amounts of data from numerous sources, and clients with unpredictable data needs

  • Barry Devlin from IBM says that a data warehouse is a complete store of data, obtained from a variety of sources, available to end-users in a business context

  • Bill Inmon coined the term "data warehouse" in 1990

  • Query-driven approaches directly collect answers from sources

  • The data warehouse data is subject-oriented, integrated, time-variant, and non-volatile, supporting management's decision-making process

  • A data warehouse is a repository containing cleaned, integrated, and reconciled data from various sources

Data Warehouse Characteristics

  • Involves a stored collection of diverse data

  • Solves data integration problems by acting as a single repository of information

  • Is subject-oriented and organized by subject (not application)

  • Intended for analysis and data mining

  • Features a user interface designed for executives

  • Involves large data volumes (GB, TB) that are non-volatile and historical

  • Includes the importance of time attributes

  • All transactions at places are included

  • Warehouse queries are long and complex that contains data that is summarized, historical and reconciled that are mainly reads

  • Standard databases focus on mostly updates, many small transactions, current snapshots and are raw

  • Data warehousing market insights project a global market size of $51.18 billion by 2028

  • Snowflake holds the largest market share in data warehousing with 3,174 domains

  • Approximately 2.5 quintillion bytes of data are created daily

Types of Data

  • Business data represents the meaning of collected data
  • Real-time data is the ultimate source of all business data
  • Reconciled data represents harmonized information
  • Derived data is information created from existing datasets
  • Metadata describes the meaning of the data

Metadata Types

  • Build-time
  • Control
  • Usage

Data as a product

  • Data has intrinsic meaning, it can also be produced for its own reasons
  • An example is the content of a textbook

Data Warehouse Architectures

  • Single-layer: every data element is stored once
  • Two-layer: a structure composed of of real-time plus derived data
  • Three-layer: transforming data from real-time to derived forms

Architecture Comparison

  • Single-Tier: Small retail tracking with sales in excel

  • Two-Tier: Marketing agency that analyzes campaign data through moderate analytics

  • Three-Tier: Multinational retailer that analyzes global sales data with advanced analytics

  • Data warehousing distinct issues contain topics on how to get information into a warehouse and what to do with the data once it is in the warehouse

Data Warehouse Issues

  • Warehouse design
  • Extraction
  • Monitoring
  • Integration
  • Warehousing specification and Maintenance
  • Optimizations

Data Extraction

  • Deals with various source types like relational databases and flat files
  • Tools include replication tools, dump files, created reports, and wrappers

First-Issue Considerations

  • It must be determined if warehouse uses relational or multi-dimensional data model must be used to store data
  • Source-type include legacy, relational, and hierarchical structured data
  • Flat files can be semistructured

How to maintain the Data Warehouse

  • A warehouse must be kept current with data source changes, and also detect updates in the sources.
  • A wrapper converts data and queries from one data model to another and extends query capabilities for sources with limited capabilities

Wrapper Generation Types

  • Hard coding for each source

  • Automatic wrapper generation

  • Source-specific adapters translates different sources of data by use of interfaces

  • Cooperation between cooperative and uncooperative sources will effect the way data is translated

Data Management (Standard v Non Standard Interactions)

  • Standard interactions has many tools and standard sources such as commercial DBMSs, or ODBC-compliant systems.
  • The data is transferred through extraction or replication
  • Non-standard interactions requires semi structured sources with only small schemas which requires the use of data transformations for translation

Data Transformation

  • Format uniformity
  • Byte ordering
  • String termination
  • Internal layout

Monitors

  • Monitors act as detectors for changes of interest
  • It propagates integration through triggers and comparing data

Data integration

  • A process of receiving data (changes) from multiple wrappers/monitors
  • It resolves inconsistencies, eliminates duplicates, and integrates with any existing data
  • Fetch more data from sources

Data Cleansing

  • Remove duplicate tuples and detect incorrect data

Topic 2: Digital Data Warehousing

  • Digital technology generates, stores, and processes data with positive and non-positive states
  • Digital can provide value by executing user experience or building foundational cababilities

Data

  • Data is information translated for efficient movement or processing, especially into binary digital form

  • Raw data is data in its more basic digital format

  • Digital data is numeric codes (0,1) stored in computer systems and software

  • Systems convert this to human-readable information

Digital Data Management

  • Activities involves digital data collection, storing, and providing acess to information
  • Goal: Provide secure user access
  • Key components include optimized storage, security, backups, data retention policies, and tools for data transfer
  • The organization of data provides easy acess to the information for data

Data Classification Features

  • Homogeneity
  • Clarity
  • Stability

Data Types

  • Discrete: Data with separate values
  • Nature: Countable values
  • Values: Fixed, specific values
  • Precision: Finite amounts
  • Statistics: Bar charts and Pie charts

Digital Data Management Classifications

  • 80%: Unstructured Data
  • 10%: Semi-structured Data
  • 10%: Structured Data

How to Manage Data

  • Unstructured: Data that doesnt conform with common formats
  • Structured: Use graph-based data models and schemas for better readability
  • Schema-less

Digital Data Platforms

  • Digital data platforms helps retrieve large volumes of data
  • It provides secure and efficient functions that can allocate memory and storage resources
  • DMPs help increase data security and enhance system performances

DMP (Data-Management Platform) Benefits

  • Brings multiple data resources into one platform to make data analyzation more cohesive
  • Helps find specific audiances and customers

Examples of DMPs

  • Google Marketing Platform
  • Nielsen DMP
  • Oracle BlueKai DMP

Managing Data

  • Creates acess through all data tiers that store private data

  • Acesses data through multiple clouds while providing high availability data

  • Archives data with schedule retention

  • A database model contains constraints to manage data acess

  • Defines a structure that will support the analytical needs for analyzation

Data Model Types

  • Relational
  • Hierarchical
  • Network
  • Object-oriented database
  • Object-relational
  • Hierarchical
  • Network

Relational Model

  • Data model sorts data into tables or relation which contains columns and rows

Hierarchical Model

  • Organizes data into tree like data where each record is linked to a root

Network Data Model

  • Builds linked records onto sets based on mathamatical set theory

Object Model

  • Object-oriented model contains reasuable software elements with their features
  • Its also known as a multimedia database

Object Relational Model

  • Combines the simpliticy of advance functionalities from common models with a table structure that can have different language configurations

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

Data Warehousing Tutorial PDF

Description

Explore data warehouse characteristics, design principles, and key functionalities. Understand the advantages and disadvantages of using data warehouses. Learn about components of digital data management systems.

More Like This

Use Quizgecko on...
Browser
Browser