Data Warehousing Flashcards
84 Questions
100 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What does dealing with changing dimension attributes refer to?

Dealing with changing dimension attributes involves managing the changes in characteristics of dimension data in databases.

What 3 types are there to change the dimension? (Select all that apply)

  • Adding a new attribute/column (correct)
  • Adding a new dimension record/row (correct)
  • Overwrite the dimension attribute (correct)
  • Deleting a dimension record

Explain the Type-1 dimension changing.

Overwriting the dimension attribute; it's easy to maintain and is often used for processing corrections.

What are the pros and cons of Type-1 dimension changing? (Select all that apply)

<p>You lose historic information (B), Doesn't create large tables (C), Easy to maintain records (D)</p> Signup and view all the answers

Explain the Type-2 dimension changing.

<p>Adding a new dimension record/row to capture attribute changes; both prior and new rows are maintained.</p> Signup and view all the answers

What are the pros and cons of Type-2 dimension changing? (Select all that apply)

<p>Expensive database option due to size (B), Creates large dimension tables (C), All history of dimension changes is kept (D)</p> Signup and view all the answers

Explain the Type-3 dimension changing.

<p>Adding a new attribute/column while preserving historical information without adding a new row.</p> Signup and view all the answers

What are the pros and cons of Type-3 dimension changing? (Select all that apply)

<p>Preserves historical values (A), Limited to the number of columns (B)</p> Signup and view all the answers

What is the rapidly changing large dimension problem?

<p>It occurs when there is an extremely large dimension with attributes that are constantly changing.</p> Signup and view all the answers

How could we apply a Type-2 change in the context of rapidly changing dimension tables?

<p>By using mini-dimensions.</p> Signup and view all the answers

What are mini-dimensions?

<p>Mini-dimensions contain rapidly changing attributes of the original dimension and are treated as standalone dimensions.</p> Signup and view all the answers

Which entities might have rapidly changing large dimensions? (Select all that apply)

<p>Large retail stores (B), Insurance companies (C), Government agencies (D)</p> Signup and view all the answers

The logical data model is usually structured as a _____. Why?

<p>Star Schema</p> Signup and view all the answers

What is a fact table?

<p>A fact table is the <em>central table in a star schema of a data warehouse</em>. It stores quantitative information for analysis and is often denormalized.</p> Signup and view all the answers

A fact table holds...... A dimension table stores.....

<p>the data to be analyzed; data about the ways in which the data in the fact table can be analyzed.</p> Signup and view all the answers

The fact table consists of ____ types of columns. Name and describe them.

<ol> <li>The <em>foreign key column</em> to allow joins with the other dimensional tables in the STAR Schema, 2. The <em>measures column</em> that contains the data being analyzed.</li> </ol> Signup and view all the answers

What is an issue with performing relational queries using the star schema?

<p>Access time</p> Signup and view all the answers

Define Measures.

<p>Measures are the core of the dimensional model and are data elements that can be summed, averaged, or mathematically manipulated.</p> Signup and view all the answers

What are the constructs of a STAR Schema?

<p>Fact Node, Dimension Node, Dimension Edge.</p> Signup and view all the answers

Define Dimensions.

<p>They're the set of companion tables to a fact table that contain measures.</p> Signup and view all the answers

Define facts.

<p>A fact is a value, or measurement, which represents a fact about the managed entity or system.</p> Signup and view all the answers

Define Granularity.

<p>Granularity means 'Level of Division'. It indicates how detailed data can be in terms of attributes.</p> Signup and view all the answers

What is an OLAP Cube?

<p>Online Analytical Processing Engine, a three-dimensional array for analyzing data.</p> Signup and view all the answers

What is an issue with the OLAP Cube?

<p>Data Sparsity (too many empty cells)</p> Signup and view all the answers

A ____________ _____________ is a good relational DB for building a ______________ ____________________.

<p>Star Schema; OLAP Cube.</p> Signup and view all the answers

OLAP cubes offer _____.

<p>sophisticated security options.</p> Signup and view all the answers

What are key decisions made during the design of a dimensional model?

<p>Select the 'business' process, declare the granularity, identify the measures (facts), identify the dimensions.</p> Signup and view all the answers

What are Additive Measures?

<p>Measures that can be added along all dimensions.</p> Signup and view all the answers

What are Semi-Additive Measures?

<p>Measures that can be added only along some dimensions.</p> Signup and view all the answers

What are Non-additive measures?

<p>Measures that can't be added along any dimension.</p> Signup and view all the answers

What 2 types of fact nodes/tables are there?

<ol> <li>Transactional, 2. Snapshot.</li> </ol> Signup and view all the answers

What is a Transactional Fact Node?

<p>Measures in the fact table represent an event that occurred at a point in time.</p> Signup and view all the answers

What is a Snapshot Fact Node?

<p>Measures in the fact node represent a 'picture' of the activity at the end of a given period.</p> Signup and view all the answers

Snapshots usually only have....

<p>One fact table row per time period (at the occurrence of each snapshot).</p> Signup and view all the answers

A _____________ fact node represents events occurring at a point in time.

<p>transactional.</p> Signup and view all the answers

Does a transactional or snapshot fact node have more dimensions?

<p>Transactional.</p> Signup and view all the answers

Does a transactional or snapshot fact node have more measures?

<p>Snapshot.</p> Signup and view all the answers

What is a Surrogate Key? Why are they important?

<p>System generated key values; important because all data warehouse keys should represent surrogate keys.</p> Signup and view all the answers

Smaller surrogate keys translate into.....

<p>smaller fact table rows.</p> Signup and view all the answers

Databases adopt an entity relationship diagram model, while data warehouses adopt a.....

<p>multidimensional data model.</p> Signup and view all the answers

What is a multidimensional data model?

<p>A multidimensional model views data in the form of a data cube, defined by dimensions and facts.</p> Signup and view all the answers

Explain Online Analytical Processing (OLAP).

<p>OLAP pre-calculates queries that are hard to execute through tabular databases, improving efficiency.</p> Signup and view all the answers

Who has access to databases vs. data warehouses?

<p>Data warehouses are accessed by decision makers and data analysts; databases are typically accessed by non-management employees.</p> Signup and view all the answers

Data warehouses provide..... Databases provide....

<p>Information to support decision making; information to support day-to-day operations.</p> Signup and view all the answers

Data warehouses manage _____________ data. Databases manage ______________ data.

<p>historical; current.</p> Signup and view all the answers

Which contains different levels of granularity?

<p>Data warehouses (A)</p> Signup and view all the answers

While data warehouses support complex transactions, databases can only do....

<p>Short and canned transactions.</p> Signup and view all the answers

Transactions for data warehouses are:

<p>read-only (A)</p> Signup and view all the answers

What are Senior Business Management Sponsor(s)? Why are they important?

<p>Someone who funds the data warehouse (B)</p> Signup and view all the answers

What is Compelling Business Motivation?

<p>It ensures the data warehouse systems align with strategic business motivations and initiatives.</p> Signup and view all the answers

Define feasibility.

<p>It is the availability of clean data, emphasizing on <em>clean</em> data!</p> Signup and view all the answers

What's a good economical justification for developing a data warehouse?

<p>Identifying costs and return on investment for the implementation of the data warehouse.</p> Signup and view all the answers

When end-users are using the data warehouse, what is the main challenge? What's a potential solution?

<p>For end-users to define what information they want from the warehouse and how to use it; focus on 'business' dimensions used for decision making.</p> Signup and view all the answers

What should be the first step when developing a data warehouse?

<p>Figure out what information the end-users want to collect from the DW; focus on the dimensions and measures.</p> Signup and view all the answers

What are the 4 Steps to the Dimensional Modeling Design Process?

<ol> <li>Select business process, 2. Declare granularity, 3. Identify dimensions, 4. Identify facts.</li> </ol> Signup and view all the answers

Explain 1. Select business process.

<p>These are the operational activities performed by your organization. This is the measurement event to be modeled.</p> Signup and view all the answers

Explain 2. Declare granularity.

<p>How many data attributes an entity has; examples include high and low levels of granularity.</p> Signup and view all the answers

Explain 3. Identify Dimensions.

<p>Provide the 'who, what, where, why, and how' context and what hierarchies the foreign keys in your fact table lead to.</p> Signup and view all the answers

Explain 4. Identify Facts.

<p>A <em>fact</em> is literally an entry in the fact table—quantitative measurements that result from your 'business process' occurring.</p> Signup and view all the answers

What question should you ask to determine if an entry is a fact or not?

<p>'Does the attribute take on lots of values, and is it used in calculations?' If yes... FACT.</p> Signup and view all the answers

What are dimensions?

<p>They simply provide <em>attributes</em> that provide more data about a fact in the fact table.</p> Signup and view all the answers

What are the 2 main purposes of dimensions?

<ol> <li>Constrain querying, 2. Filter the query result set.</li> </ol> Signup and view all the answers

Define understandability regarding dimension modeling.

<p>Keeping everything as simple as possible, but not watered down.</p> Signup and view all the answers

What's the effect on query performance when adding dimensions?

<p>No adverse effect—DWs are typically optimized for this kind of design paradigm.</p> Signup and view all the answers

How should a dimension modeling/design be extensible?

<p>It should easily allow for the integration of new data.</p> Signup and view all the answers

What do we use if no property of the fact can fulfill the purpose of the primary key?

<p>A <em>surrogate key</em> - a unique key artificially assigned to identify the row.</p> Signup and view all the answers

Although surrogate keys seem like they're replacements for primary keys, all DWs should....

<p>represent surrogate keys.</p> Signup and view all the answers

What is Normalization? Why do it?

<p>The process of removing redundant attributes from a de-normalized dimension table; it simplifies updates and saves storage space.</p> Signup and view all the answers

What is snowflaking?

<p>A method of normalizing the dimension tables in a star schema.</p> Signup and view all the answers

What are disadvantages of normalization/snowflaking?

<p>The scheme is less intuitive, browsing is more difficult, and additional joins may degrade query performance.</p> Signup and view all the answers

What are the 3 Measure Types?

<ol> <li>Additive, 2. Semi-additive, 3. Non-additive.</li> </ol> Signup and view all the answers

Describe Additive Measures.

<p>Facts that can be added with any dimension in the fact table.</p> Signup and view all the answers

Describe Semi-Additive Measures.

<p>Facts that can be summed for some dimensions in the fact table.</p> Signup and view all the answers

Describe Non-Additive Measures.

<p>Facts that you can never sum, regardless of dimensions present.</p> Signup and view all the answers

What is a Periodic/Snapshot Fact Table?

<p>A row in a periodic snapshot fact table summarizes many measurement events occurring over a standard period.</p> Signup and view all the answers

Measures in the fact table represent an event that..... Measures in the fact node represent.....

<p>occurred at a point in time; a 'picture' of the activity at the end of a given period.</p> Signup and view all the answers

Periodic fact tables allow one to measure the __________ of an organization.

<p>status.</p> Signup and view all the answers

In a PFT, here is _______ fact table row per ________ __________.

<p>one; time period.</p> Signup and view all the answers

In a TFT, there are more ________________, but less _______________ than a PFT. Why?

<p>dimensions; measures; because more data is collected in TFTs.</p> Signup and view all the answers

What is a Factless Fact Table?

<p>A factless fact table is a fact table that does not contain facts. It captures events at the information level.</p> Signup and view all the answers

What is an example of a Factless Fact Table being needed?

<p>Factless fact tables are used for tracking processes like attendance or events.</p> Signup and view all the answers

How do you determine events that did not occur?

<p>Coverage - activity = events which did not occur.</p> Signup and view all the answers

SURROGATE KEYS SHOULD BE -__ INTEGERS.

<p>Auto-Incremented.</p> Signup and view all the answers

How do surrogate keys increase performance?

<p>Surrogate keys are simple integers, allowing for more efficient joins and smaller indices.</p> Signup and view all the answers

Study Notes

Data Warehousing Concepts

  • The logical data model structure commonly uses a Star Schema due to its user-friendly navigation and better performance in relational queries.
  • A fact table serves as the central table in a data warehouse, containing quantitative information for analysis, often stored in a denormalized format.
  • Fact tables hold the data to be analyzed while dimension tables provide context about how the data can be analyzed.
  • Fact tables consist of two types of columns:
    • Foreign key columns for joining with dimension tables.
    • Measures columns containing data to be analyzed.
  • Access time can be an issue when performing relational queries using star schema structures.
  • Measures are core elements in the dimensional model, representing quantitative values, and can be summed, averaged, or manipulated mathematically.

Star Schema Constructs

  • A Star Schema includes:
    • Fact Node: Central table with measures.
    • Dimension Node: Dependent tables that provide context.
    • Dimension Edge: Relationships between dimension tables and facts.
  • Dimensions are companion tables containing characteristics that describe facts.
  • Facts are specific measurements about the entity being managed, e.g., financial metrics like totals or success rates.

Granularity and OLAP

  • Granularity refers to the level of division of an entity into attributes, impacting data detail.
  • An OLAP Cube is a three-dimensional array that allows for complex data analysis.
  • Data sparsity is a common issue with OLAP cubes, often resulting in many empty cells.

Measure Types

  • Three types of measures:
    • Additive: Can be summed across all dimensions (e.g., total sales).
    • Semi-Additive: Can be summed in some dimensions only (e.g., account balances).
    • Non-additive: Cannot be summed at all (e.g., percentages).

Fact Tables

  • Two types of fact nodes:
    • Transactional Fact Nodes: Represent events at a specific time (e.g., retail sales).
    • Snapshot Fact Nodes: Represent data snapshots at the end of a period (e.g., monthly account balances).
  • A periodic/snapshot fact table summarizes measurement events over standard time periods, whereas transactional tables capture every single transaction event.

Data Warehouse vs. Database

  • Databases focus on current data and support operational transactions, while data warehouses manage historical data and offer insights for decision-making.
  • Data warehouses often contain multiple granularity levels compared to databases.

Project Planning and Design

  • Senior Business Management Sponsors are essential for funding and utilizing the data warehouse.
  • Compelling Business Motivation aligns the data warehouse with strategic goals.
  • Feasibility emphasizes the importance of clean data for operational success.
  • Developing a data warehouse involves understanding user requirements and focusing on dimensions and measures.

Dimensional Modeling

  • The four-step design process for dimensional modeling:
    • Select business processes.
    • Declare the granularity of data attributes.
    • Identify dimensions providing context.
    • Identify facts representing quantitative data.

Change Management in Dimensions

  • Types of dimension changes:
    • Type-1: Overwrites existing dimensions (loses historical data).
    • Type-2: Adds a new record for changes (keeps historical data but uses more space).
    • Type-3: Adds new attributes/columns to keep current and previous values while minimizing new rows.

Surrogate Keys

  • Surrogate keys are system-generated unique keys beneficial for identifying rows in fact tables where no single primary key is available. They enhance performance through efficient indexing and joining.
  • Dealing with attributes in data dimensions often requires consideration of whether to overwrite, add, or change attributes to maintain data integrity and historical context.### Rapidly Changing Large Dimension Problem
  • Involves extremely large dimensions where attributes undergo constant change.
  • Affects maintenance and performance as dimensions grow, especially when using Type-2 changes to track historical data.
  • Relevant examples include:
    • Government agencies managing dimensions with over 100 million people.
    • Large retail stores with several million product entries.
    • Insurance companies maintaining millions of automobile records.

Type-2 Change Application

  • Type-2 changes involve adding new records to capture historical data.
  • Implemented through the use of mini-dimensions to manage attributes that frequently change.

Mini-Dimensions

  • Mini-dimensions isolate rapidly changing attributes of a larger dimension.
  • These mini-dimensions function as stand-alone entities and connect directly to fact tables without snowflaking, optimizing database structure.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Description

Test your knowledge of data warehousing concepts with these flashcards. Covering key elements such as star schema and fact tables, this quiz helps reinforce your understanding of data models and their structures. Ideal for students and professionals in data management.

More Like This

Star Schema in Data Warehousing
5 questions

Star Schema in Data Warehousing

AffirmativeHippopotamus avatar
AffirmativeHippopotamus
Star Schema vs Galaxy Schema Quiz
10 questions
Data Warehousing and OLAP Fundamentals
37 questions
Use Quizgecko on...
Browser
Browser