Data Warehouse: Facts and Dimensions

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson
Download our mobile app to listen on the go
Get App

Questions and Answers

In dimensional modeling, what is the primary purpose of modeling a business process?

  • To support the analysis of a business operation. (correct)
  • To replace operational systems.
  • To create data silos within the organization.
  • To complicate the analysis of business operations.

Within the context of dimensional modeling, what role do facts play?

  • They define the relationships between dimensions.
  • They act as filters for dimensions.
  • They serve as measurements of the business process. (correct)
  • They provide the context for measurements.

What is the role of dimensions in the context of data warehousing?

  • To store the numerical measurements of a business process.
  • To aggregate data for faster querying.
  • To provide context to the facts in a data warehouse. (correct)
  • To define the security protocols for data access.

Which of the following best describes the 'grain' of a fact table?

<p>The level of detail represented in each row of the fact table. (A)</p> Signup and view all the answers

When identifying facts and dimensions in a spoken statement, which word often indicates the presence of a dimension?

<p>By (D)</p> Signup and view all the answers

In the context of data warehousing, what is the significance of identifying facts and dimensions in reports?

<p>It helps in understanding how data would be stored and used in a database. (B)</p> Signup and view all the answers

In a dimensional model, what are dimensions used for in reports?

<p>Specifying groupings or 'break levels' in the report. (A)</p> Signup and view all the answers

In the context of dimensional modelling, what are 'filters' most closely associated with?

<p>Dimensions (B)</p> Signup and view all the answers

In the context of dimensional modeling, what is meant by 'grouping dimensions and facts'?

<p>Organizing facts and dimensions in a database to share relationships. (B)</p> Signup and view all the answers

What is a key characteristic of facts that are grouped together in a dimensional model?

<p>They are at the same level of detail. (B)</p> Signup and view all the answers

What is the primary purpose of surrogate keys in dimension tables within a star schema?

<p>To enable the analytic system to respond to changes in operational systems. (B)</p> Signup and view all the answers

What is the role of the fact table in a star schema?

<p>To act as the engine for business process measurement. (D)</p> Signup and view all the answers

In a star schema, what do surrogate keys in the fact table provide?

<p>Links to related dimensions. (B)</p> Signup and view all the answers

When querying a star schema, what is the purpose of 'browsing dimensions'?

<p>To explore the details within a dimension. (D)</p> Signup and view all the answers

What is the primary purpose of surrogate keys in a fact table when querying a star schema?

<p>To join fact tables with dimension tables. (A)</p> Signup and view all the answers

When querying a fact table in a star schema, which of the following elements control the sorting of results?

<p>Dimension values (D)</p> Signup and view all the answers

How are nonadditive facts typically computed in a dimensional model?

<p>As the ratio of additive facts. (B)</p> Signup and view all the answers

What does it mean for a fact to be 'additive' in a data warehouse?

<p>It can be summed across any and all of the dimensions. (B)</p> Signup and view all the answers

What is a 'degenerate dimension' in the context of dimensional modeling?

<p>A dimension stored directly in the fact table. (D)</p> Signup and view all the answers

What is the main purpose of implementing Slowly Changing Dimensions (SCD) in a data warehouse?

<p>To track and manage changes to dimension data over time. (A)</p> Signup and view all the answers

In Slowly Changing Dimensions (SCD), what does a Type 0 dimension signify?

<p>The dimension values never change and no action is required. (A)</p> Signup and view all the answers

In the context of Slowly Changing Dimensions (SCD), what action is taken in Type 1 SCDs when a dimension value changes?

<p>The dimension is simply overwritten with the new value. (D)</p> Signup and view all the answers

What is the main characteristic of Type 2 Slowly Changing Dimensions (SCD)?

<p>They create a new record in the dimension table for each change. (B)</p> Signup and view all the answers

What is the purpose of documenting slow change rules for a dimension table?

<p>To ensure that the appropriate slow change response is chosen for each dimension attribute. (A)</p> Signup and view all the answers

In a star schema, what type of table is always on the "one" side of a one-to-many relationship?

<p>Dimension table (B)</p> Signup and view all the answers

Why is it important to distinguish between additive, non-additive and semi-additive facts in dimensional modelling?

<p>To ensure that aggregations are calculated correctly according to the properties of each fact. (C)</p> Signup and view all the answers

In the context of Slowly Changing Dimensions (SCD), what is the most common downside of choosing type 1?

<p>No history is kept. (C)</p> Signup and view all the answers

If a data warehouse designer is uncertain about how dimension values will change over time, what type of SCD response is generally considered the safest approach?

<p>Type 2 (D)</p> Signup and view all the answers

Which of the following is LEAST likely to be a benefit of using surrogate keys?

<p>Increased storage for the dimension table. (C)</p> Signup and view all the answers

How does dimensional modeling contribute to the effectiveness of decision support systems?

<p>By denormalizing data in a way that aligns with how users analyze data. (D)</p> Signup and view all the answers

When a source system captures the reason for a change in dimension data, how might this impact the choice of SCD Type?

<p>A single attribute may drive either type response. (D)</p> Signup and view all the answers

Which of the following is true when choosing measures for dimensional modelling?

<p>Facts should support business questions. (D)</p> Signup and view all the answers

In a dimensional data warehouse, you need to track changes in customer addresses. You want to maintain a history of all addresses for each customer, and you need to be able to accurately report on past sales based on the customer's address at the time of the sale. Which SCD type is most appropriate?

<p>Type 2 (A)</p> Signup and view all the answers

A large retail company wants to efficiently analyze its sales data across various dimensions such as product category, region, and time. The data warehouse team is considering different dimensional models. Which of the following is a key consideration when grouping dimensions to create a performant model for analysis?

<p>Organizing dimensions so they can share relationships independent of facts. (D)</p> Signup and view all the answers

A data warehousing team is designing a star schema for a sales analysis system. The team needs to decide on the granularity of the fact table. Which of the following considerations is MOST important when determining the grain of the fact table?

<p>Balancing the level of detail with the ability to answer key business questions. (A)</p> Signup and view all the answers

An organization uses a Type 2 SCD to track changes in customer demographics. A customer moves from California to Texas. In addition to inserting a new record for the customer with the updated address, what else needs to be done?

<p>The old record must have its <code>end_date</code> updated to indicate when was the <em>last</em> day that it was current. (B)</p> Signup and view all the answers

A company is designing a data warehouse for sales analysis. The sales team needs to analyze sales by product, customer, and region. Which of the following is the MOST important consideration when determining the data types for columns in the fact table?

<p>Selecting data types that can accurately represent the data and support aggregations. (A)</p> Signup and view all the answers

A company implements a dimensional model for its sales data, but discovers that query performance is poor due to the size of the fact table. Which of the following strategies would reduce the size of the fact table?

<p>Decreasing the level of granularity of the fact table. (B)</p> Signup and view all the answers

You are designing a Type 2 SCD for customer data in a data warehouse. The customer dimension table includes attributes such as customer_id, name, address, start_date, and end_date. How would you query the customer name and address relevant on January 1, 2024?

<p><code>SELECT name, address FROM customer WHERE customer_id = 123 AND start_date &lt;= '2024-01-01' AND (end_date &gt;= '2024-01-01' OR end_date IS NULL)</code> (C)</p> Signup and view all the answers

In a dimensional model, a fact table contains order_date_key, product_key, customer_key, sales_amount, and cost_amount. Which SQL statement to determine the total sales amount and cost amount for each product in January 2024?

<pre><code class="language-sql">SELECT p.product_name, SUM(ft.sales_amount), SUM(ft.cost_amount) FROM fact_table ft JOIN product_dim p ON ft.product_key = p.product_key WHERE ft.order_date_key BETWEEN '2024-01-01' AND '2024-01-31' GROUP BY p.product_name; ``` (C) </code></pre> Signup and view all the answers

Flashcards

What supports business process analysis?

Analysis of a business process using modeling.

Facts vs. Dimensions

Measurements are called facts, context descriptors are dimensions.

What typically supports business processes?

Operational system such as billing/purchasing system

Declare the Grain

One row per scan of an individual product on a customer's sales transaction, One row per line item on a bill from a doctor or One row individual boarding pass scnanned at an airport gate.

Signup and view all the flashcards

What does a dimentional design do?

Facts connected by dimensions in a database.

Signup and view all the flashcards

Dimensions in Reports

Filters define these elements.

Signup and view all the flashcards

Dimensional Design

Designed to organize facts and dimensions in a database; share relationships across dimensions independent of facts.

Signup and view all the flashcards

Star Schema

Product, Salesperson, Day, Customer and Facts.

Signup and view all the flashcards

Fact Table

The facts are accompanied by these, which provide the dimensional context for each measurement

Signup and view all the flashcards

What does a fact table contain?

At core of star schema, engine for business process measurement, stores detailed measurements.

Signup and view all the flashcards

What is browsing dimensions?

A querying technique over dimensions, browsing the details within the dimension.

Signup and view all the flashcards

Querying a Fact Table

Used to filter results, specify aggregation scope and control sorting.

Signup and view all the flashcards

Additive Fact

Facts at one level but expressed at many summary levels.

Signup and view all the flashcards

Nonadditive Facts

Calculated from ratio of additive facts.

Signup and view all the flashcards

Degenerate dimensions

Can't sort all dimensions into neat tables, so store dimensions in the fact table.

Signup and view all the flashcards

Slowly Changing Dimensions (SCD)

Capturing and representing source data changes in dimension tables.

Signup and view all the flashcards

SCD Type 0

Dimension values never change.

Signup and view all the flashcards

SCD Type 1

Overwrites the dimension when the source value changes.

Signup and view all the flashcards

SCD Type 2

Create a new record into the dimension table when the source data changes.

Signup and view all the flashcards

How to document changes?

For each dimension attribute, choose the right slow change response

Signup and view all the flashcards

Cardinality of a Star Schema

One-to-many/inverse. One side is always the dimension table, many side is the fact table.

Signup and view all the flashcards

Dimensional Modeling

A design approach optimized for analytical systems.

Signup and view all the flashcards

Star Schema Dimension Tables

Surrogate keys manage change

Signup and view all the flashcards

Study Notes

  • Dimensional design concerns stars and cubes.

Modeling the Data Warehouse

  • Analysis of a business process is supported by modeling.
  • Measurement and context form two simple concepts.
  • Measurements are called "Facts".
  • Context descriptors are called "Dimensions."

Dimensional Design Process

  • Business processes are typically supported by an operational system like billing/purchasing.
  • The grain should be declared.
  • Focus on identifying dimensions and facts.

Identifying Facts & Dimensions

  • The word "by" is almost always followed by a dimension.
  • The word "for" is a good indicator of a dimension, where the next word is likely an instance value of said dimension.
  • Facts are usually numeric in value.

Facts & Dimensions in Reports

  • Facts and dimensions can be recognized, by how they would be used, if stored in a database and shown on a report.
  • Dimensions serve as "filters" or "query predicates".
  • Filters imply dimensions, such as region, month and year.
  • Dimensions specify groupings or "break levels" to identify levels of sub-totals.
  • Facts can be found in queries or reports based on their use.
  • Elements aggregated, summarized, or subtotaled are facts.

Grouping Dimensions and Facts

  • Dimensional designs organize facts and dimensions for storage in a database.
  • Dimensions can share relationships independent of facts.
  • Dimensions may be grouped together.
  • Facts available at the same level of detail are grouped together.
  • 1..n means that "This is very important"

Fact Table

  • It lies at the core of a star schema
  • It acts as the engine for business process measurement
  • The facts are accompanied by foreign keys.
  • Fact table includes surrogate keys referring to associated dimension tables.

Querying a Fact Table

  • Dimension values provide context.
  • Dimensions specify the scope of SUM() aggregation.
  • The fact is aggregated.
  • Each row summarizes order lines in the fact table.

Additive & Nonadditive Facts

  • A fact table stores facts at a specific level of detail.
  • Facts can be expressed at various levels of summarization.
  • Facts can be combined with dimensions to answer business questions.
  • Some facts are fully additive and may be summed across dimensions.
  • Not all measurements exhibit additivity.
  • Rates or percentages are not additive.

Degenerate Dimensions

  • Sometimes, all the dimensions cannot be sorted into a neat set of tables.
  • In such cases, one or more dimensions may be in the fact table.
  • A dimension column, when in the fact table, becomes a dimensions.
  • The dimension's values can filter, control aggregation, order data, and define master-detail relationships.

Slowly Changing Dimensions (SCD)

  • Dimension tables get their data from operational systems.
  • In dimensional data warehouses or stand-alone data marts, it is gotten directly from the operational system.
  • Dimensions may change in the operational source.
  • It is crucial to identify how changes in source data transform into dimension tables.
  • The phenomenon is referred to as slowly changing dimensions (SCD).

Types of SCD’s

  • Type 0: Dimension values never change and nothing needs to be done.
  • Type 1: The dimension is overwritten with the new value to preserve the history in the star schema.
  • Type 2: A new record/row is inserted into the dimension table.
  • All previously existing records are kept.
  • Type 3: The affected field is updated with the new data value (overwrite), and a new column is added containing the old value.

Dimension tables with Slow change rules

  • For documentation purposes: To document the slow changes rules for a dimension table is a good practice.
  • For each dimension attribute, choose and document the appropriate slow change response.
  • If uncertain, the type 2 response is the safest.
  • When a source system captures the reason for a change, a single attribute may drive either type of response.

Cardinality of a Star Schema

  • A common relationship cardinality is one-to-many or its inverse, many-to-one.
  • The "one" side is always a dimension table.
  • The "many" side is always a fact table.
  • A retention period, for example, is 5 Years.

Storage Sizing of a Star Schema

  • Relationships have a cardinality of one-to-many or its inverse many-to-one.
    • "One" side: Dimension table.
    • "Many" side: fact table.
  • Retention Period Example is 5 Years.

Summary

  • Dimensional modeling is a design approach optimized for analytic systems.
  • A dimensional model captures how a process is measured.
  • Data elements that represent measurements are called facts.
  • Data elements that provide context for measurements are called dimensions.
  • Dimensions and Facts are grouped into dimension tables and fact tables.
  • Implemented in a relational database, the design is called a star schema.
  • Dimension tables in a star schema use surrogate keys in order to enable the analytic system to respond to changes in the operational data
  • The granular facts can be queried at various levels of detail.
  • Exploring the details within a dimension is called browsing.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

More Like This

Use Quizgecko on...
Browser
Browser