Star Schema Cardinality and Storage

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson
Download our mobile app to listen on the go
Get App

Questions and Answers

In a star schema, which side of the relationship cardinality (one-to-many) is always a dimension table?

  • The 'many' side
  • The 'one' side (correct)
  • Both sides
  • Neither side

Given a star schema, which side of the relationship cardinality (one-to-many) is represented by a fact table?

  • The 'few' side
  • The 'one' side
  • The 'none' side
  • The 'many' side (correct)

In dimensional modeling, what does 'drilling' refer to?

  • Summarizing low-level data
  • The process of adding dimensional detail (correct)
  • Aggregating unrelated facts
  • Removing irrelevant data

What does the term 'snowflake' refer to in the context of dimensional modeling?

<p>Dimension tables whose attributes are made explicit in the dimensional design (B)</p> Signup and view all the answers

What is the primary characteristic of facts that should be considered when deciding whether to place them in the same fact table?

<p>Whether they occur simultaneously and at the same level of detail (B)</p> Signup and view all the answers

Why is it generally recommended to use a single fact table for each process in dimensional design?

<p>To allow each process to be analyzed individually without complications (B)</p> Signup and view all the answers

In dimensional modeling, what does the practice of 'drilling into data' involve?

<p>Analyzing the same data at different levels of detail (B)</p> Signup and view all the answers

What is the primary benefit of separating facts with different timing into separate fact tables?

<p>It allows each process to be studied more easily and avoids extraneous zeros (D)</p> Signup and view all the answers

If a data warehouse designer chooses to implement 'snowflaking' in a dimensional model, what is a likely consequence?

<p>Increased complexity in querying and potentially reduced query performance (B)</p> Signup and view all the answers

What is the result of making the relationships between dimension attributes explicit within a dimensional design?

<p>A snowflake schema (D)</p> Signup and view all the answers

Which type of schema consists of a fact table and a group of dimensions, with each dimension table joined directly to the fact table?

<p>Star schema (B)</p> Signup and view all the answers

What fundamentally distinguishes a snowflake schema from a star schema?

<p>The normalization of dimension tables (C)</p> Signup and view all the answers

A data warehouse architect is designing a dimensional model. They have a choice between combining two processes into a single fact table or using two separate fact tables. Which of the following questions is most critical in making this decision?

<p>Are the facts for both processes available at the same level of detail and do they occur simultaneously? (A)</p> Signup and view all the answers

A dimensional model contains a Date dimension and a Sales fact table. Analysts want to analyze sales trends at the Year, Quarter, and Month levels. What technique would enable this type of analysis most directly?

<p>Hierarchies &amp; Drilling (C)</p> Signup and view all the answers

Which of the following is a primary advantage of using separate fact tables for 'Quantity Ordered' and 'Quantity Shipped' in a sales data warehouse?

<p>Avoids potentially misleading zero values when analyzing data (D)</p> Signup and view all the answers

In the context of dimensional modeling, an attribute hierarchy provides a shorthand vocabulary used to do what?

<p>Define the grain of each cube (C)</p> Signup and view all the answers

What should a cube that is designed for holding 'order dollars' at the 'brand (product)' and 'quarter (day)' levels use to define the grain of the data?

<p>Attribute hierarchy (C)</p> Signup and view all the answers

What is the number of total rows in a fact table, given the following information? # Product = 1,000 rows, # Day = 365/year rows, a retention period of 5 years, # Salesperson = 10,000 rows and # Customer = 1,000,000 rows.

<p>1,825 x 10^16 (A)</p> Signup and view all the answers

A data warehouse contains sales data. The design includes a single fact table with a 'generic fact' column and a 'fact_type' dimension. What is a potential drawback of this approach?

<p>Complicated cross-process analysis due to the need to filter by fact_type (D)</p> Signup and view all the answers

In a snowflake schema, what is the primary motivation for normalizing dimension tables?

<p>To reduce data redundancy and improve data integrity (D)</p> Signup and view all the answers

Why might implementing a fully snowflaked schema be detrimental within an analytic setting?

<p>It detracts from usability, complicates ETL, and may negatively impact performance (C)</p> Signup and view all the answers

How is the storage size for a customer table with 1,000,000 rows determined, given the following column data types and sizes? Customer_id (INT, 4 bytes), Customer_name (VARCHAR(50), 50 bytes), Email (VARCHAR(100), 100 bytes), City (VARCHAR(50), 50 bytes), Country (VARCHAR(50), 50 bytes), Signup_date (DATE, 8 bytes).

<p>Approximately 250 MB (B)</p> Signup and view all the answers

What is the initial step to answer the question of computing the facts table size?

<p>Exercise (C)</p> Signup and view all the answers

When two or more facts describe events that do not take place at the same time, what do they describe?

<p>Different Processes (C)</p> Signup and view all the answers

What does the acronym 'ETL' refer to in the context of data warehousing?

<p>Extract, Transform, Load (D)</p> Signup and view all the answers

Which of the following is true about attribute hierarchies??

<p>Attribute hierarchies are best documented graphically. (A)</p> Signup and view all the answers

In the context of data warehouse design, what is the consequence of placing facts that have different timings into a single fact table?

<p>Difficulty in analyzing the individual processes (B)</p> Signup and view all the answers

You are tasked with designing a data warehouse for a retail company. You need to model the relationship between Products, Salespersons, and Customers. You have a fact table, and plan to use dimension tables. Which side of the one to many cardinality is the fact table on?

<p>Many (B)</p> Signup and view all the answers

Consider a data warehouse containing information about product sales. Each product belongs to a category and a brand. You want to enable users to 'drill down' from a general category to the specific brands within that category. Which dimensional modeling concept directly supports this?

<p>Hierarchies (C)</p> Signup and view all the answers

A data warehouse project is considering snowflaking the product dimension, which includes Category, Brand, and individual Products. What is the most likely impact this change have?

<p>Increased storage requirements due to the creation of additional tables and keys (A)</p> Signup and view all the answers

You are designing a dimensional model for tracking student enrollment in courses. Each student belongs to a department and has demographic attributes like age and gender. Each course belongs to a category and has a credit hour value. You want to enable analysis of enrollment trends by department, course category, student age group, and course credit hours. How do you plan your fact tables?

<p>Create a fact table for enrollments (B)</p> Signup and view all the answers

In a star schema, what is the purpose of a dimension table?

<p>To store descriptive attributes (B)</p> Signup and view all the answers

Which of the following describes a situation where using a single fact table is usually inappropriate, and multiple fact tables should be used?

<p>The facts have different grains.. (C)</p> Signup and view all the answers

A very large retail company is trying to build out their snowflake schema. How does increasing normalization complicate ETL processes?

<p>It increases dependencies between tables, complicating the ETL Process (A)</p> Signup and view all the answers

How does an analytic database interact with data usage?

<p>Analytic databases do not share this usage pattern (C)</p> Signup and view all the answers

Consider a scenario where you want to track both quantity ordered and quantity shipped for sales transactions. To avoid common pitfalls, what is the optimal approach?

<p>Store them on separate fact tables (C)</p> Signup and view all the answers

A database architect must choose between a star schema and a snowflake schema. What is the best argument against the snowflake schema?

<p>Snowflake schemas add complexity (B)</p> Signup and view all the answers

When should you consider moving to 3NF (Third Normal Form) after snowflaking?

<p>Never (B)</p> Signup and view all the answers

Flashcards

Star Schema

A design pattern where a fact table relates to multiple dimension tables in a central, radial structure.

Fact table

The measurement, metric, or fact that will be analyzed

Dimension Table

A table that contains attributes used to define the context of the facts.

Cardinality

The relationship cardinality is one-to-many between dimension and fact tables

Signup and view all the flashcards

Storage Sizing

Data that must be stored

Signup and view all the flashcards

One Fact Table Per Process

Designs include a single fact table for each process to be studied.

Signup and view all the flashcards

Drilling (Data)

The ability to navigate through levels of data granularity, from summary to detail.

Signup and view all the flashcards

Attribute Hierarchy

A functional dependency among attributes within the same relation.

Signup and view all the flashcards

Snowflake Schema

A data model where dimension tables are further normalized into multiple related tables.

Signup and view all the flashcards

Normalization

The process of organizing data to reduce redundancy and improve data integrity.

Signup and view all the flashcards

Study Notes

Cardinality of a Star Schema

  • A common relationship cardinality is one-to-many, or its inverse, many-to-one
  • In a star schema, the "one" side corresponds to a dimension table
  • The "many" side corresponds to a fact table
  • For calculations, a retention period of 5 years is used in subsequent examples
  • In an example, the number of products is 1,000 rows, the days are 365/year rows, the number of salesperson is 10,000 rows, and the number of customers is 1,000,000 rows
  • An example total row total comes to 1,000 x (365 x 5) x 10,000 x 1,000,000
  • In total the rows come to = 1.825 x 10^16 rows

Storage Sizing of a Star Schema

  • The previous definition of a common relationship cardinality is one-to-many, or its inverse, many-to-one
  • The "one" side is always a dimension table, while the "many" side is always a fact table.
  • Data rention period of 5 years is used in further calculations
  • The number of products is 1,000 rows, the number of days are 365/year rows, and the number of salespeople is 10,000 rows in this eample
  • The customer number is 1,000,000 rows, which equates to 262 bytes to 250 MB (approx.)
  • The calculation of Total Rows (Fact Table) = 1,000 x (365 x 5) x 10,000 x 1,000,000
  • The total rows = 1.825 x 10^16 rows
  • The individual storage sizes can be calcluated: customer_id (PK) to INT = 4 bytes, customer_name to VARCHAR(50) = 50 bytes, email to VARCHAR(100) = 100 bytes, city to VARCHAR(50) = 50 bytes, country to VARCHAR(50) = 50 bytes, signup_date to DATE = 8 bytes for a "Total row size = 4 + 50 + 100 + 50 + 50 + 8 = 262 bytes per record/row"

Storage Sizing Exercise

  • An exercise prompts you to calculate the storage of one more table
  • The fact table size should be computed
  • In this example # Product = 1 000 rows, # Day = 365/year rows, # Salesperson = 10 000 rows, # Customer = 1 000 000 rows x 262 bytes = 250 MB (approx.)
  • Example Total Rows (Fact Table) = 1 000 x (365 x 5) x 10 000 x 1 000 000
  • The example Total Rows (Fact Table) = 1,825 x 10^16 rows

Fact Tables

  • It is rare to find a subject area fully described by a single fact table
  • It's impossible for a single fact table to cover an entire enterprise
  • Multiple fact tables are necessary in almost every practical application
  • Dimensional designs generally include a single fact table for each process under study
  • Designs allow each process to be analyzed individually, avoiding complications from a single fact table covering multiple processes

Determining How Many Fact Tables

  • For a pair of facts, determine if the facts occur simultaneously and if the facts are available at the same level of detail (or grain)
  • If the answer to either question is no, the facts represent different processes

Facts with Different Timing

  • When two or more facts describe events at different times, they describe different processes
  • Analysis is hampered if facts are placed in a single fact table
  • Separate fact tables allows processes to be studied more easily

Facts Analysis

  • Quantity Ordered should be analyzed by date, customer, and product
  • Quantity Shipped should be analyzed by date, customer, and product

Fact Tables Solutions

  • One solution is to select product_key, sum(quantity_shipped) from sales_facts, group by product_key and having sum(quantity_shipped) > 0
  • Another solution is to record only one generic fact
  • An alternative dimension will determine whether it represents an order or a shippment
  • This approach solves the zero-valued row problem, because each row is either an order or a shipment.

Separate Fact Tables

  • Each fact table shares the common dimensions day, product, and customer
  • Each stores only the appropriate facts which means there is no need to record extraneous zeros
  • It is now possible to study the individual process without additional bother

Hierarchies and Drilling

  • It is possible to describe a dimension table as a series of parent-child relationships among groups of attributes.
  • Days make up months, months fall into quarters, and quarters fall into years

Drill Down Analysis

  • Analysis is talked about as the process of "drilling into data."
  • The word drill connotes digging deeper into something
  • In a dimensional context, that something is a fact
  • A generic concept of drilling is simply the addition of dimensional detail

Attribute Hierarchies

  • Attribute hierarchies are best documented graphically
  • This example includes important information like the names for each level of the hierarchies and attributes present at each level
  • One-to-many relationships exist between instances of each level
  • Drilldown analysis is achieved through the addition of dimensional detail

Cubes of Data

  • Attribute hierarchy provides a shorthand vocabulary to define the grain of each cube
  • A cube may be defined to hold "order dollars at the brand (product) and quarter (day) levels.
  • Lining up all the hierarchies is useful for a given star on a single diagram, and draw a horizontal line to define the aggregation level represented by a cube

Snowflakes

  • A star schema consists of a fact table and a group of dimension tables, which each dimension table joined directly to the fact table.
  • When relationships between dimension attributes are made explicit in a dimensional design, it becomes a snowflake

Normalization

  • Snowflaking a dimension = Normalization, following the design of operational systems
  • Normalization is not useful in an analytic setting.
  • Normalization is suited to the usage profile of an operational system, which supports a wide variety of simultaneous transactions that can modify data
  • An analytic database does not share this usage pattern, so it does not call for this level of normalization.
  • The added complexity is not necessary to guarantee data integrity and it also detracts from usability, complicates ETL, and may negatively impact performance.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

More Like This

Star Schema in Data Warehousing
5 questions

Star Schema in Data Warehousing

AffirmativeHippopotamus avatar
AffirmativeHippopotamus
Star Schema vs Galaxy Schema Quiz
10 questions
Data Warehousing: Definitions and Objectives
20 questions
Use Quizgecko on...
Browser
Browser