Podcast
Questions and Answers
In a star schema, which side of the relationship cardinality (one-to-many) is always a dimension table?
In a star schema, which side of the relationship cardinality (one-to-many) is always a dimension table?
- The 'many' side
- The 'one' side (correct)
- Both sides
- Neither side
Given a star schema, which side of the relationship cardinality (one-to-many) is represented by a fact table?
Given a star schema, which side of the relationship cardinality (one-to-many) is represented by a fact table?
- The 'few' side
- The 'one' side
- The 'none' side
- The 'many' side (correct)
In dimensional modeling, what does 'drilling' refer to?
In dimensional modeling, what does 'drilling' refer to?
- Summarizing low-level data
- The process of adding dimensional detail (correct)
- Aggregating unrelated facts
- Removing irrelevant data
What does the term 'snowflake' refer to in the context of dimensional modeling?
What does the term 'snowflake' refer to in the context of dimensional modeling?
What is the primary characteristic of facts that should be considered when deciding whether to place them in the same fact table?
What is the primary characteristic of facts that should be considered when deciding whether to place them in the same fact table?
Why is it generally recommended to use a single fact table for each process in dimensional design?
Why is it generally recommended to use a single fact table for each process in dimensional design?
In dimensional modeling, what does the practice of 'drilling into data' involve?
In dimensional modeling, what does the practice of 'drilling into data' involve?
What is the primary benefit of separating facts with different timing into separate fact tables?
What is the primary benefit of separating facts with different timing into separate fact tables?
If a data warehouse designer chooses to implement 'snowflaking' in a dimensional model, what is a likely consequence?
If a data warehouse designer chooses to implement 'snowflaking' in a dimensional model, what is a likely consequence?
What is the result of making the relationships between dimension attributes explicit within a dimensional design?
What is the result of making the relationships between dimension attributes explicit within a dimensional design?
Which type of schema consists of a fact table and a group of dimensions, with each dimension table joined directly to the fact table?
Which type of schema consists of a fact table and a group of dimensions, with each dimension table joined directly to the fact table?
What fundamentally distinguishes a snowflake schema from a star schema?
What fundamentally distinguishes a snowflake schema from a star schema?
A data warehouse architect is designing a dimensional model. They have a choice between combining two processes into a single fact table or using two separate fact tables. Which of the following questions is most critical in making this decision?
A data warehouse architect is designing a dimensional model. They have a choice between combining two processes into a single fact table or using two separate fact tables. Which of the following questions is most critical in making this decision?
A dimensional model contains a Date
dimension and a Sales
fact table. Analysts want to analyze sales trends at the Year
, Quarter
, and Month
levels. What technique would enable this type of analysis most directly?
A dimensional model contains a Date
dimension and a Sales
fact table. Analysts want to analyze sales trends at the Year
, Quarter
, and Month
levels. What technique would enable this type of analysis most directly?
Which of the following is a primary advantage of using separate fact tables for 'Quantity Ordered' and 'Quantity Shipped' in a sales data warehouse?
Which of the following is a primary advantage of using separate fact tables for 'Quantity Ordered' and 'Quantity Shipped' in a sales data warehouse?
In the context of dimensional modeling, an attribute hierarchy provides a shorthand vocabulary used to do what?
In the context of dimensional modeling, an attribute hierarchy provides a shorthand vocabulary used to do what?
What should a cube that is designed for holding 'order dollars' at the 'brand (product)' and 'quarter (day)' levels use to define the grain of the data?
What should a cube that is designed for holding 'order dollars' at the 'brand (product)' and 'quarter (day)' levels use to define the grain of the data?
What is the number of total rows in a fact table, given the following information? # Product = 1,000 rows, # Day = 365/year rows, a retention period of 5 years, # Salesperson = 10,000 rows and # Customer = 1,000,000 rows.
What is the number of total rows in a fact table, given the following information? # Product = 1,000 rows, # Day = 365/year rows, a retention period of 5 years, # Salesperson = 10,000 rows and # Customer = 1,000,000 rows.
A data warehouse contains sales data. The design includes a single fact table with a 'generic fact' column and a 'fact_type' dimension. What is a potential drawback of this approach?
A data warehouse contains sales data. The design includes a single fact table with a 'generic fact' column and a 'fact_type' dimension. What is a potential drawback of this approach?
In a snowflake schema, what is the primary motivation for normalizing dimension tables?
In a snowflake schema, what is the primary motivation for normalizing dimension tables?
Why might implementing a fully snowflaked schema be detrimental within an analytic setting?
Why might implementing a fully snowflaked schema be detrimental within an analytic setting?
How is the storage size for a customer table with 1,000,000 rows determined, given the following column data types and sizes? Customer_id (INT, 4 bytes), Customer_name (VARCHAR(50), 50 bytes), Email (VARCHAR(100), 100 bytes), City (VARCHAR(50), 50 bytes), Country (VARCHAR(50), 50 bytes), Signup_date (DATE, 8 bytes).
How is the storage size for a customer table with 1,000,000 rows determined, given the following column data types and sizes? Customer_id (INT, 4 bytes), Customer_name (VARCHAR(50), 50 bytes), Email (VARCHAR(100), 100 bytes), City (VARCHAR(50), 50 bytes), Country (VARCHAR(50), 50 bytes), Signup_date (DATE, 8 bytes).
What is the initial step to answer the question of computing the facts table size?
What is the initial step to answer the question of computing the facts table size?
When two or more facts describe events that do not take place at the same time, what do they describe?
When two or more facts describe events that do not take place at the same time, what do they describe?
What does the acronym 'ETL' refer to in the context of data warehousing?
What does the acronym 'ETL' refer to in the context of data warehousing?
Which of the following is true about attribute hierarchies??
Which of the following is true about attribute hierarchies??
In the context of data warehouse design, what is the consequence of placing facts that have different timings into a single fact table?
In the context of data warehouse design, what is the consequence of placing facts that have different timings into a single fact table?
You are tasked with designing a data warehouse for a retail company. You need to model the relationship between Products
, Salespersons
, and Customers
. You have a fact table, and plan to use dimension tables. Which side of the one to many cardinality is the fact table on?
You are tasked with designing a data warehouse for a retail company. You need to model the relationship between Products
, Salespersons
, and Customers
. You have a fact table, and plan to use dimension tables. Which side of the one to many cardinality is the fact table on?
Consider a data warehouse containing information about product sales. Each product belongs to a category and a brand. You want to enable users to 'drill down' from a general category to the specific brands within that category. Which dimensional modeling concept directly supports this?
Consider a data warehouse containing information about product sales. Each product belongs to a category and a brand. You want to enable users to 'drill down' from a general category to the specific brands within that category. Which dimensional modeling concept directly supports this?
A data warehouse project is considering snowflaking the product dimension, which includes Category
, Brand
, and individual Products
. What is the most likely impact this change have?
A data warehouse project is considering snowflaking the product dimension, which includes Category
, Brand
, and individual Products
. What is the most likely impact this change have?
You are designing a dimensional model for tracking student enrollment in courses. Each student belongs to a department and has demographic attributes like age and gender. Each course belongs to a category and has a credit hour value. You want to enable analysis of enrollment trends by department, course category, student age group, and course credit hours. How do you plan your fact tables?
You are designing a dimensional model for tracking student enrollment in courses. Each student belongs to a department and has demographic attributes like age and gender. Each course belongs to a category and has a credit hour value. You want to enable analysis of enrollment trends by department, course category, student age group, and course credit hours. How do you plan your fact tables?
In a star schema, what is the purpose of a dimension table?
In a star schema, what is the purpose of a dimension table?
Which of the following describes a situation where using a single fact table is usually inappropriate, and multiple fact tables should be used?
Which of the following describes a situation where using a single fact table is usually inappropriate, and multiple fact tables should be used?
A very large retail company is trying to build out their snowflake schema. How does increasing normalization complicate ETL processes?
A very large retail company is trying to build out their snowflake schema. How does increasing normalization complicate ETL processes?
How does an analytic database interact with data usage?
How does an analytic database interact with data usage?
Consider a scenario where you want to track both quantity ordered and quantity shipped for sales transactions. To avoid common pitfalls, what is the optimal approach?
Consider a scenario where you want to track both quantity ordered and quantity shipped for sales transactions. To avoid common pitfalls, what is the optimal approach?
A database architect must choose between a star schema and a snowflake schema. What is the best argument against the snowflake schema?
A database architect must choose between a star schema and a snowflake schema. What is the best argument against the snowflake schema?
When should you consider moving to 3NF (Third Normal Form) after snowflaking?
When should you consider moving to 3NF (Third Normal Form) after snowflaking?
Flashcards
Star Schema
Star Schema
A design pattern where a fact table relates to multiple dimension tables in a central, radial structure.
Fact table
Fact table
The measurement, metric, or fact that will be analyzed
Dimension Table
Dimension Table
A table that contains attributes used to define the context of the facts.
Cardinality
Cardinality
Signup and view all the flashcards
Storage Sizing
Storage Sizing
Signup and view all the flashcards
One Fact Table Per Process
One Fact Table Per Process
Signup and view all the flashcards
Drilling (Data)
Drilling (Data)
Signup and view all the flashcards
Attribute Hierarchy
Attribute Hierarchy
Signup and view all the flashcards
Snowflake Schema
Snowflake Schema
Signup and view all the flashcards
Normalization
Normalization
Signup and view all the flashcards
Study Notes
Cardinality of a Star Schema
- A common relationship cardinality is one-to-many, or its inverse, many-to-one
- In a star schema, the "one" side corresponds to a dimension table
- The "many" side corresponds to a fact table
- For calculations, a retention period of 5 years is used in subsequent examples
- In an example, the number of products is 1,000 rows, the days are 365/year rows, the number of salesperson is 10,000 rows, and the number of customers is 1,000,000 rows
- An example total row total comes to 1,000 x (365 x 5) x 10,000 x 1,000,000
- In total the rows come to = 1.825 x 10^16 rows
Storage Sizing of a Star Schema
- The previous definition of a common relationship cardinality is one-to-many, or its inverse, many-to-one
- The "one" side is always a dimension table, while the "many" side is always a fact table.
- Data rention period of 5 years is used in further calculations
- The number of products is 1,000 rows, the number of days are 365/year rows, and the number of salespeople is 10,000 rows in this eample
- The customer number is 1,000,000 rows, which equates to 262 bytes to 250 MB (approx.)
- The calculation of Total Rows (Fact Table) = 1,000 x (365 x 5) x 10,000 x 1,000,000
- The total rows = 1.825 x 10^16 rows
- The individual storage sizes can be calcluated: customer_id (PK) to INT = 4 bytes, customer_name to VARCHAR(50) = 50 bytes, email to VARCHAR(100) = 100 bytes, city to VARCHAR(50) = 50 bytes, country to VARCHAR(50) = 50 bytes, signup_date to DATE = 8 bytes for a "Total row size = 4 + 50 + 100 + 50 + 50 + 8 = 262 bytes per record/row"
Storage Sizing Exercise
- An exercise prompts you to calculate the storage of one more table
- The fact table size should be computed
- In this example # Product = 1 000 rows, # Day = 365/year rows, # Salesperson = 10 000 rows, # Customer = 1 000 000 rows x 262 bytes = 250 MB (approx.)
- Example Total Rows (Fact Table) = 1 000 x (365 x 5) x 10 000 x 1 000 000
- The example Total Rows (Fact Table) = 1,825 x 10^16 rows
Fact Tables
- It is rare to find a subject area fully described by a single fact table
- It's impossible for a single fact table to cover an entire enterprise
- Multiple fact tables are necessary in almost every practical application
- Dimensional designs generally include a single fact table for each process under study
- Designs allow each process to be analyzed individually, avoiding complications from a single fact table covering multiple processes
Determining How Many Fact Tables
- For a pair of facts, determine if the facts occur simultaneously and if the facts are available at the same level of detail (or grain)
- If the answer to either question is no, the facts represent different processes
Facts with Different Timing
- When two or more facts describe events at different times, they describe different processes
- Analysis is hampered if facts are placed in a single fact table
- Separate fact tables allows processes to be studied more easily
Facts Analysis
- Quantity Ordered should be analyzed by date, customer, and product
- Quantity Shipped should be analyzed by date, customer, and product
Fact Tables Solutions
- One solution is to select product_key, sum(quantity_shipped) from sales_facts, group by product_key and having sum(quantity_shipped) > 0
- Another solution is to record only one generic fact
- An alternative dimension will determine whether it represents an order or a shippment
- This approach solves the zero-valued row problem, because each row is either an order or a shipment.
Separate Fact Tables
- Each fact table shares the common dimensions day, product, and customer
- Each stores only the appropriate facts which means there is no need to record extraneous zeros
- It is now possible to study the individual process without additional bother
Hierarchies and Drilling
- It is possible to describe a dimension table as a series of parent-child relationships among groups of attributes.
- Days make up months, months fall into quarters, and quarters fall into years
Drill Down Analysis
- Analysis is talked about as the process of "drilling into data."
- The word drill connotes digging deeper into something
- In a dimensional context, that something is a fact
- A generic concept of drilling is simply the addition of dimensional detail
Attribute Hierarchies
- Attribute hierarchies are best documented graphically
- This example includes important information like the names for each level of the hierarchies and attributes present at each level
- One-to-many relationships exist between instances of each level
- Drilldown analysis is achieved through the addition of dimensional detail
Cubes of Data
- Attribute hierarchy provides a shorthand vocabulary to define the grain of each cube
- A cube may be defined to hold "order dollars at the brand (product) and quarter (day) levels.
- Lining up all the hierarchies is useful for a given star on a single diagram, and draw a horizontal line to define the aggregation level represented by a cube
Snowflakes
- A star schema consists of a fact table and a group of dimension tables, which each dimension table joined directly to the fact table.
- When relationships between dimension attributes are made explicit in a dimensional design, it becomes a snowflake
Normalization
- Snowflaking a dimension = Normalization, following the design of operational systems
- Normalization is not useful in an analytic setting.
- Normalization is suited to the usage profile of an operational system, which supports a wide variety of simultaneous transactions that can modify data
- An analytic database does not share this usage pattern, so it does not call for this level of normalization.
- The added complexity is not necessary to guarantee data integrity and it also detracts from usability, complicates ETL, and may negatively impact performance.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.