Data Warehousing Flashcards
84 Questions
100 Views

Data Warehousing Flashcards

Created by
@ProfoundPearTree

Questions and Answers

What does dealing with changing dimension attributes refer to?

Dealing with changing dimension attributes involves managing the changes in characteristics of dimension data in databases.

What 3 types are there to change the dimension? (Select all that apply)

  • Adding a new attribute/column (correct)
  • Adding a new dimension record/row (correct)
  • Overwrite the dimension attribute (correct)
  • Deleting a dimension record
  • Explain the Type-1 dimension changing.

    Overwriting the dimension attribute; it's easy to maintain and is often used for processing corrections.

    What are the pros and cons of Type-1 dimension changing? (Select all that apply)

    <p>You lose historic information</p> Signup and view all the answers

    Explain the Type-2 dimension changing.

    <p>Adding a new dimension record/row to capture attribute changes; both prior and new rows are maintained.</p> Signup and view all the answers

    What are the pros and cons of Type-2 dimension changing? (Select all that apply)

    <p>Expensive database option due to size</p> Signup and view all the answers

    Explain the Type-3 dimension changing.

    <p>Adding a new attribute/column while preserving historical information without adding a new row.</p> Signup and view all the answers

    What are the pros and cons of Type-3 dimension changing? (Select all that apply)

    <p>Preserves historical values</p> Signup and view all the answers

    What is the rapidly changing large dimension problem?

    <p>It occurs when there is an extremely large dimension with attributes that are constantly changing.</p> Signup and view all the answers

    How could we apply a Type-2 change in the context of rapidly changing dimension tables?

    <p>By using mini-dimensions.</p> Signup and view all the answers

    What are mini-dimensions?

    <p>Mini-dimensions contain rapidly changing attributes of the original dimension and are treated as standalone dimensions.</p> Signup and view all the answers

    Which entities might have rapidly changing large dimensions? (Select all that apply)

    <p>Large retail stores</p> Signup and view all the answers

    The logical data model is usually structured as a _____. Why?

    <p>Star Schema</p> Signup and view all the answers

    What is a fact table?

    <p>A fact table is the <em>central table in a star schema of a data warehouse</em>. It stores quantitative information for analysis and is often denormalized.</p> Signup and view all the answers

    A fact table holds...... A dimension table stores.....

    <p>the data to be analyzed; data about the ways in which the data in the fact table can be analyzed.</p> Signup and view all the answers

    The fact table consists of ____ types of columns. Name and describe them.

    <ol> <li>The <em>foreign key column</em> to allow joins with the other dimensional tables in the STAR Schema, 2. The <em>measures column</em> that contains the data being analyzed.</li> </ol> Signup and view all the answers

    What is an issue with performing relational queries using the star schema?

    <p>Access time</p> Signup and view all the answers

    Define Measures.

    <p>Measures are the core of the dimensional model and are data elements that can be summed, averaged, or mathematically manipulated.</p> Signup and view all the answers

    What are the constructs of a STAR Schema?

    <p>Fact Node, Dimension Node, Dimension Edge.</p> Signup and view all the answers

    Define Dimensions.

    <p>They're the set of companion tables to a fact table that contain measures.</p> Signup and view all the answers

    Define facts.

    <p>A fact is a value, or measurement, which represents a fact about the managed entity or system.</p> Signup and view all the answers

    Define Granularity.

    <p>Granularity means 'Level of Division'. It indicates how detailed data can be in terms of attributes.</p> Signup and view all the answers

    What is an OLAP Cube?

    <p>Online Analytical Processing Engine, a three-dimensional array for analyzing data.</p> Signup and view all the answers

    What is an issue with the OLAP Cube?

    <p>Data Sparsity (too many empty cells)</p> Signup and view all the answers

    A ____________ _____________ is a good relational DB for building a ______________ ____________________.

    <p>Star Schema; OLAP Cube.</p> Signup and view all the answers

    OLAP cubes offer _____.

    <p>sophisticated security options.</p> Signup and view all the answers

    What are key decisions made during the design of a dimensional model?

    <p>Select the 'business' process, declare the granularity, identify the measures (facts), identify the dimensions.</p> Signup and view all the answers

    What are Additive Measures?

    <p>Measures that can be added along all dimensions.</p> Signup and view all the answers

    What are Semi-Additive Measures?

    <p>Measures that can be added only along some dimensions.</p> Signup and view all the answers

    What are Non-additive measures?

    <p>Measures that can't be added along any dimension.</p> Signup and view all the answers

    What 2 types of fact nodes/tables are there?

    <ol> <li>Transactional, 2. Snapshot.</li> </ol> Signup and view all the answers

    What is a Transactional Fact Node?

    <p>Measures in the fact table represent an event that occurred at a point in time.</p> Signup and view all the answers

    What is a Snapshot Fact Node?

    <p>Measures in the fact node represent a 'picture' of the activity at the end of a given period.</p> Signup and view all the answers

    Snapshots usually only have....

    <p>One fact table row per time period (at the occurrence of each snapshot).</p> Signup and view all the answers

    A _____________ fact node represents events occurring at a point in time.

    <p>transactional.</p> Signup and view all the answers

    Does a transactional or snapshot fact node have more dimensions?

    <p>Transactional.</p> Signup and view all the answers

    Does a transactional or snapshot fact node have more measures?

    <p>Snapshot.</p> Signup and view all the answers

    What is a Surrogate Key? Why are they important?

    <p>System generated key values; important because all data warehouse keys should represent surrogate keys.</p> Signup and view all the answers

    Smaller surrogate keys translate into.....

    <p>smaller fact table rows.</p> Signup and view all the answers

    Databases adopt an entity relationship diagram model, while data warehouses adopt a.....

    <p>multidimensional data model.</p> Signup and view all the answers

    What is a multidimensional data model?

    <p>A multidimensional model views data in the form of a data cube, defined by dimensions and facts.</p> Signup and view all the answers

    Explain Online Analytical Processing (OLAP).

    <p>OLAP pre-calculates queries that are hard to execute through tabular databases, improving efficiency.</p> Signup and view all the answers

    Who has access to databases vs. data warehouses?

    <p>Data warehouses are accessed by decision makers and data analysts; databases are typically accessed by non-management employees.</p> Signup and view all the answers

    Data warehouses provide..... Databases provide....

    <p>Information to support decision making; information to support day-to-day operations.</p> Signup and view all the answers

    Data warehouses manage _____________ data. Databases manage ______________ data.

    <p>historical; current.</p> Signup and view all the answers

    Which contains different levels of granularity?

    <p>Data warehouses</p> Signup and view all the answers

    While data warehouses support complex transactions, databases can only do....

    <p>Short and canned transactions.</p> Signup and view all the answers

    Transactions for data warehouses are:

    <p>read-only</p> Signup and view all the answers

    What are Senior Business Management Sponsor(s)? Why are they important?

    <p>Someone who funds the data warehouse</p> Signup and view all the answers

    What is Compelling Business Motivation?

    <p>It ensures the data warehouse systems align with strategic business motivations and initiatives.</p> Signup and view all the answers

    Define feasibility.

    <p>It is the availability of clean data, emphasizing on <em>clean</em> data!</p> Signup and view all the answers

    What's a good economical justification for developing a data warehouse?

    <p>Identifying costs and return on investment for the implementation of the data warehouse.</p> Signup and view all the answers

    When end-users are using the data warehouse, what is the main challenge? What's a potential solution?

    <p>For end-users to define what information they want from the warehouse and how to use it; focus on 'business' dimensions used for decision making.</p> Signup and view all the answers

    What should be the first step when developing a data warehouse?

    <p>Figure out what information the end-users want to collect from the DW; focus on the dimensions and measures.</p> Signup and view all the answers

    What are the 4 Steps to the Dimensional Modeling Design Process?

    <ol> <li>Select business process, 2. Declare granularity, 3. Identify dimensions, 4. Identify facts.</li> </ol> Signup and view all the answers

    Explain 1. Select business process.

    <p>These are the operational activities performed by your organization. This is the measurement event to be modeled.</p> Signup and view all the answers

    Explain 2. Declare granularity.

    <p>How many data attributes an entity has; examples include high and low levels of granularity.</p> Signup and view all the answers

    Explain 3. Identify Dimensions.

    <p>Provide the 'who, what, where, why, and how' context and what hierarchies the foreign keys in your fact table lead to.</p> Signup and view all the answers

    Explain 4. Identify Facts.

    <p>A <em>fact</em> is literally an entry in the fact table—quantitative measurements that result from your 'business process' occurring.</p> Signup and view all the answers

    What question should you ask to determine if an entry is a fact or not?

    <p>'Does the attribute take on lots of values, and is it used in calculations?' If yes... FACT.</p> Signup and view all the answers

    What are dimensions?

    <p>They simply provide <em>attributes</em> that provide more data about a fact in the fact table.</p> Signup and view all the answers

    What are the 2 main purposes of dimensions?

    <ol> <li>Constrain querying, 2. Filter the query result set.</li> </ol> Signup and view all the answers

    Define understandability regarding dimension modeling.

    <p>Keeping everything as simple as possible, but not watered down.</p> Signup and view all the answers

    What's the effect on query performance when adding dimensions?

    <p>No adverse effect—DWs are typically optimized for this kind of design paradigm.</p> Signup and view all the answers

    How should a dimension modeling/design be extensible?

    <p>It should easily allow for the integration of new data.</p> Signup and view all the answers

    What do we use if no property of the fact can fulfill the purpose of the primary key?

    <p>A <em>surrogate key</em> - a unique key artificially assigned to identify the row.</p> Signup and view all the answers

    Although surrogate keys seem like they're replacements for primary keys, all DWs should....

    <p>represent surrogate keys.</p> Signup and view all the answers

    What is Normalization? Why do it?

    <p>The process of removing redundant attributes from a de-normalized dimension table; it simplifies updates and saves storage space.</p> Signup and view all the answers

    What is snowflaking?

    <p>A method of normalizing the dimension tables in a star schema.</p> Signup and view all the answers

    What are disadvantages of normalization/snowflaking?

    <p>The scheme is less intuitive, browsing is more difficult, and additional joins may degrade query performance.</p> Signup and view all the answers

    What are the 3 Measure Types?

    <ol> <li>Additive, 2. Semi-additive, 3. Non-additive.</li> </ol> Signup and view all the answers

    Describe Additive Measures.

    <p>Facts that can be added with any dimension in the fact table.</p> Signup and view all the answers

    Describe Semi-Additive Measures.

    <p>Facts that can be summed for some dimensions in the fact table.</p> Signup and view all the answers

    Describe Non-Additive Measures.

    <p>Facts that you can never sum, regardless of dimensions present.</p> Signup and view all the answers

    What is a Periodic/Snapshot Fact Table?

    <p>A row in a periodic snapshot fact table summarizes many measurement events occurring over a standard period.</p> Signup and view all the answers

    Measures in the fact table represent an event that..... Measures in the fact node represent.....

    <p>occurred at a point in time; a 'picture' of the activity at the end of a given period.</p> Signup and view all the answers

    Periodic fact tables allow one to measure the __________ of an organization.

    <p>status.</p> Signup and view all the answers

    In a PFT, here is _______ fact table row per ________ __________.

    <p>one; time period.</p> Signup and view all the answers

    In a TFT, there are more ________________, but less _______________ than a PFT. Why?

    <p>dimensions; measures; because more data is collected in TFTs.</p> Signup and view all the answers

    What is a Factless Fact Table?

    <p>A factless fact table is a fact table that does not contain facts. It captures events at the information level.</p> Signup and view all the answers

    What is an example of a Factless Fact Table being needed?

    <p>Factless fact tables are used for tracking processes like attendance or events.</p> Signup and view all the answers

    How do you determine events that did not occur?

    <p>Coverage - activity = events which did not occur.</p> Signup and view all the answers

    SURROGATE KEYS SHOULD BE -__ INTEGERS.

    <p>Auto-Incremented.</p> Signup and view all the answers

    How do surrogate keys increase performance?

    <p>Surrogate keys are simple integers, allowing for more efficient joins and smaller indices.</p> Signup and view all the answers

    Study Notes

    Data Warehousing Concepts

    • The logical data model structure commonly uses a Star Schema due to its user-friendly navigation and better performance in relational queries.
    • A fact table serves as the central table in a data warehouse, containing quantitative information for analysis, often stored in a denormalized format.
    • Fact tables hold the data to be analyzed while dimension tables provide context about how the data can be analyzed.
    • Fact tables consist of two types of columns:
      • Foreign key columns for joining with dimension tables.
      • Measures columns containing data to be analyzed.
    • Access time can be an issue when performing relational queries using star schema structures.
    • Measures are core elements in the dimensional model, representing quantitative values, and can be summed, averaged, or manipulated mathematically.

    Star Schema Constructs

    • A Star Schema includes:
      • Fact Node: Central table with measures.
      • Dimension Node: Dependent tables that provide context.
      • Dimension Edge: Relationships between dimension tables and facts.
    • Dimensions are companion tables containing characteristics that describe facts.
    • Facts are specific measurements about the entity being managed, e.g., financial metrics like totals or success rates.

    Granularity and OLAP

    • Granularity refers to the level of division of an entity into attributes, impacting data detail.
    • An OLAP Cube is a three-dimensional array that allows for complex data analysis.
    • Data sparsity is a common issue with OLAP cubes, often resulting in many empty cells.

    Measure Types

    • Three types of measures:
      • Additive: Can be summed across all dimensions (e.g., total sales).
      • Semi-Additive: Can be summed in some dimensions only (e.g., account balances).
      • Non-additive: Cannot be summed at all (e.g., percentages).

    Fact Tables

    • Two types of fact nodes:
      • Transactional Fact Nodes: Represent events at a specific time (e.g., retail sales).
      • Snapshot Fact Nodes: Represent data snapshots at the end of a period (e.g., monthly account balances).
    • A periodic/snapshot fact table summarizes measurement events over standard time periods, whereas transactional tables capture every single transaction event.

    Data Warehouse vs. Database

    • Databases focus on current data and support operational transactions, while data warehouses manage historical data and offer insights for decision-making.
    • Data warehouses often contain multiple granularity levels compared to databases.

    Project Planning and Design

    • Senior Business Management Sponsors are essential for funding and utilizing the data warehouse.
    • Compelling Business Motivation aligns the data warehouse with strategic goals.
    • Feasibility emphasizes the importance of clean data for operational success.
    • Developing a data warehouse involves understanding user requirements and focusing on dimensions and measures.

    Dimensional Modeling

    • The four-step design process for dimensional modeling:
      • Select business processes.
      • Declare the granularity of data attributes.
      • Identify dimensions providing context.
      • Identify facts representing quantitative data.

    Change Management in Dimensions

    • Types of dimension changes:
      • Type-1: Overwrites existing dimensions (loses historical data).
      • Type-2: Adds a new record for changes (keeps historical data but uses more space).
      • Type-3: Adds new attributes/columns to keep current and previous values while minimizing new rows.

    Surrogate Keys

    • Surrogate keys are system-generated unique keys beneficial for identifying rows in fact tables where no single primary key is available. They enhance performance through efficient indexing and joining.
    • Dealing with attributes in data dimensions often requires consideration of whether to overwrite, add, or change attributes to maintain data integrity and historical context.### Rapidly Changing Large Dimension Problem
    • Involves extremely large dimensions where attributes undergo constant change.
    • Affects maintenance and performance as dimensions grow, especially when using Type-2 changes to track historical data.
    • Relevant examples include:
      • Government agencies managing dimensions with over 100 million people.
      • Large retail stores with several million product entries.
      • Insurance companies maintaining millions of automobile records.

    Type-2 Change Application

    • Type-2 changes involve adding new records to capture historical data.
    • Implemented through the use of mini-dimensions to manage attributes that frequently change.

    Mini-Dimensions

    • Mini-dimensions isolate rapidly changing attributes of a larger dimension.
    • These mini-dimensions function as stand-alone entities and connect directly to fact tables without snowflaking, optimizing database structure.

    Studying That Suits You

    Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

    Quiz Team

    Description

    Test your knowledge of data warehousing concepts with these flashcards. Covering key elements such as star schema and fact tables, this quiz helps reinforce your understanding of data models and their structures. Ideal for students and professionals in data management.

    More Quizzes Like This

    Data Warehousing Project Management Quiz
    10 questions
    Star Schema in Data Warehousing
    5 questions

    Star Schema in Data Warehousing

    AffirmativeHippopotamus avatar
    AffirmativeHippopotamus
    Star Schema vs Galaxy Schema Quiz
    10 questions
    Data Warehousing Concepts Quiz
    24 questions
    Use Quizgecko on...
    Browser
    Browser