BI Architectures and Components

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson
Download our mobile app to listen on the go
Get App

Questions and Answers

In the context of BI Architecture, explain the primary role of the ETL layer. What key processes does it involve?

The ETL layer extracts data from sources, transforms it into a usable format, and loads it into a central repository.

Differentiate between 'Agglomerative' and 'Divisive' clustering approaches in terms of their fundamental methodology.

Agglomerative clustering is a bottom-up approach starting with individual data points, while divisive clustering is a top-down approach starting with all data in one cluster.

Describe the purpose of the 'Model Management Subsystem' within a Decision Support System (DSS). Give one example of a model it might contain.

The Model Management Subsystem houses statistical, mathematical, or analytical models to process data and assist in decision-making. An example is forecasting models.

What are Multilevel Association Rules used for, and how do they differ from single-level association rules?

<p>Multilevel Association Rules discover patterns or associations at multiple levels of abstraction in a dataset, compared to single-level rules that only operate at one level.</p> Signup and view all the answers

Briefly explain the core principle behind 'Density-Based' clustering methods and provide example algorithms.

<p>Density-based methods form clusters based on the density of data points, identifying arbitrarily shaped clusters and noise. Examples include DBSCAN and OPTICS.</p> Signup and view all the answers

Describe the concept of a 'Contextual Outlier'. Give an example to illustrate your explanation.

<p>A contextual outlier is a data point that is unusual depending on contextual attributes like time or location. For example, a temperature of 30°C is normal in summer but an outlier in winter.</p> Signup and view all the answers

In association rule mining, what does the 'Lift' measure signify? How is it calculated?

<p>Lift measures how much more likely item B is to occur with item A than alone. Lift (A ⇒ B) = Confidence (A ⇒ B) / Support(B).</p> Signup and view all the answers

How does the K-Medoids algorithm differ from the K-Means algorithm in its approach to clustering?

<p>K-Medoids uses actual data points (medoids) as cluster centers, while K-Means uses the mean (centroid).</p> Signup and view all the answers

Outline two reasons why association rules are valuable in data analysis and decision-making.

<p>Association rules help uncover hidden patterns and relationships in large datasets, and enables business to improve their marketing strategy.</p> Signup and view all the answers

Explain the 'Apriori property' as it relates to the Apriori algorithm.

<p>If an itemset is frequent, then all of its subsets must also be frequent.</p> Signup and view all the answers

Briefly differentiate between content-based and collaborative filtering approaches in recommendation systems.

<p>Content-based filtering recommends items similar to those a user liked before, based on item attributes. Collaborative filtering recommends items based on the preferences of similar users.</p> Signup and view all the answers

In Market Basket Analysis (MBA), define 'Support' and explain its importance

<p>Support: How frequently items appear together and signifies the frequency of the itemset in the dataset. It is essential for determining the significance of the association rule.</p> Signup and view all the answers

Explain the purpose of the Data Warehouse Layer in a Business Intelligence (BI) architecture.

<p>The Data Warehouse Layer stores large volumes of historical and integrated data for analysis and reporting.</p> Signup and view all the answers

What is a 'Dendrogram' and how is it used in the context of hierarchical clustering?

<p>A dendrogram is a tree-like diagram that represents the hierarchy of clusters. It visually illustrates how clusters merge or split at different stages of the hierarchical clustering process.</p> Signup and view all the answers

What is the role of the User Interface (Dialog Management Subsystem) in a Decision Support System (DSS)?

<p>It acts as the communication bridge between the user and the system. It provides tools, menus, dashboards, and visualization to interact with data and models.</p> Signup and view all the answers

Give two key differences between Business Intelligence (BI) and Decision Support Systems (DSS).

<p>BI focuses on data analysis and reporting with a broader scope, while DSS focuses on supporting decision-making with a narrower scope. BI handles structured data, while DSS handles semi-structured and unstructured problems.</p> Signup and view all the answers

Describe what are 'Global Outliers (Point Outliers)'. Provide one example.

<p>Data points that are far from the rest of the dataset. Example: A person with age 150 in a survey.</p> Signup and view all the answers

In the Apriori algorithm, what is the purpose of the 'Scan and Count' step?

<p>To count the support for each candidate itemset and keep those that satisfy minimum support.</p> Signup and view all the answers

What is 'Hybrid Filtering' in the context of recommendation systems, and what are its advantages?

<p>Combines both content-based and collaborative filtering for better accuracy. Reduces limitations like cold start or data sparsity.</p> Signup and view all the answers

Describe in your own words how Market Basket Analysis (MBA) is used to improve business strategy.

<p>MBA helps businesses understand customer buying behaviour by identifying relationships between items purchased together. This used to improve product placement, marketing promotions, and recommendation systems.</p> Signup and view all the answers

Flashcards

BI Architecture

Framework defining data collection, storage, and analysis for organizational decision-making.

Data Source Layer

Gathers raw data from various sources (databases, files, ERP, CRM).

ETL Layer

Extracts, transforms, and loads data into a central repository.

Data Warehouse Layer

Stores large volumes of historical and integrated data.

Signup and view all the flashcards

Metadata Layer

Provides descriptive details about the data for understanding and management.

Signup and view all the flashcards

Data Presentation Layer

Offers tools to visualize and analyze data (dashboards, reports, OLAP).

Signup and view all the flashcards

User Layer (Decision Support)

The area where end-users interact with the BI system for informed decisions.

Signup and view all the flashcards

Agglomerative Clustering

Bottom-up clustering approach starting with individual data points.

Signup and view all the flashcards

Divisive Clustering

Top-down clustering approach starting with one big cluster.

Signup and view all the flashcards

Decision Support System (DSS)

Interactive system analyzing large data volumes for decision support.

Signup and view all the flashcards

Data Management Subsystem

Stores and manages data required for decision-making.

Signup and view all the flashcards

Model Management Subsystem

Contains statistical, mathematical, and analytical models.

Signup and view all the flashcards

User Interface (Dialog Management)

Bridge between user and system, providing data interaction tools.

Signup and view all the flashcards

Knowledge-based Subsystem

Expert knowledge or AI to enhance decision-making capabilities.

Signup and view all the flashcards

Association Rule Mining

Data mining to find interesting relationships between data items.

Signup and view all the flashcards

Support (Association Rule)

Frequency of the rule in the dataset.

Signup and view all the flashcards

Confidence (Association Rule)

Likelihood of B occurring when A occurs.

Signup and view all the flashcards

Lift (Association Rule)

B is more likely with A than by chance.

Signup and view all the flashcards

Global Outliers

Outliers far from the dataset's norm.

Signup and view all the flashcards

Apriori Algorithm

The algorithm mines frequent itemsets and creates rules.

Signup and view all the flashcards

Study Notes

BI Architectures

  • Business Intelligence (BI) Architecture is a framework
  • Defines how data is collected, stored, and analyzed
  • Supports decision-making in an organization
  • Composed of various layers and components

BI Architecture Components

  • Data Source Layer gathers raw data
  • Collects data from databases, files, ERP, and CRM systems
  • ETL (Extract, Transform, Load) Layer has data extracted from sources
  • Transforms into a usable format and loads it into a central repository
  • Data Warehouse Layer stores large volumes of historical and integrated data
  • Allows for analysis and reporting
  • Metadata Layer contains descriptive information about the data
  • Eases understanding and management
  • Data Presentation Layer provides tools
  • Uses dashboards, reports, and OLAP for visualization and analysis
  • User Layer (Decision Support) facilitates interaction with the BI system
  • Enables end-users to make informed business decisions

Agglomerative Clustering

  • Approach is bottom-up
  • Starts with individual data points
  • Merges clusters step by step
  • More commonly used and easier to implement
  • Dendrogram is built by merging nodes
  • Suitable for small to medium datasets
  • Less flexible with complex data structures
  • Sensitive to noise and outliers
  • Uses a greedy approach, so can lead to suboptimal clustering

Divisive Clustering

  • Approach is top-down
  • Begins with all data in one big cluster
  • Splits clusters step by step
  • Less commonly used and more computationally complex
  • Dendrogram is built by splitting nodes
  • Good for smaller datasets due to complexity
  • Can capture complex data structures if tuned well
  • Can isolate noise early in the splitting process
  • Produces better global clusters with the right strategy

Decision Support System (DSS)

  • DSS is an interactive, computer-based system
  • Supports decision-making through the analysis of large data volumes

DSS Components

  • Data Management Subsystem stores and manages data
  • Needed for decision-making, includes internal and external databases
  • Model Management Subsystem contains models that process data
  • Employs statistical, mathematical, or analytical models
  • User Interface (Dialog Management Subsystem) acts as a communication bridge
  • Provides tools, menus, dashboards, and visualizations to interact with data and models
  • Knowledge-based Subsystem (Optional) enhances decision-making with expert knowledge or AI

Business Intelligence (BI) vs. Decision Support System (DSS)

  • BI focuses on data analysis and reporting, DSS supports decision-making
  • BI handles structured and historical data, DSS deals with semi-structured and unstructured problems
  • BI is mainly for strategic decisions, DSS for tactical and operational decisions
  • BI uses dashboards and data mining, DSS uses models and simulations
  • BI provides insights and trends, DSS provides recommendations and alternatives
  • BI is a passive system, DSS is an interactive system
  • BI often integrates with data warehouses, DSS uses internal models and databases
  • BI requires historical data, DSS uses real-time and predictive models
  • BI helps in performance monitoring, DSS in problem-solving, and what-if analysis
  • BI has a broader scope, DSS is narrowly focused

Multilevel Association Rules

  • Used to discover interesting patterns or associations
  • Patterns occur at multiple levels of abstraction in a dataset

Multilevel Association Rules Details

  • Rules organize items in a hierarchical structure (category → subcategory → item)
  • Mining more detailed and specific patterns than single-level rules is possible
  • Different minimum support/confidence thresholds can be set for different levels
  • Concept hierarchies or taxonomies define abstraction levels
  • Useful in market basket analysis to study customer buying behavior
  • Top-down progressive deepening starts mining from higher levels and moves down
  • Rules provide insights for decision-making and business strategies

Clustering Methods

  • Clustering groups similar data objects into clusters

Partitioning Methods

  • Subdivides data into k non-overlapping subsets (clusters)
  • K-Means and K-Medoids are examples
  • Simple and efficient but requires predefining k

Hierarchical Methods

  • Builds a hierarchy of clusters in a tree-like structure
  • Agglomerative (bottom-up) and divisive (top-down) are the two types
  • Represented using a dendrogram

Density Based Methods

  • Forms clusters based on data point density
  • Can find arbitrarily shaped clusters and noise
  • DBSCAN and OPTICS are examples

Grid-Based Methods

  • Divides data space into a grid structure for clustering
  • Processes quickly
  • STING and CLIQUE are examples

Model-Based Methods

  • Assumes a model (e.g., probabilistic) to find best fit
  • Example is the EM (Expectation Maximization) algorithm

Types of Outliers

  • Outliers are data points that deviate significantly from other observations

Global Outliers

  • Data points are far from the rest of the dataset
  • Person with age 150 in a survey is an example

Contextual Outliers

  • Outliers depend on contextual attributes like time or location
  • Temperature that is normal in summer but an outlier in winter is an example

Collective Outliers

  • Group of data points deviate together from the overall pattern
  • Sudden spike indicating a DDoS attack is an example

Outlier Applications

  • Fraud Detection for detecting unusual transactions
  • Network Security to identify intrusions or attacks
  • Medical Diagnosis for spotting rare patterns in health data
  • Industrial Fault Detection for detecting machine failures
  • Market Analysis for finding unusual buying patterns

Association Rule Mining

  • Discovers relationships or patterns between items in large datasets
  • Consists of an implication expression in the form of A ⇒ B
  • A and B are itemsets, and A ∩ B = ∅
  • Indicates if A occurs, then B is likely to occur

Association Rule Key Measures

  • Support represents the frequency of the rule in the dataset, P (A ∪ B)
  • Confidence is the likelihood of B occurring when A occurs, P (B|A)
  • Lift measures how much more likely B is to occur with A than alone, Confidence (A ⇒ B) / Support(B)

Association Rule Example

  • {Milk, Bread} ⇒ {Butter} in a supermarket
  • Customers who buy milk and bread are also likely to buy butter

Association Rule Applications

  • Market Basket Analysis
  • Cross-selling and recommendation systems
  • Web usage mining
  • Medical data analysis

K Medoids Algorithm

  • Partition-based clustering algorithm similar to K-Means
  • Uses actual data points (medoids) as cluster centers instead of the mean (centroid)
  • More robust to noise and outliers

K-Medoids Algorithm Steps

  • Initialization where k random data points are selected as initial medoids
  • Each remaining data point is assigned to the nearest medoid
  • Distance metric determines nearness (e.g., Manhattan or Euclidean distance)
  • Update step chooses a new medoid minimizing total distance within the cluster
  • Assignment and Update steps are repeated until the medoids stabilize

K-Medoids Key Features

  • Uses real data points as centers (unlike K-Means)
  • More robust to outliers and noise
  • Common algorithm is PAM (Partitioning Around Medoids)

K-Medoids Applications

  • Customer segmentation
  • Anomaly detection
  • Document or text clustering

Association Rules Importance

  • Association rules discover hidden patterns and relationships in large datasets

Importance of Association Rules

  • Pattern Discovery uncovers meaningful associations
  • Customer Behavior Analysis helps businesses understand what products are often bought together
  • Improve Recommendations suggests items based on previous purchases
  • Efficient Marketing Strategies helps design targeted promotions
  • Data Summarization simplifies large datasets into interpretable rules
  • Domain Versatility applicable in retail, healthcare, finance, web usage mining, etc.

Apriori Algorithm

  • Classic algorithm
  • Mines frequent itemsets and generates association rules in transactional databases
  • It uses the "apriori property" that if an itemset is frequent, all its subsets are also frequent

Apriori Algorithm Steps

  • Setting minimum support and confidence thresholds
  • Generating frequent 1-itemsets by scanning the database
  • Generating candidate k-itemsets (Ck) from frequent (k-1)-itemsets
  • Self-joining and pruning is used
  • Scanning and counting support for each candidate itemset
  • Generating association rules from frequent itemsets

Apriori Algorithm Example

  • If {Milk, Bread, Butter} is frequent, then {Milk, Bread} and {Milk} must also be frequent

Apriori Applications

  • Market basket analysis
  • Product recommendations
  • Customer behavior analysis

Recommendation Systems

  • A data filtering technique that suggests relevant items
  • Suggestions are based on preferences, behavior, or historical data
  • Widely used in e-commerce, entertainment, and social media

Content-Based Filtering

  • Recommends items similar to those the user liked before
  • Based on item attributes and the user profile

Collaborative Filtering

  • Recommends items based on similar users' preferences
    • User-based: Suggests items liked by similar users
    • Item-based: Suggests items similar to those the user liked.

Hybrid Filtering

  • Combines both content-based and collaborative filtering

Recommendation Systems Applications

  • E-commerce (Amazon) uses product suggestions
  • Streaming platforms (Netflix, Spotify) for movie/song recommendations
  • Social Media suggests friends or content
  • Online Education for course or content recommendations

Market Basket Analysis (MBA)

  • A data mining technique used to identify patterns or relationships
  • Done between items purchased together in a transaction
  • Helps businesses understand customer buying behavior

How MBA Works

  • Uses association rule mining to find combinations of items that frequently co-occur
  • Based on transactional data from point-of-sale systems or online orders

MBA Measures

  • Support measures how frequently items appear together
  • Confidence measures the likelihood that item B is bought when item A is bought
  • Lift measures strength of the rule compared to random chance

MBA Example

  • Supermarket transaction analysis
  • Customers who buy milk and bread also buy butter

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

More Like This

Are You Ready for the New BI Architecture?
3 questions
[04/Sarda/01]
35 questions

[04/Sarda/01]

InestimableRhodolite avatar
InestimableRhodolite
Business Intelligence Architecture
38 questions
Use Quizgecko on...
Browser
Browser