Data Storage and Management Fundamentals Quiz

Data Storage and Management Fundamentals Quiz

Created by
@StellarPrehistoricArt

Questions and Answers

What aspect of data does 'Volume' concern with?

Storage repositories for large amounts of data

Which factor is related to dealing with data arriving very fast?

Velocity

What does 'Veracity' mainly focus on?

Data quality, accuracy, and availability

In the Data Analytics Project Plan, what is the purpose of step 2 - 'Defining the objectives of the project'?

<p>Defining the goals of the project</p> Signup and view all the answers

What does a 'Hyper-Parameter' refer to in data analytics?

<p>Value set by a user of some external optimization method</p> Signup and view all the answers

'Small Data' can be defined as:

<p>Data whose volume and format can be processed by a person or small organization</p> Signup and view all the answers

What is the primary goal of Descriptive Analytics?

<p>Summarize and condense raw data to extract patterns</p> Signup and view all the answers

In Data Visualization, what does high cardinality refer to?

<p>The uniqueness of data values contained in an attribute</p> Signup and view all the answers

What is the key consideration for Data Visualization's understandability according to the text?

<p>The arrangement of elements in a graph</p> Signup and view all the answers

What is the primary function of a Histogram in data analytics?

<p>Show the distribution of values for attributes by dividing them into bins</p> Signup and view all the answers

Which visualization technique is most suitable for showing how one or more quantitative attributes change over time?

<p>Area Chart</p> Signup and view all the answers

What is the main difference between Vertical and Horizontal Bar Charts according to the text?

<p>Vertical Bar Charts emphasize magnitude over time, while Horizontal Bar Charts compare different qualitative categories</p> Signup and view all the answers

What is the purpose of Frequent Pattern Mining?

<p>To discover interesting relationships hidden in large data sets</p> Signup and view all the answers

How are spurious patterns typically addressed in Frequent Pattern Mining?

<p>By using evaluation metrics to identify patterns that occur by chance</p> Signup and view all the answers

In the context of Frequent Pattern Mining, what does the 'Support' measure refer to?

<p>How often a rule applies to a given dataset</p> Signup and view all the answers

What does 'Association Rule Denoted by X → Y' signify in Frequent Pattern Mining?

<p>Implication expression between itemsets X and Y</p> Signup and view all the answers

How is 'Confidence' calculated in the context of Association Rules?

<p>Measures how frequently items in Y appear in transactions containing X</p> Signup and view all the answers

What is the primary difference between 'Support' and 'Confidence' measures in data mining?

<p>Support is about rule applicability, while Confidence is about inference reliability</p> Signup and view all the answers

What does it mean if a rule is classified as 'uninteresting'?

<p>The rule seldom occurs together</p> Signup and view all the answers

In the context of rule discovery, what does higher confidence indicate?

<p>Higher likelihood for Y to be present in transactions containing X</p> Signup and view all the answers

What is the primary drawback of using the 'Brute-Force Approach' for rule discovery?

<p>It is computationally expensive</p> Signup and view all the answers

What does it mean to 'prune' the rules in data mining?

<p>To disregard infrequent itemsets without computing confidence</p> Signup and view all the answers

When computing the total number of rules in a dataset, what does '3d - 2d+1 + 1' represent?

<p>Total number of possible rules</p> Signup and view all the answers

Why are more than 80% of rules discarded when minsup = 20% and minconf = 50%?

<p>To ensure only strong associations are considered</p> Signup and view all the answers

What is the main focus of the A priori algorithm in data mining?

<p>Generating frequent itemsets that satisfy a minimum support threshold</p> Signup and view all the answers

According to Theorem 1, what can be inferred about subsets of a frequent itemset?

<p>All subsets of a frequent itemset are also frequent</p> Signup and view all the answers

What does the anti-monotone property in data mining ensure?

<p>Support for an itemset never exceeds the support for its subset</p> Signup and view all the answers

How does lowering the support threshold affect the number of itemsets declared as frequent?

<p>Increases the number of frequent itemsets</p> Signup and view all the answers

What can be inferred about infrequent itemsets and their supersets?

<p>All supersets of an infrequent itemset are also infrequent</p> Signup and view all the answers

How does the average number of items in a transaction impact the A priori algorithm's runtime?

<p>Decreased average items in transactions increase runtime</p> Signup and view all the answers

Study Notes

Rule Discovery

  • Given a set of transactions T, find all rules having support > minsup and confidence > minconf.
  • Two approaches: Brute-Force and Prune the Rules.
  • Brute-Force Approach: Compute support and confidence for every possible rule, but it's computationally expensive.
  • Prune the Rules: Decouple support and confidence requirements, prune infrequent itemsets without computing confidence.

Rule Analysis

  • {Milk, Diapers} → {Beer} and {Beer, Milk} → {Bread} are examples of rules.
  • Support and confidence thresholds (minsup and minconf) are used to filter out uninteresting rules.
  • Analysis suggests a strong co-occurrence relationship between items in the antecedent and consequent of the rule.

Data Analytics

  • Descriptive Analytics: Summarizing and condensing raw data to extract patterns, using data aggregation and data mining to provide insight into the past.
  • Tasks: data visualization, summary measures, frequent pattern mining, and clustering.
  • Data Visualization: Display of information in a graph, picture, or table, providing an accessible way to see and understand trends, outliers, and patterns in data.

Data Visualization

  • Types of graphs: Pie chart, Bar chart, Line chart, Histogram, Box and Whiskers Plot.
  • Univariate Visualization: Chart description, Pie chart for nominal scales with < six categories.
  • Cardinality (uniqueness of data values): High cardinality (e.g., Bank-accounts), Low cardinality (e.g., Gender).

Frequent Pattern Mining

  • Aka “Association Rule Mining” or “Market-Basket Analysis”.
  • Used to discover interesting relationships hidden in large data sets.
  • Challenges: Spurious patterns, computationally expensive.
  • Applications: Supermarket Purchases, Baking Services, Insurance Claims, Earth Science Data, Medical Patient Histories.

Association Rules

  • X → Y: Implication expression, where X is the antecedent and Y is the consequent.
  • Support (σ(X)): Number of transactions containing a particular itemset.
  • Confidence: Measures the reliability of the inference made by a rule, ~Conditional Probability.

A priori Algorithm

  • Step 1: Frequent Itemset Generation, find all itemsets that satisfy the minsup threshold.
  • A priori Principle: Theorem 1 (if the itemset is frequent, then all of its subsets must also be frequent) and Theorem 2 (if a rule X → Y does not satisfy the minconf, then any rule X’ → Y – X’ where X’ is a subset of X, must not also satisfy the minconf).
  • Factors Affecting the A priori Principle: Support Threshold, Dimensionality, Number of Transactions, Average Transaction Width.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

More Quizzes Like This

Data Storage and Analysis
10 questions
Database Management Basics
10 questions

Database Management Basics

BenevolentOliveTree avatar
BenevolentOliveTree
Data Management and Storage
12 questions

Data Management and Storage

InspiringPerception2567 avatar
InspiringPerception2567
Use Quizgecko on...
Browser
Browser