Data Mining Concepts

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

Which of the following BEST describes the primary function of data mining?

  • Managing and organizing large databases efficiently.
  • Storing historical data for future reference.
  • Predicting future trends and behaviors to facilitate proactive decision-making. (correct)
  • Reporting past performance and generating summaries.

Business intelligence and data warehousing commonly support which activity?

  • Encrypting sensitive data.
  • Managing network security.
  • Forecasting future sales trends. (correct)
  • Designing user interfaces.

In the context of decision trees, where are classification rules typically extracted from?

  • Sibling nodes.
  • The root node.
  • The entire decision tree structure. (correct)
  • Leaf nodes.

Which of the following BEST describes dimensionality reduction?

<p>Removing unimportant attributes to reduce data set size. (B)</p> Signup and view all the answers

What condition defines class conditional independence?

<p>The effect of one attribute value on a given class is independent of the values of other attributes. (B)</p> Signup and view all the answers

Which data transformation process aims to reduce the number of attributes in a dataset?

<p>Projection. (C)</p> Signup and view all the answers

Customer Relationship Management (CRM) systems are MOST closely related to which technology area?

<p>Personalization. (A)</p> Signup and view all the answers

Which of the following is NOT typically associated with the data cleaning process?

<p>Segmentation. (B)</p> Signup and view all the answers

What type of models does data mining MOST often strive to build?

<p>Predictive. (C)</p> Signup and view all the answers

The process of determining the most common purchase among customers is known as:

<p>Association. (C)</p> Signup and view all the answers

What is the MOST significant strategic value offered by data mining?

<p>Time-sensitive decision-making. (B)</p> Signup and view all the answers

What does the acronym 'KDD' stand for?

<p>Knowledge Discovery in Databases. (B)</p> Signup and view all the answers

What data quality issue is addressed by removing duplicate records from a dataset?

<p>Data cleaning. (A)</p> Signup and view all the answers

Discovery of cross-sales opportunities is called:

<p>Association. (D)</p> Signup and view all the answers

The ability of a self-learning system to adapt and improve over time is PRIMARILY dependent on its:

<p>Simplicity. (D)</p> Signup and view all the answers

Signup and view all the answers

Flashcards

Data Mining

Predicts future trends & behaviors, enabling proactive decisions.

Business Intelligence and Data Warehousing

Used for forecasting and analyzing large data volumes.

Decision Tree

Classification rules originate from this data structure.

Dimensionality Reduction

Reduces dataset size by removing irrelevant attributes.

Signup and view all the flashcards

Class Conditional Independence

Effect of one attribute is independent of other attribute values on a class.

Signup and view all the flashcards

CRM (Customer Relationship Management)

Associated with specialization, generalization and personalization.

Signup and view all the flashcards

Data Mining

Capability to construct predictive models.

Signup and view all the flashcards

Preferencing

Process of determining customer's majority preference.

Signup and view all the flashcards

Time-sensitive

Strategic benefit of extracting timely information from data.

Signup and view all the flashcards

Data Cleansing

Process of eliminating duplicate entries.

Signup and view all the flashcards

Highly Summarized Data

Process of data distillation from low-level detail.

Signup and view all the flashcards

Exploratory Data Analysis

Another name for data mining.

Signup and view all the flashcards

Regression

Data mining function for predicting numeric values.

Signup and view all the flashcards

Descriptive Model

A model that identifies patterns or relationships.

Signup and view all the flashcards

Outliers

Extreme values that occur infrequently.

Signup and view all the flashcards

Study Notes

Data Mining Basics

  • Data mining predicts future trends and behaviors, enabling proactive, knowledge-driven decision-making for business managers.
  • Business Intelligence and data warehousing facilitate the analysis of large data volumes.

Classification and Attributes

  • Classification rules originate from the decision tree structure of data mining.
  • Dimensionality reduction decreases data set size by eliminating irrelevant attributes.
  • Class conditional independence arises when one attribute's value is independent of others for a given class.

Data Transformation and CRM

  • Projection is a data transformation process.
  • Personalization is a technology area linked to Customer Relationship Management (CRM).

Data Cleaning and Mining Capabilities

  • Segmentation does not come under the data cleaning process.
  • Data mining's ability to build predictive models is a core capability.

Customer Preference and Data Mining Value

  • Preferencing determines customer majority preferences.
  • Data mining's strategic value is time-sensitive.

Knowledge Discovery and Data Handling

  • KDD expands to Knowledge Discovery in Databases.
  • Removing duplicate records aligns with data cleaning/cleansing.

Data Distillation and Modeling

  • Association uncovers cross-sales opportunities.
  • Self-learning systems are powerful due to their accuracy.
  • Highly summarized data is distilled from detailed levels and is compact and easily accessible.
  • Transaction is not a primary grain in analytical modeling.

Data Mining Synonyms and Models

  • Exploratory data analysis is another term for data mining.
  • Regression constitutes a predictive model, while association rules are descriptive.

Regression and Model Types

  • Regression predicts numeric values along a continuum.
  • A descriptive model, like association rules, identifies patterns or relationships.

Predictive Models and Data Mapping

  • Predictive models utilize historical data.
  • Classification maps data into predefined groups.

Data Analysis Over Time

  • Regression maps data items to real-valued prediction variables.
  • Time series analysis examines attribute values as they vary over time.

Grouping Data

  • Clustering involves non-predefined groups.
  • Link Analysis is affinity analysis

Knowledge Discovery Inputs & Outputs

  • Data is an input to KDD, with useful information as the output
  • The KDD process consists of six steps

Data Handling

  • Processing inaccurate or missing data refers to preprocessing
  • Transformation converts data from different sources into a common format for processing

Visualisation and values

  • Various visualization techniques are used in the interpretation step of KDD
  • Extreme values that occur infrequently are called outliers
  • Box plots and scatter diagram techniques are graphical

Knowledge Induction

  • Induction moves from specific knowledge to general information.
  • Summarization describes data characteristics using a general model.

Data Uncovering & Requirements

  • Summarization reveals hidden data information.
  • Users are needed to identify both training data and results

Model Fit

  • Overfitting occurs when a model does not fit in future states
  • The dimensionality curse arises when attributes interfere with data mining tasks or increase complexity.
  • Incorrect/invalid data is noisy data

Investment and Data

  • ROI is return on investment
  • Unauthorized data use risks disclosing confidential information

Data States and Metrics

  • Real-world data is noisy with many missing values.
  • Return On Investment (ROI) is not a data mining metric

Dimensionality and Interest

  • Dimensionality reduction reduces attributes to address high dimensionality.
  • Data not of interest to the data mining task is irrelevant data.

Scalability

  • Sampling and parallelization effectively address the scalability problem
  • Data mining supports inventory, sales promotions, and marketing strategies.

Transaction Proportions and Counts

  • The proportion of transactions supporting X in T is called support.
  • The absolute number of transactions supporting X in T is called support count.

Transaction Support Value and Rule Sides

  • Confidence indicates that transactions supporting X also support Y.
  • In association rules, the left-hand side is called the antecedent, and the right-hand side is the consequent.

Algorithm Efficiency

  • A less efficient algorithm is characterized by maximal code length.
  • Frequent sets exceed the user-specified minimum support.

Data Structures

  • If a frequent set has no frequent supersets, it's a maximal frequent set.
  • Any subset of a frequent set is also frequent (Downward closure property).
  • Any superset of an infrequent set is infrequent (Upward closure property).
  • Sets that are not frequent but whose supersets are, are designated as Border Set.

A Priori Algorithm

  • The A priori algorithm equals with-wise or level-wise approaches.
  • A Priori constitutes a top-down and breadth-first search.
  • Candidate and itemset generation are phases of the A Priori algorithm -Pruning eliminates extensions of infrequent itemsets
  • A priori frequent itemset discovery algorithm moves upwards in the lattice. -After pruning of a priori algorithm only candidate sets will remain
  • The number of iterations in the A priori increases with both the size of the maximum frequent set and the size of the data.

Abbreviation

  • MFCS expands to Minimal Frequent Candidate Set
  • Solid category structures have a counter and the top number with them
  • Dashes are not subjected to counting

Dashed Circles

  • Certain itemsets in dashed circles, reaching sufficient support, move into solid circles.
  • Itemsets entering and moving that comes from the circle do to the box are essentialily the supersets of the itemsets that move from the dashed circle to the dashed box
  • Itemsets completing a full pass move from a dashed circle to a solid circle

FP Growth phases & Data structures

  • FP-growth algorithm has two phases.
  • A frequent pattern tree consists of an item-prefix-tree and a frequent-item-header table.
  • The non-root node of item-prefix-tree consists of three fields.
  • The frequent-item-header-table consists of two fields.
  • Paths from the root node to nodes labeled 'a' are called transformed prefix paths.
  • Transformed prefix paths of node 'a' form a truncated database of patterns co-occurring with 'a', creating the conditional pattern base.
  • Clustering aims to discover dense and sparse regions within a dataset.

Clustering

  • Clustering is used for genetic algorithms
  • CLARA is an algorithm used for clustering
  • Agglomerative clustering starts with records and one cluster per record only
  • Divisive clustering techniques start with all records in one cluster and then split it into pieces.
  • MUSHROOM is a dataset in machine-learning repositories.
  • In k-means, a cluster is represented by the center of gravity
  • k-medoid cluster is represented by one of the objects of a cluster which is near its center
  • PAM is a k-medoid algorithm
  • BIRCH is a hierarchical clustering algorithm

Algorithms and Clustering

  • CLARANS expands to Clustering Large Applications based on RANdomized Search.
  • BIRCH constitutes a hierarchical-agglomerative algorithm.
  • Cluster features of subclusters are maintained in a CF tree (Clustering Feature Tree).

A Priori Algorithm

  • The a priori algorithm is based on frequent sets being normally very few in number compared to the set of all itemsets.
  • Clustering and association rules are data analysis techniques.

Data scans & Algoirthm

  • The partition algorithm utilizes two databases to discover all frequent sets.

K-means and Neural Networks

  • The Apriori algorithm generates candidate item sets and scans the database.
  • APriori is the best-known association rule algorithm, commonly used.
  • Apriori-gen generates item sets after the first pass
  • Partition reduce the number of database scans to two and divides it into partitions to perform
  • Estimation and prediction classify
  • Prediction focuses on attribute values in possible classes
  • Training data includes sample input data and classification assignments.
  • Neural networks draw inspiration from neuroscience for computing

Neuron Connectivity

  • The human brain combines a network of neurons
  • Neurons are made up of a number of nerve fibres called dendrites
  • An axon fibre originates from the cell body.
  • A single axon makes thousands of synapses with other neurons.
  • Transmission is a complex chemical process in networks.
  • The connectivity of neurons gives simple devices their real power -Artificial neurons are simplified models of biological neurons
  • The biological neuron's output is a continuous functions rather than a step function
  • Threshold functions replaced by continuous functions are activation functions -Sigmoid function is also knows as logistic functions

Abbreviation & Architecture

  • Multi Layer Perception(MLP) is many layer perception
  • Feed-forward networks is unidirectional
  • Topology is constrained to be feedforward busy

Functions

  • RBF (Radial Basis Function) stands for Radial basis function.
  • RBF(Radial Basis Fundtion) have only One(1) or in some cases Three(3) hidden layers
  • RBF network may be used when a clear link between input data sets and target output values does not exist
  • RBF hidden layers units are receptive field
  • The Connectivity of neurons gives them real power MLP (Multilayer Perceptron) is the most applied widely used neural network technique

Map and models

  • SOM is annacronym for self-organizing map, and are among the most popular in the unsupervised framework
  • The actual amount of reduction at each(every) learning step may be guided by leanring rate
  • SOM was a neural network model developed by Teuvokohonen = SOM(Self origin Map) was developed during 1980-90
  • Investment analysis used neural networks, stock is predict the movement
  • Moths Medical Dataset
  • Genetic algorithm which is general algorithm called

Genetic Algorithms

  • Genetic Algorithm was introduced 1975
  • Genetic algorithms search based on mechcanics of nature
  • GA systems were developed in early
  • RSES in Poland (system RSES)
  • CrossOver to recombine to the populations
  • New genetic population
  • Mutation to create new structure
  • Genetic Algorithm inversion or all to above
  • LERS created inductions rules
  • NLP is the acronym of NLP, Natural Language Processing

Web-based learning

  • Web context to mining , from web context to mining
  • Researched to multimedias data
  • web mining is concerned with discovering the model underlying the link structures of the web
  • is the way of studying the web link structure.
  • open propose a measure of standing a node because, based on counting path, its open
  • Find Natural Groups in the web mining Sequential Order in URLs in the analysis Tend to Request URL Web context describes mining web mining content structures models models that can use practically like maps, charts so other representation , allows a compressed form

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

More Like This

Data Mining Techniques and Applications Quiz
10 questions
Data Warehousing and Data Mining: Strategic Information
10 questions
Data Warehousing and Mining Overview
32 questions
Use Quizgecko on...
Browser
Browser