Data Mining

WinningTropicalRainforest avatar
WinningTropicalRainforest
·
·
Download

Start Quiz

Study Flashcards

Questions and Answers

What is one of the reasons for the enormous data growth in commercial and scientific databases?

Advances in data generation and collection technologies

Which company is mentioned to have Peta Bytes of web data?

Yahoo

What is one of the examples of the competitive pressure mentioned in the text?

Providing better, customized services for an edge in Customer Relationship Management

What is the new mantra (slogan) mentioned in the text regarding data gathering?

<p>Gather whatever data you can whenever and wherever possible</p> Signup and view all the answers

What is the primary purpose of data mining?

<p>To automate analysis of massive datasets</p> Signup and view all the answers

Which fields contribute ideas to data mining?

<p>Machine learning, AI, and statistics</p> Signup and view all the answers

What are examples of classification tasks in data mining?

<p>Categorizing news stories and predicting tumor cells</p> Signup and view all the answers

What are some applications of classification tasks in data mining?

<p>Fraud detection in credit card transactions and churn prediction for telephone customers</p> Signup and view all the answers

What is the primary goal of data mining in the context of sky survey cataloging?

<p>To predict the class of sky objects based on telescopic survey images</p> Signup and view all the answers

What are the tasks involved in data mining?

<p>Prediction methods and finding human-interpretable patterns in data</p> Signup and view all the answers

What is the volume of earth science data archived by NASA EOSDIS per year?

<p>Over petabytes</p> Signup and view all the answers

What does data mining involve?

<p>The nontrivial extraction of potentially useful information from data</p> Signup and view all the answers

What are the potential opportunities through data mining?

<p>Improving productivity and solving societal problems</p> Signup and view all the answers

What are the examples of tasks in data mining?

<p>Prediction methods and finding human-interpretable patterns in data</p> Signup and view all the answers

What is the focus of classification tasks in data mining?

<p>Identifying intruders in cyberspace and predicting credit worthiness</p> Signup and view all the answers

What is the primary use of data mining in fraud detection?

<p>Detecting fraud in credit card transactions</p> Signup and view all the answers

Which of the following is an application of association rule discovery in data mining?

<p>Market-basket analysis</p> Signup and view all the answers

What is the primary purpose of deviation/anomaly/change detection in data mining?

<p>Detect significant deviations from normal behavior</p> Signup and view all the answers

What is the main focus of clustering in data mining?

<p>Finding groups of similar objects</p> Signup and view all the answers

Which feature is the class model based on?

<p>Success stories</p> Signup and view all the answers

What is the primary use of regression in data mining?

<p>Predict continuous valued variables</p> Signup and view all the answers

What are the motivating challenges in data mining?

<p>Scalability, high dimensionality, heterogeneous and complex data</p> Signup and view all the answers

What is an example of association analysis mentioned in the text?

<p>Subspace differential coexpression pattern enriched with the TNF/NFB signaling pathway related to lung cancer</p> Signup and view all the answers

What is an application of clustering in data mining?

<p>Custom profiling for targeted marketing</p> Signup and view all the answers

What does market segmentation aim to achieve in data mining?

<p>Subdividing markets</p> Signup and view all the answers

What is the primary application of deviation/anomaly/change detection in data mining?

<p>Credit card fraud detection</p> Signup and view all the answers

What is an example of a regression task in data mining?

<p>Predicting sales amounts</p> Signup and view all the answers

What are the applications of association rule discovery in data mining?

<p>Market-basket analysis, telecommunication alarm diagnosis, medical informatics</p> Signup and view all the answers

Which type of attribute captures only the order properties of length?

<p>Ordinal</p> Signup and view all the answers

What type of attribute has distinctness, order, and addition properties?

<p>Interval</p> Signup and view all the answers

Which type of attribute has all four properties: distinctness, order, addition, and multiplication?

<p>Ratio</p> Signup and view all the answers

Which type of attribute is represented by a permutation of values?

<p>Nominal</p> Signup and view all the answers

What type of attribute is represented by a transformation of the form new_value = a * old_value + b?

<p>Interval</p> Signup and view all the answers

What type of attribute is typically represented as floating-point variables?

<p>Interval</p> Signup and view all the answers

Which type of attribute has only a finite or countably infinite set of values?

<p>Nominal</p> Signup and view all the answers

What type of attribute has real numbers as attribute values?

<p>Ratio</p> Signup and view all the answers

Which type of attribute is regarded as important only in the presence of a non-zero attribute value?

<p>Ratio</p> Signup and view all the answers

What type of attribute provides enough information to order objects but does not have the property of multiplication?

<p>Ordinal</p> Signup and view all the answers

What type of attribute encompasses only the order properties of length?

<p>Ordinal</p> Signup and view all the answers

Which type of attribute is represented by a transformation of the form new_value = f(old_value) where f is a monotonic function?

<p>Ordinal</p> Signup and view all the answers

What is the purpose of aggregation in data preprocessing?

<p>Data reduction</p> Signup and view all the answers

What is the key principle for effective sampling in data mining?

<p>Using a sample that is representative of the original data</p> Signup and view all the answers

What is the main drawback of high dimensionality in data mining?

<p>Increased sparsity of data</p> Signup and view all the answers

What is the primary goal of dimensionality reduction in data mining?

<p>Avoiding curse of dimensionality</p> Signup and view all the answers

What technique in dimensionality reduction aims to capture the largest amount of variation in data?

<p>Principal Component Analysis (PCA)</p> Signup and view all the answers

What is the major issue when merging data from heterogeneous sources?

<p>Duplicate data</p> Signup and view all the answers

What is the purpose of feature subset selection in data preprocessing?

<p>Reduce dimensionality</p> Signup and view all the answers

What is the technique for combining two or more attributes into a single attribute in data preprocessing?

<p>Aggregation</p> Signup and view all the answers

What is the primary reason for using sampling in data mining?

<p>Processing the entire data set is too expensive or time consuming</p> Signup and view all the answers

What is the main purpose of data cleaning in data preprocessing?

<p>Dealing with duplicate data issues</p> Signup and view all the answers

What is the characteristic of a sample that makes it representative of the original data set?

<p>Having approximately the same properties of interest as the original data</p> Signup and view all the answers

What is the purpose of stratified sampling in data mining?

<p>Balancing representation of different groups in the sample</p> Signup and view all the answers

Which type of data set represents data objects as points in a multi-dimensional space?

<p>Data matrix</p> Signup and view all the answers

What does noise refer to in the context of data quality problems?

<p>Modification of original values</p> Signup and view all the answers

What is the primary characteristic of data that involves dimensionality, sparsity, resolution, and size?

<p>Dimensionality</p> Signup and view all the answers

What type of data involves sequences of transactions, genomic sequence data, and spatio-temporal data?

<p>Ordered data</p> Signup and view all the answers

In association analysis, what type of attributes does it use?

<p>Asymmetric attributes</p> Signup and view all the answers

What does transaction data involve?

<p>Sets of items in each record</p> Signup and view all the answers

What is an example of graph data mentioned in the text?

<p>A molecule</p> Signup and view all the answers

What does missing values in data quality problems refer to?

<p>Information not being collected</p> Signup and view all the answers

What does document data represent?

<p>Each document as a term vector</p> Signup and view all the answers

What type of data set consists of a collection of records, each with a fixed set of attributes?

<p>Record data</p> Signup and view all the answers

What are the important characteristics of data mentioned in the text?

<p>Dimensionality, sparsity, resolution, and size</p> Signup and view all the answers

What is the impact of poor data quality on data processing efforts mentioned in the text?

<p>Negatively affects data processing efforts</p> Signup and view all the answers

Which technique involves creating new attributes to capture essential information more efficiently?

<p>Feature Creation</p> Signup and view all the answers

What does discretization involve?

<p>Converting continuous attributes into ordinal attributes</p> Signup and view all the answers

Which technique maps continuous or categorical attributes into one or more binary variables?

<p>Binarization</p> Signup and view all the answers

What does attribute transformation involve?

<p>Mapping the entire set of attribute values to a new set of replacement values</p> Signup and view all the answers

What are the various methods for discretization without using class labels?

<p>Equal interval width, equal frequency, and K-means approaches</p> Signup and view all the answers

What is used for discretization using class labels?

<p>Entropy-based approach</p> Signup and view all the answers

What does attribute transformation include?

<p>Standardization and normalization techniques</p> Signup and view all the answers

What is the Iris Sample Data Set primarily composed of?

<p>Three flower types and four non-class attributes</p> Signup and view all the answers

What is the primary purpose of feature subset selection?

<p>Eliminating redundant and irrelevant features</p> Signup and view all the answers

What does mapping data to a new space involve?

<p>Employing techniques like Fourier and wavelet transforms</p> Signup and view all the answers

What is the process of converting continuous attributes into ordinal attributes often used in?

<p>Classification</p> Signup and view all the answers

What does binarization typically used for?

<p>Association analysis</p> Signup and view all the answers

What is another term for an attribute in the context of data mining?

<p>Field</p> Signup and view all the answers

What is the primary purpose of discretization in data mining?

<p>To transform continuous attributes into categorical attributes</p> Signup and view all the answers

Which term describes a collection of attributes that describe an object in data mining?

<p>Record</p> Signup and view all the answers

What is the distinction between attributes and attribute values?

<p>Same attribute can be mapped to different attribute values</p> Signup and view all the answers

What is the purpose of measurement of length in data mining?

<p>To ensure consistency in attribute measurement</p> Signup and view all the answers

What is the term for the numbers or symbols assigned to an attribute in data mining?

<p>Attribute values</p> Signup and view all the answers

What is the primary use of attribute values in data mining?

<p>To represent the properties or characteristics of data objects</p> Signup and view all the answers

What term is used in data mining to describe a property or characteristic of an object?

<p>Attribute</p> Signup and view all the answers

What is the formula for Euclidean Distance?

<p>dist = \sqrt{\sum_{k=1}^{n} (p_k - q_k)^2}</p> Signup and view all the answers

What is the purpose of standardization in statistics?

<p>To remove the mean and divide by the standard deviation</p> Signup and view all the answers

What is the range of similarity values?

<p>[0,1]</p> Signup and view all the answers

What is the transformation equation for dissimilarity values of 0, 1, 10, 100 to similarity values?

<p>transformation equation results in similarity values of 1, 0.5, 0.09, 0.01, respectively</p> Signup and view all the answers

What is the formula for Minkowski Distance?

<p>dist = (\sum_{k=1}^{n} |p_k - q_k|^r)^{1/r}</p> Signup and view all the answers

What type of distance is calculated by setting r = 1 in the Minkowski Distance formula?

<p>City block (Manhattan, taxicab, L1 norm) distance</p> Signup and view all the answers

What does proximity refer to in the context of data mining?

<p>Similarity or dissimilarity</p> Signup and view all the answers

What is the purpose of normalizing using monthly Z Score in the context of plant growth data?

<p>To subtract off monthly mean and divide by monthly standard deviation</p> Signup and view all the answers

What is the correlation value between Atlanta and Sao Paolo in the original time series?

<p>-0.5739</p> Signup and view all the answers

What is the distance between points p2 and p4 in the Euclidean Distance matrix?

<p>5.099</p> Signup and view all the answers

What is the maximum value of dissimilarity for two data objects?

<p>∞</p> Signup and view all the answers

Study Notes

Introduction to Data Mining

  • NASA EOSDIS archives over petabytes of earth science data per year from remote sensors on a satellite.
  • Data mining helps scientists in automated analysis of massive datasets and hypothesis formation.
  • Opportunities to improve productivity and solve societal problems exist through data mining.
  • Data mining involves the nontrivial extraction of potentially useful information from data.
  • Data mining draws ideas from machine learning, AI, pattern recognition, statistics, and database systems.
  • Data mining tasks include prediction methods and finding human-interpretable patterns in data.
  • Classification tasks involve predicting credit worthiness and identifying intruders in cyberspace.
  • Examples of classification tasks include categorizing news stories and predicting tumor cells.
  • Classification applications include fraud detection in credit card transactions and churn prediction for telephone customers.
  • Sky survey cataloging uses data mining to predict the class of sky objects based on telescopic survey images.

Introduction to Data Mining

  • The class model is based on features like success stories, early class stages of formation, intermediate and late data sizes, object catalog, and image database.
  • Regression is used to predict continuous valued variables based on other variables, with examples like predicting sales amounts and time series prediction of stock market indices.
  • Clustering involves finding groups of similar objects, with applications in custom profiling for targeted marketing, grouping related documents for browsing, and reducing the size of large data sets.
  • Market segmentation and document clustering are applications of clustering, aimed at subdividing markets and finding groups of similar documents, respectively.
  • Association rule discovery involves producing dependency rules to predict the occurrence of an item based on occurrences of other items, with applications in market-basket analysis, telecommunication alarm diagnosis, and medical informatics.
  • An example of association analysis is the subspace differential coexpression pattern, enriched with the TNF/NFB signaling pathway, related to lung cancer.
  • Deviation/anomaly/change detection is used to detect significant deviations from normal behavior, with applications in credit card fraud detection, network intrusion detection, and identifying abnormal behavior from sensor networks.
  • The motivating challenges in data mining include scalability, high dimensionality, heterogeneous and complex data, data ownership and distribution, and non-traditional analysis.

Dimensionality Reduction Techniques in Data Mining

  • Feature Subset Selection is a method to reduce data dimensionality by eliminating redundant and irrelevant features.
  • Feature Creation involves creating new attributes to capture essential information more efficiently, including feature extraction, feature construction, and mapping data to a new space.
  • Mapping Data to a New Space employs techniques like Fourier and wavelet transforms to represent data in a different domain.
  • Discretization is the process of converting continuous attributes into ordinal attributes, often used in classification.
  • The Iris Sample Data Set, available from the UCI Machine Learning Repository, consists of three flower types and four non-class attributes.
  • Discretization in the Iris Example involves determining the best discretization method, which can be unsupervised or supervised.
  • Binarization maps continuous or categorical attributes into one or more binary variables, typically used for association analysis.
  • Attribute Transformation involves mapping the entire set of attribute values to a new set of replacement values, including simple functions and normalization techniques.
  • Attribute Transformation refers to functions that map the entire set of attribute values to a new set of replacement values, including simple functions and normalization techniques.
  • Various methods for discretization without using class labels include equal interval width, equal frequency, and K-means approaches.
  • An entropy-based approach is used for discretization using class labels, categorizing attributes for different groups of points.
  • Attribute Transformation includes standardization and normalization techniques to adjust for differences among attributes in terms of frequency of occurrence, mean, variance, and range.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team
Use Quizgecko on...
Browser
Browser