Data Mining Classification: Binary and Symmetric Data

Study Notes

Data mining is the process of extracting knowledge or insights from large amounts of data using various statistical and computational techniques.

Binary Data: has only two possible values (e.g., yes/no, true/false, pass/fail), used in classification and association rule mining tasks.
Symmetric Attribute: both values or states are considered equally important or interchangeable (e.g., gender: male/female).
Asymmetric Attribute: the two values or states are not equally important or interchangeable (e.g., result: pass/fail, where passing may hold greater significance).

Interval Data: quantitative data with equal intervals between consecutive values, no absolute zero point, and ratios cannot be computed (e.g., temperature, IQ scores, time), used in clustering and prediction tasks.
Ratio Data: similar to interval data, but with an absolute zero point, allowing for meaningful comparisons (e.g., height, weight, income), used in prediction and association rule mining tasks.
Text Data: unstructured data in the form of text (e.g., social media posts, customer reviews, news articles), used in sentiment analysis, text classification, and topic modeling tasks.

Data: a collection of data objects and their attributes.
Attribute: a property or characteristic of an object (also known as variable, field, characteristic, or feature).
Data Object: a collection of attributes that describe an object (also known as record, point, case, sample, entity, or instance).
Data Set: an organized collection of data, typically covering one topic at a time.

Nominal Data: qualitative data that cannot be measured or compared with numbers, represents a category with no inherent order or hierarchy (e.g., gender, race, religion, occupation), used in classification and clustering tasks.
Ordinal Data: categorical data with an inherent order or hierarchy, can be ranked in a particular order, but with non-uniform distance between values (e.g., education level, social status), used in ranking and classification tasks.

Clustering: used in data mining for classification and clustering tasks.
Classification: used in data mining for classification and clustering tasks.
Regression Analysis: used in data mining for prediction tasks.
Association Rule Mining: used in data mining for association rule mining tasks.
Anomaly Detection: used in data mining for anomaly detection tasks.

Marketing: used to identify customer segments and target marketing campaigns.
Finance: used to identify potential investment opportunities and predict stock prices.
Healthcare: used to identify risk factors for diseases and develop personalized treatment plans.
Telecommunications: used to analyze customer behavior and optimize network performance.

Market Basket Analysis: analyzing customer purchases to identify items frequently purchased together, and making recommendations or suggestions to customers.