16 Questions
Data preprocessing helps in dealing with ______ data in the real world.
dirty
Incomplete data may have missing attribute values, lack of certain attributes of interest, or contain only aggregate data, making it ______.
incomplete
No quality data, no quality mining results. Quality decisions must be based on ______ data.
quality
Data preparation, cleaning, and transformation comprises the majority of the work in a data mining application, approximately ______ %.
90
A well-accepted multi-dimensional view of data quality includes measures like accuracy, completeness, consistency, timeliness, believability, value added, ______, and accessibility.
interpretability
Data preprocessing helps in dealing with discrepancies, such as inconsistencies in codes or names, making the data ______.
inconsistent
Data preprocessing involves 4 major tasks: data cleaning, data integration, normalization and aggregation, and ______ reduction
data
Data transformation includes filling in missing values, smoothing noisy data, identifying or removing outliers and noisy data, and resolving ______
inconsistencies
Data preprocessing aims to obtain reduced representation in volume but produce the same or similar analytical results. This process is known as data ______
reduction
One of the tasks in data preprocessing is the integration of multiple databases, or files, which is known as data ______
integration
One of the methods used in handling missing data is to fill in missing values with a global constant, such as 'unknown', which is known as a ______ constant
global
Another method of handling missing data is to fill in missing values with the attribute ______
mean
Noisy data may be due to random error or variance in a measured variable, which is known as ______
noise
One of the problems in data preprocessing is duplicate records, incomplete data, and inconsistent data, which requires data ______
cleaning
One of the methods used in handling noisy data is the binning method, which involves sorting data and partitioning it into (equi-depth) ______
bins
In the binning method for data smoothing, one can smooth by bin means, smooth by bin median, and smooth by bin ______
boundaries
This quiz covers the importance of data preprocessing and various techniques involved such as data cleaning, integration, transformation, reduction, and discretization. It also discusses the challenges associated with real-world data and the need for preprocessing.
Make Your Own Quizzes and Flashcards
Convert your notes into interactive study material.
Get started for free