Data Preprocessing: Importance and Techniques

BraveNaïveArt avatar
BraveNaïveArt
·
·
Download

Start Quiz

Study Flashcards

16 Questions

Data preprocessing helps in dealing with ______ data in the real world.

dirty

Incomplete data may have missing attribute values, lack of certain attributes of interest, or contain only aggregate data, making it ______.

incomplete

No quality data, no quality mining results. Quality decisions must be based on ______ data.

quality

Data preparation, cleaning, and transformation comprises the majority of the work in a data mining application, approximately ______ %.

90

A well-accepted multi-dimensional view of data quality includes measures like accuracy, completeness, consistency, timeliness, believability, value added, ______, and accessibility.

interpretability

Data preprocessing helps in dealing with discrepancies, such as inconsistencies in codes or names, making the data ______.

inconsistent

Data preprocessing involves 4 major tasks: data cleaning, data integration, normalization and aggregation, and ______ reduction

data

Data transformation includes filling in missing values, smoothing noisy data, identifying or removing outliers and noisy data, and resolving ______

inconsistencies

Data preprocessing aims to obtain reduced representation in volume but produce the same or similar analytical results. This process is known as data ______

reduction

One of the tasks in data preprocessing is the integration of multiple databases, or files, which is known as data ______

integration

One of the methods used in handling missing data is to fill in missing values with a global constant, such as 'unknown', which is known as a ______ constant

global

Another method of handling missing data is to fill in missing values with the attribute ______

mean

Noisy data may be due to random error or variance in a measured variable, which is known as ______

noise

One of the problems in data preprocessing is duplicate records, incomplete data, and inconsistent data, which requires data ______

cleaning

One of the methods used in handling noisy data is the binning method, which involves sorting data and partitioning it into (equi-depth) ______

bins

In the binning method for data smoothing, one can smooth by bin means, smooth by bin median, and smooth by bin ______

boundaries

This quiz covers the importance of data preprocessing and various techniques involved such as data cleaning, integration, transformation, reduction, and discretization. It also discusses the challenges associated with real-world data and the need for preprocessing.

Make Your Own Quizzes and Flashcards

Convert your notes into interactive study material.

Get started for free

More Quizzes Like This

Use Quizgecko on...
Browser
Browser