Data Preprocessing: Importance and Techniques
16 Questions
1 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

Data preprocessing helps in dealing with ______ data in the real world.

dirty

Incomplete data may have missing attribute values, lack of certain attributes of interest, or contain only aggregate data, making it ______.

incomplete

No quality data, no quality mining results. Quality decisions must be based on ______ data.

quality

Data preparation, cleaning, and transformation comprises the majority of the work in a data mining application, approximately ______ %.

<p>90</p> Signup and view all the answers

A well-accepted multi-dimensional view of data quality includes measures like accuracy, completeness, consistency, timeliness, believability, value added, ______, and accessibility.

<p>interpretability</p> Signup and view all the answers

Data preprocessing helps in dealing with discrepancies, such as inconsistencies in codes or names, making the data ______.

<p>inconsistent</p> Signup and view all the answers

Data preprocessing involves 4 major tasks: data cleaning, data integration, normalization and aggregation, and ______ reduction

<p>data</p> Signup and view all the answers

Data transformation includes filling in missing values, smoothing noisy data, identifying or removing outliers and noisy data, and resolving ______

<p>inconsistencies</p> Signup and view all the answers

Data preprocessing aims to obtain reduced representation in volume but produce the same or similar analytical results. This process is known as data ______

<p>reduction</p> Signup and view all the answers

One of the tasks in data preprocessing is the integration of multiple databases, or files, which is known as data ______

<p>integration</p> Signup and view all the answers

One of the methods used in handling missing data is to fill in missing values with a global constant, such as 'unknown', which is known as a ______ constant

<p>global</p> Signup and view all the answers

Another method of handling missing data is to fill in missing values with the attribute ______

<p>mean</p> Signup and view all the answers

Noisy data may be due to random error or variance in a measured variable, which is known as ______

<p>noise</p> Signup and view all the answers

One of the problems in data preprocessing is duplicate records, incomplete data, and inconsistent data, which requires data ______

<p>cleaning</p> Signup and view all the answers

One of the methods used in handling noisy data is the binning method, which involves sorting data and partitioning it into (equi-depth) ______

<p>bins</p> Signup and view all the answers

In the binning method for data smoothing, one can smooth by bin means, smooth by bin median, and smooth by bin ______

<p>boundaries</p> Signup and view all the answers

More Like This

Use Quizgecko on...
Browser
Browser