Data Preprocessing: Overview and Major Tasks

SuperCthulhu avatar
SuperCthulhu
·
·
Download

Start Quiz

Study Flashcards

17 Questions

What is the primary purpose of data integration according to the text?

To combine data from multiple sources into a coherent store

Which of the following is NOT mentioned as a major task in data preprocessing according to the text?

Data Sampling

What technique is mentioned in the text for detecting data value conflicts?

Correlation and clustering to find outliers

What is the purpose of data scrubbing?

To detect errors and make corrections using simple domain knowledge

Which of the following is mentioned in the text as a tool for data migration and integration?

Data migration tools that allow transformations to be specified

What is the relationship between data cleaning and data auditing according to the text?

Data cleaning is a subset of data auditing, where data auditing involves analyzing data to discover rules and relationships

What is one of the major reasons for data being considered dirty in the real world?

Contains noise, errors, or outliers

Which of the following is an example of inconsistent data according to the text?

Age=“42”, Birthday=“03/07/2010”

Why might missing data need to be inferred according to the text?

Due to intentional disguising of missing data

What is a common cause of missing data according to the text?

Certain data not considered important at the time of entry

Which process involves reducing the dimensionality or numerosity of the data, according to the text?

Data reduction

What is a characteristic of noisy data as described in the text?

Containing noise, errors, or outliers

What is the main purpose of data cleaning in data preprocessing?

To fill in missing values and smooth noisy data

Which of the following is NOT a measure of data quality?

Scalability

What is the purpose of data integration in data preprocessing?

To combine data from different sources into a coherent dataset

Which of the following is NOT a task in data preprocessing?

Data visualization

What is the purpose of data reduction in data preprocessing?

To remove irrelevant or redundant data

Study Notes

Data Preprocessing

  • Data preprocessing involves data cleaning, data integration, data reduction, and data transformation and discretization.

Data Quality

  • Data quality is measured by accuracy, completeness, consistency, timeliness, believability, and interpretability.
  • Inaccurate data can be incorrect or wrong.
  • Incomplete data can be missing or unavailable.
  • Inconsistent data can be modified or dangling.
  • Untimely data can be outdated.
  • Unbelievable data can be untrustworthy.
  • Uninterpretable data can be difficult to understand.

Data Cleaning

  • Data cleaning involves detecting and correcting errors, inconsistencies, and inaccuracies.
  • Data cleaning methods include combined computer and human inspection, data discrepancy detection, data scrubbing, and data auditing.
  • Data scrubbing uses simple domain knowledge to detect errors and make corrections.
  • Data auditing analyzes data to discover rules and relationships to detect violators.

Data Integration

  • Data integration combines data from multiple sources into a coherent store.
  • Data integration involves entity identification, detecting and resolving data value conflicts.
  • Entity identification involves identifying real-world entities from multiple data sources.
  • Data value conflicts occur when attribute values from different sources are different.

Data Reduction

  • Data reduction involves reducing the size of the data while preserving its integrity.
  • Data reduction techniques include dimensionality reduction, numerosity reduction, and data compression.

Data Transformation and Discretization

  • Data transformation involves converting data into a suitable format for analysis.
  • Data discretization involves converting continuous data into discrete data.
  • Normalization and concept hierarchy generation are techniques used in data transformation and discretization.

Learn about data preprocessing techniques such as cleaning, integration, reduction, and transformation. This quiz covers topics like data quality, data cleaning, data integration, and data reduction.

Make Your Own Quizzes and Flashcards

Convert your notes into interactive study material.

Get started for free
Use Quizgecko on...
Browser
Browser