ITBAN 3 – Fundamentals of Analytics Modelling Data Preprocessing Quiz
28 Questions
0 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is one of the major tasks in data preprocessing?

  • Model deployment
  • Data cleaning (correct)
  • Algorithm training
  • Data visualization

Which task involves scaling data to a specific range during data preprocessing?

  • Data reduction
  • Data transformation (correct)
  • Data discretization
  • Data integration

Why might missing data occur in a dataset?

  • High importance placed on entering all data
  • Perfect data entry by all employees
  • Due to consistent recording practices
  • Equipment malfunction (correct)

Which preprocessing task aims to reduce the volume of data while maintaining analytical results?

<p>Data reduction (D)</p> Signup and view all the answers

What is a common reason for missing data in sales records according to the text?

<p>Misunderstanding of data importance (C)</p> Signup and view all the answers

Which step of data preprocessing involves integrating multiple databases or files?

<p>Data integration (B)</p> Signup and view all the answers

Which of the following is the most effective way to handle a missing class label in a classification task?

<p>Ignore the tuple (A)</p> Signup and view all the answers

What is the definition of noise in the context of data?

<p>Random error in a measured variable (D)</p> Signup and view all the answers

Which of the following is not a common cause of incorrect attribute values in data?

<p>Inconsistency in variable naming (D)</p> Signup and view all the answers

What is the purpose of the binning method in handling noisy data?

<p>To smooth the data by replacing values with bin means or medians (A)</p> Signup and view all the answers

Which of the following is a more sophisticated approach to filling in missing values compared to using a global constant or attribute mean?

<p>Using the most probable value based on inference (D)</p> Signup and view all the answers

Which of the following is not a common data quality issue that requires data cleaning?

<p>Unstructured text data (C)</p> Signup and view all the answers

What is the main purpose of using external references for manual correction in data preprocessing?

<p>To correct redundant data (B)</p> Signup and view all the answers

What is the primary goal of data integration?

<p>Schema integration (C)</p> Signup and view all the answers

What is the entity identification problem in data integration?

<p>Identifying real world entities from multiple data sources (D)</p> Signup and view all the answers

Why might attribute values for the same real world entity be different in data integration?

<p>Due to different scales used in the data (B)</p> Signup and view all the answers

What is the purpose of resolving data value conflicts in data integration?

<p>To ensure consistency when attribute values differ for the same entity (D)</p> Signup and view all the answers

What is the purpose of clustering in data preprocessing?

<p>Detect and remove outliers (A)</p> Signup and view all the answers

Which method divides the range into N intervals of equal size in data discretization?

<p>Equal-width partitioning (B)</p> Signup and view all the answers

What issue might arise when using equal-width partitioning for discretization?

<p>Outliers dominating presentation (A)</p> Signup and view all the answers

Which method involves dividing the range into intervals with approximately the same number of samples?

<p>Equal-depth partitioning (D)</p> Signup and view all the answers

What does smoothing by bin means entail?

<p>Finding the average value in each bin (A)</p> Signup and view all the answers

In what scenario would smoothing by bin boundaries be preferred?

<p>Maintaining original range information (C)</p> Signup and view all the answers

What is the primary reason for data preprocessing?

<p>To improve the quality of the data (B)</p> Signup and view all the answers

Which of the following is NOT a key aspect of data quality according to the passage?

<p>Scalability (A)</p> Signup and view all the answers

What is the relationship between data quality and mining results according to the passage?

<p>Data quality and mining results are directly related (C)</p> Signup and view all the answers

Which of the following is an example of a data quality issue mentioned in the passage?

<p>Data is incomplete (B)</p> Signup and view all the answers

What is the primary purpose of a data warehouse according to the passage?

<p>To integrate and consolidate quality data (B)</p> Signup and view all the answers

More Like This

Use Quizgecko on...
Browser
Browser