ITBAN 3 – Fundamentals of Analytics Modelling Data Preprocessing Quiz
28 Questions
0 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is one of the major tasks in data preprocessing?

  • Model deployment
  • Data cleaning (correct)
  • Algorithm training
  • Data visualization
  • Which task involves scaling data to a specific range during data preprocessing?

  • Data reduction
  • Data transformation (correct)
  • Data discretization
  • Data integration
  • Why might missing data occur in a dataset?

  • High importance placed on entering all data
  • Perfect data entry by all employees
  • Due to consistent recording practices
  • Equipment malfunction (correct)
  • Which preprocessing task aims to reduce the volume of data while maintaining analytical results?

    <p>Data reduction</p> Signup and view all the answers

    What is a common reason for missing data in sales records according to the text?

    <p>Misunderstanding of data importance</p> Signup and view all the answers

    Which step of data preprocessing involves integrating multiple databases or files?

    <p>Data integration</p> Signup and view all the answers

    Which of the following is the most effective way to handle a missing class label in a classification task?

    <p>Ignore the tuple</p> Signup and view all the answers

    What is the definition of noise in the context of data?

    <p>Random error in a measured variable</p> Signup and view all the answers

    Which of the following is not a common cause of incorrect attribute values in data?

    <p>Inconsistency in variable naming</p> Signup and view all the answers

    What is the purpose of the binning method in handling noisy data?

    <p>To smooth the data by replacing values with bin means or medians</p> Signup and view all the answers

    Which of the following is a more sophisticated approach to filling in missing values compared to using a global constant or attribute mean?

    <p>Using the most probable value based on inference</p> Signup and view all the answers

    Which of the following is not a common data quality issue that requires data cleaning?

    <p>Unstructured text data</p> Signup and view all the answers

    What is the main purpose of using external references for manual correction in data preprocessing?

    <p>To correct redundant data</p> Signup and view all the answers

    What is the primary goal of data integration?

    <p>Schema integration</p> Signup and view all the answers

    What is the entity identification problem in data integration?

    <p>Identifying real world entities from multiple data sources</p> Signup and view all the answers

    Why might attribute values for the same real world entity be different in data integration?

    <p>Due to different scales used in the data</p> Signup and view all the answers

    What is the purpose of resolving data value conflicts in data integration?

    <p>To ensure consistency when attribute values differ for the same entity</p> Signup and view all the answers

    What is the purpose of clustering in data preprocessing?

    <p>Detect and remove outliers</p> Signup and view all the answers

    Which method divides the range into N intervals of equal size in data discretization?

    <p>Equal-width partitioning</p> Signup and view all the answers

    What issue might arise when using equal-width partitioning for discretization?

    <p>Outliers dominating presentation</p> Signup and view all the answers

    Which method involves dividing the range into intervals with approximately the same number of samples?

    <p>Equal-depth partitioning</p> Signup and view all the answers

    What does smoothing by bin means entail?

    <p>Finding the average value in each bin</p> Signup and view all the answers

    In what scenario would smoothing by bin boundaries be preferred?

    <p>Maintaining original range information</p> Signup and view all the answers

    What is the primary reason for data preprocessing?

    <p>To improve the quality of the data</p> Signup and view all the answers

    Which of the following is NOT a key aspect of data quality according to the passage?

    <p>Scalability</p> Signup and view all the answers

    What is the relationship between data quality and mining results according to the passage?

    <p>Data quality and mining results are directly related</p> Signup and view all the answers

    Which of the following is an example of a data quality issue mentioned in the passage?

    <p>Data is incomplete</p> Signup and view all the answers

    What is the primary purpose of a data warehouse according to the passage?

    <p>To integrate and consolidate quality data</p> Signup and view all the answers

    More Like This

    Use Quizgecko on...
    Browser
    Browser