Podcast
Questions and Answers
Which of the following is not part of the KDD process?
Which of the following is not part of the KDD process?
- Data interpretation
- Data mining
- Data pre-processing
- Data collection (correct)
What is the primary focus of data mining?
What is the primary focus of data mining?
- Result interpretation and reporting
- Testing models and hypotheses on the dataset
- Uncovering hidden patterns in large volumes of data (correct)
- Data collection and preparation
What were terms like data fishing and data dredging used for in the 1960s?
What were terms like data fishing and data dredging used for in the 1960s?
- Referring to bad practice of analyzing data without an a-priori hypothesis (correct)
- Identifying patterns in data
- Referring to good practices in data analysis
- Analyzing data with an a-priori hypothesis
What has the proliferation of computer technology increased in the context of data mining?
What has the proliferation of computer technology increased in the context of data mining?
What does the CRISP-DM methodology refer to?
What does the CRISP-DM methodology refer to?
What are terms interchangeably used with data mining?
What are terms interchangeably used with data mining?
What does data dredging, data fishing, and data snooping refer to in the context of data mining?
What does data dredging, data fishing, and data snooping refer to in the context of data mining?
What is the overall goal of data mining?
What is the overall goal of data mining?
What does data mining bridge the gap from and to?
What does data mining bridge the gap from and to?
What is another notable standard methodology used by data miners?
What is another notable standard methodology used by data miners?
What is the difference between data analysis and data mining?
What is the difference between data analysis and data mining?
What is the overall goal of data mining?
What is the overall goal of data mining?
Which fields does data mining intersect with?
Which fields does data mining intersect with?
Why is the term 'data mining' considered a misnomer?
Why is the term 'data mining' considered a misnomer?
What does data mining encompass apart from the raw analysis step?
What does data mining encompass apart from the raw analysis step?
What is the primary purpose of data cleaning in the context of data mining?
What is the primary purpose of data cleaning in the context of data mining?
What does association rule learning aim to do in data mining?
What does association rule learning aim to do in data mining?
What is the key task of regression in data mining?
What is the key task of regression in data mining?
What is the potential consequence of overfitting in data mining?
What is the potential consequence of overfitting in data mining?
What is the purpose of using a test set in data mining evaluation?
What is the purpose of using a test set in data mining evaluation?
What is the main concern with unintentional misuse of data mining?
What is the main concern with unintentional misuse of data mining?
What is the purpose of results validation in the context of data mining?
What is the purpose of results validation in the context of data mining?
What is the potential issue with data mining algorithms finding patterns in the training set that are not present in the general data set?
What is the potential issue with data mining algorithms finding patterns in the training set that are not present in the general data set?
What is the task of discovering groups and structures in the data that are 'similar' without using known structures in the data?
What is the task of discovering groups and structures in the data that are 'similar' without using known structures in the data?
Which task involves attempting to find a function that models the data with the least error for estimating the relationships among data or datasets?
Which task involves attempting to find a function that models the data with the least error for estimating the relationships among data or datasets?
What is the task of generalizing known structure to apply to new data, such as classifying an e-mail as 'legitimate' or as 'spam'?
What is the task of generalizing known structure to apply to new data, such as classifying an e-mail as 'legitimate' or as 'spam'?
Which task involves providing a more compact representation of the data set, including visualization and report generation?
Which task involves providing a more compact representation of the data set, including visualization and report generation?
What is the identification of unusual data records, which might be interesting or data errors that require further investigation due to being out of standard range?
What is the identification of unusual data records, which might be interesting or data errors that require further investigation due to being out of standard range?
Which task searches for relationships between variables, such as a supermarket gathering data on customer purchasing habits?
Which task searches for relationships between variables, such as a supermarket gathering data on customer purchasing habits?
What is the unintentional misuse of data mining, producing results that appear significant but do not actually predict future behavior and cannot be reproduced on a new sample of data?
What is the unintentional misuse of data mining, producing results that appear significant but do not actually predict future behavior and cannot be reproduced on a new sample of data?
What is the final step of knowledge discovery from data, involving verifying that the patterns produced by the data mining algorithms occur in the wider data set?
What is the final step of knowledge discovery from data, involving verifying that the patterns produced by the data mining algorithms occur in the wider data set?
Flashcards are hidden until you start studying
Study Notes
Data Mining: Practical Machine Learning Tools and Techniques with Java
- The book Data Mining: Practical Machine Learning Tools and Techniques with Java was initially intended to be named Practical Machine Learning, with the term data mining added for marketing purposes.
- Data mining involves the semi-automatic or automatic analysis of large data sets to extract previously unknown patterns, such as cluster analysis, anomaly detection, and association rule mining.
- Data mining does not include data collection, preparation, or result interpretation and reporting, but these belong to the overall KDD process.
- Data analysis tests models and hypotheses on the dataset, while data mining uses machine learning and statistical models to uncover hidden patterns in large volumes of data.
- Terms related to data mining include data dredging, data fishing, and data snooping, which refer to the use of data mining methods to sample parts of a larger population data set.
- In the 1960s, terms like data fishing and data dredging were used to refer to the bad practice of analyzing data without an a-priori hypothesis.
- The term "data mining" was initially used critically by economist Michael Lovell but gained positive connotations in the 1990s in the database community.
- Data mining is interchangeably used with knowledge discovery, and other terms include data archaeology, information harvesting, and knowledge extraction.
- Early methods of identifying patterns in data include Bayes' theorem and regression analysis, and the proliferation of computer technology has increased data collection, storage, and manipulation ability.
- Data mining bridges the gap from applied statistics and artificial intelligence to database management, applying methods to ever-larger data sets.
- The knowledge discovery in databases (KDD) process includes stages such as selection, pre-processing, transformation, data mining, and interpretation/evaluation.
- Polls show that the CRISP-DM methodology is the leading methodology used by data miners, with SEMMA being another notable standard.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.