Podcast
Questions and Answers
What is the estimated daily volume of data generated by NASA's current Earth observation satellites?
What is the estimated daily volume of data generated by NASA's current Earth observation satellites?
Approximately how many users are there on Facebook?
Approximately how many users are there on Facebook?
What is the estimated number of tweets sent daily on Twitter?
What is the estimated number of tweets sent daily on Twitter?
What is the estimated number of websites?
What is the estimated number of websites?
Signup and view all the answers
What type of data is recorded by CCTV recordings?
What type of data is recorded by CCTV recordings?
Signup and view all the answers
What is the purpose of a Data Warehouse?
What is the purpose of a Data Warehouse?
Signup and view all the answers
What is a consequence of the vast amounts of data being stored?
What is a consequence of the vast amounts of data being stored?
Signup and view all the answers
What is the potential of machine learning technology?
What is the potential of machine learning technology?
Signup and view all the answers
What is the goal of knowledge discovery?
What is the goal of knowledge discovery?
Signup and view all the answers
What is the role of data mining in knowledge discovery?
What is the role of data mining in knowledge discovery?
Signup and view all the answers
What is the outcome of the knowledge discovery process?
What is the outcome of the knowledge discovery process?
Signup and view all the answers
What happens to most of the data that is stored?
What happens to most of the data that is stored?
Signup and view all the answers
What is the current state of the world in terms of data and knowledge?
What is the current state of the world in terms of data and knowledge?
Signup and view all the answers
What is a potential application of knowledge discovery?
What is a potential application of knowledge discovery?
Signup and view all the answers
What is the primary goal of using labelled data in data mining?
What is the primary goal of using labelled data in data mining?
Signup and view all the answers
What is the term for data mining using unlabelled data?
What is the term for data mining using unlabelled data?
Signup and view all the answers
What is the task called when the designated attribute is categorical?
What is the task called when the designated attribute is categorical?
Signup and view all the answers
What is the term for a dataset of examples, each comprising the values of a number of variables?
What is the term for a dataset of examples, each comprising the values of a number of variables?
Signup and view all the answers
What is the goal of data mining when using unlabelled data?
What is the goal of data mining when using unlabelled data?
Signup and view all the answers
What is the term for the process of predicting a numerical outcome?
What is the term for the process of predicting a numerical outcome?
Signup and view all the answers
What is the primary goal of classification in data mining?
What is the primary goal of classification in data mining?
Signup and view all the answers
What is the term for data that has a specially designated attribute?
What is the term for data that has a specially designated attribute?
Signup and view all the answers
What is the goal of the analysis in the given dataset?
What is the goal of the analysis in the given dataset?
Signup and view all the answers
What method involves identifying the closest examples to an unclassified instance?
What method involves identifying the closest examples to an unclassified instance?
Signup and view all the answers
What is the purpose of a classification tree?
What is the purpose of a classification tree?
Signup and view all the answers
What type of structure is used to generate classification rules?
What type of structure is used to generate classification rules?
Signup and view all the answers
What is the form of the dataset?
What is the form of the dataset?
Signup and view all the answers
What is the purpose of the classification rules?
What is the purpose of the classification rules?
Signup and view all the answers
What is the result of applying the nearest neighbour matching method?
What is the result of applying the nearest neighbour matching method?
Signup and view all the answers
What is the relationship between the attributes in the dataset?
What is the relationship between the attributes in the dataset?
Signup and view all the answers
What is the primary goal of market basket analysis?
What is the primary goal of market basket analysis?
Signup and view all the answers
What is the purpose of stating association rules with additional information?
What is the purpose of stating association rules with additional information?
Signup and view all the answers
What is the main difference between supervised and unsupervised learning?
What is the main difference between supervised and unsupervised learning?
Signup and view all the answers
What is the purpose of clustering algorithms?
What is the purpose of clustering algorithms?
Signup and view all the answers
What is an example of a clustering application?
What is an example of a clustering application?
Signup and view all the answers
What is the concept of 'IF variable 1 > 85 and switch 6 = open THEN variable 23 < 47.5 and switch 8 = closed (probability = 0.8)' an example of?
What is the concept of 'IF variable 1 > 85 and switch 6 = open THEN variable 23 < 47.5 and switch 8 = closed (probability = 0.8)' an example of?
Signup and view all the answers
What is the term for the type of prediction where the value to be predicted is a label?
What is the term for the type of prediction where the value to be predicted is a label?
Signup and view all the answers
What is the term for the process of finding relationships between product purchases?
What is the term for the process of finding relationships between product purchases?
Signup and view all the answers
Study Notes
The Data Explosion
- Modern computer systems are accumulating data at an unimaginable rate from a wide variety of sources, including point-of-sale machines, machines logging cheque clearance, bank cash withdrawals, credit card transactions, and Earth observation satellites.
- The volume of data is enormous, with examples including:
- NASA Earth observation satellites generating a terabyte (10^9 bytes) of data every day.
- The Human Genome project storing thousands of bytes for each of several billion genetic bases.
- Data warehouses containing over a hundred million customer transactions.
- Automatic recording devices, such as credit card transaction files and web logs, as well as non-symbolic data such as CCTV recordings.
- Over 650 million websites, with some extremely large sites.
- Over 900 million Facebook users, with an estimated 3 billion postings a day, and 150 million Twitter users, sending 350 million tweets a day.
Knowledge Discovery
- Knowledge Discovery is the non-trivial extraction of implicit, previously unknown, and potentially useful information from data.
- It involves a process of data mining, which is a central part of the Knowledge Discovery process.
- The Knowledge Discovery process involves:
- Data coming in from many sources.
- Data integration and storage in a common data store.
- Pre-processing of data into a standard format.
- Applying a data mining algorithm to produce rules or patterns.
- Interpreting the output to gain new and potentially useful knowledge.
Types of Data and Data Mining
- There are two types of data: labelled and unlabelled data.
- Labelled data is used for supervised learning, where the aim is to predict the value of a designated attribute for unseen instances.
- Unlabelled data is used for unsupervised learning, where the aim is to extract the most information possible from the available data.
- Data mining applications can be divided into four main types:
- Classification: predicting a categorical value, such as classifying medical patients into high, medium, or low risk of acquiring an illness.
- Numerical Prediction: predicting a numerical value, such as the expected sale price of a house.
- Association: finding relationships amongst variables, such as in market basket analysis.
- Clustering: grouping items that are similar, such as customers according to income, age, and types of policy purchased.
Classification
- Classification is a common application of data mining, involving predicting a categorical value.
- Examples include:
- Classifying medical patients into high, medium, or low risk of acquiring an illness.
- Classifying people into those likely to vote for different political parties.
- Classifying student projects into distinction, merit, pass, or fail.
Association Rules
- Association rules involve finding relationships amongst variables, such as in market basket analysis.
- An example of an association rule is: IF cheese AND milk THEN bread (probability = 0.7), indicating that 70% of customers who buy cheese and milk also buy bread.
Clustering
- Clustering algorithms examine data to find groups of items that are similar.
- Examples include:
- Grouping customers according to income, age, and types of policy purchased.
- Grouping electrical faults according to the values of certain key variables.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Description
Explore the rapid accumulation of data in modern computer systems from various sources, including point-of-sale machines, bank transactions, and earth observation satellites.