The Data Explosion
38 Questions
0 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is the estimated daily volume of data generated by NASA's current Earth observation satellites?

  • 100 exabytes
  • 1 terabyte (correct)
  • 100 gigabytes
  • 10 petabytes
  • Approximately how many users are there on Facebook?

  • 100 million
  • 900 million (correct)
  • 500 million
  • 1.5 billion
  • What is the estimated number of tweets sent daily on Twitter?

  • 350 million (correct)
  • 100 million
  • 500 million
  • 200 million
  • What is the estimated number of websites?

    <p>650 million</p> Signup and view all the answers

    What type of data is recorded by CCTV recordings?

    <p>Non-symbolic data</p> Signup and view all the answers

    What is the purpose of a Data Warehouse?

    <p>To store and analyze customer transactions</p> Signup and view all the answers

    What is a consequence of the vast amounts of data being stored?

    <p>Most of the data is not examined in detail.</p> Signup and view all the answers

    What is the potential of machine learning technology?

    <p>To solve the problem of the tidal wave of data.</p> Signup and view all the answers

    What is the goal of knowledge discovery?

    <p>To extract implicit, previously unknown and potentially useful information from data.</p> Signup and view all the answers

    What is the role of data mining in knowledge discovery?

    <p>It is a central part of the knowledge discovery process.</p> Signup and view all the answers

    What is the outcome of the knowledge discovery process?

    <p>New and potentially useful knowledge.</p> Signup and view all the answers

    What happens to most of the data that is stored?

    <p>It is merely stored and never examined.</p> Signup and view all the answers

    What is the current state of the world in terms of data and knowledge?

    <p>Data rich but knowledge poor.</p> Signup and view all the answers

    What is a potential application of knowledge discovery?

    <p>All of the above.</p> Signup and view all the answers

    What is the primary goal of using labelled data in data mining?

    <p>To predict the value of a designated attribute for unseen instances</p> Signup and view all the answers

    What is the term for data mining using unlabelled data?

    <p>Unsupervised learning</p> Signup and view all the answers

    What is the task called when the designated attribute is categorical?

    <p>Classification</p> Signup and view all the answers

    What is the term for a dataset of examples, each comprising the values of a number of variables?

    <p>Instances</p> Signup and view all the answers

    What is the goal of data mining when using unlabelled data?

    <p>To extract the most information from the data available</p> Signup and view all the answers

    What is the term for the process of predicting a numerical outcome?

    <p>Regression</p> Signup and view all the answers

    What is the primary goal of classification in data mining?

    <p>To predict the value of a categorical attribute</p> Signup and view all the answers

    What is the term for data that has a specially designated attribute?

    <p>Labelled data</p> Signup and view all the answers

    What is the goal of the analysis in the given dataset?

    <p>To predict the degree classification for other students given their grade profiles</p> Signup and view all the answers

    What method involves identifying the closest examples to an unclassified instance?

    <p>Nearest Neighbour Matching</p> Signup and view all the answers

    What is the purpose of a classification tree?

    <p>To generate classification rules</p> Signup and view all the answers

    What type of structure is used to generate classification rules?

    <p>Decision Tree</p> Signup and view all the answers

    What is the form of the dataset?

    <p>A table containing students' grades on five subjects</p> Signup and view all the answers

    What is the purpose of the classification rules?

    <p>To predict the degree classification of an unseen instance</p> Signup and view all the answers

    What is the result of applying the nearest neighbour matching method?

    <p>A predicted degree classification for an unseen instance</p> Signup and view all the answers

    What is the relationship between the attributes in the dataset?

    <p>The attributes are used to predict the degree classification</p> Signup and view all the answers

    What is the primary goal of market basket analysis?

    <p>To find relationships between product purchases</p> Signup and view all the answers

    What is the purpose of stating association rules with additional information?

    <p>To indicate the reliability of the rules</p> Signup and view all the answers

    What is the main difference between supervised and unsupervised learning?

    <p>The presence of labeled data</p> Signup and view all the answers

    What is the purpose of clustering algorithms?

    <p>To find groups of similar items</p> Signup and view all the answers

    What is an example of a clustering application?

    <p>Fault diagnosis</p> Signup and view all the answers

    What is the concept of 'IF variable 1 > 85 and switch 6 = open THEN variable 23 < 47.5 and switch 8 = closed (probability = 0.8)' an example of?

    <p>Association rule</p> Signup and view all the answers

    What is the term for the type of prediction where the value to be predicted is a label?

    <p>Classification</p> Signup and view all the answers

    What is the term for the process of finding relationships between product purchases?

    <p>Market basket analysis</p> Signup and view all the answers

    Study Notes

    The Data Explosion

    • Modern computer systems are accumulating data at an unimaginable rate from a wide variety of sources, including point-of-sale machines, machines logging cheque clearance, bank cash withdrawals, credit card transactions, and Earth observation satellites.
    • The volume of data is enormous, with examples including:
      • NASA Earth observation satellites generating a terabyte (10^9 bytes) of data every day.
      • The Human Genome project storing thousands of bytes for each of several billion genetic bases.
      • Data warehouses containing over a hundred million customer transactions.
      • Automatic recording devices, such as credit card transaction files and web logs, as well as non-symbolic data such as CCTV recordings.
      • Over 650 million websites, with some extremely large sites.
      • Over 900 million Facebook users, with an estimated 3 billion postings a day, and 150 million Twitter users, sending 350 million tweets a day.

    Knowledge Discovery

    • Knowledge Discovery is the non-trivial extraction of implicit, previously unknown, and potentially useful information from data.
    • It involves a process of data mining, which is a central part of the Knowledge Discovery process.
    • The Knowledge Discovery process involves:
      • Data coming in from many sources.
      • Data integration and storage in a common data store.
      • Pre-processing of data into a standard format.
      • Applying a data mining algorithm to produce rules or patterns.
      • Interpreting the output to gain new and potentially useful knowledge.

    Types of Data and Data Mining

    • There are two types of data: labelled and unlabelled data.
    • Labelled data is used for supervised learning, where the aim is to predict the value of a designated attribute for unseen instances.
    • Unlabelled data is used for unsupervised learning, where the aim is to extract the most information possible from the available data.
    • Data mining applications can be divided into four main types:
      • Classification: predicting a categorical value, such as classifying medical patients into high, medium, or low risk of acquiring an illness.
      • Numerical Prediction: predicting a numerical value, such as the expected sale price of a house.
      • Association: finding relationships amongst variables, such as in market basket analysis.
      • Clustering: grouping items that are similar, such as customers according to income, age, and types of policy purchased.

    Classification

    • Classification is a common application of data mining, involving predicting a categorical value.
    • Examples include:
      • Classifying medical patients into high, medium, or low risk of acquiring an illness.
      • Classifying people into those likely to vote for different political parties.
      • Classifying student projects into distinction, merit, pass, or fail.

    Association Rules

    • Association rules involve finding relationships amongst variables, such as in market basket analysis.
    • An example of an association rule is: IF cheese AND milk THEN bread (probability = 0.7), indicating that 70% of customers who buy cheese and milk also buy bread.

    Clustering

    • Clustering algorithms examine data to find groups of items that are similar.
    • Examples include:
      • Grouping customers according to income, age, and types of policy purchased.
      • Grouping electrical faults according to the values of certain key variables.

    Studying That Suits You

    Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

    Quiz Team

    Description

    Explore the rapid accumulation of data in modern computer systems from various sources, including point-of-sale machines, bank transactions, and earth observation satellites.

    Use Quizgecko on...
    Browser
    Browser