Big Data Trends and Measurements
45 Questions
0 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is the approximate size of data projected to be created globally by 2025?

  • 200 zettabytes
  • 175 zettabytes (correct)
  • 100 zettabytes
  • 50 zettabytes
  • How many bytes are there in one zettabyte?

  • 1 trillion bytes
  • 1 sextillion bytes (correct)
  • 1 quintillion bytes
  • 1 septillion bytes
  • Which of the following data sizes comes immediately before a zettabyte in the data size hierarchy?

  • Exabyte (correct)
  • Petabyte
  • Gigabyte
  • Terabyte
  • If 40 zettabytes are equated to the total number of grains of sand on Earth multiplied by 75, what does this suggest about zettabytes?

    <p>Zettabytes represent a vast amount of data.</p> Signup and view all the answers

    In 2024, what is the global internet penetration rate?

    <p>66.2%</p> Signup and view all the answers

    What is a key trend influencing the growth of Big Data?

    <p>Rise of Internet of Things (IoT)</p> Signup and view all the answers

    Which of the following is an example of Big Data?

    <p>Data generated by millions of IoT devices</p> Signup and view all the answers

    What plays a crucial role in analyzing Big Data effectively?

    <p>Analytic flow processes</p> Signup and view all the answers

    Which statement best reflects the nature of Big Data?

    <p>Big Data involves a variety of data formats and sources.</p> Signup and view all the answers

    Which of the following is NOT a type of Big Data?

    <p>Fixed-data</p> Signup and view all the answers

    What percentage of activities are expected to be cloud-based due to Big Data and IoT?

    <p>92%</p> Signup and view all the answers

    What is a significant source of Big Data in modern society?

    <p>Social media interactions</p> Signup and view all the answers

    Which of the following refers to the increasing amount of data generated globally over time?

    <p>Data accumulation</p> Signup and view all the answers

    What is a defining characteristic of big data?

    <p>Data volume, velocity, or variety must be large.</p> Signup and view all the answers

    What does 'velocity' in the context of big data refer to?

    <p>The speed at which data is generated.</p> Signup and view all the answers

    Which of the following describes 'volume' in big data?

    <p>The significant amount of data accumulated.</p> Signup and view all the answers

    An example of a high-velocity data source is:

    <p>Social media feeds.</p> Signup and view all the answers

    Which statement about the volume aspect of big data is true?

    <p>Volume refers to the large amount of diverse datasets.</p> Signup and view all the answers

    Why is traditional database technology often insufficient for managing big data?

    <p>They cannot handle large volumes, high velocities, and various types of data effectively.</p> Signup and view all the answers

    What term describes collections of datasets that are too large for traditional data processing tools?

    <p>Big data.</p> Signup and view all the answers

    What does the term 'Variety' in big data refer to?

    <p>The types of data and their forms</p> Signup and view all the answers

    Which of the following is an example of structured data?

    <p>Financial records</p> Signup and view all the answers

    Which scenario exemplifies a challenge posed by big data?

    <p>A website receiving millions of user interactions every day.</p> Signup and view all the answers

    Why is cleansing data important in big data applications?

    <p>To filter out incorrect and faulty data</p> Signup and view all the answers

    What does the 'Value' in big data signify?

    <p>The usefulness of data for its intended purpose</p> Signup and view all the answers

    Which statement best describes unstructured data?

    <p>It cannot be easily organized or analyzed traditionally.</p> Signup and view all the answers

    What is the implication of having inaccurate data in data-driven applications?

    <p>It can lead to misleading outcomes from analysis.</p> Signup and view all the answers

    How does big data relate to mobile phone usage?

    <p>Mobile usage generates digital interactions that contribute to big data.</p> Signup and view all the answers

    What is a key characteristic of semi-structured data?

    <p>It exists within a defined field but allows some flexibility.</p> Signup and view all the answers

    What should be chosen if results are required to be updated every few seconds?

    <p>Real-time analytics mode</p> Signup and view all the answers

    Which analysis type would be most appropriate for discovering patterns in data?

    <p>Pattern Mining</p> Signup and view all the answers

    Which method could be a good choice for batch analytics when performing basic statistics?

    <p>MapReduce</p> Signup and view all the answers

    What type of visualization is best for displaying results that update regularly?

    <p>Dynamic visualization</p> Signup and view all the answers

    If a user wants to actively engage with the application for input on results, which visualization is required?

    <p>Interactive visualization</p> Signup and view all the answers

    Which analytics mode is suitable for applications that only need results generated on a daily or monthly basis?

    <p>Batch mode</p> Signup and view all the answers

    Which analysis type would use techniques to categorize data into distinct classes?

    <p>Classification</p> Signup and view all the answers

    If an application needs to process only data meeting specific criteria and exclude bad records, what is the technique employed?

    <p>Sampling and Filtering</p> Signup and view all the answers

    What type of data is represented by user-generated content such as Facebook posts or tweets?

    <p>Unstructured data</p> Signup and view all the answers

    Which process involves transforming data from one raw format to another?

    <p>Data wrangling</p> Signup and view all the answers

    What is the primary issue that data cleansing addresses?

    <p>Corrupt records and missing values</p> Signup and view all the answers

    Which type of data is generated every time a customer makes a purchase?

    <p>Transactional data</p> Signup and view all the answers

    What does normalization in data preparation aim to resolve?

    <p>Inconsistent units or scales</p> Signup and view all the answers

    Which of the following is an example of captured data?

    <p>Google searches</p> Signup and view all the answers

    What is the purpose of de-duplication in data preparation?

    <p>To create a single version of data</p> Signup and view all the answers

    Which of the following kinds of data is experimental in nature?

    <p>Data gathered from focus groups</p> Signup and view all the answers

    Study Notes

    GFQR 1026: Big Data in "X" - Lecture 1

    • Big data is a collection of datasets whose volume, velocity, or variety is so large that traditional database and data processing tools struggle to manage it.
    • The concept of big data gained momentum in the early 2000s when Doug Laney defined it as the three Vs.
    • Volume: The amount and form of data (e.g., terabytes, records, transactions, tables, files).
    • Velocity: The speed at which data is generated and analyzed (e.g., near time, real time, streams, batches).
    • Variety: Different forms of data (e.g., structured, semi-structured, unstructured, mixed).
    • Organizations collect data from various sources like business transactions, social media, and machine-to-machine data.
    • Big data is often massive-scale data difficult to store, manage, and process with traditional databases.
    • There's no fixed threshold for data volume to be considered big data.
    • Data generated at high velocity contributes to large volumes of accumulated data in short periods.
    • Real-time data analysis is essential in some applications (like fraud detection).
    • Big data systems need flexibility to handle different data types (structured, unstructured, semi-structured).
    • Structured data is data located in fixed fields within records or files (e.g., sales, financial, student data).
    • Unstructured and semi-structured data is hard to organize into rows/columns (e.g., photos, videos, websites, emails, PDFs, social media posts, presentations).
    • Gartner estimates ~20% of enterprise data is structured and ~80% is unstructured.
    • Veracity/Validity: Refers to the accuracy and meaningfulness of data. Data cleansing is crucial to filter out incorrect and faulty data.
    • Value: The usefulness of data for the intended purpose. The goal of big data analytics is to extract value from data.
    • Data is now mined from activities, conversations, photos/videos, sensors, and the Internet of Things.
    • Daily data generation from mobile phones is massive (texts, emails, photos, social media interactions)
    • Number of connected devices (IoT) is growing rapidly (reaching 14.4 billion devices by 2022, exceeding 9.7 billion in 2020).
    • These trends indicate a global increase in the scale and volume of generated data

    Types of Big Data ( examples)

    • Facebook generates over 30 petabytes of data daily.
    • Over 230 million tweets are created every day.
    • Youtube users upload 48 hours of new videos every minute.
    • 294 billion emails are sent per day.
    • IoT devices generate large volumes of data (600 ZB per year in 2020)
    • Large companies like Google, eBay, Facebook, Microsoft, Alibaba Group, Amazon, Twitter, YouTube, and Yahoo! are big data generators.

    Analytic Flow for Big Data (steps)

    1. Data Collection (various sources, structured and unstructured)
    2. Data Preparation (cleaning, wrangling, de-duplication, normalization, sampling)
    3. Analysis Types (e.g., basic stats, regression, recommendation, dimensionality reduction, graph analytics, classification, time series analysis, text analysis, pattern mining)
    4. Analytics Modes (Batch, real-time, interactive)
    5. Visualizations (static, dynamic interactive)

    Studying That Suits You

    Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

    Quiz Team

    Related Documents

    Description

    Test your knowledge on the current state and projected trends of Big Data. This quiz covers essential statistics, definitions, and examples relevant to the world of Big Data and internet data growth. Challenge yourself to understand the magnitude and implications of data in our digital age.

    Use Quizgecko on...
    Browser
    Browser