Big Data Overview
8 Questions
0 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is the estimated global volume of data generated by 2025?

  • 175 zettabytes (correct)
  • 175 terabytes
  • 175 petabytes
  • 175 gigabytes
  • Which of the following statements about the 5 Vs of Big Data is incorrect?

  • Variety includes only structured data. (correct)
  • Volume relates to the amount of generated data.
  • Velocity refers to the speed of data processing.
  • Veracity addresses the trustworthiness of data.
  • Which platform generates approximately 500,000 tweets per minute?

  • Twitter (correct)
  • Facebook
  • LinkedIn
  • Instagram
  • What percentage of global data is typically stored in relational databases?

    <p>10%</p> Signup and view all the answers

    What is the primary characteristic of data lakes?

    <p>They store raw data without transformation.</p> Signup and view all the answers

    What technology is specifically designed to store large volumes of data across multiple servers?

    <p>Hadoop Distributed File System (HDFS)</p> Signup and view all the answers

    What main challenge does 80% of global unstructured data present?

    <p>It requires advanced analytics techniques.</p> Signup and view all the answers

    Which of the following describes veracity in the context of Big Data?

    <p>The authenticity and trustworthiness of data.</p> Signup and view all the answers

    Study Notes

    Big Data

    • Data is essential for decision-making in all aspects of business
    • Kathleen Hogan, Microsoft Chief People Officer, highlights its importance

    Global Volume of Data

    • In 2025, the world is projected to generate 175 zettabytes (ZB) of data (1 ZB = 1 billion gigabytes)
    • In 2010, the data volume was significantly lower
    • Daily internet users generate about 2.5 million gigabytes of data
    • 90% of today's data has been created in the last two years

    The 5 Vs of Big Data

    • Velocity: Batch, near real-time, real-time, and streaming data
    • Variety: Structured, unstructured, and semi-structured data
    • Volume: Terabytes, records, transactions, and substantial amounts of data
    • Veracity: Trustworthiness, authenticity, origin, reputation, and accountability
    • Value: Statistical data, events, correlations, and potential insights

    Sources of Data

    • Main sources include Facebook, Twitter (500,000 tweets/minute), Instagram (347,000 posts/minute), and Internet of Things (IoT) devices (75 million connected devices generating data, including sensors)

    Storage of Generated Data

    • Less than 20% of global data is stored in relational databases (important for handling banks, hospitals, and customer data).
    • 80% of global data is unstructured (text, images, video) and stored in big data architectures (cloud and NoSQL databases)

    Big Data Storage (HDFS)

    • Hadoop Distributed File System (HDFS): A storage system designed for large volumes of data across multiple servers
    • Data is divided into small blocks (typically 128 MB or 256 MB) and spread across various nodes (servers)
    • Redundancy (copies) of data ensures resilience against node failure
    • Suitable for unstructured or semi-structured data

    Data Lakes

    • Centralized repositories storing various data formats (structured, semi-structured, and unstructured) as raw data, with no transformation
    • Ideal for long-term analysis when the analysis type isn't known beforehand

    Studying That Suits You

    Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

    Quiz Team

    Related Documents

    Big Data Overview PDF

    Description

    Explore the vast world of big data and its significance in business decision-making. Learn about the projected global data volume, the 5 Vs of big data, and key sources generating massive amounts of information in today's digital landscape.

    More Like This

    Big Data Sources
    30 questions

    Big Data Sources

    RicherNobelium avatar
    RicherNobelium
    Big Data Overview
    30 questions
    Use Quizgecko on...
    Browser
    Browser