Big Data Overview and 5 Vs
8 Questions
0 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is the projected global data generation volume by 2025?

  • 100 zettabytes
  • 500 zettabytes
  • 175 zettabytes (correct)
  • 250 zettabytes
  • Which of the following best describes the concept of Veracity in Big Data?

  • The reliability and authenticity of data (correct)
  • The different forms of data present
  • The speed at which data is generated
  • The overall volume of data managed
  • Which percentage of global data is typically stored in Relational Databases?

  • Over 75%
  • About 20% (correct)
  • Approximately 50%
  • Less than 5%
  • What is the primary purpose of HDFS (Hadoop Distributed File System)?

    <p>To provide high redundancy and handle large volumes of data</p> Signup and view all the answers

    What type of data does a Data Lake primarily store?

    <p>Raw data of all types without transformation</p> Signup and view all the answers

    Which of the following is a characteristic of the data generated by IoT devices?

    <p>Large volumes of unstructured data</p> Signup and view all the answers

    What does the term Variety in Big Data refer to?

    <p>The various formats of data captured</p> Signup and view all the answers

    Which of the following platforms is NOT typically associated with Big Data generation?

    <p>Microsoft Word</p> Signup and view all the answers

    Study Notes

    Big Data

    • Data is crucial for decision-making in all aspects of business
    • By 2025, the world is estimated to generate 175 zettabytes (ZB) of data (1 ZB = 1 billion gigabytes)
    • In 2010, data generation was significantly lower (just 2 ZB)
    • Daily, internet users generate roughly 2.5 million gigabytes of data
    • The majority (90%) of current data was generated in the past two years

    The 5 Vs of Big Data

    • Velocity: Data streams in various formats (batch, near real-time, real-time, streams)
    • Variety: Data comes in various structures (structured, unstructured, semi-structured, including all the above)
    • Volume: Data amounts are massive (terabytes, records, transactions, tables, files)
    • Veracity: Data reliability concerns trustworthiness, authenticity, origin, reputation, and accountability
    • Value: Data's worth can be derived from statistical patterns, events, correlations, and hypothetical connections

    Sources of Data

    • Facebook: 500,000 tweets per minute
    • Twitter: 347,222 posts per minute
    • Internet of Things (IoT): 75 million connected devices generating data, including sensors

    Storage of Generated Data

    • Less than 20% of global data is stored in Relational Databases (important for banks, hospitals, customers)
    • 80% is unstructured (text, images, video), stored in Big Data Architectures, in the cloud, and NoSQL databases

    Big Data Storage (HDFS)

    • Hadoop Distributed File System (HDFS) is designed for handling large data volumes across multiple servers
    • Data is broken into smaller blocks (128 MB or 256 MB) and distributed across different nodes (servers)
    • Data replication provides high redundancy to avoid data loss if a node fails
    • Ideal for storing unstructured or semi-structured large amounts of data

    Data Lakes

    • Centralized repository for various data types (structured, semi-structured, unstructured)
    • Data is stored in raw format as it is generated, without transformation
    • Useful when dealing with large volumes for long-term analysis or when the analysis type is unknown
    • Suitable when the analysis type is unknown or it needs to be used multiple times later on

    Studying That Suits You

    Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

    Quiz Team

    Related Documents

    Big Data Overview PDF

    Description

    Explore the significance of big data and its impact on decision-making in business. Understand the 5 Vs of big data: Velocity, Variety, Volume, Veracity, and Value. Discover how data sources contribute to the ever-growing data landscape.

    More Like This

    Use Quizgecko on...
    Browser
    Browser