Big Data Overview
8 Questions
0 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is the estimated volume of data the world will generate by 2025?

  • 175 exabytes
  • 175 petabytes
  • 175 terabytes
  • 175 zettabytes (correct)
  • Which of the following is NOT one of the 5 Vs of big data?

  • Velocity
  • Validity (correct)
  • Value
  • Volume
  • What percentage of global data is typically structured and stored in relational databases?

  • Around 40%
  • More than 80%
  • Less than 10%
  • Less than 20% (correct)
  • What type of storage is HDFS known for?

    <p>Handling large volumes of unstructured data</p> Signup and view all the answers

    Which statement regarding datalakes is accurate?

    <p>They keep raw data as it is generated.</p> Signup and view all the answers

    Which of the following is a primary source of big data?

    <p>Internet of Things (IoT) devices</p> Signup and view all the answers

    How many tweets are generated per minute on Twitter?

    <p>500,000</p> Signup and view all the answers

    What is the primary purpose of big data architectures?

    <p>To process and analyze large volumes of data</p> Signup and view all the answers

    Study Notes

    Big Data

    • Data is crucial for decision-making in all business aspects.
    • 2025 global data estimates are 175 zettabytes (ZB) 1ZB = 1 Billion Gigabytes.
    • 2010 global data was significantly less (2 ZB).
    • Daily internet data generation is approximately 2.5 million GB.
    • 90% of current data was generated in the last two years.

    The 5 Vs of Big Data

    • Velocity: Batch, near real-time, real-time, streams.
    • Variety: Structured, unstructured, semi-structured data.
    • Volume: Terabytes, records, transactions, tables.
    • Veracity: Trustworthiness, authenticity, origin.
    • Value: Statistical, events, correlations, hypothetical.

    Data Sources

    • Facebook: 500,000 tweets per minute.
    • Twitter: 500,000 tweets per minute.
    • Instagram: 347,222 posts per minute.
    • Internet of Things (IoT): 75 million connected devices generate data (sensors).

    Data Storage

    • Less than 20% of global data is stored in relational databases (e.g., banks, hospitals, customer data.)
    • 80% of data is unstructured (text, images, video).
    • This data is stored in big data architectures, cloud platforms and NoSQL databases.

    Big Data Storage Technologies

    • Hadoop Distributed File System (HDFS): Divides data into small blocks (e.g., 128 MB or 256 MB) and distributes across multiple servers for redundancy. Ideal for large unstructured/semi-structured data.

    • Data Lakes: Centralized repositories for all data types (structured, semi-structured, unstructured) stored as raw data for long-term analysis.

    • NoSQL Databases: Flexible, fast storage for unstructured data like logs, social media, IoT.

    • Relational Databases (SQL): High consistency, best for well-structured data, transactions.

    Studying That Suits You

    Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

    Quiz Team

    Related Documents

    Big Data Overview PDF

    Description

    Explore the fundamental concepts of big data, including its significance in decision-making and the astounding volume of data generated today. Learn about the 5 Vs of big data—velocity, variety, volume, veracity, and value—and discover the various data sources driving this phenomenon.

    More Like This

    Introduction to Big Data Concepts
    9 questions
    Big Data Concepts and Scaling Methods
    34 questions
    Cours Big Data - Introduction et Concepts
    21 questions
    Use Quizgecko on...
    Browser
    Browser