Big Data Overview

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is the estimated global volume of data generated by 2025?

  • 175 zettabytes (correct)
  • 175 terabytes
  • 175 petabytes
  • 175 gigabytes

Which of the following statements about the 5 Vs of Big Data is incorrect?

  • Variety includes only structured data. (correct)
  • Volume relates to the amount of generated data.
  • Velocity refers to the speed of data processing.
  • Veracity addresses the trustworthiness of data.

Which platform generates approximately 500,000 tweets per minute?

  • Twitter (correct)
  • Facebook
  • LinkedIn
  • Instagram

What percentage of global data is typically stored in relational databases?

<p>10% (A)</p> Signup and view all the answers

What is the primary characteristic of data lakes?

<p>They store raw data without transformation. (A)</p> Signup and view all the answers

What technology is specifically designed to store large volumes of data across multiple servers?

<p>Hadoop Distributed File System (HDFS) (A)</p> Signup and view all the answers

What main challenge does 80% of global unstructured data present?

<p>It requires advanced analytics techniques. (C)</p> Signup and view all the answers

Which of the following describes veracity in the context of Big Data?

<p>The authenticity and trustworthiness of data. (C)</p> Signup and view all the answers

Flashcards

Global Data Volume

The massive amount of data generated daily, reaching 175 zettabytes by 2025, showcasing the exponential growth of data production.

5Vs of Big Data

A set of characteristics used to describe the nature of Big Data, encompassing volume, velocity, variety, veracity, and value.

Datalake

Data stored in its raw format, without any transformations, allowing for flexible analysis and future insights.

HDFS (Hadoop Distributed File System)

A distributed file system designed to handle large volumes of data across multiple servers, ideal for storing unstructured or semi-structured data.

Signup and view all the flashcards

NoSQL

Databases that don't adhere to traditional relational database structures, offering flexibility and scalability for diverse data types.

Signup and view all the flashcards

Velocity (Big Data)

The speed at which data is generated and processed, ranging from batch processing to real-time analysis.

Signup and view all the flashcards

Variety (Big Data)

The variety of data formats encountered in Big Data, including structured, unstructured, and semi-structured data.

Signup and view all the flashcards

Veracity (Big Data)

The trustworthiness and reliability of Big Data, considering factors like authenticity, origin, and accountability.

Signup and view all the flashcards

Study Notes

Big Data

  • Data is essential for decision-making in all aspects of business
  • Kathleen Hogan, Microsoft Chief People Officer, highlights its importance

Global Volume of Data

  • In 2025, the world is projected to generate 175 zettabytes (ZB) of data (1 ZB = 1 billion gigabytes)
  • In 2010, the data volume was significantly lower
  • Daily internet users generate about 2.5 million gigabytes of data
  • 90% of today's data has been created in the last two years

The 5 Vs of Big Data

  • Velocity: Batch, near real-time, real-time, and streaming data
  • Variety: Structured, unstructured, and semi-structured data
  • Volume: Terabytes, records, transactions, and substantial amounts of data
  • Veracity: Trustworthiness, authenticity, origin, reputation, and accountability
  • Value: Statistical data, events, correlations, and potential insights

Sources of Data

  • Main sources include Facebook, Twitter (500,000 tweets/minute), Instagram (347,000 posts/minute), and Internet of Things (IoT) devices (75 million connected devices generating data, including sensors)

Storage of Generated Data

  • Less than 20% of global data is stored in relational databases (important for handling banks, hospitals, and customer data).
  • 80% of global data is unstructured (text, images, video) and stored in big data architectures (cloud and NoSQL databases)

Big Data Storage (HDFS)

  • Hadoop Distributed File System (HDFS): A storage system designed for large volumes of data across multiple servers
  • Data is divided into small blocks (typically 128 MB or 256 MB) and spread across various nodes (servers)
  • Redundancy (copies) of data ensures resilience against node failure
  • Suitable for unstructured or semi-structured data

Data Lakes

  • Centralized repositories storing various data formats (structured, semi-structured, and unstructured) as raw data, with no transformation
  • Ideal for long-term analysis when the analysis type isn't known beforehand

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

Big Data Overview PDF

More Like This

Use Quizgecko on...
Browser
Browser