Podcast
Questions and Answers
What is the estimated global volume of data generated by 2025?
What is the estimated global volume of data generated by 2025?
- 250 zettabytes
- 200 zettabytes
- 175 zettabytes (correct)
- 150 zettabytes
Which of the following best describes the data type that accounts for 80% of global data?
Which of the following best describes the data type that accounts for 80% of global data?
- Unstructured data (correct)
- Hierarchical data
- Transactional data
- Structured data
How is data typically divided and distributed in Hadoop Distributed File System (HDFS)?
How is data typically divided and distributed in Hadoop Distributed File System (HDFS)?
- Into small blocks of 128 MB or 256 MB (correct)
- In whole files of 1 GB
- In random unspecified sizes
- Into chunks of 64 MB
Which of the following is NOT one of the 5 Vs of Big Data?
Which of the following is NOT one of the 5 Vs of Big Data?
What percentage of global data is stored in relational databases?
What percentage of global data is stored in relational databases?
What defines a data lake in the context of data storage?
What defines a data lake in the context of data storage?
Which source generates approximately 500,000 units of data per minute?
Which source generates approximately 500,000 units of data per minute?
What is one of the main characteristics of Veracity in the 5 Vs of Big Data?
What is one of the main characteristics of Veracity in the 5 Vs of Big Data?
Flashcards
Global Data Volume
Global Data Volume
The sheer volume of data generated daily, measured in zettabytes (ZB). A zettabyte is one billion gigabytes. In 2025, it's estimated that the world will generate 175 ZB of data.
Velocity
Velocity
The speed at which data is generated and processed. Big Data can be classified into categories: batch, near-time, real-time, and streams, each with different processing speeds.
Variety
Variety
The variety of data types, including structured, unstructured, and semi-structured data. Structured data fits into rows and columns (like a database), unstructured data is in formats like images or text, and semi-structured data has some organization but isn't fully tabular.
Volume
Volume
Signup and view all the flashcards
Veracity
Veracity
Signup and view all the flashcards
Value
Value
Signup and view all the flashcards
Data Lake
Data Lake
Signup and view all the flashcards
HDFS (Hadoop Distributed File System)
HDFS (Hadoop Distributed File System)
Signup and view all the flashcards
Study Notes
Big Data
- Data is crucial for decision-making in all business aspects.
- In 2025, the world will produce 175 zettabytes of data (1 ZB = 1 billion gigabytes).
- In 2010, data production was significantly lower.
- 2.5 million GB of data is created daily by internet users.
- Recent years have seen 90% of global data creation.
The 5 Vs of Big Data
- Velocity: Batch, near real-time, real-time, and streaming data
- Variety: Structured, unstructured, and semi-structured data, including different formats (e.g. tables, files, transactions)
- Volume: Gigabytes, terabytes, and petabytes of data
- Veracity: Trustworthiness, authenticity, origin, reputation, accountability in the data
- Value: Statistical analysis, event identification, correlations, and potential insights from data
Sources of Data
- Facebook: 500,000 tweets per minute
- Twitter: 500,000 posts per minute
- Instagram: 347,222 posts per minute
- Internet of Things (IoT): 75 million connected devices generating data and sensor readings
Storage of Data
- Less than 20% of global data is stored in relational databases (important for businesses like banks, hospitals, and customers).
- 80% of data is unstructured (text, images, videos) and is stored in big data architectures and NoSQL databases.
Big Data Storage Technologies (Hadoop Distributed File System - HDFS)
- Divides data into small blocks (128 MB or 256 MB) and distributes them across multiple servers.
- Provides redundancy (copies of data) to ensure data safety.
- Ideal for storing large amounts of unstructured or semi-structured data.
Data Lakes
- Centralized repository storing all types of data (structured, semi-structured, and unstructured) in raw format.
- Used when needing long-term storage and analysis of diverse, raw data.
- Ideal when the specific analysis type is unknown.
NoSQL Databases
- Designed for flexibility, high speed and handling unstructured data.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Related Documents
Description
Explore the transformative nature of big data and its crucial role in decision-making across various business functions. Learn about the 5 Vs of Big Data: Velocity, Variety, Volume, Veracity, and Value, and understand the exponential growth of data generated from different sources. Test your knowledge on how big data influences modern analytics.