Podcast
Questions and Answers
What is the estimated global volume of data generated by 2025?
What is the estimated global volume of data generated by 2025?
Which of the following best describes the data type that accounts for 80% of global data?
Which of the following best describes the data type that accounts for 80% of global data?
How is data typically divided and distributed in Hadoop Distributed File System (HDFS)?
How is data typically divided and distributed in Hadoop Distributed File System (HDFS)?
Which of the following is NOT one of the 5 Vs of Big Data?
Which of the following is NOT one of the 5 Vs of Big Data?
Signup and view all the answers
What percentage of global data is stored in relational databases?
What percentage of global data is stored in relational databases?
Signup and view all the answers
What defines a data lake in the context of data storage?
What defines a data lake in the context of data storage?
Signup and view all the answers
Which source generates approximately 500,000 units of data per minute?
Which source generates approximately 500,000 units of data per minute?
Signup and view all the answers
What is one of the main characteristics of Veracity in the 5 Vs of Big Data?
What is one of the main characteristics of Veracity in the 5 Vs of Big Data?
Signup and view all the answers
Study Notes
Big Data
- Data is crucial for decision-making in all business aspects.
- In 2025, the world will produce 175 zettabytes of data (1 ZB = 1 billion gigabytes).
- In 2010, data production was significantly lower.
- 2.5 million GB of data is created daily by internet users.
- Recent years have seen 90% of global data creation.
The 5 Vs of Big Data
- Velocity: Batch, near real-time, real-time, and streaming data
- Variety: Structured, unstructured, and semi-structured data, including different formats (e.g. tables, files, transactions)
- Volume: Gigabytes, terabytes, and petabytes of data
- Veracity: Trustworthiness, authenticity, origin, reputation, accountability in the data
- Value: Statistical analysis, event identification, correlations, and potential insights from data
Sources of Data
- Facebook: 500,000 tweets per minute
- Twitter: 500,000 posts per minute
- Instagram: 347,222 posts per minute
- Internet of Things (IoT): 75 million connected devices generating data and sensor readings
Storage of Data
- Less than 20% of global data is stored in relational databases (important for businesses like banks, hospitals, and customers).
- 80% of data is unstructured (text, images, videos) and is stored in big data architectures and NoSQL databases.
Big Data Storage Technologies (Hadoop Distributed File System - HDFS)
- Divides data into small blocks (128 MB or 256 MB) and distributes them across multiple servers.
- Provides redundancy (copies of data) to ensure data safety.
- Ideal for storing large amounts of unstructured or semi-structured data.
Data Lakes
- Centralized repository storing all types of data (structured, semi-structured, and unstructured) in raw format.
- Used when needing long-term storage and analysis of diverse, raw data.
- Ideal when the specific analysis type is unknown.
NoSQL Databases
- Designed for flexibility, high speed and handling unstructured data.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Related Documents
Description
Explore the transformative nature of big data and its crucial role in decision-making across various business functions. Learn about the 5 Vs of Big Data: Velocity, Variety, Volume, Veracity, and Value, and understand the exponential growth of data generated from different sources. Test your knowledge on how big data influences modern analytics.