Podcast
Questions and Answers
What is the estimated volume of data the world will generate by 2025?
What is the estimated volume of data the world will generate by 2025?
Which of the following is NOT one of the 5 Vs of big data?
Which of the following is NOT one of the 5 Vs of big data?
What percentage of global data is typically structured and stored in relational databases?
What percentage of global data is typically structured and stored in relational databases?
What type of storage is HDFS known for?
What type of storage is HDFS known for?
Signup and view all the answers
Which statement regarding datalakes is accurate?
Which statement regarding datalakes is accurate?
Signup and view all the answers
Which of the following is a primary source of big data?
Which of the following is a primary source of big data?
Signup and view all the answers
How many tweets are generated per minute on Twitter?
How many tweets are generated per minute on Twitter?
Signup and view all the answers
What is the primary purpose of big data architectures?
What is the primary purpose of big data architectures?
Signup and view all the answers
Study Notes
Big Data
- Data is crucial for decision-making in all business aspects.
- 2025 global data estimates are 175 zettabytes (ZB) 1ZB = 1 Billion Gigabytes.
- 2010 global data was significantly less (2 ZB).
- Daily internet data generation is approximately 2.5 million GB.
- 90% of current data was generated in the last two years.
The 5 Vs of Big Data
- Velocity: Batch, near real-time, real-time, streams.
- Variety: Structured, unstructured, semi-structured data.
- Volume: Terabytes, records, transactions, tables.
- Veracity: Trustworthiness, authenticity, origin.
- Value: Statistical, events, correlations, hypothetical.
Data Sources
- Facebook: 500,000 tweets per minute.
- Twitter: 500,000 tweets per minute.
- Instagram: 347,222 posts per minute.
- Internet of Things (IoT): 75 million connected devices generate data (sensors).
Data Storage
- Less than 20% of global data is stored in relational databases (e.g., banks, hospitals, customer data.)
- 80% of data is unstructured (text, images, video).
- This data is stored in big data architectures, cloud platforms and NoSQL databases.
Big Data Storage Technologies
-
Hadoop Distributed File System (HDFS): Divides data into small blocks (e.g., 128 MB or 256 MB) and distributes across multiple servers for redundancy. Ideal for large unstructured/semi-structured data.
-
Data Lakes: Centralized repositories for all data types (structured, semi-structured, unstructured) stored as raw data for long-term analysis.
-
NoSQL Databases: Flexible, fast storage for unstructured data like logs, social media, IoT.
-
Relational Databases (SQL): High consistency, best for well-structured data, transactions.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Related Documents
Description
Explore the fundamental concepts of big data, including its significance in decision-making and the astounding volume of data generated today. Learn about the 5 Vs of big data—velocity, variety, volume, veracity, and value—and discover the various data sources driving this phenomenon.