Podcast
Questions and Answers
What is the estimated global volume of data generated by 2025?
What is the estimated global volume of data generated by 2025?
Which of the following statements about the 5 Vs of Big Data is incorrect?
Which of the following statements about the 5 Vs of Big Data is incorrect?
Which platform generates approximately 500,000 tweets per minute?
Which platform generates approximately 500,000 tweets per minute?
What percentage of global data is typically stored in relational databases?
What percentage of global data is typically stored in relational databases?
Signup and view all the answers
What is the primary characteristic of data lakes?
What is the primary characteristic of data lakes?
Signup and view all the answers
What technology is specifically designed to store large volumes of data across multiple servers?
What technology is specifically designed to store large volumes of data across multiple servers?
Signup and view all the answers
What main challenge does 80% of global unstructured data present?
What main challenge does 80% of global unstructured data present?
Signup and view all the answers
Which of the following describes veracity in the context of Big Data?
Which of the following describes veracity in the context of Big Data?
Signup and view all the answers
Study Notes
Big Data
- Data is essential for decision-making in all aspects of business
- Kathleen Hogan, Microsoft Chief People Officer, highlights its importance
Global Volume of Data
- In 2025, the world is projected to generate 175 zettabytes (ZB) of data (1 ZB = 1 billion gigabytes)
- In 2010, the data volume was significantly lower
- Daily internet users generate about 2.5 million gigabytes of data
- 90% of today's data has been created in the last two years
The 5 Vs of Big Data
- Velocity: Batch, near real-time, real-time, and streaming data
- Variety: Structured, unstructured, and semi-structured data
- Volume: Terabytes, records, transactions, and substantial amounts of data
- Veracity: Trustworthiness, authenticity, origin, reputation, and accountability
- Value: Statistical data, events, correlations, and potential insights
Sources of Data
- Main sources include Facebook, Twitter (500,000 tweets/minute), Instagram (347,000 posts/minute), and Internet of Things (IoT) devices (75 million connected devices generating data, including sensors)
Storage of Generated Data
- Less than 20% of global data is stored in relational databases (important for handling banks, hospitals, and customer data).
- 80% of global data is unstructured (text, images, video) and stored in big data architectures (cloud and NoSQL databases)
Big Data Storage (HDFS)
- Hadoop Distributed File System (HDFS): A storage system designed for large volumes of data across multiple servers
- Data is divided into small blocks (typically 128 MB or 256 MB) and spread across various nodes (servers)
- Redundancy (copies) of data ensures resilience against node failure
- Suitable for unstructured or semi-structured data
Data Lakes
- Centralized repositories storing various data formats (structured, semi-structured, and unstructured) as raw data, with no transformation
- Ideal for long-term analysis when the analysis type isn't known beforehand
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Related Documents
Description
Explore the vast world of big data and its significance in business decision-making. Learn about the projected global data volume, the 5 Vs of big data, and key sources generating massive amounts of information in today's digital landscape.