Podcast
Questions and Answers
What is the projected global data generation volume by 2025?
What is the projected global data generation volume by 2025?
Which of the following best describes the concept of Veracity in Big Data?
Which of the following best describes the concept of Veracity in Big Data?
Which percentage of global data is typically stored in Relational Databases?
Which percentage of global data is typically stored in Relational Databases?
What is the primary purpose of HDFS (Hadoop Distributed File System)?
What is the primary purpose of HDFS (Hadoop Distributed File System)?
Signup and view all the answers
What type of data does a Data Lake primarily store?
What type of data does a Data Lake primarily store?
Signup and view all the answers
Which of the following is a characteristic of the data generated by IoT devices?
Which of the following is a characteristic of the data generated by IoT devices?
Signup and view all the answers
What does the term Variety in Big Data refer to?
What does the term Variety in Big Data refer to?
Signup and view all the answers
Which of the following platforms is NOT typically associated with Big Data generation?
Which of the following platforms is NOT typically associated with Big Data generation?
Signup and view all the answers
Study Notes
Big Data
- Data is crucial for decision-making in all aspects of business
- By 2025, the world is estimated to generate 175 zettabytes (ZB) of data (1 ZB = 1 billion gigabytes)
- In 2010, data generation was significantly lower (just 2 ZB)
- Daily, internet users generate roughly 2.5 million gigabytes of data
- The majority (90%) of current data was generated in the past two years
The 5 Vs of Big Data
- Velocity: Data streams in various formats (batch, near real-time, real-time, streams)
- Variety: Data comes in various structures (structured, unstructured, semi-structured, including all the above)
- Volume: Data amounts are massive (terabytes, records, transactions, tables, files)
- Veracity: Data reliability concerns trustworthiness, authenticity, origin, reputation, and accountability
- Value: Data's worth can be derived from statistical patterns, events, correlations, and hypothetical connections
Sources of Data
- Facebook: 500,000 tweets per minute
- Twitter: 347,222 posts per minute
- Internet of Things (IoT): 75 million connected devices generating data, including sensors
Storage of Generated Data
- Less than 20% of global data is stored in Relational Databases (important for banks, hospitals, customers)
- 80% is unstructured (text, images, video), stored in Big Data Architectures, in the cloud, and NoSQL databases
Big Data Storage (HDFS)
- Hadoop Distributed File System (HDFS) is designed for handling large data volumes across multiple servers
- Data is broken into smaller blocks (128 MB or 256 MB) and distributed across different nodes (servers)
- Data replication provides high redundancy to avoid data loss if a node fails
- Ideal for storing unstructured or semi-structured large amounts of data
Data Lakes
- Centralized repository for various data types (structured, semi-structured, unstructured)
- Data is stored in raw format as it is generated, without transformation
- Useful when dealing with large volumes for long-term analysis or when the analysis type is unknown
- Suitable when the analysis type is unknown or it needs to be used multiple times later on
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Related Documents
Description
Explore the significance of big data and its impact on decision-making in business. Understand the 5 Vs of big data: Velocity, Variety, Volume, Veracity, and Value. Discover how data sources contribute to the ever-growing data landscape.