Podcast
Questions and Answers
What is the projected global data generation volume by 2025?
What is the projected global data generation volume by 2025?
Which of the following best describes the concept of Veracity in Big Data?
Which of the following best describes the concept of Veracity in Big Data?
Which percentage of global data is typically stored in Relational Databases?
Which percentage of global data is typically stored in Relational Databases?
What is the primary purpose of HDFS (Hadoop Distributed File System)?
What is the primary purpose of HDFS (Hadoop Distributed File System)?
Signup and view all the answers
What type of data does a Data Lake primarily store?
What type of data does a Data Lake primarily store?
Signup and view all the answers
Which of the following is a characteristic of the data generated by IoT devices?
Which of the following is a characteristic of the data generated by IoT devices?
Signup and view all the answers
What does the term Variety in Big Data refer to?
What does the term Variety in Big Data refer to?
Signup and view all the answers
Which of the following platforms is NOT typically associated with Big Data generation?
Which of the following platforms is NOT typically associated with Big Data generation?
Signup and view all the answers
Flashcards
Big Data
Big Data
The massive amount of data generated and stored in the digital world, often characterized by its high volume, variety, velocity, veracity, and value.
Velocity (in Big Data)
Velocity (in Big Data)
The rate at which data is generated and processed. It can be categorized as batch, near real-time, real-time, and streams.
Variety (in Big Data)
Variety (in Big Data)
The diverse forms of data that are collected, ranging from structured (database tables) to unstructured (text, images, videos) and semi-structured (JSON, XML).
Volume (in Big Data)
Volume (in Big Data)
Signup and view all the flashcards
Veracity (in Big Data)
Veracity (in Big Data)
Signup and view all the flashcards
Value (in Big Data)
Value (in Big Data)
Signup and view all the flashcards
HDFS (Hadoop Distributed File System)
HDFS (Hadoop Distributed File System)
Signup and view all the flashcards
Data Lake
Data Lake
Signup and view all the flashcards
Study Notes
Big Data
- Data is crucial for decision-making in all aspects of business
- By 2025, the world is estimated to generate 175 zettabytes (ZB) of data (1 ZB = 1 billion gigabytes)
- In 2010, data generation was significantly lower (just 2 ZB)
- Daily, internet users generate roughly 2.5 million gigabytes of data
- The majority (90%) of current data was generated in the past two years
The 5 Vs of Big Data
- Velocity: Data streams in various formats (batch, near real-time, real-time, streams)
- Variety: Data comes in various structures (structured, unstructured, semi-structured, including all the above)
- Volume: Data amounts are massive (terabytes, records, transactions, tables, files)
- Veracity: Data reliability concerns trustworthiness, authenticity, origin, reputation, and accountability
- Value: Data's worth can be derived from statistical patterns, events, correlations, and hypothetical connections
Sources of Data
- Facebook: 500,000 tweets per minute
- Twitter: 347,222 posts per minute
- Internet of Things (IoT): 75 million connected devices generating data, including sensors
Storage of Generated Data
- Less than 20% of global data is stored in Relational Databases (important for banks, hospitals, customers)
- 80% is unstructured (text, images, video), stored in Big Data Architectures, in the cloud, and NoSQL databases
Big Data Storage (HDFS)
- Hadoop Distributed File System (HDFS) is designed for handling large data volumes across multiple servers
- Data is broken into smaller blocks (128 MB or 256 MB) and distributed across different nodes (servers)
- Data replication provides high redundancy to avoid data loss if a node fails
- Ideal for storing unstructured or semi-structured large amounts of data
Data Lakes
- Centralized repository for various data types (structured, semi-structured, unstructured)
- Data is stored in raw format as it is generated, without transformation
- Useful when dealing with large volumes for long-term analysis or when the analysis type is unknown
- Suitable when the analysis type is unknown or it needs to be used multiple times later on
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Related Documents
Description
Explore the significance of big data and its impact on decision-making in business. Understand the 5 Vs of big data: Velocity, Variety, Volume, Veracity, and Value. Discover how data sources contribute to the ever-growing data landscape.