Podcast
Questions and Answers
What is the approximate size of data projected to be created globally by 2025?
What is the approximate size of data projected to be created globally by 2025?
- 200 zettabytes
- 175 zettabytes (correct)
- 100 zettabytes
- 50 zettabytes
How many bytes are there in one zettabyte?
How many bytes are there in one zettabyte?
- 1 trillion bytes
- 1 sextillion bytes (correct)
- 1 quintillion bytes
- 1 septillion bytes
Which of the following data sizes comes immediately before a zettabyte in the data size hierarchy?
Which of the following data sizes comes immediately before a zettabyte in the data size hierarchy?
- Exabyte (correct)
- Petabyte
- Gigabyte
- Terabyte
If 40 zettabytes are equated to the total number of grains of sand on Earth multiplied by 75, what does this suggest about zettabytes?
If 40 zettabytes are equated to the total number of grains of sand on Earth multiplied by 75, what does this suggest about zettabytes?
In 2024, what is the global internet penetration rate?
In 2024, what is the global internet penetration rate?
What is a key trend influencing the growth of Big Data?
What is a key trend influencing the growth of Big Data?
Which of the following is an example of Big Data?
Which of the following is an example of Big Data?
What plays a crucial role in analyzing Big Data effectively?
What plays a crucial role in analyzing Big Data effectively?
Which statement best reflects the nature of Big Data?
Which statement best reflects the nature of Big Data?
Which of the following is NOT a type of Big Data?
Which of the following is NOT a type of Big Data?
What percentage of activities are expected to be cloud-based due to Big Data and IoT?
What percentage of activities are expected to be cloud-based due to Big Data and IoT?
What is a significant source of Big Data in modern society?
What is a significant source of Big Data in modern society?
Which of the following refers to the increasing amount of data generated globally over time?
Which of the following refers to the increasing amount of data generated globally over time?
What is a defining characteristic of big data?
What is a defining characteristic of big data?
What does 'velocity' in the context of big data refer to?
What does 'velocity' in the context of big data refer to?
Which of the following describes 'volume' in big data?
Which of the following describes 'volume' in big data?
An example of a high-velocity data source is:
An example of a high-velocity data source is:
Which statement about the volume aspect of big data is true?
Which statement about the volume aspect of big data is true?
Why is traditional database technology often insufficient for managing big data?
Why is traditional database technology often insufficient for managing big data?
What term describes collections of datasets that are too large for traditional data processing tools?
What term describes collections of datasets that are too large for traditional data processing tools?
What does the term 'Variety' in big data refer to?
What does the term 'Variety' in big data refer to?
Which of the following is an example of structured data?
Which of the following is an example of structured data?
Which scenario exemplifies a challenge posed by big data?
Which scenario exemplifies a challenge posed by big data?
Why is cleansing data important in big data applications?
Why is cleansing data important in big data applications?
What does the 'Value' in big data signify?
What does the 'Value' in big data signify?
Which statement best describes unstructured data?
Which statement best describes unstructured data?
What is the implication of having inaccurate data in data-driven applications?
What is the implication of having inaccurate data in data-driven applications?
How does big data relate to mobile phone usage?
How does big data relate to mobile phone usage?
What is a key characteristic of semi-structured data?
What is a key characteristic of semi-structured data?
What should be chosen if results are required to be updated every few seconds?
What should be chosen if results are required to be updated every few seconds?
Which analysis type would be most appropriate for discovering patterns in data?
Which analysis type would be most appropriate for discovering patterns in data?
Which method could be a good choice for batch analytics when performing basic statistics?
Which method could be a good choice for batch analytics when performing basic statistics?
What type of visualization is best for displaying results that update regularly?
What type of visualization is best for displaying results that update regularly?
If a user wants to actively engage with the application for input on results, which visualization is required?
If a user wants to actively engage with the application for input on results, which visualization is required?
Which analytics mode is suitable for applications that only need results generated on a daily or monthly basis?
Which analytics mode is suitable for applications that only need results generated on a daily or monthly basis?
Which analysis type would use techniques to categorize data into distinct classes?
Which analysis type would use techniques to categorize data into distinct classes?
If an application needs to process only data meeting specific criteria and exclude bad records, what is the technique employed?
If an application needs to process only data meeting specific criteria and exclude bad records, what is the technique employed?
What type of data is represented by user-generated content such as Facebook posts or tweets?
What type of data is represented by user-generated content such as Facebook posts or tweets?
Which process involves transforming data from one raw format to another?
Which process involves transforming data from one raw format to another?
What is the primary issue that data cleansing addresses?
What is the primary issue that data cleansing addresses?
Which type of data is generated every time a customer makes a purchase?
Which type of data is generated every time a customer makes a purchase?
What does normalization in data preparation aim to resolve?
What does normalization in data preparation aim to resolve?
Which of the following is an example of captured data?
Which of the following is an example of captured data?
What is the purpose of de-duplication in data preparation?
What is the purpose of de-duplication in data preparation?
Which of the following kinds of data is experimental in nature?
Which of the following kinds of data is experimental in nature?
Flashcards
Big Data
Big Data
Large datasets with volume, velocity, or variety too significant for traditional databases to manage and process.
Volume (Big Data)
Volume (Big Data)
The sheer size of big data, often from various sources, making storage and processing complex.
Velocity (Big Data)
Velocity (Big Data)
The speed at which the data is generated, especially in high-volume sources. Critical for applications that need quick analysis.
Variety (Big Data)
Variety (Big Data)
Signup and view all the flashcards
Traditional Databases
Traditional Databases
Signup and view all the flashcards
Analytic Flow for Big Data
Analytic Flow for Big Data
Signup and view all the flashcards
Data Sources (Big Data)
Data Sources (Big Data)
Signup and view all the flashcards
Three Vs
Three Vs
Signup and view all the flashcards
Big Data Variety
Big Data Variety
Signup and view all the flashcards
Structured Data
Structured Data
Signup and view all the flashcards
Unstructured Data
Unstructured Data
Signup and view all the flashcards
Veracity/Validity
Veracity/Validity
Signup and view all the flashcards
Data Cleansing
Data Cleansing
Signup and view all the flashcards
Data Value
Data Value
Signup and view all the flashcards
Real-time Analysis
Real-time Analysis
Signup and view all the flashcards
Big Data Sources
Big Data Sources
Signup and view all the flashcards
Big Data
Big Data
Signup and view all the flashcards
Global Trends of Big Data
Global Trends of Big Data
Signup and view all the flashcards
Types of Big Data
Types of Big Data
Signup and view all the flashcards
Analytic Flow for Big Data
Analytic Flow for Big Data
Signup and view all the flashcards
Internet of Things (IoT)
Internet of Things (IoT)
Signup and view all the flashcards
Semantic Search
Semantic Search
Signup and view all the flashcards
Data Volume
Data Volume
Signup and view all the flashcards
Data Variety
Data Variety
Signup and view all the flashcards
Zettabyte (ZB)
Zettabyte (ZB)
Signup and view all the flashcards
Data Growth Rate
Data Growth Rate
Signup and view all the flashcards
Data in 2025
Data in 2025
Signup and view all the flashcards
Internet Penetration
Internet Penetration
Signup and view all the flashcards
Data Sizes Progression
Data Sizes Progression
Signup and view all the flashcards
Transaction Data
Transaction Data
Signup and view all the flashcards
Compiled Data
Compiled Data
Signup and view all the flashcards
Experimental Data
Experimental Data
Signup and view all the flashcards
Captured Data
Captured Data
Signup and view all the flashcards
User-Generated Data
User-Generated Data
Signup and view all the flashcards
Data Cleansing
Data Cleansing
Signup and view all the flashcards
Data Wrangling
Data Wrangling
Signup and view all the flashcards
Data Deduplication
Data Deduplication
Signup and view all the flashcards
Weather Data Scales
Weather Data Scales
Signup and view all the flashcards
Sampling and Filtering
Sampling and Filtering
Signup and view all the flashcards
Analysis Types
Analysis Types
Signup and view all the flashcards
Batch Analytics
Batch Analytics
Signup and view all the flashcards
Real-time Analytics
Real-time Analytics
Signup and view all the flashcards
Interactive Analytics
Interactive Analytics
Signup and view all the flashcards
Data Processing Methods
Data Processing Methods
Signup and view all the flashcards
Visualizations (Static, Dynamic, Interactive)
Visualizations (Static, Dynamic, Interactive)
Signup and view all the flashcards
Study Notes
GFQR 1026: Big Data in "X" - Lecture 1
- Big data is a collection of datasets whose volume, velocity, or variety is so large that traditional database and data processing tools struggle to manage it.
- The concept of big data gained momentum in the early 2000s when Doug Laney defined it as the three Vs.
- Volume: The amount and form of data (e.g., terabytes, records, transactions, tables, files).
- Velocity: The speed at which data is generated and analyzed (e.g., near time, real time, streams, batches).
- Variety: Different forms of data (e.g., structured, semi-structured, unstructured, mixed).
- Organizations collect data from various sources like business transactions, social media, and machine-to-machine data.
- Big data is often massive-scale data difficult to store, manage, and process with traditional databases.
- There's no fixed threshold for data volume to be considered big data.
- Data generated at high velocity contributes to large volumes of accumulated data in short periods.
- Real-time data analysis is essential in some applications (like fraud detection).
- Big data systems need flexibility to handle different data types (structured, unstructured, semi-structured).
- Structured data is data located in fixed fields within records or files (e.g., sales, financial, student data).
- Unstructured and semi-structured data is hard to organize into rows/columns (e.g., photos, videos, websites, emails, PDFs, social media posts, presentations).
- Gartner estimates ~20% of enterprise data is structured and ~80% is unstructured.
- Veracity/Validity: Refers to the accuracy and meaningfulness of data. Data cleansing is crucial to filter out incorrect and faulty data.
- Value: The usefulness of data for the intended purpose. The goal of big data analytics is to extract value from data.
- Data is now mined from activities, conversations, photos/videos, sensors, and the Internet of Things.
Global Trends in Big Data
- Daily data generation from mobile phones is massive (texts, emails, photos, social media interactions)
- Number of connected devices (IoT) is growing rapidly (reaching 14.4 billion devices by 2022, exceeding 9.7 billion in 2020).
- These trends indicate a global increase in the scale and volume of generated data
Types of Big Data ( examples)
- Facebook generates over 30 petabytes of data daily.
- Over 230 million tweets are created every day.
- Youtube users upload 48 hours of new videos every minute.
- 294 billion emails are sent per day.
- IoT devices generate large volumes of data (600 ZB per year in 2020)
- Large companies like Google, eBay, Facebook, Microsoft, Alibaba Group, Amazon, Twitter, YouTube, and Yahoo! are big data generators.
Analytic Flow for Big Data (steps)
- Data Collection (various sources, structured and unstructured)
- Data Preparation (cleaning, wrangling, de-duplication, normalization, sampling)
- Analysis Types (e.g., basic stats, regression, recommendation, dimensionality reduction, graph analytics, classification, time series analysis, text analysis, pattern mining)
- Analytics Modes (Batch, real-time, interactive)
- Visualizations (static, dynamic interactive)
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.