Podcast
Questions and Answers
What is the approximate size of data projected to be created globally by 2025?
What is the approximate size of data projected to be created globally by 2025?
How many bytes are there in one zettabyte?
How many bytes are there in one zettabyte?
Which of the following data sizes comes immediately before a zettabyte in the data size hierarchy?
Which of the following data sizes comes immediately before a zettabyte in the data size hierarchy?
If 40 zettabytes are equated to the total number of grains of sand on Earth multiplied by 75, what does this suggest about zettabytes?
If 40 zettabytes are equated to the total number of grains of sand on Earth multiplied by 75, what does this suggest about zettabytes?
Signup and view all the answers
In 2024, what is the global internet penetration rate?
In 2024, what is the global internet penetration rate?
Signup and view all the answers
What is a key trend influencing the growth of Big Data?
What is a key trend influencing the growth of Big Data?
Signup and view all the answers
Which of the following is an example of Big Data?
Which of the following is an example of Big Data?
Signup and view all the answers
What plays a crucial role in analyzing Big Data effectively?
What plays a crucial role in analyzing Big Data effectively?
Signup and view all the answers
Which statement best reflects the nature of Big Data?
Which statement best reflects the nature of Big Data?
Signup and view all the answers
Which of the following is NOT a type of Big Data?
Which of the following is NOT a type of Big Data?
Signup and view all the answers
What percentage of activities are expected to be cloud-based due to Big Data and IoT?
What percentage of activities are expected to be cloud-based due to Big Data and IoT?
Signup and view all the answers
What is a significant source of Big Data in modern society?
What is a significant source of Big Data in modern society?
Signup and view all the answers
Which of the following refers to the increasing amount of data generated globally over time?
Which of the following refers to the increasing amount of data generated globally over time?
Signup and view all the answers
What is a defining characteristic of big data?
What is a defining characteristic of big data?
Signup and view all the answers
What does 'velocity' in the context of big data refer to?
What does 'velocity' in the context of big data refer to?
Signup and view all the answers
Which of the following describes 'volume' in big data?
Which of the following describes 'volume' in big data?
Signup and view all the answers
An example of a high-velocity data source is:
An example of a high-velocity data source is:
Signup and view all the answers
Which statement about the volume aspect of big data is true?
Which statement about the volume aspect of big data is true?
Signup and view all the answers
Why is traditional database technology often insufficient for managing big data?
Why is traditional database technology often insufficient for managing big data?
Signup and view all the answers
What term describes collections of datasets that are too large for traditional data processing tools?
What term describes collections of datasets that are too large for traditional data processing tools?
Signup and view all the answers
What does the term 'Variety' in big data refer to?
What does the term 'Variety' in big data refer to?
Signup and view all the answers
Which of the following is an example of structured data?
Which of the following is an example of structured data?
Signup and view all the answers
Which scenario exemplifies a challenge posed by big data?
Which scenario exemplifies a challenge posed by big data?
Signup and view all the answers
Why is cleansing data important in big data applications?
Why is cleansing data important in big data applications?
Signup and view all the answers
What does the 'Value' in big data signify?
What does the 'Value' in big data signify?
Signup and view all the answers
Which statement best describes unstructured data?
Which statement best describes unstructured data?
Signup and view all the answers
What is the implication of having inaccurate data in data-driven applications?
What is the implication of having inaccurate data in data-driven applications?
Signup and view all the answers
How does big data relate to mobile phone usage?
How does big data relate to mobile phone usage?
Signup and view all the answers
What is a key characteristic of semi-structured data?
What is a key characteristic of semi-structured data?
Signup and view all the answers
What should be chosen if results are required to be updated every few seconds?
What should be chosen if results are required to be updated every few seconds?
Signup and view all the answers
Which analysis type would be most appropriate for discovering patterns in data?
Which analysis type would be most appropriate for discovering patterns in data?
Signup and view all the answers
Which method could be a good choice for batch analytics when performing basic statistics?
Which method could be a good choice for batch analytics when performing basic statistics?
Signup and view all the answers
What type of visualization is best for displaying results that update regularly?
What type of visualization is best for displaying results that update regularly?
Signup and view all the answers
If a user wants to actively engage with the application for input on results, which visualization is required?
If a user wants to actively engage with the application for input on results, which visualization is required?
Signup and view all the answers
Which analytics mode is suitable for applications that only need results generated on a daily or monthly basis?
Which analytics mode is suitable for applications that only need results generated on a daily or monthly basis?
Signup and view all the answers
Which analysis type would use techniques to categorize data into distinct classes?
Which analysis type would use techniques to categorize data into distinct classes?
Signup and view all the answers
If an application needs to process only data meeting specific criteria and exclude bad records, what is the technique employed?
If an application needs to process only data meeting specific criteria and exclude bad records, what is the technique employed?
Signup and view all the answers
What type of data is represented by user-generated content such as Facebook posts or tweets?
What type of data is represented by user-generated content such as Facebook posts or tweets?
Signup and view all the answers
Which process involves transforming data from one raw format to another?
Which process involves transforming data from one raw format to another?
Signup and view all the answers
What is the primary issue that data cleansing addresses?
What is the primary issue that data cleansing addresses?
Signup and view all the answers
Which type of data is generated every time a customer makes a purchase?
Which type of data is generated every time a customer makes a purchase?
Signup and view all the answers
What does normalization in data preparation aim to resolve?
What does normalization in data preparation aim to resolve?
Signup and view all the answers
Which of the following is an example of captured data?
Which of the following is an example of captured data?
Signup and view all the answers
What is the purpose of de-duplication in data preparation?
What is the purpose of de-duplication in data preparation?
Signup and view all the answers
Which of the following kinds of data is experimental in nature?
Which of the following kinds of data is experimental in nature?
Signup and view all the answers
Study Notes
GFQR 1026: Big Data in "X" - Lecture 1
- Big data is a collection of datasets whose volume, velocity, or variety is so large that traditional database and data processing tools struggle to manage it.
- The concept of big data gained momentum in the early 2000s when Doug Laney defined it as the three Vs.
- Volume: The amount and form of data (e.g., terabytes, records, transactions, tables, files).
- Velocity: The speed at which data is generated and analyzed (e.g., near time, real time, streams, batches).
- Variety: Different forms of data (e.g., structured, semi-structured, unstructured, mixed).
- Organizations collect data from various sources like business transactions, social media, and machine-to-machine data.
- Big data is often massive-scale data difficult to store, manage, and process with traditional databases.
- There's no fixed threshold for data volume to be considered big data.
- Data generated at high velocity contributes to large volumes of accumulated data in short periods.
- Real-time data analysis is essential in some applications (like fraud detection).
- Big data systems need flexibility to handle different data types (structured, unstructured, semi-structured).
- Structured data is data located in fixed fields within records or files (e.g., sales, financial, student data).
- Unstructured and semi-structured data is hard to organize into rows/columns (e.g., photos, videos, websites, emails, PDFs, social media posts, presentations).
- Gartner estimates ~20% of enterprise data is structured and ~80% is unstructured.
- Veracity/Validity: Refers to the accuracy and meaningfulness of data. Data cleansing is crucial to filter out incorrect and faulty data.
- Value: The usefulness of data for the intended purpose. The goal of big data analytics is to extract value from data.
- Data is now mined from activities, conversations, photos/videos, sensors, and the Internet of Things.
Global Trends in Big Data
- Daily data generation from mobile phones is massive (texts, emails, photos, social media interactions)
- Number of connected devices (IoT) is growing rapidly (reaching 14.4 billion devices by 2022, exceeding 9.7 billion in 2020).
- These trends indicate a global increase in the scale and volume of generated data
Types of Big Data ( examples)
- Facebook generates over 30 petabytes of data daily.
- Over 230 million tweets are created every day.
- Youtube users upload 48 hours of new videos every minute.
- 294 billion emails are sent per day.
- IoT devices generate large volumes of data (600 ZB per year in 2020)
- Large companies like Google, eBay, Facebook, Microsoft, Alibaba Group, Amazon, Twitter, YouTube, and Yahoo! are big data generators.
Analytic Flow for Big Data (steps)
- Data Collection (various sources, structured and unstructured)
- Data Preparation (cleaning, wrangling, de-duplication, normalization, sampling)
- Analysis Types (e.g., basic stats, regression, recommendation, dimensionality reduction, graph analytics, classification, time series analysis, text analysis, pattern mining)
- Analytics Modes (Batch, real-time, interactive)
- Visualizations (static, dynamic interactive)
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Related Documents
Description
Test your knowledge on the current state and projected trends of Big Data. This quiz covers essential statistics, definitions, and examples relevant to the world of Big Data and internet data growth. Challenge yourself to understand the magnitude and implications of data in our digital age.