Big Data Trends and Measurements

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is the approximate size of data projected to be created globally by 2025?

  • 200 zettabytes
  • 175 zettabytes (correct)
  • 100 zettabytes
  • 50 zettabytes

How many bytes are there in one zettabyte?

  • 1 trillion bytes
  • 1 sextillion bytes (correct)
  • 1 quintillion bytes
  • 1 septillion bytes

Which of the following data sizes comes immediately before a zettabyte in the data size hierarchy?

  • Exabyte (correct)
  • Petabyte
  • Gigabyte
  • Terabyte

If 40 zettabytes are equated to the total number of grains of sand on Earth multiplied by 75, what does this suggest about zettabytes?

<p>Zettabytes represent a vast amount of data. (A)</p> Signup and view all the answers

In 2024, what is the global internet penetration rate?

<p>66.2% (C)</p> Signup and view all the answers

What is a key trend influencing the growth of Big Data?

<p>Rise of Internet of Things (IoT) (A)</p> Signup and view all the answers

Which of the following is an example of Big Data?

<p>Data generated by millions of IoT devices (B)</p> Signup and view all the answers

What plays a crucial role in analyzing Big Data effectively?

<p>Analytic flow processes (A)</p> Signup and view all the answers

Which statement best reflects the nature of Big Data?

<p>Big Data involves a variety of data formats and sources. (C)</p> Signup and view all the answers

Which of the following is NOT a type of Big Data?

<p>Fixed-data (C)</p> Signup and view all the answers

What percentage of activities are expected to be cloud-based due to Big Data and IoT?

<p>92% (B)</p> Signup and view all the answers

What is a significant source of Big Data in modern society?

<p>Social media interactions (A)</p> Signup and view all the answers

Which of the following refers to the increasing amount of data generated globally over time?

<p>Data accumulation (B)</p> Signup and view all the answers

What is a defining characteristic of big data?

<p>Data volume, velocity, or variety must be large. (A)</p> Signup and view all the answers

What does 'velocity' in the context of big data refer to?

<p>The speed at which data is generated. (D)</p> Signup and view all the answers

Which of the following describes 'volume' in big data?

<p>The significant amount of data accumulated. (D)</p> Signup and view all the answers

An example of a high-velocity data source is:

<p>Social media feeds. (C)</p> Signup and view all the answers

Which statement about the volume aspect of big data is true?

<p>Volume refers to the large amount of diverse datasets. (C)</p> Signup and view all the answers

Why is traditional database technology often insufficient for managing big data?

<p>They cannot handle large volumes, high velocities, and various types of data effectively. (A)</p> Signup and view all the answers

What term describes collections of datasets that are too large for traditional data processing tools?

<p>Big data. (D)</p> Signup and view all the answers

What does the term 'Variety' in big data refer to?

<p>The types of data and their forms (C)</p> Signup and view all the answers

Which of the following is an example of structured data?

<p>Financial records (D)</p> Signup and view all the answers

Which scenario exemplifies a challenge posed by big data?

<p>A website receiving millions of user interactions every day. (A)</p> Signup and view all the answers

Why is cleansing data important in big data applications?

<p>To filter out incorrect and faulty data (B)</p> Signup and view all the answers

What does the 'Value' in big data signify?

<p>The usefulness of data for its intended purpose (C)</p> Signup and view all the answers

Which statement best describes unstructured data?

<p>It cannot be easily organized or analyzed traditionally. (D)</p> Signup and view all the answers

What is the implication of having inaccurate data in data-driven applications?

<p>It can lead to misleading outcomes from analysis. (C)</p> Signup and view all the answers

How does big data relate to mobile phone usage?

<p>Mobile usage generates digital interactions that contribute to big data. (A)</p> Signup and view all the answers

What is a key characteristic of semi-structured data?

<p>It exists within a defined field but allows some flexibility. (C)</p> Signup and view all the answers

What should be chosen if results are required to be updated every few seconds?

<p>Real-time analytics mode (A)</p> Signup and view all the answers

Which analysis type would be most appropriate for discovering patterns in data?

<p>Pattern Mining (A)</p> Signup and view all the answers

Which method could be a good choice for batch analytics when performing basic statistics?

<p>MapReduce (B)</p> Signup and view all the answers

What type of visualization is best for displaying results that update regularly?

<p>Dynamic visualization (D)</p> Signup and view all the answers

If a user wants to actively engage with the application for input on results, which visualization is required?

<p>Interactive visualization (D)</p> Signup and view all the answers

Which analytics mode is suitable for applications that only need results generated on a daily or monthly basis?

<p>Batch mode (C)</p> Signup and view all the answers

Which analysis type would use techniques to categorize data into distinct classes?

<p>Classification (C)</p> Signup and view all the answers

If an application needs to process only data meeting specific criteria and exclude bad records, what is the technique employed?

<p>Sampling and Filtering (B)</p> Signup and view all the answers

What type of data is represented by user-generated content such as Facebook posts or tweets?

<p>Unstructured data (B)</p> Signup and view all the answers

Which process involves transforming data from one raw format to another?

<p>Data wrangling (C)</p> Signup and view all the answers

What is the primary issue that data cleansing addresses?

<p>Corrupt records and missing values (C)</p> Signup and view all the answers

Which type of data is generated every time a customer makes a purchase?

<p>Transactional data (B)</p> Signup and view all the answers

What does normalization in data preparation aim to resolve?

<p>Inconsistent units or scales (C)</p> Signup and view all the answers

Which of the following is an example of captured data?

<p>Google searches (C)</p> Signup and view all the answers

What is the purpose of de-duplication in data preparation?

<p>To create a single version of data (A)</p> Signup and view all the answers

Which of the following kinds of data is experimental in nature?

<p>Data gathered from focus groups (A)</p> Signup and view all the answers

Flashcards

Big Data

Large datasets with volume, velocity, or variety too significant for traditional databases to manage and process.

Volume (Big Data)

The sheer size of big data, often from various sources, making storage and processing complex.

Velocity (Big Data)

The speed at which the data is generated, especially in high-volume sources. Critical for applications that need quick analysis.

Variety (Big Data)

Different forms of data involved. From structured data (databases) to unstructured data (text, images).

Signup and view all the flashcards

Traditional Databases

Databases designed for structured data, not optimized for large, high-speed datasets typical in big data.

Signup and view all the flashcards

Analytic Flow for Big Data

The process of analyzing large datasets using specialized tools and techniques.

Signup and view all the flashcards

Data Sources (Big Data)

The various places where big data comes from, including business transactions, social media, and sensors.

Signup and view all the flashcards

Three Vs

The core characteristics of big data as volume, velocity, and variety.

Signup and view all the flashcards

Big Data Variety

Big data comes in different formats: structured, unstructured, and semi-structured data.

Signup and view all the flashcards

Structured Data

Data organized in rows and columns, like spreadsheets.

Signup and view all the flashcards

Unstructured Data

Data not organized in a predefined format, like images or social media posts.

Signup and view all the flashcards

Veracity/Validity

Accuracy of Big Data; Ensuring data is correct to avoid errors

Signup and view all the flashcards

Data Cleansing

Removing incorrect or faulty data to improve analysis accuracy

Signup and view all the flashcards

Data Value

Usefulness of data for a specific purpose; is data useful?

Signup and view all the flashcards

Real-time Analysis

Analyzing data immediately as it's generated

Signup and view all the flashcards

Big Data Sources

Various sources contributing to big data, including mobile phone activity, such as messages, photos, and social media.

Signup and view all the flashcards

Big Data

A large volume of data that is difficult to process using traditional data processing tools.

Signup and view all the flashcards

Global Trends of Big Data

The increasing amount and variety of data being generated worldwide.

Signup and view all the flashcards

Types of Big Data

Big data can include various formats, such as structured, semi-structured, and unstructured data.

Signup and view all the flashcards

Analytic Flow for Big Data

The process of collecting, storing, and analyzing big data to drive insights.

Signup and view all the flashcards

Internet of Things (IoT)

Network of physical objects embedded with sensors, software, and other technologies.

Signup and view all the flashcards

Semantic Search

Online search using the meaning of words.

Signup and view all the flashcards

Data Volume

The amount of data being generated globally.

Signup and view all the flashcards

Data Variety

Different types of data being generated (structured, semi-structured, and unstructured).

Signup and view all the flashcards

Zettabyte (ZB)

A unit of digital information equal to 10 to the power of 21 bytes, or 1 quintillion bytes.

Signup and view all the flashcards

Data Growth Rate

The rate at which the amount of data is increasing globally.

Signup and view all the flashcards

Data in 2025

Predicted to reach approximately 175 zettabytes of data in 2025.

Signup and view all the flashcards

Internet Penetration

The percentage of people globally who have access to the Internet.

Signup and view all the flashcards

Data Sizes Progression

The increasing scale of data that started with bytes, progressing through kilobytes, megabytes, gigabytes, terabytes, petabytes, exabytes, zettabytes, and yottabytes.

Signup and view all the flashcards

Transaction Data

Data generated from customer actions like online or in-store purchases.

Signup and view all the flashcards

Compiled Data

Data collected from multiple outside sources, like a credit report.

Signup and view all the flashcards

Experimental Data

Data from experiments, often combining created and transactional data, like using a focus group.

Signup and view all the flashcards

Captured Data

Data gathered from sources like Google searches or GPS.

Signup and view all the flashcards

User-Generated Data

Data created by users, like social media posts and videos.

Signup and view all the flashcards

Data Cleansing

Fixes problems like bad formatting, missing values, or corrupted records, in data.

Signup and view all the flashcards

Data Wrangling

Transforming raw data into consistent formats, addressing format differences across sources.

Signup and view all the flashcards

Data Deduplication

Removing duplicate data records to prevent errors and inefficiencies.

Signup and view all the flashcards

Weather Data Scales

Data measured in Celsius and Fahrenheit temperature units.

Signup and view all the flashcards

Sampling and Filtering

Selecting data based on criteria or eliminating incorrect data.

Signup and view all the flashcards

Analysis Types

Methods for analyzing data, including statistics, regression, and more.

Signup and view all the flashcards

Batch Analytics

Analyzing data over longer time intervals, like daily or monthly.

Signup and view all the flashcards

Real-time Analytics

Analyzing data as it's generated, often within seconds.

Signup and view all the flashcards

Interactive Analytics

Data analysis allowing user input and interactive query processing.

Signup and view all the flashcards

Data Processing Methods

Techniques used to process data for analysis, such as MapReduce and Stream Processing.

Signup and view all the flashcards

Visualizations (Static, Dynamic, Interactive)

Representations of analysis results; static displays data stored, dynamic updates data regularly, and Interactive lets users interact and explore.

Signup and view all the flashcards

Study Notes

GFQR 1026: Big Data in "X" - Lecture 1

  • Big data is a collection of datasets whose volume, velocity, or variety is so large that traditional database and data processing tools struggle to manage it.
  • The concept of big data gained momentum in the early 2000s when Doug Laney defined it as the three Vs.
  • Volume: The amount and form of data (e.g., terabytes, records, transactions, tables, files).
  • Velocity: The speed at which data is generated and analyzed (e.g., near time, real time, streams, batches).
  • Variety: Different forms of data (e.g., structured, semi-structured, unstructured, mixed).
  • Organizations collect data from various sources like business transactions, social media, and machine-to-machine data.
  • Big data is often massive-scale data difficult to store, manage, and process with traditional databases.
  • There's no fixed threshold for data volume to be considered big data.
  • Data generated at high velocity contributes to large volumes of accumulated data in short periods.
  • Real-time data analysis is essential in some applications (like fraud detection).
  • Big data systems need flexibility to handle different data types (structured, unstructured, semi-structured).
  • Structured data is data located in fixed fields within records or files (e.g., sales, financial, student data).
  • Unstructured and semi-structured data is hard to organize into rows/columns (e.g., photos, videos, websites, emails, PDFs, social media posts, presentations).
  • Gartner estimates ~20% of enterprise data is structured and ~80% is unstructured.
  • Veracity/Validity: Refers to the accuracy and meaningfulness of data. Data cleansing is crucial to filter out incorrect and faulty data.
  • Value: The usefulness of data for the intended purpose. The goal of big data analytics is to extract value from data.
  • Data is now mined from activities, conversations, photos/videos, sensors, and the Internet of Things.
  • Daily data generation from mobile phones is massive (texts, emails, photos, social media interactions)
  • Number of connected devices (IoT) is growing rapidly (reaching 14.4 billion devices by 2022, exceeding 9.7 billion in 2020).
  • These trends indicate a global increase in the scale and volume of generated data

Types of Big Data ( examples)

  • Facebook generates over 30 petabytes of data daily.
  • Over 230 million tweets are created every day.
  • Youtube users upload 48 hours of new videos every minute.
  • 294 billion emails are sent per day.
  • IoT devices generate large volumes of data (600 ZB per year in 2020)
  • Large companies like Google, eBay, Facebook, Microsoft, Alibaba Group, Amazon, Twitter, YouTube, and Yahoo! are big data generators.

Analytic Flow for Big Data (steps)

  1. Data Collection (various sources, structured and unstructured)
  2. Data Preparation (cleaning, wrangling, de-duplication, normalization, sampling)
  3. Analysis Types (e.g., basic stats, regression, recommendation, dimensionality reduction, graph analytics, classification, time series analysis, text analysis, pattern mining)
  4. Analytics Modes (Batch, real-time, interactive)
  5. Visualizations (static, dynamic interactive)

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

Use Quizgecko on...
Browser
Browser