Introduction to Big Data Concepts
8 Questions
0 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is the estimated global volume of data generated by 2025?

  • 250 zettabytes
  • 200 zettabytes
  • 175 zettabytes (correct)
  • 150 zettabytes

Which of the following best describes the data type that accounts for 80% of global data?

  • Unstructured data (correct)
  • Hierarchical data
  • Transactional data
  • Structured data

How is data typically divided and distributed in Hadoop Distributed File System (HDFS)?

  • Into small blocks of 128 MB or 256 MB (correct)
  • In whole files of 1 GB
  • In random unspecified sizes
  • Into chunks of 64 MB

Which of the following is NOT one of the 5 Vs of Big Data?

<p>Variability (C)</p> Signup and view all the answers

What percentage of global data is stored in relational databases?

<p>Less than 20% (D)</p> Signup and view all the answers

What defines a data lake in the context of data storage?

<p>A place where raw data is stored without transformation (A)</p> Signup and view all the answers

Which source generates approximately 500,000 units of data per minute?

<p>Tweets (C)</p> Signup and view all the answers

What is one of the main characteristics of Veracity in the 5 Vs of Big Data?

<p>The trustworthiness and authenticity of data (C)</p> Signup and view all the answers

Flashcards

Global Data Volume

The sheer volume of data generated daily, measured in zettabytes (ZB). A zettabyte is one billion gigabytes. In 2025, it's estimated that the world will generate 175 ZB of data.

Velocity

The speed at which data is generated and processed. Big Data can be classified into categories: batch, near-time, real-time, and streams, each with different processing speeds.

Variety

The variety of data types, including structured, unstructured, and semi-structured data. Structured data fits into rows and columns (like a database), unstructured data is in formats like images or text, and semi-structured data has some organization but isn't fully tabular.

Volume

The massive size of data sets, often measured in terabytes (TB), petabytes (PB), and even exabytes (EB).

Signup and view all the flashcards

Veracity

The quality and trustworthiness of data. This includes factors like data origin, authenticity, and accuracy.

Signup and view all the flashcards

Value

The potential value that can be extracted from data. This could include statistical insights, event patterns, correlations, or even hypothetical scenarios.

Signup and view all the flashcards

Data Lake

A centralized repository designed to store all types of data (structured, semi-structured, and unstructured) in its raw form. This allows for long-term analysis and flexibility as you may not know what type of analysis you'll want to do in the future.

Signup and view all the flashcards

HDFS (Hadoop Distributed File System)

A type of NoSQL database that manages large volumes of data distributed across multiple servers. It divides data into blocks and replicates them across nodes, ensuring data is not lost if a server fails.

Signup and view all the flashcards

Study Notes

Big Data

  • Data is crucial for decision-making in all business aspects.
  • In 2025, the world will produce 175 zettabytes of data (1 ZB = 1 billion gigabytes).
  • In 2010, data production was significantly lower.
  • 2.5 million GB of data is created daily by internet users.
  • Recent years have seen 90% of global data creation.

The 5 Vs of Big Data

  • Velocity: Batch, near real-time, real-time, and streaming data
  • Variety: Structured, unstructured, and semi-structured data, including different formats (e.g. tables, files, transactions)
  • Volume: Gigabytes, terabytes, and petabytes of data
  • Veracity: Trustworthiness, authenticity, origin, reputation, accountability in the data
  • Value: Statistical analysis, event identification, correlations, and potential insights from data

Sources of Data

  • Facebook: 500,000 tweets per minute
  • Twitter: 500,000 posts per minute
  • Instagram: 347,222 posts per minute
  • Internet of Things (IoT): 75 million connected devices generating data and sensor readings

Storage of Data

  • Less than 20% of global data is stored in relational databases (important for businesses like banks, hospitals, and customers).
  • 80% of data is unstructured (text, images, videos) and is stored in big data architectures and NoSQL databases.

Big Data Storage Technologies (Hadoop Distributed File System - HDFS)

  • Divides data into small blocks (128 MB or 256 MB) and distributes them across multiple servers.
  • Provides redundancy (copies of data) to ensure data safety.
  • Ideal for storing large amounts of unstructured or semi-structured data.

Data Lakes

  • Centralized repository storing all types of data (structured, semi-structured, and unstructured) in raw format.
  • Used when needing long-term storage and analysis of diverse, raw data.
  • Ideal when the specific analysis type is unknown.

NoSQL Databases

  • Designed for flexibility, high speed and handling unstructured data.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

Big Data Overview PDF

Description

Explore the transformative nature of big data and its crucial role in decision-making across various business functions. Learn about the 5 Vs of Big Data: Velocity, Variety, Volume, Veracity, and Value, and understand the exponential growth of data generated from different sources. Test your knowledge on how big data influences modern analytics.

More Like This

Business Intelligence Basics
10 questions
Data Mining Techniques and Applications Quiz
10 questions
Introducción al Big Data
4 questions
Use Quizgecko on...
Browser
Browser