Understanding Big Data: Concepts and Evolution

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Listen to an AI-generated conversation about this lesson
Download our mobile app to listen on the go
Get App

Questions and Answers

Which of the following best describes big data?

  • Small datasets easily processed by traditional techniques
  • A type of database management system
  • Large datasets that cannot be adequately processed using traditional techniques (correct)
  • A statistical tool used for data analysis

Big data only includes structured data, like that found in traditional databases.

False (B)

What are the three 'V's initially used to define big data, as identified by Doug Laney?

Volume, velocity, and variety

The 'V' representing the speed at which data is generated and processed is known as ______.

<p>velocity</p>
Signup and view all the answers

Match the following big data 'V' characteristics with their descriptions:

<p>Volume = The amount of data Velocity = The speed of data processing Variety = Different data types and sources Veracity = The trustworthiness of data</p>
Signup and view all the answers

Which of the following is an example of a 'Volume' aspect of big data?

<p>The total amount of customer transaction records (D)</p>
Signup and view all the answers

Hadoop is a technology that has increased the burden of data storage.

<p>False (B)</p>
Signup and view all the answers

What does the term 'Variety' refer to in the context of big data?

<p>Different forms of data</p>
Signup and view all the answers

The characteristic of big data that relates to the consistency and accuracy of the data is known as ______.

<p>veracity</p>
Signup and view all the answers

What is the significance of 'Velocity' in the context of big data?

<p>The rate at which data is being generated and processed (A)</p>
Signup and view all the answers

The main purpose of big data analytics is to complicate decision-making processes for organizations.

<p>False (B)</p>
Signup and view all the answers

Name one technology that is often associated with addressing the 'Volume' aspect of big data.

<p>Hadoop</p>
Signup and view all the answers

The 5 V's of Big Data are Volume, Velocity, Variety, Veracity and ______.

<p>value</p>
Signup and view all the answers

Which of the following is an example of 'variety' in big data?

<p>The different types of data, such as text, audio, and video (D)</p>
Signup and view all the answers

According to the information, most companies in the U.S. store less than 1 Terabyte of data.

<p>False (B)</p>
Signup and view all the answers

What is the primary purpose of analyzing 'big data' for an organization?

<p>Better decision making</p>
Signup and view all the answers

RFID tags, sensors, and smart metering contribute significantly to the ______ aspect of big data.

<p>velocity</p>
Signup and view all the answers

Which of the following is NOT typically considered a category of 'Big Data'?

<p>Quantum Data (C)</p>
Signup and view all the answers

Recalculating risk portfolios can take days, even with big data technologies.

<p>False (B)</p>
Signup and view all the answers

Name one industry sector that utilizes big data technology.

<p>Banking</p>
Signup and view all the answers

Using analytics to identify how consumers feel about products is called ______ analysis.

<p>sentiment</p>
Signup and view all the answers

Which of the following describes 'Black Box Data'?

<p>Data detailing communications from technical staff (A)</p>
Signup and view all the answers

Big data only helps in generating coupons and is not used to detect fraudulent behavior.

<p>False (B)</p>
Signup and view all the answers

Name two barriers that are imposed on big data.

<p>Data Capture and Storage Capacity</p>
Signup and view all the answers

The power grid data holds information consumed by a node in terms of ______.

<p>base station</p>
Signup and view all the answers

Which of the following is a use of business analytics/business intelligence?

<p>Storing (B)</p>
Signup and view all the answers

In order to capitalize on big data, one should require infrastructure that only manages structured data.

<p>False (B)</p>
Signup and view all the answers

Match the following data unit to the corresponding number of bytes:

<p>Kilobyte (KB) = 1,000 bytes Megabyte (MB) = 1,000,000 bytes Gigabyte (GB) = 1,000,000,000 bytes Terabyte (TB) = 1,000,000,000,000 bytes</p>
Signup and view all the answers

What is the difference between Operational Big Data and Analytical Big Data?

<p>Operational big data provides operational capabilities for interactive and real-time workloads. Analytical big data owns the systems like Massively Parallel Processing database systems and MapReduce which provides analytical capabilities for collective and complex analysis.</p>
Signup and view all the answers

The New York Stock Exchange captures 1 ______ of trade information during each trading session

<p>TB</p>
Signup and view all the answers

Flashcards

What is Big Data?

Large datasets that are difficult to process using traditional methods, involving various tools, techniques, and frameworks.

Volume in Big Data

The amount of data collected from various sources like business transactions and social media. Technologies like Hadoop help manage this.

Velocity in Big Data

Describes the speed at which data is generated and processed, often requiring real-time operations using technologies like RFID tags and sensors.

Variety in Big Data

The different forms of data, including structured, numeric data, unstructured text documents, email, video, and audio.

Signup and view all the flashcards

Veracity in Big Data

Refers to the trustworthiness and accuracy of data, considering factors like origin, authenticity, and completeness.

Signup and view all the flashcards

Value in Big Data

The potential usefulness and insights that can be derived from data, turning data into something valuable.

Signup and view all the flashcards

Black Box Data

This involves conversations between crew members, alert messages, or any orders passed by technical staff.

Signup and view all the flashcards

Social Media Data

Data from social networking sites like Facebook and Twitter, including information and views posted by millions of people.

Signup and view all the flashcards

Power Grid Data

Information on the power consumption of a particular node in terms of base station.

Signup and view all the flashcards

Transport Data

Data from various transport sectors including model, capacity, distance, and availability of vehicles.

Signup and view all the flashcards

Importance of Big Data

Enables finding the root cause of failures, generating targeted coupons, recalculating risk, and detecting fraud.

Signup and view all the flashcards

Big Data Technology Requirements

Infrastructure to manage and process structured and unstructured data in real-time, ensuring data privacy and security.

Signup and view all the flashcards

Operational Big Data

Applications that provide operational capabilities for interactive and real-time workloads where data is captured and stored.

Signup and view all the flashcards

Analytical Big Data

Systems that provide analytical capabilities for collective and complex analysis, often using systems like MapReduce.

Signup and view all the flashcards

MapReduce

A method for analyzing data that can be scaled up from single servers to thousands, often used with SQL and MapReduce.

Signup and view all the flashcards

Big Data Technology Users

Banking, government, education, health care, manufacturing, and retail.

Signup and view all the flashcards

Barriers of Big Data

Capture data, storage capacity, searching, sharing, transfer, analysis, and presentation.

Signup and view all the flashcards

Sources of Big Data

Email, enterprise 'dark data,' public sector, commercial activities, and social media.

Signup and view all the flashcards

Business Intelligence

Business Intelligence (BI) is a category of applications and technologies for gathering, storing, accessing, and analyzing data.

Signup and view all the flashcards

Sentiment Analysis

A type of analysis to better and more quickly understand what customers are saying about their products.

Signup and view all the flashcards

Study Notes

What is Big Data?

  • Big Data is a collection of large datasets that cannot be adequately processed using traditional processing techniques.
  • Big Data is more than just data; it has evolved into a complete subject that utilizes various tools, techniques, and frameworks.
  • The term "Big Data" refers to the volume of structured and unstructured data that businesses use daily.
  • Analyzing in-depth concepts through Big Data leads to better decisions and strategic development for organizations.

The Evolution of Big Data

  • The idea of Big Data emerged in the early 2000s.
  • Doug Laney defined big data using three categories
  • Organizations gather data from various sources like business transactions, social media, and machine-to-machine data.
  • Technologies like Hadoop have alleviated storage concerns.
  • Data streams are now faster than ever and are improved in a timely manner.
  • RFID tags, sensors, and smart metering necessitate real-time data processing.
  • Big Data comes in many forms, including structured, numeric data in databases and unstructured text documents, emails, videos, audio, stock tickers, and financial transactions.

The 5 V's of Big Data

  • Volume: the sheer amount of data generated every second.
  • Velocity: the speed at which data emanates and changes occur.
  • Veracity: the trustworthiness of data.
  • Variety: The different forms of data.
  • Value: Turning big data into something useful.
  • 40 Zettabytes of data will be created by 2020.
  • this will be a 300x increase since 2005
  • It is estimated that 2.5 Quintillion Bytes of data are collected everyday
  • Modern cars have around 100 sensors
  • By 2016, there were 18.9 Billion network connections
  • As of 2011, the global data in healthcare was 150 exabytes

Volume: Scale of Data

  • Unit | Value | Size
  • bit | 0 or 1 | 1/8 of a byte
  • byte | 8 bits | 1 byte
  • kilobyte | 1000^1 bytes | 1,000 bytes
  • megabyte | 1000^2 bytes | 1,000,000 bytes
  • gigabyte | 1000^3 bytes | 1,000,000,000 bytes
  • terabyte | 1000^4 bytes | 1,000,000,000,000 bytes
  • petabyte | 1000^5 bytes | 1,000,000,000,000,000 bytes
  • exabyte | 1000^6 bytes | 1,000,000,000,000,000,000 bytes
  • zettabyte | 1000^7 bytes | 1,000,000,000,000,000,000,000 bytes
  • yottabyte | 1000^8 bytes | 1,000,000,000,000,000,000,000,000 bytes
  • 90% of today's data was created in the last 2 years.
  • There are 2.5 quintillion bytes of data created every day.
  • This would fill 10 million Blu-ray discs
  • 40 zettabytes (40 trillion gigabytes) of data are expected to be generated by 2020, which is 300 times more than in 2005.
  • This equals 5,200 gigabytes for every person on Earth.
  • Most companies in the US store over 100 terabytes (100,000 gigabytes) of data.
  • The New York Stock Exchange captures 1 TB of trade information in each session

Variety: Different Forms of Data

  • As of 2014:
  • There were 420 million wearable wireless health monitors
  • Over 4 Billion hours of video watched on Youtube per month
  • Over 30 Billion pieces of content shared on Facebook every month
  • Over 400 million tweets sent per day

Veracity: Trustworthiness of Data

  • Includes: -Origin -Authenticity -Trustworthiness -Completeness -Integrity
  • 1 in 3 business leaders do not trust information they use to make decisions
  • Poor data quality costs the US about $3.1 Trillion a year

Categories of Big Data

  • Black Box Data: Includes communications between crew members and technical staff.
  • Social Media Data: Information and views from social networking sites like Facebook and Twitter.
  • Stock Exchange Data: Details about business transactions and share decisions.
  • Power Grid Data: Information on power consumption within a network.
  • Transport Data: Data from transport sectors, including vehicle model, capacity, distance, and availability.
  • Search Engine Data: Large amounts of data retrieved by search engines from various sources.

Importance of Big Data

  • Finding the root cause of failures, issues and defects in real time operations.
  • Generating coupons at the point of sale seeing the customer's habits of buying goods.
  • Recalculating entire risk portfolios in just minutes.
  • Detecting fraudulent behavior before it affects and risks your organization.
  • Big Data Technology in: -Banking -Government -Education -Health Care -Manufacturing -Retail

How Businesses Utilize Big Data

  • To understand customer wants and rapidly moving products
  • To meet end-user expectations for customer service
  • To speed up marketing timelines, reduce costs, and build efficient economies of scale

Big Data Technologies

  • Accurate analysis increases efficiency, reduces costs, and lowers business operation risk.
  • Capitalizing requires infrastructure to manage and process large volumes of structured and unstructured data in real-time while ensuring privacy and security.
  • Technologies from vendors like Amazon, IBM, and Microsoft can be used to approach Big Data.

Operational Big Data vs. Analytical Big Data

  • Operational Big Data:
    • Includes applications like MongoDB
    • Provides operational capabilities for interactive and real-time workloads
    • Data is generally captured and stored.
    • Capitalizes on new cloud computing architectures
    • Allows access for massive computations to be run reasonably and efficiently.
  • Analytical Big Data:
    • Owns systems like Massively Parallel Processing database and MapReduce
    • Offers analytical capabilities for complex analysis.
    • MapReduce provides a new method for analyzing data
    • Can be scaled up from single servers to many machines.

Barriers to Big Data

  • Capture data
  • Storage capacity
  • Searching
  • Sharing
  • Transfer
  • Analysis
  • Presentation

Where does Big Data come from?

  • Email
  • Contracts
  • Credit
  • Weather
  • Population
  • Economic
  • Enterprise "Dark" Data
  • Public
  • Social Media
  • Partner, Employee, Customer, Supplier
  • Commercial
  • Sensor
  • Industry
  • Sentiment
  • Monitoring
  • Transactions
  • Network

Business Analytics/Business Intelligence

  • Business Analytics/Business Intelligence (BI) is a broad category of applications, technologies, and processes.
  • The purpose is for: -gathering -storing -accessing -analyzing data
  • Helps business users to make better decisions.

Things are getting more complex

  • Many companies are performing new kinds of analytics (sentiment analysis, etc.)
  • This allows them to better and more quickly understand and respond to what customers are saying about them and their products
  • The cloud and appliances are being used as data stores.
  • Advanced analytics are growing in popularity and importance.
  • Sentiment analysis (opinion mining) uses natural language processing, text analysis, and computational linguistics to identify and extract subjective information.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

What is Big Data? PDF

More Like This

MCQs on Big Data Concepts
11 questions
Big Data and Analytics Quiz
37 questions
Understanding Big Data Concepts
44 questions

Understanding Big Data Concepts

SophisticatedNashville avatar
SophisticatedNashville
Use Quizgecko on...
Browser
Browser