Big Data Analytics

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson
Download our mobile app to listen on the go
Get App

Questions and Answers

Which of the following is NOT a typical outcome or benefit of applying Big Data Analytics?

  • Creation of new products and services
  • Increased data storage limitations (correct)
  • Cost Reduction
  • Faster, better decision making

Big Data is easily processed and analyzed using traditional database systems.

False (B)

What term describes the characteristic of big data that refers to the speed at which data is generated?

Velocity

The characteristic of Big Data that refers to the different forms of data, such as structured, unstructured, and semi-structured, is known as ______.

<p>variety</p> Signup and view all the answers

Which of the following is a core characteristic, or 'V', of Big Data?

<p>Veracity (D)</p> Signup and view all the answers

The 'Value' of big data refers solely to the size of the dataset.

<p>False (B)</p> Signup and view all the answers

What is the primary goal of any big data analytics system concerning the data it processes?

<p>To extract value</p> Signup and view all the answers

Ensuring data is cleansed and accurate to remove noise relates to the Big Data characteristic of ______.

<p>veracity</p> Signup and view all the answers

Which of the following best describes data analytics?

<p>The process of exploring and analyzing datasets to find hidden patterns and correlations (A)</p> Signup and view all the answers

Algorithms are NOT an integral part of data analytics.

<p>False (B)</p> Signup and view all the answers

Name one specific action performed on data as part of data analytics.

<p>Filtering or Categorizing or Processing or Condensing or Contextualizing</p> Signup and view all the answers

The use of data analytics to determine what will happen is called ______ analytics.

<p>predictive</p> Signup and view all the answers

Which type of analytics answers the question 'What is the solution?'?

<p>Prescriptive Analytics (D)</p> Signup and view all the answers

Basic Statistics is NOT considered a computational task in massive data analysis.

<p>False (B)</p> Signup and view all the answers

Name one specific area where Big Data applications are prevalent.

<p>Healthcare or Retail or Environment or Logistics or Industry or Agriculture or Internet of Things or Energy Systems or Homes or Cities</p> Signup and view all the answers

In the financial sector, Big Data is used in areas such as Credit Risk Modeling and ______ Detection.

<p>fraud</p> Signup and view all the answers

Which of the following is an application of Big Data in healthcare?

<p>Epidemiological Surveillance (C)</p> Signup and view all the answers

Smart Parking is NOT an example of Big Data applications in the Internet of Things.

<p>False (B)</p> Signup and view all the answers

What type of monitoring is aided by Big Data in environmental applications?

<p>Air Pollution Monitoring or Weather Monitoring or Noise Pollution Monitoring or Water Quality Monitoring or Forest Fire Detection</p> Signup and view all the answers

The use of Big Data to plan and manage the efficient flow and storage of goods is known as ______.

<p>logistics</p> Signup and view all the answers

Which of the following is a standard step in the Analytics Flow for Big Data?

<p>Data Collection (B)</p> Signup and view all the answers

Analysis Modes DO NOT directly follow Analysis Types in the Analytics Flow for Big Data.

<p>False (B)</p> Signup and view all the answers

After data collection, what is the subsequent step in the Analytics Flow for Big Data?

<p>Data Preparation</p> Signup and view all the answers

The final step in the general Analytics Flow for Big Data is typically ______.

<p>visualizations</p> Signup and view all the answers

What is the purpose of the Big Data Stack?

<p>To handle and process large volumes of data using various software and technologies (A)</p> Signup and view all the answers

Big Data stack is the activity performed on the infrastructure.

<p>False (B)</p> Signup and view all the answers

Name one type of raw data source used in a Big Data Stack.

<p>Logs or Databases or Social Media or Transactional Data or Sensor Data or Clickstream Data or Surveillance Data or Healthcare Data or Network Data</p> Signup and view all the answers

In a Big Data Stack, tools and frameworks for collecting and ingesting data from various sources are known as Data [Blank] Connectors.

<p>Access</p> Signup and view all the answers

Which of the following is a type of NoSQL database often used in a Big Data Stack for data storage?

<p>HBase (D)</p> Signup and view all the answers

Hadoop-MapReduce is related to real-time analytics.

<p>False (B)</p> Signup and view all the answers

Name one framework used for real-time analytics.

<p>Storm or Spark Streaming</p> Signup and view all the answers

A framework for distributed and fault-tolerant real-time computation is Apache ______.

<p>storm</p> Signup and view all the answers

Which system allows users to query data using SQL-like statements in a Big Data environment?

<p>Spark SQL (A)</p> Signup and view all the answers

MySQL is an example of a non-relational database.

<p>False (B)</p> Signup and view all the answers

Name one Visualization tool used with Big Data.

<p>Lightning or Pygal or Seaborn</p> Signup and view all the answers

In the context of Big Data Analytics, the data access connectors collect and ingest data into the big data ______ and analytics frameworks.

<p>storage</p> Signup and view all the answers

Match the following Analytic Types with their descriptions:

<p>Descriptive = What has happened?</p> Signup and view all the answers

According to the weather data analysis case study, what is one common application of analytics?

<p>Analyzing Streaming Sensor Data (B)</p> Signup and view all the answers

The weather data analysis case study includes the use of data preparation.

<p>True (A)</p> Signup and view all the answers

List the first step in the Analytics Flow for the Weather data analysis application

<p>data Collection</p> Signup and view all the answers

In weather data analysis, ______ and interactive visualizations are necessary to view data.

<p>dynamic</p> Signup and view all the answers

Flashcards

What is Big Data?

Collections of datasets so large that it is difficult to store, manage, process and analyze the data using traditional databases and data processing tools.

What is Data Analytics?

A broad term that encompasses the processes, technologies, frameworks and algorithms to extract meaningful insights from data.

What is Volume in Big Data?

The characteristic of big data relating to the sheer size of the data.

What is Velocity in Big Data?

The characteristic of big data relating to the speed at which the data is generated.

Signup and view all the flashcards

What is Variety in Big Data?

The characteristic of big data relating to the different forms of data.

Signup and view all the flashcards

What is Value in Big Data?

The characteristic of big data relating to the usefulness of data for the intended purpose.

Signup and view all the flashcards

What is Veracity in Big Data?

The characteristic of big data relating to how accurate is the data.

Signup and view all the flashcards

What is Descriptive Analytics?

A type of analytics that looks at what happened?

Signup and view all the flashcards

What is Diagnostic Analytics?

A type of analytics that looks at why did it happen?

Signup and view all the flashcards

What is Predictive Analytics?

A type of analytics that looks at what will happen?

Signup and view all the flashcards

What is Prescriptive Analytics?

A type of analytics that looks at what is the solution?

Signup and view all the flashcards

What are Logs (Big Data Stack)?

Refers to records of events that happen within the application or server environment.

Signup and view all the flashcards

What is Transactional Data (Big Data Stack)?

These types of data is generated by applications such as eCommerce, Banking and Financial

Signup and view all the flashcards

What is Social Media Data (Big Data Stack)?

These type of data includes posts, comments, and shares.

Signup and view all the flashcards

What are Databases (Big Data Stack)?

Structured data residing in relational databases.

Signup and view all the flashcards

What it Sensor Data (Big Data Stack)?

Data generated by Internet of Things (IoT) systems.

Signup and view all the flashcards

What is Clickstream Data (Big Data Stack)?

Data generated by web applications which can be used to analyze browsing patterns of the users.

Signup and view all the flashcards

Surveillance Data (Big Data Stack)?

Sensor, image and video data generated by surveillance systems.

Signup and view all the flashcards

What is Healthcare Data (Big Data Stack)?

Data generated by Electronic Health Record (EHR) and other healthcare applications.

Signup and view all the flashcards

What is Network Data (Big Data Stack)?

Data generated by network devices such as routers.

Signup and view all the flashcards

Data Access Connectors (big data stack)

Tools and frameworks for collecting and ingesting data from various sources into the big data storage and analytics frameworks.

Signup and view all the flashcards

What is Publish-Subscribe Messaging?

Messaging pattern where senders (publishers) transmit messages to subscribers

Signup and view all the flashcards

What is NoSQL?

Non-relational databases

Signup and view all the flashcards

What is Hadoop?

An open-source software framework designed for distributed storage and processing of large datasets.

Signup and view all the flashcards

Batch Analytics (Big Data Stack)

Includes various frameworks which allow analysis of data in batches.

Signup and view all the flashcards

Real-time Analytics (Big Data Stack)

Includes the Apache Storm and Spark Streaming frameworks.

Signup and view all the flashcards

Interactive Querying (Big Data Stack)

Allows users to query data by writing statements in SQL-like languages.

Signup and view all the flashcards

Hive

Apache module. provides an SQL-like query language called Hive Query Language, for querying data residing in HDFS

Signup and view all the flashcards

Databases, Web & Visualization Frameworks

Serving data for visualization.

Signup and view all the flashcards

Study Notes

  • Big Data Analytics is a field focused on extracting meaningful insights from large and complex datasets.
  • Cost reduction, faster/better decision-making, and new product/service development can be achieved through Big Data Analytics.

Learning outcomes

  • Data Analytics
  • Big Data
  • Big Data Characteristics
  • Types of Analytics

Big Data Analytics Textbook & Reference Books

  • "Big Data Analytics: A Hands-On Approach" by Arshdeep Bahga
  • "Big Data Fundamentals Concepts, Drivers & Techniques" by Thomas Erl
  • "Hadoop: The definitive guide" by Tom White from O'Reilly Media, Inc., 2012
  • "Learning Spark: lightning-fast big data analysis" by Karau, Holden, Konwinski, Wendell, and Zaharia from O'Reilly Media, Inc., 2015
  • "MapReduce design patterns: building effective algorithms and analytics for Hadoop and other systems" by Miner and Shook from O'Reilly Media, Inc., 2012.

Big Data

  • Big Data is defined by collections of datasets whose volume, velocity, or variety are very large.
  • Big Data Volume, velocity or variety is so large that it becomes difficult to store, manage, process, and analyze with traditional databases.
  • Exponential growth in structured and unstructured data occurs due to information technology, industrial applications, healthcare, and the Internet of Things.
  • Approximately 3.5 quintillion bytes of data are created every day.

Characteristics of Big Data

  • Volume: Big data is a form of data with a scale that is too large for a single machine, requiring special tools/frameworks for storage, processing, and analysis.
  • Social media platforms process billions of messages daily.
  • Industrial and energy systems can generate terabytes of sensor data each day.
  • Cab aggregation apps might process millions of transactions daily.
  • Velocity: Velocity of data refers to the speed at which data is generated.
  • High velocity results in high volume after a short span of time.
  • Some applications require real-time analysis with strict deadlines like trading platforms.
  • Variety: Variety refers to the different forms of data such as structured, semi-structured, and unstructured, including text, image, audio, video, and sensor data.
  • Value: Value of data refers to the usefulness of data for the intended purpose
  • Veracity: Veracity refers to the accuracy of the data that it contains
  • Data needs to be cleaned of noise to extract value, and faulty data must be filtered out.
  • Data-driven applications require meaningful and accurate data to reap the benefits of big data.

Data Analytics

  • Analytics is a broad term encompassing the processes, technologies, frameworks, and algorithms used to extract meaningful insights.
  • Data analysis is achieved through filtering, processing, categorizing, condensing, and contextualizing data.
  • Data analytics is the process of exploring and analyzing large data sets to identify hidden patterns, unseen trends, and valuable correlations/insights.

Types of Analytics

  • Descriptive Analytics: Focuses on what has happened by looking at data.
  • Diagnostic Analytics: Focuses on why something happened.
  • Predictive Analytics: Focuses on what will happen in the future.
  • Prescriptive Analytics: Focuses on what the solution is.

Computational Tasks

  • The National Research Council has characterized computational tasks for massive data analysis, known as the "seven giants".
  • The seven computational tasks:
  • Basic Statistics
  • Generalized N-Body Problems
  • Linear Algebraic Computations
  • Graph-Theoretic Computations
  • Optimization
  • Integration

Domain specific examples of Big Data

  • Homes
  • Cities
  • Environment
  • Energy Systems
  • Retail
  • Logistics
  • Industry
  • Agriculture
  • Internet of Things
  • Healthcare

Industry specific analytics

  • Web:
  • Web Analytics
  • Performance Monitoring
  • Ad Targeting & Analytics
  • Content Recommendation
  • Financial:
  • Credit Risk Modeling
  • Fraud Detection
  • Healthcare:
  • Epidemiological Surveillance
  • Patient Similarity-based Decision Intelligence Application
  • Adverse Drug Events Prediction
  • Detecting Claim Anomalies
  • Real-time health monitoring
  • Internet of Things:
  • Intrusion Detection
  • Smart Parking
  • Smart Roads
  • Structural Health Monitoring
  • Smart Irrigation
  • Environment:
  • Weather Monitoring
  • Air Pollution Monitoring
  • Noise Pollution Monitoring
  • Forest Fire Detection
  • River Floods Detection
  • Water Quality Monitoring
  • Logistics and Transportation
  • Real-time Fleet Tracking
  • Shipment Monitoring
  • Remote Vehicle Diagnostics
  • Route Generation and Scheduling
  • Hyper-local Delivery
  • Cab/Taxi Aggregators
  • Industry:
  • Machine Diagnosis and Prognosis
  • Risk Analysis of Industrial Operations
  • Production Planning and Control
  • Retail:
  • Inventory Management
  • Customer Recommendations
  • Store Layout Optimization
  • Forecasting Demand

Analytics flow

  • Data Collection.
  • Data Preparation.
  • Analysis Types.
  • Analysis Modes.
  • Visualizations.

Big Data Stack

  • A Big Data stack refers to the collection of software and technologies used to handle and process large volumes of data.
  • Big data stacks are designed to manage the challenges associated with storing, processing, and analyzing massive datasets.
  • Big data analytics is the process of extracting insights from large, complex datasets.
  • Big data stack is the collection of technologies and tools used to manage, process, and analyze that data.
  • The stack is the infrastructure and analytics are the activity performed on it.

Elements of the Big Data Stack

  • Raw Data Sources consist of Logs, Transactional Data, Social Media, Databases, Sensor Data, Clickstream Data, Surveillance Data, Healthcare Data and Network Data
  • Logs are records of events within an application or server environment for performance monitoring.
  • Transactional data is generated by eCommerce, Banking, and Financial applications.
  • Social Media data is generated by social media platforms, while databases hold structured data in relational formats.
  • Sensor Data is generated by Internet of Things (IoT) systems.
  • Clickstream Data is generated by web applications to analyze browsing patterns.
  • Healthcare Data is generated by Electronic Health Record (EHR) and other healthcare apps
  • Network Data is generated by network devices such as routers.
  • Data Access Connectors consist of Publish-Subscribe Messaging, Source-Sink Connectors, Database Connectors, Messaging Queues and Custom Connectors
  • These connectors consist of tools and frameworks for collecting and ingesting data from various sources into big data storage and analytics frameworks
  • Data Storage consists of Non-relational (NoSQL) databases and the Hadoop Distributed File System (HDFS).
  • Hadoop is an open-source software framework designed for distributed storage and processing of large datasets, enabling efficient analysis by splitting workloads.
  • Batch Analytics include Hadoop-MapReduce, Pig, Oozie, Spark, Solr and Machine Learning.
  • It allows analysis of data in batches.
  • Real-time Analytics tools are Apache Storm and Spark Streaming.
  • Apache Storm is a framework for distributed and fault-tolerant real-time computation.
  • Storm handles data from publish-subscribe messaging frameworks (Kafka or Kinesis), messaging queues (RabbitMQ or ZeroMQ), and custom connectors.
  • Spark Streaming is a component of Spark for analyzing streaming data like sensor data, clickstreams, and web server logs.
  • The streaming data is ingested and analyzed in micro-batches leading to scalable and high throughput stream processing.
  • Interactive Querying systems.
  • Spark SQL and Hive can be used to query structured and semi-structured data using SQL-like queries, while supporting Apache Hadoop.
  • Amazon Redshift to handle queries on datasets of sizes up to a petabyte or more parallelizing the SQL queries.
  • Google BigQuery querying datasets using SQL-like queries.
  • Serving Databases, Web & Visualization Frameworks consist of:
  • Databases include MySQL, Amazon DynamoDB, Cassandra and MongoDB
  • Visualization frameworks would be Lightning, Pygal and Seaborn
  • Analytics results are stored in these serving databases for subsequent presentation and visualization tasks.

Analytics mapping

  • In any big data application, the next step is to map the analytics flow to specific tools and frameworks in the big data stack, based on the chosen analytics flow.

Analytics flow case study

  • Weather Data Analysis uses
  • Data Collection (collection processes)
  • The Data Preparation stage
  • The analysis stage
  • The Batch
  • The interactive tools
  • Real-time Analytics stages.
  • Batch
  • Interactive
  • Data analytics

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

More Like This

Análisis de Datos en Big Data
22 questions

Análisis de Datos en Big Data

SelfSatisfactionTuring avatar
SelfSatisfactionTuring
Big Data Analytics Overview
18 questions
Data Science Midterm Exam
48 questions

Data Science Midterm Exam

WorthyModernism8021 avatar
WorthyModernism8021
Big Data Analytics in Business
35 questions

Big Data Analytics in Business

ProtectiveHawthorn5138 avatar
ProtectiveHawthorn5138
Use Quizgecko on...
Browser
Browser