Podcast
Questions and Answers
Which of the following is NOT a typical outcome or benefit of applying Big Data Analytics?
Which of the following is NOT a typical outcome or benefit of applying Big Data Analytics?
- Creation of new products and services
- Increased data storage limitations (correct)
- Cost Reduction
- Faster, better decision making
Big Data is easily processed and analyzed using traditional database systems.
Big Data is easily processed and analyzed using traditional database systems.
False (B)
What term describes the characteristic of big data that refers to the speed at which data is generated?
What term describes the characteristic of big data that refers to the speed at which data is generated?
Velocity
The characteristic of Big Data that refers to the different forms of data, such as structured, unstructured, and semi-structured, is known as ______.
The characteristic of Big Data that refers to the different forms of data, such as structured, unstructured, and semi-structured, is known as ______.
Which of the following is a core characteristic, or 'V', of Big Data?
Which of the following is a core characteristic, or 'V', of Big Data?
The 'Value' of big data refers solely to the size of the dataset.
The 'Value' of big data refers solely to the size of the dataset.
What is the primary goal of any big data analytics system concerning the data it processes?
What is the primary goal of any big data analytics system concerning the data it processes?
Ensuring data is cleansed and accurate to remove noise relates to the Big Data characteristic of ______.
Ensuring data is cleansed and accurate to remove noise relates to the Big Data characteristic of ______.
Which of the following best describes data analytics?
Which of the following best describes data analytics?
Algorithms are NOT an integral part of data analytics.
Algorithms are NOT an integral part of data analytics.
Name one specific action performed on data as part of data analytics.
Name one specific action performed on data as part of data analytics.
The use of data analytics to determine what will happen is called ______ analytics.
The use of data analytics to determine what will happen is called ______ analytics.
Which type of analytics answers the question 'What is the solution?'?
Which type of analytics answers the question 'What is the solution?'?
Basic Statistics is NOT considered a computational task in massive data analysis.
Basic Statistics is NOT considered a computational task in massive data analysis.
Name one specific area where Big Data applications are prevalent.
Name one specific area where Big Data applications are prevalent.
In the financial sector, Big Data is used in areas such as Credit Risk Modeling and ______ Detection.
In the financial sector, Big Data is used in areas such as Credit Risk Modeling and ______ Detection.
Which of the following is an application of Big Data in healthcare?
Which of the following is an application of Big Data in healthcare?
Smart Parking is NOT an example of Big Data applications in the Internet of Things.
Smart Parking is NOT an example of Big Data applications in the Internet of Things.
What type of monitoring is aided by Big Data in environmental applications?
What type of monitoring is aided by Big Data in environmental applications?
The use of Big Data to plan and manage the efficient flow and storage of goods is known as ______.
The use of Big Data to plan and manage the efficient flow and storage of goods is known as ______.
Which of the following is a standard step in the Analytics Flow for Big Data?
Which of the following is a standard step in the Analytics Flow for Big Data?
Analysis Modes DO NOT directly follow Analysis Types in the Analytics Flow for Big Data.
Analysis Modes DO NOT directly follow Analysis Types in the Analytics Flow for Big Data.
After data collection, what is the subsequent step in the Analytics Flow for Big Data?
After data collection, what is the subsequent step in the Analytics Flow for Big Data?
The final step in the general Analytics Flow for Big Data is typically ______.
The final step in the general Analytics Flow for Big Data is typically ______.
What is the purpose of the Big Data Stack?
What is the purpose of the Big Data Stack?
Big Data stack is the activity performed on the infrastructure.
Big Data stack is the activity performed on the infrastructure.
Name one type of raw data source used in a Big Data Stack.
Name one type of raw data source used in a Big Data Stack.
In a Big Data Stack, tools and frameworks for collecting and ingesting data from various sources are known as Data [Blank] Connectors.
In a Big Data Stack, tools and frameworks for collecting and ingesting data from various sources are known as Data [Blank] Connectors.
Which of the following is a type of NoSQL database often used in a Big Data Stack for data storage?
Which of the following is a type of NoSQL database often used in a Big Data Stack for data storage?
Hadoop-MapReduce is related to real-time analytics.
Hadoop-MapReduce is related to real-time analytics.
Name one framework used for real-time analytics.
Name one framework used for real-time analytics.
A framework for distributed and fault-tolerant real-time computation is Apache ______.
A framework for distributed and fault-tolerant real-time computation is Apache ______.
Which system allows users to query data using SQL-like statements in a Big Data environment?
Which system allows users to query data using SQL-like statements in a Big Data environment?
MySQL is an example of a non-relational database.
MySQL is an example of a non-relational database.
Name one Visualization tool used with Big Data.
Name one Visualization tool used with Big Data.
In the context of Big Data Analytics, the data access connectors collect and ingest data into the big data ______ and analytics frameworks.
In the context of Big Data Analytics, the data access connectors collect and ingest data into the big data ______ and analytics frameworks.
Match the following Analytic Types with their descriptions:
Match the following Analytic Types with their descriptions:
According to the weather data analysis case study, what is one common application of analytics?
According to the weather data analysis case study, what is one common application of analytics?
The weather data analysis case study includes the use of data preparation.
The weather data analysis case study includes the use of data preparation.
List the first step in the Analytics Flow for the Weather data analysis application
List the first step in the Analytics Flow for the Weather data analysis application
In weather data analysis, ______ and interactive visualizations are necessary to view data.
In weather data analysis, ______ and interactive visualizations are necessary to view data.
Flashcards
What is Big Data?
What is Big Data?
Collections of datasets so large that it is difficult to store, manage, process and analyze the data using traditional databases and data processing tools.
What is Data Analytics?
What is Data Analytics?
A broad term that encompasses the processes, technologies, frameworks and algorithms to extract meaningful insights from data.
What is Volume in Big Data?
What is Volume in Big Data?
The characteristic of big data relating to the sheer size of the data.
What is Velocity in Big Data?
What is Velocity in Big Data?
Signup and view all the flashcards
What is Variety in Big Data?
What is Variety in Big Data?
Signup and view all the flashcards
What is Value in Big Data?
What is Value in Big Data?
Signup and view all the flashcards
What is Veracity in Big Data?
What is Veracity in Big Data?
Signup and view all the flashcards
What is Descriptive Analytics?
What is Descriptive Analytics?
Signup and view all the flashcards
What is Diagnostic Analytics?
What is Diagnostic Analytics?
Signup and view all the flashcards
What is Predictive Analytics?
What is Predictive Analytics?
Signup and view all the flashcards
What is Prescriptive Analytics?
What is Prescriptive Analytics?
Signup and view all the flashcards
What are Logs (Big Data Stack)?
What are Logs (Big Data Stack)?
Signup and view all the flashcards
What is Transactional Data (Big Data Stack)?
What is Transactional Data (Big Data Stack)?
Signup and view all the flashcards
What is Social Media Data (Big Data Stack)?
What is Social Media Data (Big Data Stack)?
Signup and view all the flashcards
What are Databases (Big Data Stack)?
What are Databases (Big Data Stack)?
Signup and view all the flashcards
What it Sensor Data (Big Data Stack)?
What it Sensor Data (Big Data Stack)?
Signup and view all the flashcards
What is Clickstream Data (Big Data Stack)?
What is Clickstream Data (Big Data Stack)?
Signup and view all the flashcards
Surveillance Data (Big Data Stack)?
Surveillance Data (Big Data Stack)?
Signup and view all the flashcards
What is Healthcare Data (Big Data Stack)?
What is Healthcare Data (Big Data Stack)?
Signup and view all the flashcards
What is Network Data (Big Data Stack)?
What is Network Data (Big Data Stack)?
Signup and view all the flashcards
Data Access Connectors (big data stack)
Data Access Connectors (big data stack)
Signup and view all the flashcards
What is Publish-Subscribe Messaging?
What is Publish-Subscribe Messaging?
Signup and view all the flashcards
What is NoSQL?
What is NoSQL?
Signup and view all the flashcards
What is Hadoop?
What is Hadoop?
Signup and view all the flashcards
Batch Analytics (Big Data Stack)
Batch Analytics (Big Data Stack)
Signup and view all the flashcards
Real-time Analytics (Big Data Stack)
Real-time Analytics (Big Data Stack)
Signup and view all the flashcards
Interactive Querying (Big Data Stack)
Interactive Querying (Big Data Stack)
Signup and view all the flashcards
Hive
Hive
Signup and view all the flashcards
Databases, Web & Visualization Frameworks
Databases, Web & Visualization Frameworks
Signup and view all the flashcards
Study Notes
- Big Data Analytics is a field focused on extracting meaningful insights from large and complex datasets.
- Cost reduction, faster/better decision-making, and new product/service development can be achieved through Big Data Analytics.
Learning outcomes
- Data Analytics
- Big Data
- Big Data Characteristics
- Types of Analytics
Big Data Analytics Textbook & Reference Books
- "Big Data Analytics: A Hands-On Approach" by Arshdeep Bahga
- "Big Data Fundamentals Concepts, Drivers & Techniques" by Thomas Erl
- "Hadoop: The definitive guide" by Tom White from O'Reilly Media, Inc., 2012
- "Learning Spark: lightning-fast big data analysis" by Karau, Holden, Konwinski, Wendell, and Zaharia from O'Reilly Media, Inc., 2015
- "MapReduce design patterns: building effective algorithms and analytics for Hadoop and other systems" by Miner and Shook from O'Reilly Media, Inc., 2012.
Big Data
- Big Data is defined by collections of datasets whose volume, velocity, or variety are very large.
- Big Data Volume, velocity or variety is so large that it becomes difficult to store, manage, process, and analyze with traditional databases.
- Exponential growth in structured and unstructured data occurs due to information technology, industrial applications, healthcare, and the Internet of Things.
- Approximately 3.5 quintillion bytes of data are created every day.
Characteristics of Big Data
- Volume: Big data is a form of data with a scale that is too large for a single machine, requiring special tools/frameworks for storage, processing, and analysis.
- Social media platforms process billions of messages daily.
- Industrial and energy systems can generate terabytes of sensor data each day.
- Cab aggregation apps might process millions of transactions daily.
- Velocity: Velocity of data refers to the speed at which data is generated.
- High velocity results in high volume after a short span of time.
- Some applications require real-time analysis with strict deadlines like trading platforms.
- Variety: Variety refers to the different forms of data such as structured, semi-structured, and unstructured, including text, image, audio, video, and sensor data.
- Value: Value of data refers to the usefulness of data for the intended purpose
- Veracity: Veracity refers to the accuracy of the data that it contains
- Data needs to be cleaned of noise to extract value, and faulty data must be filtered out.
- Data-driven applications require meaningful and accurate data to reap the benefits of big data.
Data Analytics
- Analytics is a broad term encompassing the processes, technologies, frameworks, and algorithms used to extract meaningful insights.
- Data analysis is achieved through filtering, processing, categorizing, condensing, and contextualizing data.
- Data analytics is the process of exploring and analyzing large data sets to identify hidden patterns, unseen trends, and valuable correlations/insights.
Types of Analytics
- Descriptive Analytics: Focuses on what has happened by looking at data.
- Diagnostic Analytics: Focuses on why something happened.
- Predictive Analytics: Focuses on what will happen in the future.
- Prescriptive Analytics: Focuses on what the solution is.
Computational Tasks
- The National Research Council has characterized computational tasks for massive data analysis, known as the "seven giants".
- The seven computational tasks:
- Basic Statistics
- Generalized N-Body Problems
- Linear Algebraic Computations
- Graph-Theoretic Computations
- Optimization
- Integration
Domain specific examples of Big Data
- Homes
- Cities
- Environment
- Energy Systems
- Retail
- Logistics
- Industry
- Agriculture
- Internet of Things
- Healthcare
Industry specific analytics
- Web:
- Web Analytics
- Performance Monitoring
- Ad Targeting & Analytics
- Content Recommendation
- Financial:
- Credit Risk Modeling
- Fraud Detection
- Healthcare:
- Epidemiological Surveillance
- Patient Similarity-based Decision Intelligence Application
- Adverse Drug Events Prediction
- Detecting Claim Anomalies
- Real-time health monitoring
- Internet of Things:
- Intrusion Detection
- Smart Parking
- Smart Roads
- Structural Health Monitoring
- Smart Irrigation
- Environment:
- Weather Monitoring
- Air Pollution Monitoring
- Noise Pollution Monitoring
- Forest Fire Detection
- River Floods Detection
- Water Quality Monitoring
- Logistics and Transportation
- Real-time Fleet Tracking
- Shipment Monitoring
- Remote Vehicle Diagnostics
- Route Generation and Scheduling
- Hyper-local Delivery
- Cab/Taxi Aggregators
- Industry:
- Machine Diagnosis and Prognosis
- Risk Analysis of Industrial Operations
- Production Planning and Control
- Retail:
- Inventory Management
- Customer Recommendations
- Store Layout Optimization
- Forecasting Demand
Analytics flow
- Data Collection.
- Data Preparation.
- Analysis Types.
- Analysis Modes.
- Visualizations.
Big Data Stack
- A Big Data stack refers to the collection of software and technologies used to handle and process large volumes of data.
- Big data stacks are designed to manage the challenges associated with storing, processing, and analyzing massive datasets.
- Big data analytics is the process of extracting insights from large, complex datasets.
- Big data stack is the collection of technologies and tools used to manage, process, and analyze that data.
- The stack is the infrastructure and analytics are the activity performed on it.
Elements of the Big Data Stack
- Raw Data Sources consist of Logs, Transactional Data, Social Media, Databases, Sensor Data, Clickstream Data, Surveillance Data, Healthcare Data and Network Data
- Logs are records of events within an application or server environment for performance monitoring.
- Transactional data is generated by eCommerce, Banking, and Financial applications.
- Social Media data is generated by social media platforms, while databases hold structured data in relational formats.
- Sensor Data is generated by Internet of Things (IoT) systems.
- Clickstream Data is generated by web applications to analyze browsing patterns.
- Healthcare Data is generated by Electronic Health Record (EHR) and other healthcare apps
- Network Data is generated by network devices such as routers.
- Data Access Connectors consist of Publish-Subscribe Messaging, Source-Sink Connectors, Database Connectors, Messaging Queues and Custom Connectors
- These connectors consist of tools and frameworks for collecting and ingesting data from various sources into big data storage and analytics frameworks
- Data Storage consists of Non-relational (NoSQL) databases and the Hadoop Distributed File System (HDFS).
- Hadoop is an open-source software framework designed for distributed storage and processing of large datasets, enabling efficient analysis by splitting workloads.
- Batch Analytics include Hadoop-MapReduce, Pig, Oozie, Spark, Solr and Machine Learning.
- It allows analysis of data in batches.
- Real-time Analytics tools are Apache Storm and Spark Streaming.
- Apache Storm is a framework for distributed and fault-tolerant real-time computation.
- Storm handles data from publish-subscribe messaging frameworks (Kafka or Kinesis), messaging queues (RabbitMQ or ZeroMQ), and custom connectors.
- Spark Streaming is a component of Spark for analyzing streaming data like sensor data, clickstreams, and web server logs.
- The streaming data is ingested and analyzed in micro-batches leading to scalable and high throughput stream processing.
- Interactive Querying systems.
- Spark SQL and Hive can be used to query structured and semi-structured data using SQL-like queries, while supporting Apache Hadoop.
- Amazon Redshift to handle queries on datasets of sizes up to a petabyte or more parallelizing the SQL queries.
- Google BigQuery querying datasets using SQL-like queries.
- Serving Databases, Web & Visualization Frameworks consist of:
- Databases include MySQL, Amazon DynamoDB, Cassandra and MongoDB
- Visualization frameworks would be Lightning, Pygal and Seaborn
- Analytics results are stored in these serving databases for subsequent presentation and visualization tasks.
Analytics mapping
- In any big data application, the next step is to map the analytics flow to specific tools and frameworks in the big data stack, based on the chosen analytics flow.
Analytics flow case study
- Weather Data Analysis uses
- Data Collection (collection processes)
- The Data Preparation stage
- The analysis stage
- The Batch
- The interactive tools
- Real-time Analytics stages.
- Batch
- Interactive
- Data analytics
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.