Podcast
Questions and Answers
What is the name of the course according to the slide?
What is the name of the course according to the slide?
BIG DATA ANALYTICS (CS 0654)
Who is the instructor of the course?
Who is the instructor of the course?
Dr. Sana Fakhfakh
Which of these are NOT included in the introductory section of the big data analytics course?
Which of these are NOT included in the introductory section of the big data analytics course?
What are the 5 V's of Big Data?
What are the 5 V's of Big Data?
Signup and view all the answers
The volume of video data at Lahore Safe City Authority Control Room is an example of unstructured data.
The volume of video data at Lahore Safe City Authority Control Room is an example of unstructured data.
Signup and view all the answers
Which of the 5 V's of big data refers to the speed of data accumulation?
Which of the 5 V's of big data refers to the speed of data accumulation?
Signup and view all the answers
What is the name of the process used to collect information from a group of people?
What is the name of the process used to collect information from a group of people?
Signup and view all the answers
What type of learning involves training a model with a set of labelled data?
What type of learning involves training a model with a set of labelled data?
Signup and view all the answers
Unsupervised Learning utilizes data with labelled outputs.
Unsupervised Learning utilizes data with labelled outputs.
Signup and view all the answers
What is the name of the analytical technique that focuses on assigning predefined labels to each object?
What is the name of the analytical technique that focuses on assigning predefined labels to each object?
Signup and view all the answers
What is the name of the analytical technique involving finding a function to predict a continuous output?
What is the name of the analytical technique involving finding a function to predict a continuous output?
Signup and view all the answers
Study Notes
Big Data Analytics (CS 0654)
- Master of Data Science course offered by Dr. Sana Fakhfakh at Prince Sattam Bin Abdulaziz University.
- Course focuses on introducing big data analytics.
Introduction to Big Data Analytics
-
Big Data Generation and Growth
- Data generated at an explosive rate, with organizations collecting trillions of bytes daily about customers, suppliers, and operations.
- Large data pools are captured, communicated, aggregated, stored, and analyzed by businesses, academia, and governments.
- Social media use fuels multimedia data growth.
- Internet users spent 2.8 million years online in 2018.
- Social media accounts for 33% of total online time.
- In 2019, there were over 2.3 billion active Facebook users, sending nearly half a million tweets per minute.
- By 2020, each person would generate 1.7 megabytes every second, resulting in 40 trillion gigabytes (40 zettabytes) of data.
- 90% of all data created in the last two years.
-
What is Big Data
- Datasets too large for typical database software to capture, store, manage, and analyze.
- Definition varies by industry and available software tools, often ranging from dozens of terabytes to petabytes.
- Data size increases with technology advancements.
-
Importance of Big Data Analytics
- Organizations use data to discover new opportunities, shape smarter business decisions, implement efficient operations, maximize revenue/profits, and retain satisfied customers.
- Top three most valued factors include cost reduction, faster/better decision-making, and new products/services.
-
Industries Benefiting from Big Data Analytics
- Retail: Advertising, targeted marketing, recommendation systems, customer loyalty, inventory management, demand prediction.
- Banking and Finance: Customer loyalty and churn, fraud detection, risk assessment.
- Brands: Using data analytics for product and service launches and appropriate timings (66% of brands).
- Logistics and Transportation: Fleet management, maintenance needs, driver risk assessment, real-time tracking.
- Health Care: Efficiency in healthcare operations, predictive analytics, outbreak prediction, immunization strategy.
- Government and Utility Companies: Surveys & census, development planning, health, education, energy supply & demand management.
- Google AI system can detect breast cancer.
-
Sources of data: (people, machines, organizations)
- Machine generated data: Temperature sensors, GPS, satellite imagery, apps, IoT, flight data (sensors, temperature, pressure, accelerometer, turbulence), smart city/transportation video data
- Human generated data: Blogs, social media posts, keywords, pictures, emails, ratings, reviews, Facebook data, Twitter data for sentiment analysis.
- Organization generated data: LUMS students data, TCS shipment tracking data, governments' open data, stock records, banks, e-commerce, medical records, optimizing routes and scheduling, Walmart sales and social media analysis/events, estimate demands, fraud detection, highly structured data.
-
Aspects of Bigness (The 5 V's of big data):
- Volume: Huge amount of data. Challenges include acquisition, storage, retrieval, and processing time.
- Velocity: High speed of data accumulation. Challenges include making quick decisions, real-time processing vs batch processing.
- Variety: Different data formats. Challenges include various data formats, requirement for sophisticated analytics, and interpretation.
- Veracity: Quality of the data. Issues include biases, inconsistencies, incomplete/duplicate records, volatility, trustworthiness, and reliability.
- Value: Data turned into meaningful information for the company, meeting strategic objectives, and amplifying other technological innovations.
-
Types of Data (table, text, multimedia, stream, sequence, graphs):
- Relational data, Text Data, Multimedia data, Time Series, Data Streams, Graphs and Homogeneous Networks, Graphs and Heterogeneous Networks
-
The Analytics Process (preprocessing, analytics, visualization)
- Business objective: Finding data analytics reasons (e.g., lowering production costs, increasing sales, favorable brand image).
- Data Collection: Identifying sources and relevance of data, ensuring sufficient instances and relevant variables, and retrieving data from various sources (RDBMS, .txt, Web Services, RSS, tweets, experiments, synthetic data generation, surveys).
- Data Preparation: Making data ready for analytics; performing data analysis to describe, summarize, visualize, pre-process data to improve quality, clean, transform, standardize, and normalize.
- Data Analysis: Applying analytics techniques (supervised/unsupervised learning, graph analytics).
- Report and Deployment: Communicating findings & making conclusions to gain benefit.
-
Data Analytics Tasks and Methods
- Descriptive Analytics: Uncover patterns, correlations, trends describing data.
- Predictive Analytics: Predict the value of an attribute based on values of other attributes, including classification (nominal target attributes) and regression (numeric target attributes).
- Clustering, Outlier Detection, Classification, Regression, Association Analysis, Recommendation, Community Detection, and Centrality.
-
Machine Learning for Data Analytics
- Supervised Learning: Using labeled data to learn to predict target variables.
- Classification, Regression.
- Unsupervised Learning: Using statistical properties of data to cluster/discover patterns without specific labels.
- Clustering, outlier detection, dimensionality reduction, density modeling.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.