Big Data Analytics Lecture Notes PDF
Document Details
Uploaded by ForemostLimeTree
Beni-Suef University
Dr.Mohamed Moustafa
Tags
Summary
These lecture notes cover Big Data Analytics, providing an introductory overview and discussion around topics like class rules, course assessment, different types of data analysis, and the various components and characteristics that make up Big Data. Includes notes on various aspects surrounding the characteristics and implications of big data, as well as related technologies and terminologies. Overall, relevant reading material for students of Data Science and students of Computer Science or related fields.
Full Transcript
10/14/2022 Big Data Analytics Dr.Mohamed Moustafa Associate Professor, Faculty of Computers and AI, Beni-Suef University MIS Consultant, ICTP, Ministr...
10/14/2022 Big Data Analytics Dr.Mohamed Moustafa Associate Professor, Faculty of Computers and AI, Beni-Suef University MIS Consultant, ICTP, Ministry of Higher Education 1 Class Rules You can do anything except: Make noises (chatting, singing…) Feel free to interrupt me if you have questions. According to the university policy,taking attendance is needed. Important: you are required to have an 80% attendance to be able to seat for the final exam. 2 2 1 10/14/2022 Course Assessment Temporary according to the situation: Final exam:50% Assignment:20%,individually Project:30%,2-3 members per group,report and presentation are required. Important:cheating and plagiarism will get no marks. 3 A few suggestions…. Your final grade is based on points – not on an accumulation of grades. You start the class with zero points and earn your way to your final grade If you have an issue or problem, communicate – send me an email If you know you’re not going to meet the deadline for a quiz or assignment – email me BEFORE the deadline 4 4 2 10/14/2022 What is Big Data? 5 Data Deluge 6 3 10/14/2022 Consequences of the Data Deluge Every problem generates data eventually. Every company needs analytics eventually. Everyone needs analytics eventually. 7 Big Data Big data is what happened when the cost of storing information became less than the cost of making the decision to throw it away 8 4 10/14/2022 Big Data: What Is It? The point at which the volume, velocity, and variety of data exceed an organization’s storage or computation capacity for accurate and timely decision making 9 Factors associated with big data: data volume data velocity data variety data variability data complexity 10 5 10/14/2022 Data Volume Data volumes are increasing due to use of the following: social media (Facebook, Twitter, Instagram) machines talking to machines improvements in the manufacturing process (quality control) automated tracking devices streaming data feeds 11 Data Velocity business processes that are more automated mergers and acquisitions more use of social media more use of self-service applications integration of business applications 12 6 10/14/2022 Data Variety structured data unstructured data business applications unstructured text documents (articles, blogs, and so on) emails digital images video and audio clips streaming data stock ticker data RFID tag data sensor data 13 Data Variability The flow of data changes over time (seasonality, peak response, social media trends, and so on). Data values change over time. How much history do you keep? Data values are different across data sources. Data is stored in different formats. Data standards change across time. What was “valid” five years ago might not be “valid” today. 14 7 10/14/2022 Data Complexity Data comes from a variety of systems in a variety of formats. This can make it difficult to merge, cleanse, and transform data in a uniform manner. 15 What Is Analytics The importance of big data doesn’t revolve around how much data you have,but what you do with it. Analytics is the scientific process of transforming data into insight for making better decisions,offering new opportunities for a competitive advantage. 16 16 8 10/14/2022 Levels of Analytics 17 Data Science: domain experience advanced support the analytics end-to-end analysis of large and diverse data sets value communication software engineering to stakeholders as actionable results 18 9 10/14/2022 Analytic Methods what to do by providing information about optimal decisions based on the predicted future scenarios the use of data, statistical algorithms, Prescriptive model and machine learning techniques to identify the likelihood of future outcomes based on historical data Predictive model helps you understand what happened, or diagnostic models that types help you understand key relationships classification -> predict class membership and determine why something regression -> predict a number happened Descriptive model techniques decision trees | linear/logistic regression neural networks gradient boosting | random forests support vector machines 19 Glossary of Terms Statistics Data Mining Machine Learning Data Analysis Predictive Analysis Artificial Intelligence Prescriptive Analysis Natural Language Computer Vision Processing Optimization Deep Learning 20 10 10/14/2022 Glossary of Terms in data, understand what is numeric study of relevant, assess outcomes, trains a machine how to learn with data relationships accelerate informed decisions minimal human intervention Statistics Data Mining Machine Learning find meaningful patterns and knowledge in data machines learn from experience Data Analysis adjust to new inputs and perform human-like tasks providing information Predictive Analysis Artificial Intelligence about optimal decisions identify the likelihood of future based on the predicted outcomes based on historical data future scenarios analyzes/interprets a picture or video Prescriptive Analysis Natural Language Computer Vision Processing enables understanding, Optimization interaction, and communication Deep Learning delivers the best results between humans and machines trains a machine to perform given resource constraints human-like tasks 21 Reasons for the Big Data Explosion Increasing “data velocity” due to the following: streaming data feeds point-of-sale (POS) transactional systems radio-frequency identification (RFID) tags smart metering bigger and cheaper data storage capabilities social media improved and automated business processes mergers and acquisitions, leading to the merge of multiple data sources more online self-service applications being used 22 11 10/14/2022 Factors Driving Demand for Big Data Solutions In addition to rapidly increasing data growth rates, consider these factors: availability of data from social media sources demand for mobile business intelligence increasing requirements around real-time reporting desire to mine data from social media sources (sentiment analysis) 23 Data Science Data Systems Business Machine Intelligence Learning Data Scientist Data Data Science Team Science Deep in one or two All areas covered in areas depth BusinessA Business Math nalytics Acumen or Statistics 24 12 10/14/2022 Big Data Tools 25 25 SAS No.1 market leader in analytics. The largest independent vendor in the business intelligence market. The industry standard for Clinical DataAnalysis. Integrated platform for end to end solutions. SAS provides an integrated set of software products and services and integrated technologies for information management, advanced analytics and reporting. Business solutions across domains and industries. Unmatched domain specific industry focused analytics solutions. 26 26 13 10/14/2022 R R is a language and environment for statistical computing and graphics. R provides a wide variety of statistical and graphical techniques, and is highly extensible. 27 27 Hadoop Hadoop is the most popular big data ecosystem. Hadoop is highly scalable, that is designed to accommodate computation ranging from a single server to a cluster of thousands of machines. 28 28 14 10/14/2022 Python Python is an interpreted,high-level,general-purpose programming language. One of the most popular programming language in recent years. Ten areas that uses Python most frequently: Web Development Web Scraping Applications Game Development Business Applications Machine Learning andArtificial Intelligence Audio and Video Data Science and DataVisualization Applications Desktop GUI CAD Applications Embedded Applications 29 29 Tableau Tableau is a data visualization tool that is widely used for business intelligence. Create interactive graphs and charts in the form of dashboards and worksheets to gain business insights. 30 30 15