Introduction to Big Data.pptx
Document Details

Uploaded by EarnestGreenTourmaline7771
Full Transcript
1 Lakehead University Introduction to Big Data by Abed Alkhateeb 2023 182/ Scale of Data 183/ How to store data? 184/ Data Importance 345/ Where data comes from? 186/ Where data comes from? Health Data 187/ Where data comes from? 188/ What to do with it? Cybersecurity 189/ 1810/...
1 Lakehead University Introduction to Big Data by Abed Alkhateeb 2023 182/ Scale of Data 183/ How to store data? 184/ Data Importance 345/ Where data comes from? 186/ Where data comes from? Health Data 187/ Where data comes from? 188/ What to do with it? Cybersecurity 189/ 1810/ Programing languages paradigms Functional Paradigm Logical Paradigm Imperative Paradigm Object Oriented Paradigm MapReduce Paradigm You are !here Big Data 3411/ What makes data big data? If the data is too large to handle within your system, the data generation is too fast, and the data is in many different formats or types. Big Data Characteristics 5/34 Big Data three V’s (dimensions) which are High Volume, High Velocity, and High Variety Other Vs? 13 Veracity is a big data characteristic related to consistency, accuracy, quality, and trustworthiness it refers to : Biasedness Noise Abnormality in data Incomplete data or the presence of errors, outliers, and missing values Other Vs? 14 Home work!! List and define other Vs!! What is a Big Data Platform? 15 Solution that packages all the capabilities to deal with the Vs. to find a solution or create an application. It generally consists of servers File system Storage Databases Management utilities Data visualization Business intelligence 16 Objective of Big Data (SAPS) Scalability Availability Performance Security 17 Objective of Big Data (SAPS) Scalability The ability to handle growing amount of data Availability The ability to guarantee reliable access to data Performance Quick access and improve the overall productivity Security Guarding data and keep data confidential and private Challenges For Big Data 3418/ Storage / Processing Read/Write to disk is slow Use multiple disks for parallel read Hardware failure Keep multiple copies of data How to merge data from different reads Distributed processing or Hadoop Apache Hadoop Ecosystem 7/34 Hadoop is an open-source software framework written in Java, 2005 designed to answer the question: “How to process big data with reasonable cost and time” Distributed storage / Computer clusters Distributed processing of very large data sets Two main components HDFS (Hadoop Distributed File System) Provides Distributed Storage MapReduce (Distributed Data Processing Model) References 20 Fan, J., Han, F., & Liu, H. (2014). Challenges of big data analysis. National science .review, 1(2), 293-314 Chandan Gaur, Big Data Platform, https:// .www.xenonstack.com/blog/big-data-platform, Last accessed Sep 6, 2023 Ghasemaghaei, M. (2021). Understanding the impact of big data on firm performance: The necessity of conceptually differentiating among big data .characteristics. International Journal of Information Management, 57, 102055 21 QA