Full Transcript

1 Lakehead University Introduction to Big Data by Abed Alkhateeb 2023 182/ Scale of Data 183/ How to store data? 184/ Data Importance 345/ Where data comes from? 186/ Where data comes from? Health Data 187/ Where data comes from? 188/ What to do with it? Cybersecurity 189/ 1810/...

1 Lakehead University Introduction to Big Data by Abed Alkhateeb 2023 182/ Scale of Data 183/ How to store data? 184/ Data Importance 345/ Where data comes from? 186/ Where data comes from? Health Data 187/ Where data comes from? 188/ What to do with it? Cybersecurity 189/ 1810/ Programing languages paradigms Functional Paradigm Logical Paradigm Imperative Paradigm Object Oriented Paradigm MapReduce Paradigm You are !here Big Data 3411/  What makes data big data?  If the data is too large to handle within your system, the data generation is too fast, and the data is in many different formats or types. Big Data Characteristics 5/34 Big Data three V’s (dimensions) which are High Volume, High Velocity, and High Variety Other Vs? 13  Veracity is a big data characteristic related to consistency, accuracy, quality, and trustworthiness it refers to :  Biasedness  Noise  Abnormality in data  Incomplete data or the presence of errors, outliers, and missing values Other Vs? 14   Home work!! List and define other Vs!! What is a Big Data Platform? 15   Solution that packages all the capabilities to deal with the Vs. to find a solution or create an application. It generally consists of  servers  File system  Storage  Databases    Management utilities Data visualization Business intelligence 16 Objective of Big Data (SAPS)  Scalability  Availability  Performance  Security 17 Objective of Big Data (SAPS) Scalability The ability to handle growing amount of data  Availability The ability to guarantee reliable access to data  Performance Quick access and improve the overall productivity  Security Guarding data and keep data confidential and private  Challenges For Big Data 3418/  Storage / Processing  Read/Write to disk is slow  Use multiple disks for parallel read  Hardware failure  Keep multiple copies of data  How to merge data from different reads  Distributed processing or Hadoop Apache Hadoop Ecosystem 7/34  Hadoop is an open-source software framework written in Java, 2005 designed to answer the question: “How to process big data with reasonable cost and time”    Distributed storage / Computer clusters Distributed processing of very large data sets Two main components  HDFS (Hadoop Distributed File System)  Provides Distributed Storage  MapReduce (Distributed Data Processing Model) References 20 Fan, J., Han, F., & Liu, H. (2014). Challenges of big data analysis. National science .review, 1(2), 293-314 Chandan Gaur, Big Data Platform, https:// .www.xenonstack.com/blog/big-data-platform, Last accessed Sep 6, 2023 Ghasemaghaei, M. (2021). Understanding the impact of big data on firm performance: The necessity of conceptually differentiating among big data .characteristics. International Journal of Information Management, 57, 102055 21 QA

Use Quizgecko on...
Browser
Browser