Big Data Analysis Lecture Notes PDF

Document Details

LuxuryIntellect7556

Uploaded by LuxuryIntellect7556

Helwan University

Dr. Mona Abbass

Tags

big data analysis big data data analysis technology

Summary

These notes provide an overview of big data analysis, covering topics including course description, content, characteristics, and challenges. The document details the various aspects of big data and its analytical processes, including definitions, examples, and potential hurdles in an educational context.

Full Transcript

Big Data Analysis Dr. Mona Abbass Content ❑ Course Description ❑ Introduction to Big Data ▪ What is a big data? ▪ Characteristics of Big Data ▪ Big data Challenges Course Description ❑The aim of course: ▪The aim of this course is to provide the students with theoretical and practical skills...

Big Data Analysis Dr. Mona Abbass Content ❑ Course Description ❑ Introduction to Big Data ▪ What is a big data? ▪ Characteristics of Big Data ▪ Big data Challenges Course Description ❑The aim of course: ▪The aim of this course is to provide the students with theoretical and practical skills related to big data analysis. Course Content 1. Basic concept in big data 2. Cloud computing 3. Introduction to big data analytics 4. Introduction to Hadoop technology 5. Mapreduce 6. Revision 7. Final exam Course Description Grading (100%): ❑ Final exam 70 ❑ Mid-term exam 10 ❑ Practice exam 10 ❑ course work 10 Timing: ❑ Lecture 3 ❑ Practice 3 What is a big data? ❑The notion of Big data comes before the advances in databases technologies and from the need for solutions to handle the huge deluge of datasets and, therefore, the lack of sufficient storage capacity. ❑The notion of Big data has evolved through the past decades where each decade is described in terms of computer disc space, from Megabyte (MB) in 1970s to Exabyte (EB) which was introduced in 2011. What is a big data? What is a big data? ❑Big data is a term for a collection of data sets, so large and complex that it becomes often difficult to process using traditional data processing applications. ❑Big data is Large amounts of different types of data produced from various types of sources, such as ▪ People, ▪ Machines or ▪ Sensors. What is a big data? The Big Data Framework organization attempts to categories the development of Big data to three main phases; ❑Phase 1.0: Big data was mainly described by the data storage and analytics, and it was an extension to the modern database management systems and data warehousing technologies; ❑Phase 2.0: With the uprising of Web 2.0, and the propagation of semi-structured and unstructured content, the notion of Big data has evolved to embody advanced technical solutions to extract meaningful information from dissimilar and heterogeneous data formats; What is a big data? The Big Data Framework organization attempts to categories the development of Big data to three main phases; ❑Phase 3.0: with the emergence of smartphones and mobile devices, sensor data, Internet of Things (IoT), to many more data generators, Big Data has entered a new era and has drawn a new horizon with a new range of opportunities. What is a big data? Characteristics of Big Data ❑ Big data has been amply characterized by the well-known 3Vs (Volume, Velocity and Variety) ❑ The following is the 10Vs of Big data. ▪ Volume ▪ Velocity ▪ Variety ▪ Veracity ▪ Variability ▪ Validity ▪ Vulnerability ▪ Volatility ▪ Value Volume ❑ Refers to the vast increase in the data growth. ❑ In fact, more than 2.5 quintillion (𝟏𝟎𝟏𝟖 ) bytes are created daily since even as earlier as 2013 from every post, share, search, click, stream, and many more data producers. Velocity ❑ Represents the accumulation of data in high speed, near real-time and real-time from dissimilar data sources. Variety (Format) ❑ Involves collecting data from various resources and in fuzzy and heterogeneous types. ❑ This includes importing data in dissimilar formats, namely ▪ Structured (tables reside in relational databases – RDBMS, etc.), ▪ Semi-structured (email, XML, and other markup languages, etc.) and ▪ Unstructured (text, pictures, audio files, video, sensor data, etc.). Veracity ❑ Refers to the accuracy, and correctness of data. ❑ There are multiple factors to ensure the veracity of Big data: ▪ Trustworthiness of data origin; ▪ Reliability and security of data store; ▪ Data availability ▪ Correctness and ▪ Consistency. Variability ❑ Refers to variance in meaning, number of inconsistences, multitude of data dimensions, and inconsistent data receiving speeds. Validity ❑ Refers to the “data are shown (or known) to be an accurate indicator of the claim being made”. ❑ It differs from the veracity in that the validity does “mean the correctness and accuracy of data with regard to the intended usage. ❑ In other word, data can be trustworthy, thus satisfy the veracity aspect. Yet, poor interpretation to the data might lead to unintended use. Moreover, the same veracious data can be valid to be used in one application and invalid for a different one. Vulnerability ❑ Refers to the security of the collected datasets that will be used for later analysis. ❑ It also denotes the flaws in the system which permits malicious activities to be conducted on the collected datasets. Volatility ❑ Refers to time up which data is valid to be stored/used before it be- comes obsolete or no longer relevant. ❑ It is crucial dimension since cost of storage and maintenance augments with longer Big data retention. Visualization ❑ Refers to the ability to present Big data into a visual context, such as diagrams, graphs, maps, etc. toward better understanding and interpreting of data. ❑ It also assists people and organizations to discover patterns, correlations, trends, relationships and dependencies. ❑ Big data visualization is a powerful tool for decision makers to access, evaluate and interpret massive data in even real time and act upon it. Value ❑ Represents the outcome product of Big data analysis (i.e. new insights). Big data V-features Big data Challenges ❑Storing and processing issue ❑Privacy and Security ❑Data access and sharing ❑Analytical challenges ❑Skills requirements ❑Technical Issues Storing and processing issue ❑The rate of increase in data is much faster than the existing processing systems. ❑The current storage systems are not capable enough to store these data. ❑There is a need to develop a processing system that not only caters to today's needs but also future needs. Privacy and Security ❑New devices and technologies like cloud computing provide a gateway to access and to store information for analysis. ❑This integration of IT architectures will pose greater risks to data security and intellectual property. Data access and sharing ❑Generally data is used for making accurate decisions. ❑The data should be available in accurate complete and timely manner. Analytical challenges ❑Traditional RDBMS are suitable only for structured data. ❑What if data volume gets so large that we do not know how to deal with it? ❑ Does all data need to be store? ❑Does all data need to be analyzed? ❑Which data points are important? ❑How can data be used for best advantages? Skills requirements ❑With the increase in amount of (structured and unstructured) data generated, there is a need for talent. ❑The demand for people with good analytical skills in big data is increasing. Technical Issues ❑Fault Tolerance ❑Scalability ❑Quality of Data ❑Heterogeneous Data Fault Tolerance ❑A system's ability to continue operating uninterrupted despite the failure of one or more of its components. ❑Fault-tolerant systems use backup components that automatically take the place of failed components, ensuring no loss of service. Scalability The property of a system to handle a growing amount of work by adding resources to the system. Vertical Scalability (Scale-up) – In this type of scalability, we increase the power of existing resources in the working environment in an upward direction Scalability Horizontal Scalability (Scale-Out) – In this kind of scaling, the resources are added in a horizontal row. Quality of Data Heterogeneous Data Questions 1. What are the types (format) of Big data? 2. Mention some of the Big data features? 3. Mention some of the Big data challenges? Thanks Dr. Mona Abbass E-mail [email protected]

Use Quizgecko on...
Browser
Browser