DSC650: Data Technology and Future Emergence Lecture 1
Document Details
Uploaded by ProfoundAutomatism9589
Dr Khairul Anwar Sedek
Tags
Summary
This lecture provides an overview of data technology, including its evolution, introduction to big data, and various aspects of data analysis. Key characteristics of big data, such as volume, velocity, and variety, are explored. Different types of data (structured, semi-structured, unstructured) are also discussed.
Full Transcript
DSC650: DATA TECHNOLOGY AND FUTURE EMERGENCE LECTURE 1: OVERVIEW OF DATA TECHNOLOGY LECTURER: DR KHAIRUL ANWAR SEDEK 1.1 : Overview of Data Technology Overview of Data Technology Evolution Introduction of Big Data Big Data Ecosystem Foundation of Big Data Technology Ca...
DSC650: DATA TECHNOLOGY AND FUTURE EMERGENCE LECTURE 1: OVERVIEW OF DATA TECHNOLOGY LECTURER: DR KHAIRUL ANWAR SEDEK 1.1 : Overview of Data Technology Overview of Data Technology Evolution Introduction of Big Data Big Data Ecosystem Foundation of Big Data Technology Career Related At the end of the lecture, students should be able to; CLO1: Demonstrate an understanding on the basic concepts and practices of big data technology DATA TECHNOLOGY Data technology (DataTech) is the technology connected to areas such as martech or adtech. Data technology sector includes solutions for data management, and products or services that are based on data generated by both human and machines Data technology has been used to manage big data sets, build solutions for data management and integrate data from various sources to discover new business or analytical insights from collected information DATA TECHNOLOGY EVOLUTION BIG DATA – AN INTRODUCTION Big Data is a field dedicated to the analysis, processing, and storage of large collections of data that frequently originate from disparate sources. Combining of multiple unrelated datasets, processing of large amounts of unstructured data and harvesting of hidden information in a time-sensitive manner. BIG DATA – CHARACTERISTICS (5 V) Volume Velocity Variety Veracity Value huge amount of refers to the high refers to nature of refers to Needs to be data speed of data that is inconsistencies and converted into If the volume of accumulation of structured, semi- uncertainty in data something valuable data is very large data structured and Big Data is also to extract then it is actually massive and unstructured data variable because of Information considered as a ‘Big continuous flow of heterogeneous the multitude of Data’. data sources data dimensions resulting from multiple disparate data types and sources The Five Vs of Big Data https://www.geeksforgeeks.org/5-vs-of-big-data/ 1.2 BIG DATA – AN INTRODUCTION The results obtained through the processing of Big Data can lead to a wide range of insights and benefits: Operational optimization Actionable intelligence Identification of new markets Accurate predictions Fault and fraud detection More detailed records Improved decision-making Scientific discoveries Collections or groups of related data. Each group or dataset member (datum) shares the same set of attributes or properties as others in the same dataset. BIG DATA Some examples of datasets are: TERMINOLOGY Tweets stored in a flat file - DATASETS A collection of image files in a directory An extract of rows from a database table stored in a CSV formatted file Historical weather observations that are stored as XML files Process of examining data to find facts, relationships, patterns, insights and/or trends. The overall goal of data analysis is to support better BIG DATA decision making. Example; analysis of ice-cream sales data in order to TERMINOLOGY: determine how the number of ice-cream cones sold is related to the daily temperature. DATA The results of such an analysis would support decisions ANALYSIS related to how much ice-cream a store should order in relation to weather forecast information. Carrying out data analysis helps establish patterns and relationships among the data being analyzed. Broader term for data analysis. BIG DATA TERMINOLOGY: DATA Data analytics is a discipline that includes the ANALYTICS management of the complete data lifecycle, which encompasses collecting, cleansing, organizing, storing, analyzing and governing data. FOUR GENERAL CATEGORIES OF ANALYTICS Descriptive Diagnostic Predictive Prescriptive analytics analytics analytics analytics Value and complexity increase from descriptive to prescriptive analytics DATA ANALYTICS: DESCRIPTIVE ANALYTICS Carried out to answer questions about events that have already occurred. This form of analytics contextualizes data to generate information. Sample questions can include: What was the sales volume over the past 12 months? What is the number of support calls received as categorized by severity and geographic location? What is the monthly commission earned by each sales agent? DESCRIPTIVE ANALYTICS TOOLS The operational systems, (pictured left), are queried via descriptive analytics tools to generate reports or dashboards, (pictured right) To determine the cause of a phenomenon that occurred in the past using questions that focus on the reason behind the event. The goal of this type of analytics is to determine what DATA information is related to the phenomenon - to enable answering questions that seek to determine why something ANALYTICS: has occurred. DIAGNOSTIC Such questions include: ANALYTICS Why were Q2 sales less than Q1 sales? Why have there been more support calls originating from the Eastern region than from the Western region? Why was there an increase in patient re-admission rates over the past three months? Diagnostic analytics can result in data that is suitable for performing drill down and roll-up analysis Carried out in an attempt to determine the outcome of an event that might occur in the future. Information is enhanced with meaning to generate knowledge that conveys how that information is related. DATA The models used for predictive analytics have implicit dependencies on the conditions under which the past events ANALYTICS: occurred. If these underlying conditions change, then the models that PREDICTIVE make predictions need to be updated. ANALYTICS What are the chances that a customer will default on a loan if they have missed a monthly payment? What will be the patient survival rate if Drug B is administered instead of Drug A? If a customer has purchased Products A and B, what are the chances that they will also purchase Product C? Predictive analytics tools can provide user-friendly front-end interfaces Build upon the results of predictive analytics by prescribing actions that should be taken. The focus is not only on which prescribed option is best DATA to follow, but why. ANALYTICS: Prescriptive analytics provide results that can be reasoned about because they embed elements of situational PRESCRIPTIVE understanding. ANALYTICS Can be used to gain an advantage or mitigate a risk. Among three drugs, which one provides the best results? When is the best time to trade a particular stock? Prescriptive analytics involves the use of business rules and internal (current and historical sales data, customer information, product data and business rules) and/or external data (social media data, weather forecasts and government produced demographic data) TYPES OF DATA human-generated data include social media, blog posts, emails, photo sharing machine-generated data include web logs, and messaging sensor data, telemetry data, smart meter data and appliance usage data Conforms to a data model or schema and is often stored in tabular form. It is used to capture relationships between different entities and is therefore most often stored in a TYPES OF DATA: relational database. STRUCTURED Frequently generated by enterprise applications and DATA information systems like ERP and CRM systems. Examples of this type of data include banking transactions, invoices, and customer records. Data that does not conform to a data model or data schema. It is estimated that unstructured data makes up 80% of the data within any given enterprise. Unstructured data has a faster growth rate than structured data. TYPES OF DATA: This form of data is either textual or binary and often UNSTRUCTURED conveyed via files that are self-contained and non-relational. DATA A text file may contain the contents of various tweets or blog postings. Binary files are often media files that contain image, audio or video data. The notion of being unstructured is in relation to the format of the data contained in the file itself. has a defined level of structure and consistency, but is not relational in nature. Instead, semi-structured data is hierarchical or graph- based. TYPES OF DATA: This kind of data is commonly stored in files that contain text. SEMI- XML and JSON files are common forms of semi- STRUCTURED structured data. Due to the textual nature of this data and its conformance DATA to some level of structure, it is more easily processed than unstructured data. Examples of common sources of semi-structured data include electronic data interchange (EDI) files, spreadsheets, RSS feeds and sensor data. EXAMPLE UNSTRUCTURED, SEMI-STRUCTURED, AND STRUCTURED DATA 1.3 BIG DATA ECOSYSTEM 1.4 BIG DATA ARCHITECTURE – TECHNOLOGY FOUNDATION 1.6 BIG DATA CAREER PATH Source: https://www.whizlabs.com/blog/best-big-data-careers/ BIG DATA USAGES Source: https://www.edureka.co/blog/10-reasons-why-big-data-analytics-is-the- best-career-move/