Data Science And Visualization VAI301 Syllabus PDF
Document Details
2020
OCR
Tags
Summary
This document is a syllabus for a data science and visualization course, likely for a vocational or professional education program. It outlines the topics covered, including data types, data visualization methods, and data analysis techniques. It also suggests readings on R programming for visualization and exploration.
Full Transcript
DATA SCIENCE AND VISUALIZATION VAI301 SYLLABUS Course Number: VAI301, Course Title: Data Science and VisualizationClass: B.Voc AI & ROBOTICS, Status of the Course: MAJOR, Proposed: 2020-21 Credits: 3, Periods (55mts. each) per week: 3 (L: 3 + T: 0 + P:0) Min. Periods/Sem.: 39 UNIT-1 Introdu...
DATA SCIENCE AND VISUALIZATION VAI301 SYLLABUS Course Number: VAI301, Course Title: Data Science and VisualizationClass: B.Voc AI & ROBOTICS, Status of the Course: MAJOR, Proposed: 2020-21 Credits: 3, Periods (55mts. each) per week: 3 (L: 3 + T: 0 + P:0) Min. Periods/Sem.: 39 UNIT-1 Introduction to data, different types of data, Overview of Data Visualization,Data Analysis, Types of data Analysis, Data Visualization & its Functions, Benefits of data visualization, Applications of data visualization, Data Analysis & its types. Introduction to Data Visualization in R. UNIT-2 Basics of Data Exploration and Visualization with R, Different methods of data cleaning in R, Introduction to ggplot2 library for visualization, Univariate Graphs, Bivariate Graphs, Multivariate graphs. Customizing Graphs: Legends, Labels, Annotations. UNIT-3 Sorting by multiple vectors with different order, Number, Date and time, custom list, row wise. Data Validation and dealing with invalid data, Validating numeric data, formula validation, list and date validation Group and outline data, Data filter, Data consolidation, Data text to column,Custom Views Contd… UNIT-4 Creating Data Model from Tables, Exploring data with power pivot, Exploring data with power view, view chart and view maps. Visualizing geographic data with ggmap. Grouping and summarizing data set in R. Applying piping, Adding rows with mutate function, Adding group &outline criteria to range. UNIT-5 Interactive graphs for data Visualization: Leaflet, plotly, rbokeh chart, rcharts, Highcharter,Histograms , line chart, Waterfall chart, Pivot chart, Bubble Charts Box, time dependent graphs. Creating density plot, violin plot, Creating a facet etc. SUGGESTED READINGS: 1) Eric Pimpler, Data Visualization and Exploration with R, First Edition. 2) Michael Friendly and David Meyer, Discrete Data Analysis with R, First Edition. 3) Thomas Rahlf, Data Visualization with R, First Edition. Introduction Contd…. Contd… LECTURE 1: DATE: 2 JULY 2021 UNIT-1 ? DATA & ITS TYPES ? METHODS OF DATA COLLECTION ? CATEGORIES OF DATA VISUALIZATION ? PROCESS OF DATA VISUALIZATION ? FUNCTIONS OF DATA VISUALIZATION ? BENEFITS OF DATA VISUALIZATION ? DATA ANALYSIS AND ITS PROCESS ? INRODUCTION TO TABLES ? DATA CLEANING WITH TEXT FUNCTIONS,DATE AND TIME VALUES Introduction ? DATA: Data is a collection of facts, such as numbers, words , measurement, observations or description of things. Data is commonly associated with scientific research. Data is collected by various sources like organization, institute, etc. ? For example: Business Data-> Sales data, revenue, profit, stock price etc. Government Organisation Data-> Crime rate, Unemployment rate, literacy rate. Data is measured or collected and analysed using graphs, images and tools. Contd… Contd…. TYPES OF DATA Contd…. ? Qualitative data: This kind of data gives the descriptive information about something. Qualitative data can be observed but cannot be computed. It can be observed but cannot be computed. It is concerned with data that is observed in terms of smell, taste,feel etc. ? Quantitative data: Quantitative data is one which deals with quantity or numbers. Quantitative data can be used computation and statistical analysis. It is concerned with measurement like height, weight, volume, length etc. Types of Quantitative Data ? Discrete Data: Discrete data can take only certain values (like whole no.). Discrete data can be based on counts only a finite no. of possible values. For eg. Rolling of dice. ? Continous Data: Continous data can take any values within the range. Continous data has an infinite number of possible values within a selected range e.g. temperature range, height and weight of person. Example Some other forms of data ? Primary data: Primary data are those data that are collected for the first time and it is in original form. Eg. Data collected through Surveys or experiment. ? Secondary data: Secondary data are those data that are collected by someone else and already been passed through various statistical processes. Eg. Books, Newspaper etc. Methods of Collecting Data ? There are three methods of collecting data: 1. Observation Method i) Structured and unstructured observation ii) Participant and non participant observation iii) Controlled and Uncontrolled observation 2. Interview Method i) Personal Interview ii) Telephonic Interview 3. Questionnaire Method LECTURE 2: 8 July 2021 Methods of Collecting Data There are three methods of collecting data: 1. Observation Method i) Structured and unstructured observation ii) Participant and non participant observation iii) Controlled and Uncontrolled observation 2. Interview Method i) Personal Interview ii) Telephonic Interview 3. Questionnaire Method Contd…. Methods of data Collection Observation Interview Method Questionnaire Method Method Participant Structured & & Controlled & Personal Telephonic Unstructured Non-Partici Uncontrolled Interview Interview pant Contd…. Observation Method: In this information is collected directly by investigator itself through observations. It is an however an expensive and time consuming method. Observation involves collecting information without asking questions. This method is more subjective, as it requires the researcher, or observer, to add their judgment to the data. Structured and Unstructured Observation Structured observation method – This is a systematic observation method where data is collected as per a pre-defined schedule. The specific variable is used in this method for data collection. It is done in order or follow certain pattern or rules. Unstructured observation method – The unstructured observation method is conducted in a free and open manner without using any pre-determined objectives, schedules or variables. This is not systematic and unplanned observation. In this observer monitors all aspects of phenomenon that seems relevant to the problem. Participant and Non- Participant Observation Participant Observation: When the observer is a member of group which he is observing then it is a participant observation. The researcher is not a distant observer anymore because he has joined the participants and become a part of their group. Non-Participant Observation: The researcher watches the subjects of his or her study, with their knowledge, but without taking an active part in the situation. This option is used to understand a phenomenon by entering the community or social system involved, while staying separate from the activities being observed. Controlled and Uncontrolled Observation Controlled Observation:when observation takes place according to definite pre-arranged plans, involving experimental procedure, the same is then termed controlled observation. Generally, controlled observation takes place in various experiments that are carried out in a laboratory or under controlled conditions. Uncontrolled Observation: Uncontrolled observations is not preplanned and systematic. This observation is related to day to day happenings Interview Method This method of collecting data involves oral or verbal communication where interviewer ask the questions to interviewee in order to collect the information. There are two different type of interviews: i) Personal Interview ii) Telephonic Interview Personal Interview and Telephonic Interview Personal interview: This is a kind of face to face interaction between two persons. The interviewer asks the questions to other person. In this there is a predetermined set of questions and analysis of data becomes easier because information is collected in systematic manner. Telephonic Interview: In this method interviewer communicates with the respondent on telephone. Telephonic interview is generally of short duration and focussed on collection of information. Questionnaire Method This method of data collection is quite popular particularly in case of enquiries. A questionnaire consist of sets of questions printed in a definite order on a form. This questionnaire is sent by mail to the concerned person with a request to answer the question. Introduction to Data Visualization It refers to the visual representation of data. A primary goal of data visualization is to communicate information clearly and effectively via statistical graphics, plots, tables charts etc. Categories of Data Visualization Temporal Data visualization belongs to this category if they satisfies the two conditions: i) They are linear. ii) They are one dimensional Example: Scatter plots, line graph Contd… Hierarchical Data visualizations that belong in the hierarchical category are those that order groups within larger groups. Hierarchical visualizations are best suited if you’re looking to display clusters of information, especially if they flow from a single origin point. Example: Tree, Ring chart Contd… Network Datasets connect deeply with other datasets. Network data visualizations show how they relate to one another within a network. Example: matrix chart, node link diagram Contd… Multidimensional Multidimensional data visualizations have multiple dimensions. This means that there are always 2 or more variables in the mix to create a 3D data visualization. Because of the many concurrent layers and datasets, these types of visualizations tend to be the most vibrant or eye-catching visuals. Example: Pie chart, Histogram etc. LECTURE 3: DATA VISUALIZATION 9 JULY 2021 Process of Data Visualization Acquire Parse Filter Mine Represent Refine Interact Contd… Acquire: The acquisition step involves obtaining the data from various sources. It is a very complicated task because data collected from different sources is unstructured and it is not in particular format. Contd… Parse: After acquiring the data it needs to be parsed or changed into a format that is intended for use. Each piece of data needs to be convert into a specific format. Contd…. Filter: This step involves filtering the data in which we remove some information that is not relevant for use. This step generally decreases the size of dataset. Contd… Mine: Data mining is the process of looking at large sets of information in a different way so that new information can be derived from that which already exists. Contd… Represents: In this step we represents the data into some graphical representation like line chart, bar chart, histogram, pie chart etc. Contd… Refine: In this graphic design methods are used to further classify the representation of data by changing the different attributes like color or font style etc. Contd…. Interactive: Interactive visualization is a field of computer science and programming that is focused on graphic visualizations and improving the way we can access and interact with information. Visualizations grant users the ability to explore, manipulate, and interact with data by employing dynamic charts, changing colors and shapes based on queries or interactions. Importance of Data Visualization Helping decision makers understand how the business data is being interpreted to determine business decisions. Leading the target audience to focus on business insights to discover areas that require attention. Handling large amounts of data in a pictorial format to provide a summary of unseen patterns in the data, revealing insights and the story behind the data to establish a business goal. Visualizing business data to manage growth and converting trends into business strategies by making sense of your information. Revealing previously unnoticed key points about the data sources to help decision makers compose data analysis reports. Functions of Data Visualization Record: (Store Information) In this we have to collect information from different sources and store that information in structured manner. Analyze: ( Support reasoning about information) Visual analytics involves the process of building different charts with our data to gives us various perspective. This helps us to identify important points which are relevant for further investigation. Contd…. Communicate: (Convey information to others) This function allows us to communicate information in visual way so that everyone is able to understand information more effectively. LECTURE 4: DATA VISUALIZATION 10 July 2021 Benefits of Data Visualization Constructing ways in absorbing information: Data visualization enables users to receive vast amounts of information regarding operational and business conditions. It allows decision makers to see connections between multi-dimensional data sets and provides new ways to interpret data through the use of heat maps, charts, and other rich graphical representations. Contd…. Visualize relationship and patterns in data: Data Visualization technique helps us to give more insight in our data. This enables the users to identify the hidden relationships and patterns within the dataset. Visualization allows business users to recognize relationships between the data, providing greater meaning to it. Exploring these patterns helps users focus on specific areas that require attention in the data, so that they can identify the significance of those areas to drive their business forward. Contd… Acting on emerging trends faster: The volume of data that companies are able to gather about customers and market conditions can provide business leaders with insights into new revenue and business opportunities. Using data visualization will enable decision makers to grasp shifts in customer behavior and market conditions, across multiple data sets much more efficiently. Why we visualize data Visualization are processed faster by brain: Our brain have the ability to process visual information faster than the text information. According to the survey 70% of our sensory receptor are in our eyes and that we can usually get the sense of visual scene in 1/10 of a second. Contd…. Visual information committed to long term memory easier than text: Visual information is powerful when it comes to our ability to remember information. We are able to start storing visual memories at very young age and can remember and recall some of them throughout their entire life. Contd… Visualization can help to simplify complex information: When the information is present in complex form then it is not easy to interpret the information but with the help of visualization we can analyze the complex information into simplest form. From the visualization, it becomes immediately obvious that sales of socks remain constant, with small spikes in December and June. On the other hand, sales of jackets are more seasonal, and reach their low point in July. They then rise and peak in December before decreasing monthly until right before fall. Introduction to data analysis Data analysis is defined as a process of cleaning, transforming, and modeling data to discover useful information for business decision-making. The purpose of Data Analysis is to extract useful information from data and taking the decision based upon the data analysis. Various data analysis techniques are available to understand ,interpret and derieve conclusion. It is used to examine the data in graphical format. Data analysis Steps Lecture 5: Data Visualization 15 July 2021 What is the data analysis process? Define why you need data analysis. Begin collecting data from sources. Clean through unnecessary data. Begin analyzing the data. Interpret the results and apply them. Steps in Data Analysis Process Data Requirements and Specifications The data required for analysis is based on a question or an experiment. Based on the requirements of those directing the analysis, the data necessary as inputs to the analysis is identified (e.g., Population of people). Specific variables regarding a population (e.g., Age and Income) may be specified and obtained. Data may be numerical or categorical. Data Collection Data Collection is the process of gathering information on targeted variables identified as data requirements. Data is collected from various sources ranging from organizational databases to the information in web pages. The data thus obtained, may not be structured and may contain irrelevant information. Hence, the collected data is required to be subjected to Data Processing and Data Cleaning. Data Processing The data that is collected must be processed or organized for analysis. This includes structuring the data as required for the relevant Analysis Tools. For example, the data might have to be placed into rows and columns in a table within a Spreadsheet or Statistical Application. A Data Model might have to be created Data Cleaning The processed and organized data may be incomplete, contain duplicates, or contain errors. Data Cleaning is the process of preventing and correcting these errors. There are several types of Data Cleaning that depend on the type of data. For example, while cleaning the financial data, certain totals might be compared against reliable published numbers or defined thresholds. Likewise, quantitative data methods can be used for outlier detection that would be subsequently excluded in analysis. Data analysis Data that is processed, organized and cleaned would be ready for the analysis. Various data analysis techniques are available to understand, interpret, and derive conclusions based on the requirements. Data Visualization may also be used to examine the data in graphical format, to obtain additional insight regarding the messages within the data. Communication The results of the data analysis are to be reported in a format as required by the users to support their decisions and further action. The feedback from the users might result in additional analysis. The data analysts can choose data visualization techniques, such as tables and charts, which help in communicating the message clearly and efficiently to the users. The analysis tools provide facility to highlight the required information with color codes and formatting in tables and charts Importance of data analysis in different fields Real-time Analytics This is set to be one of the most disruptive business forces today, so much so that 91% of data scientists are interested to work with real time data. Immediate insights are naturally bound to be more useful than analyzing a pool of collected data. Improved Marketing Efficiency What makes analytics even hotter for you to learn about? The fact that it is also being extensively used to derive what the customers are actually interested in buying, especially in the e-commerce space. If you’re a decision maker, such insights will help you in defining market blueprints for the company. Contd…. Healthcare Ingression With advent of new apps enabling users to keep a track of calorie count, work as pedometers, and even measure heartbeats, a mind-boggling amount of data is being created and stored. And soon you would be able to share such data and actionable insights with your doctor who will use it as a part of his diagnostic inputs. Cyber Security Big data comes with threats, thus making it mandatory for all to leverage data analytics in order to mitigate cyber security risks. Employees, especially in research and development domains, are expected to know all about the technological aspects of handling cyber-attacks. For example, IBM has quickly capitalized on the potential of analytics and cyber security to introduce hundreds of security products. Types of Data Analysis There are basically five types of data analysis: Data Mining Business Intelligence Statistical Analysis Predictive Analytics Text Analytics Lecture 6: Data Visualization 16 July 2021 Types of Data Analysis There are basically five types of data analysis: Data Mining Business Intelligence Statistical Analysis Predictive Analytics Text Analytics Data Mining Data mining can be defined as the process of extracting data, analyzing it from many dimensions or perspectives, then producing a summary of the information in a useful form that identifies relationships within the data. There are two types of data mining: Descriptive data mining: It gives information about existing data; Predictive data mining: It makes forecasts based on the data Contd…. Examples of data mining applications Marketing. Data mining is used to explore increasingly large databases and to improve market segmentation. By analysing the relationships between parameters such as customer age, gender, tastes, etc., Banking. Banks use data mining to better understand market risks. It is commonly applied to credit ratings and to intelligent anti-fraud systems to analyse transactions, card transactions, purchasing patterns and customer financial data. Medicine. Data mining enables more accurate diagnostics. Having all of the patient's information, such as medical records, physical examinations, and treatment patterns, allows more effective treatments to be prescribed. Statistical Analysis Statistical Analysis includes collection, Analysis, interpretation, presentation, and modeling of data. It analyses a set of data or a sample of data. There are two categories of this type of Analysis - Descriptive Analysis and Inferential Analysis. Descriptive Analysis It analyze complete data or a sample of summarized numerical data. It shows mean and deviation for continuous data whereas percentage and frequency for categorical data. consider a simple example in which you must determine how well the student performed throughout the semester by calculating the average. This average is nothing but the sum of the score in all the subjects in the semester by the total number of subjects. This single number is describing the general performance of the student across a potentially wide range of subject experiences. Contd… Inferential Analysis It analyze sample from complete data. In this type of Analysis, we can find different conclusions from the same data by selecting different samples. Inferential statistics takes data from a sample and makes inferences about the larger population from which the sample was drawn. Because the goal of inferential statistics is to draw conclusions from a sample and generalize them to a population, we need to have confidence that our sample accurately reflects the population. This requirement affects our process. Contd… Contd… Business Intelligence Business Intelligence (BI) refers to technologies, applications and practices for the collection, integration, analysis, and presentation of business information. The purpose of Business Intelligence is to support better business decision making. Business intelligence (BI) combines business analytics, data mining, data visualization, data tools and infrastructure, and best practices to help organizations to make more data-driven decisions. Predictive Analytics Predictive analytics is the use of data, statistical algorithms and machine learning techniques to identify the likelihood of future outcomes based on historical data. The goal is to go beyond knowing what has happened to providing a best assessment of what will happen in the future. Contd… Text Analytics Text analytics is the automated process of translating large volumes of unstructured text into quantitative data to uncover insights, trends, and patterns. Combined with data visualization tools, this technique enables companies to understand the story behind the numbers and make better decisions. Text analysis and text analytics often work together to provide a complete understanding of all kinds of text, like emails, social media posts, surveys, customer support tickets etc. Contd…. Lecture 7: Data Visualization 17-july-2021 R language introduction R is a programming language and software environment for statistical analysis, graphics representation and reporting. R was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand, and is currently developed by the R Development Core Team. Features of R As stated earlier, R is a programming language and software environment for statistical analysis, graphics representation and reporting. The following are the important features of R − R is a well-developed, simple and effective programming language which includes conditionals, loops, user defined recursive functions and input and output facilities. R has an effective data handling and storage facility, R provides a suite of operators for calculations on arrays, lists, vectors and matrices. R provides a large, coherent and integrated collection of tools for data analysis. R provides graphical facilities for data analysis and display either directly at the computer or printing at the papers. Basic syntax of R myString print ( myString) # Output "Hello, World!" 2+2 # Output 4 Data types in R Data Type Example Verify Logical TRUE, FALSE v