UNIT 1 PDF
Document Details
Uploaded by Deleted User
SCA
Er. Ranjit Kaur Walia
Tags
Summary
This document provides a foundational introduction to data analysis, covering different types of data (nominal, ordinal, interval, ratio), data analysis processes and related concepts. It is aimed at an undergraduate level.
Full Transcript
UNIT 1 Points of Discussion Data and different types of Data Understanding Data Data can be texts or numbers written on papers, or it can be bytes and bits inside the memory of electronic devices, or it could be facts that are stored inside a person’s mind. Data is fa...
UNIT 1 Points of Discussion Data and different types of Data Understanding Data Data can be texts or numbers written on papers, or it can be bytes and bits inside the memory of electronic devices, or it could be facts that are stored inside a person’s mind. Data is factual information used as a basis for reasoning, discussion , calculation and decision making. Er. Ranjit Kaur Walia, Asst Prof., SCA, LPU Er. Ranjit Kaur Walia, Asst Prof., SCA, LPU Types of Data Er. Ranjit Kaur Walia, Asst Prof., SCA, LPU Er. Ranjit Kaur Walia, Asst Prof., SCA, LPU Categories of Data Er. Ranjit Kaur Walia, Asst Prof., SCA, LPU Structured Vs Unstructured Data Er. Ranjit Kaur Walia, Asst Prof., SCA, LPU Er. Ranjit Kaur Walia, Asst Prof., SCA, LPU NOTE:- Er. Ranjit Kaur Walia, Asst Prof., SCA, LPU Er. Ranjit Kaur Walia, Asst Prof., SCA, LPU Er. Ranjit Kaur Walia, Asst Prof., SCA, LPU Er. Ranjit Kaur Walia, Asst Prof., SCA, LPU Er. Ranjit Kaur Walia, Asst Prof., SCA, LPU Er. Ranjit Kaur Walia, Asst Prof., SCA, LPU Q1: Which of the following is an example of nominal data? A. Temperature in degrees Celsius B. Rankings in a race C. Types of fruit (apple, banana, cherry) D. Number of students in a class C. Types of fruit (apple, banana, cherry) Q2: What characteristic defines nominal data? A. Data that can be ordered or ranked B. Data measured on a fixed scale C. Data that represent categories with no intrinsic order D. Data that has a true zero point C. Data that represent categories with no intrinsic order Q1: Which of the following is an example of ordinal data? A. Blood type (A, B, AB, O) B. Customer satisfaction rating (very unsatisfied, unsatisfied, neutral, satisfied, very satisfied) C. Temperature in Fahrenheit D. Age of individuals B. Customer satisfaction rating (very unsatisfied, unsatisfied, neutral, satisfied, very satisfied) Q2: What distinguishes ordinal data from nominal data? A. Ordinal data is numerical, while nominal data is categorical B. Ordinal data can be ranked or ordered, while nominal data cannot C. Ordinal data has a true zero point, while nominal data does not D. Ordinal data can be added or subtracted, while nominal data cannot B. Ordinal data can be ranked or ordered, while nominal data cannot Q1: Which of the following is an example of interval data? A. Weight of an object B. Temperature in Celsius C. Ranking of movies from best to worst D. Type of car (sedan, SUV, truck) B. Temperature in Celsius Which of the following is an example of ratio data? A. Number of books on a shelf B. Temperature in Fahrenheit C. Likert scale (1-5) D. Eye color (blue, green, brown) A. Number of books on a shelf What makes ratio data unique compared to other types of data? A. It cannot be measured B. It has a true zero point, allowing for the calculation of ratios C. It represents categories with no inherent order D. It is based on subjective measurement B. It has a true zero point, allowing for the calculation of ratios Er. Ranjit Kaur Walia, Asst Prof., SCA, LPU Data Analysis Data analysis is defined as a process of cleaning, transforming, and modeling data to discover useful information for business decision-making. The purpose of Data Analysis is to extract useful information from data and taking the decision based upon the data analysis. A simple example of Data analysis is whenever we take any decision in our day-to-day life is by thinking about what happened last time or what will happen by choosing that particular decision. This is nothing but analyzing our past or future and making decisions based on it. For that, we gather memories of our past or dreams of our future. Er. Ranjit Kaur Walia, Asst Prof., SCA, LPU Why Data Analysis? To grow your business even to grow in your life, sometimes all you need to do is Analysis! If your business is not growing, then you have to look back and acknowledge your mistakes and make a plan again without repeating those mistakes. And even if your business is growing, then you have to look forward to making the business to grow more. All you need to do is analyze your business data and business processes. Er. Ranjit Kaur Walia, Asst Prof., SCA, LPU Er. Ranjit Kaur Walia, Asst Prof., SCA, LPU Life cycle of Data analysis Data acquisition Data preparation Data Exploration Predictive modelling Model interpretation and deployment Er. Ranjit Kaur Walia, Asst Prof., SCA, LPU Statistics is at the heart of data analytics. It is the branch of mathematics that helps us spot trends and patterns in the bulk of numerical data. Statistical techniques can be categorized as Descriptive Statistics and Inferential Statistics. Interestingly, some of the measurement techniques are similar, but the objectives are different. So, let’s understand the major differences. What is Descriptive Statistics? Descriptive Statistics describes the characteristics of a data set. It is a simple technique to describe, show and summarize data in a meaningful way. You simply choose a group you’re interested in, record data about the group, and then use summary statistics and graphs to describe the group properties. There is no uncertainty involved because you’re just describing the people or items that you actually measure. You’re not aiming to infer properties about a large data set. Descriptive statistics involves taking a potentially sizeable number of data points in the sample data and reducing them to certain meaningful summary values and graphs. The process allows you to obtain insights and visualize the data rather than simply pouring through sets of raw numbers. With descriptive statistics, you can describe both an entire population and an individual sample. In Descriptive statistics, we are describing our data with the help of various representative methods like by using charts, graphs, tables, excel files etc. In descriptive statistics, we describe our data in some manner and present it in a meaningful way so that it can be easily understood. Most of the times it is performed on small data sets and this analysis helps us a lot to predict some future trends based on the current findings. Some measures that are used to describe a data set are measures of central tendency and measures of variability or dispersion. What is Inferential Statistics? Inferential statistics involves drawing conclusions about populations by examining samples. It allows us to make inferences about the entire set, including specific examples within it, based on information obtained from a subset of examples. These inferences rely on the principles of evidence and utilize sample statistics as a basis for drawing broader conclusions. The accuracy of inferential statistics depends largely on the accuracy of sample data and how it represents the larger population. This can be effectively done by obtaining a random sample. Results that are based on non- random samples are usually discarded. Random sampling - though not very straightforward always – is extremely important for carrying out inferential techniques. Difference Between Descriptive and Inferential Statistics Descriptive statistics provide a summary of the features or attributes of a dataset, while inferential statistics enable hypothesis testing and evaluation of the applicability of the data to a larger population. Here are the key differences between descriptive vs inferential statistics: Similarities Between Descriptive and Inferential Statistics Descriptive and inferential statistics are both used to analyze and comprehend data, which is a similar function to that of descriptive statistics. They both employ statistical techniques and instruments to make judgements about a community. The same fundamental ideas in probability, such as selection, randomization, and probability distributions, are also used by both of them. Last but not least, they both employ the same kinds of statistical programs, including SPSS, SAS, and R. Types of Descriptive Statistics There are three major types of Descriptive Statistics. 1. Frequency Distribution Frequency distribution is used to show how often a response is given for quantitative as well as qualitative data. It shows the count, percent, or frequency of different outcomes occurring in a given data set. Frequency distribution is usually represented in a table or graph. Bar charts, histograms, pie charts, and line charts are commonly used to present frequency distribution. Each entry in the graph or table is accompanied by how many times the value occurs in a specific interval, range, or group. These tables of graphs are a structured way to depict a summary of grouped data classified on the basis of mutually exclusive classes and the frequency of occurrence in each respective class. 2. Central Tendency Central tendency includes the descriptive summary of a dataset using a single value that reflects the center of the data distribution. It locates the distribution by various points and is used to show average or most commonly indicated responses in a data set. Measures of central tendency or measures of central location include the mean, median, and mode. Mean refers to the average or most common value in a data set, while the median is the middle score for the data set in increasing order, and mode is the most frequent value. 3. Variability or Dispersion A measure of variability identifies the range, variance, and standard deviation of scores in a sample. This measure denotes the range and width of distribution values in a data set and determines how to spread apart the data points are from the center. The range shows the degree of dispersion or the difference between the highest and lowest values within the data set. The variance refers to the degree of the spread and is measured as an average of the squared deviations. The standard deviation determines the difference between the observed score in the data set and the mean value. This descriptive statistic is useful when you want to show how to spread out your data is and how it affects the mean. Descriptive Statistics is also used to determine measures of position, which describes how a score ranks in relation to another. This statistic is used to compare scores to a normalized score like determining percentile ranks and quartile ranks. Types of Inferential Statistics Inferential Statistics helps to draw conclusions and make predictions based on a data set. It is done using several techniques, methods, and types of calculations. Some of the most important types of inferential statistics calculations are:- 1. Regression Analysis Regression models show the relationship between a set of independent variables and a dependent variable. This statistical method lets you predict the value of the dependent variable based on different values of the independent variables. Hypothesis tests are incorporated to determine whether the relationships observed in sample data actually exist in the data set. 2. Hypothesis Tests Hypothesis testing is used to compare entire populations or assess relationships between variables using samples. Hypotheses or predictions are tested using statistical tests so as to draw valid inferences. 3. Confidence Intervals The main goal of inferential statistics is to estimate population parameters, which are mostly unknown or unknowable values. A confidence interval observes the variability in a statistic to draw an interval estimate for a parameter. Confidence intervals take uncertainty and sampling error into account to create a range of values within which the actual population value is estimated to fall. Each confidence interval is associated with a confidence level that indicates the probability in the percentage of the interval to contain the parameter estimate if you repeat the study. Example of Descriptive Statistics Examples of descriptive statistics are used to enumerate and explain a dataset's key characteristics. Measures like mean, median, mode, range, variance, and standard deviation are some examples. For instance, you could use descriptive statistics to determine the average age, the age distribution, and the age standard deviation of a group of individuals if you wanted to summarize their ages. Example of Inferential Statistics Using a sample of data, inferential statistics is used to draw conclusions or generalizations about a broader population. Examples include regression analysis, confidence ranges, and hypothesis testing. For instance, you could use inferential statistics to assess whether there is a significant difference in the outcomes of patients who receive the drug compared to those who receive a placebo if you want to know if a new drug is effective. Tools of Descriptive Statistics Measures of centre tendency (mean, median, mode), measures of variability (range, variance, standard deviation), frequency distributions, histograms, scatterplots, and box plots are examples of descriptive statistics tools. Tools of Inferential Statistics Hypothesis testing, confidence intervals, regression analysis, analysis of variance (ANOVA), and chi-square tests are examples of inferential statistics tools. Who Needs Data Analytics? Any business professional who makes decisions needs foundational data analytics knowledge. Access to data is more common than ever. If you formulate strategies and make decisions without considering the data you have access to, you could miss major opportunities or red flags that it communicates. Professionals who can benefit from data analytics skills include: Marketers, who utilize customer data, industry trends, and performance data from past campaigns to plan marketing strategies Product managers, who analyze market, industry, and user data to improve their companies’ products Finance professionals, who use historical performance data and industry trends to forecast their companies’ financial trajectories Human resources and diversity, equity, and inclusion professionals, who gain insights into employees’ opinions, motivations, and behaviors and pair it with industry trend data to make meaningful changes within their organizations To get the greatest insight from your data, familiarize yourself with the four key types of data analytics. Here’s a breakdown of the types, which you can use individually or in tandem to maximally benefit from your company’s data. Analytics for Decision Making Support Using Data to Drive Decision-Making The four types of data analysis should be used in tandem to create a full picture of the story data tells and make informed decisions. To understand your company’s current situation, use descriptive analytics. To figure out how your company got there, leverage diagnostic analytics. Predictive analytics is useful for determining the trajectory of a situation—will current trends continue? Finally, prescriptive analytics can help you consider all aspects of current and future scenarios and plan actionable strategies. Depending on the problem you’re trying to solve and your goals, you may opt to use two or three of these analytics types—or use them all in sequential order to gain the deepest understanding of the story data tells. Strengthening your analytics skills can empower you to take advantage of insights your data offers and advance your organization and career. Analytical practitioners today have a vast array of analytical capabilities and techniques at their disposal. These range from the most fundamental techniques, “ descriptive analytics”, which involve preparing the data for subsequent analysis, to “predictive analytics” that provide advanced models to forecast and predict future, to the top-notch of analytics called “prescriptive analytics” that utilize machine based learning algorithms and dynamic rule engines to provide interpretations and recommendations. With their diverse use cases and applications, it is no longer a surprise that these techniques are now finding way into customer, workforce, supply- chain, finance and risk strategies at an organizational level. Data is the new oil- and the best way for companies to access and understand it is to digitize their processes. Digitizing customer interactions can provide troves of information, which companies can feed into strategy, sales, marketing, and product development. Detailed and granular data can enable companies to micro- target their customers and to personalize their products and services. Further internal digitization generates data that managers can use to improve their operations, including routing and transportation, resource allocation and scheduling, capacity planning and manufacturing. These trends are also causing many companies to converge their “Business Intelligence” and “Operation Research” units on the common ground of predictive and advanced analytics. Both communities are now using statistical and mathematical techniques to attack strategic business problems and systemize decision making. Data analytics, with its far reaching use cases and diverse applications, is now emerging as the keystone of strategic business decision making. The next few sections explore the vast and diverse opportunities that data and analytics bring to businesses today. Making most out of consumer patterns: In an increasingly customer oriented era, organizations have amassed wealth of consumer information and data. In order to remain competitive, it is imperative for organizations to use these consumer insights to shape their products, solutions and buying experiences. In addition, pattern data can also generate valuable customer insights that can be used to direct marketing expenditures. Using data to drive performance: While organizations spend considerable time analyzing consumer data and frontline monetization opportunities, it is equally imperative to focus on improving productivity and performance. Data and analytics can play a huge role in reducing inefficiency and streamlining business operations. For instance, reporting and analytical dashboards can identify data correlations and provide managers with detailed insights to perform cost valuations, peer benchmarking and pricing segmentation. Similarly, using analytics to measure key performance metrics across areas such as operational excellence, product innovation and workforce planning can produce calculated insights to solve complex business scenarios. Managing risk through analytics: Organizations today are exposed to immense risk from structured data- such as databases and unstructured data- such as websites, blogs, and social media channels. By leveraging risk analytics, companies can find themselves in a better position to quantify, measure and predict risk.