Unit 1 - Business Statistics PDF
Document Details
Uploaded by Deleted User
Tags
Summary
This document introduces the basics of business statistics. It defines statistics, its types, functions, importance, and limitations. The document also discusses the scopes of statistical studies in business and other fields.
Full Transcript
Introduction of Statistics Statistics: It is the science of collecting, organizing, analyzing and interpreting data in order to make decisions. Types of Statistics There are two types of statistics: descriptive statistics and inferential statistics. Descriptive Statistics is the branch of st...
Introduction of Statistics Statistics: It is the science of collecting, organizing, analyzing and interpreting data in order to make decisions. Types of Statistics There are two types of statistics: descriptive statistics and inferential statistics. Descriptive Statistics is the branch of statistics that involves the organization, summarization and display of data. Inferential Statistics is the branch of statistics that involves using a sample to draw conclusions about a population. Functions or Uses of Statistics (1) Statistics helps in providing a better understanding and accurate description of nature’s phenomena. (2) Statistics helps in the proper and efficient planning of a statistical inquiry in any field of study. (3) Statistics helps in collecting appropriate quantitative data. (4) Statistics helps in presenting complex data in a suitable tabular, diagrammatic and graphic form for an easy and clear comprehension of the data. (5) Statistics helps in understanding the nature and pattern of variability of a phenomenon through quantitative observations. (6) Statistics helps in drawing valid inferences, along with a measure of their reliability about the population parameters from the sample data. Importance of Statistics Here are some of the key reasons showing the importance of Statistics. Making Informed Decisions Predicting Outcomes Testing Hypotheses Monitoring Progress Quality Control Understanding Competitors Measure the Health of a Nation Estimate Risk Levels in the Market Help Predict the future Measuring the success rate of various programs Understanding Demographics Improve the quality of a product or service Identify Profit Centers For Better Marketing To find the Root Cause of a Problem Limitations of Statistics 1. Misuse of statistics: Stats can be easily manipulated to support a particular agenda or bias. This can lead to inaccurate conclusions or misinterpretations of data. 2. Inaccurate data: Stats are only as accurate as the data on which they are based. If the data is flawed or incomplete, the Stats may not provide an accurate representation of the population being studied. 3. Oversimplification of complex issues: Stats may oversimplify complex issues, leading to a lack of nuance or understanding. This can be particularly problematic when dealing with issues that are deeply intertwined with social, cultural, or political factors. 4. Overreliance on statistics: An overreliance on statistics can lead to a lack of critical thinking and analysis. It is important to consider the context in which the Stats are being used and to critically evaluate their validity and relevance. 5. Limited scope: Stats can only provide information about the variables that have been measured. This means that important variables may be overlooked or omitted, leading to incomplete or inaccurate conclusions. 6. Ethical concerns: There are ethical concerns associated with the collection and use of data for statistical analysis. It is important to ensure that an individual’s privacy is respected and that data is collected and used in an ethical and responsible manner. Scopes of Statistical Studies Statistics has a wide range of applications across many different fields, including: Business and economics: Statistics is used to analyze sales trends, market research data, and financial performance. It is used to develop pricing strategies, forecast demand, and measure the impact of advertising and marketing campaigns. Medicine and healthcare: Stats is used in clinical trials to test the safety and efficacy of new treatments. It is also used to analyze epidemiological data, develop public health policies, and evaluate the effectiveness of healthcare interventions. Social sciences: Stats is used to study human behavior and social trends. It is used to analyze data on crime rates, education levels, and population demographics. It is also used to evaluate the effectiveness of social programs and policies. Engineering and science: Stats is used to analyze data in engineering and scientific research. It is used to develop and test hypotheses, measure the effectiveness of experiments, and identify patterns and trends in data. Sports and entertainment: Stats is used to analyze and predict the outcomes of sporting events. It is used to evaluate player performance and develop strategies for winning. It is also used to analyze audience data in the entertainment industry to develop marketing strategies and measure the success of productions. Overall, statistics is a powerful tool that has numerous applications in many different fields. Its ability to analyze and interpret data makes it an essential tool for decision- making, research, and problem-solving. NOTES UNIT I MEASURES OF CENTRAL TENDENCY In general terms, central tendency is a statistical measure that determines a single value that accurately describes the center of the distribution and represents the entire distribution of scores. The goal of central tendency is to identify the single value that is the best representative for the entire set of data. By identifying the "average score," central tendency allows researchers to summarize or condense a large set of data into a single value In addition, it is possible to compare two (or more) sets of data by simply comparing the average score (central tendency) for one set versus the average score for another set. According to Prof Bowley “Measures of central tendency (averages) are statistical constants which enable us to comprehend in a single effort the significance of the whole.” The main objectives of Measure of Central Tendency are 1. To condense data in a single value. 2. To facilitate comparisons between data. There are different types of averages, each has its own advantages and disadvantages. Measures of central tendency Mathematical Average Positional Average GM HM Partition Values Mode AM Median Quartile Decile Percentile MATHEMATICAL AVERAGES: GEOMETRIC MEAN HARMONIC MEAN ARITHMETIC MEAN (GM) (HM) (AM) Ungrouped Data Ungrouped Data Ungrouped Data Frequency Distribution Frequency Distribution Frequency Distribution Uses: 1. Averaging ratios, rates Uses: Uses: and percentages. 1. HM gives the largest 1. To compare the 2. Average rate under weights to smallest item and smallest weights to class averages. compound interest, largest item. (when there 2. Average spending depreciation of machines. are few extremely large or habit of individual 3. In economics and business small values HM is in the construction of preferable. index numbers. 2. Averages involving time, speed, rate and price. POSITIONAL AVERAGES: MODE Ungrouped Data: Mode is determined by locating that value, which occurs the maximum number of times. Frequency Distribution: 𝑓 −𝑓 𝑀𝑜𝑑𝑒 = 𝐿 + ×ℎ 2𝑓 − 𝑓 − 𝑓 Where, 𝐿 = 𝐿𝑜𝑤𝑒𝑟 𝑙𝑖𝑚𝑖𝑡 𝑜𝑓 𝑚𝑜𝑑𝑎𝑙 𝑐𝑙𝑎𝑠𝑠 𝑓 = 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑝𝑟𝑒𝑐𝑒𝑒𝑑𝑖𝑛𝑔 𝑚𝑜𝑑𝑎𝑙 𝑐𝑙𝑎𝑠𝑠 𝑓 = 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑚𝑜𝑑𝑎𝑙 𝑐𝑙𝑎𝑠𝑠 𝑓 = 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑠𝑢𝑐𝑐𝑒𝑒𝑑𝑖𝑛𝑔 𝑚𝑜𝑑𝑎𝑙 𝑐𝑙𝑎𝑠𝑠 ℎ = 𝑤𝑖𝑑𝑡ℎ 𝑜𝑓 𝑚𝑜𝑑𝑎𝑙 𝑐𝑙𝑎𝑠𝑠 PARTITION VALUES Median Quartiles Deciles Percentile Individual series Individual series Individual series Individual series 𝑛+1 𝑄 = 𝑘 𝑡ℎ 𝑛+1 𝑛+1 4 𝑛+1 𝑡ℎ 𝑜𝑑𝑑 𝑐𝑎𝑠𝑒 𝐷 = 𝑘 𝑡ℎ 𝑃 = 𝑘 𝑡ℎ 2 10 100 𝑡ℎ + + 1 𝑡ℎ even case Discrete series Discrete series Discrete series Discrete series Find ; just greater value in Find ; just greater value Find ; just greater value Find ; just greater CF gives median. in CF gives quartiles. in CF gives deciles. value in CF gives percentile. Continuous Series Continuous Series Continuous Series Continuous Series 𝑁 𝑘𝑁 𝑘𝑁 − 𝑐𝑓 − 𝑐𝑓 − 𝑐𝑓 𝑘𝑁 𝑀 =𝐿+ 2 ℎ 𝑄 =𝐿+ 4 ℎ 𝐷 = 𝐿 + 10 ℎ 100 − 𝑐𝑓 𝑓 𝑓 𝑓 𝑃 =𝐿+ ℎ 𝑓 th L= Lower limit of the median L= Lower limit of the kth quartile L= Lower limit of the k class decile class L= Lower limit of the kth class percentile class cf = Cumulative frequency cf = Cumulative frequency cf = Cumulative frequency preceding kth quartile class preceding kth decile class cf = Cumulative frequency preceding median class preceding kth percentile class f = frequency of the median class f = frequency of the kth quartile f = frequency of the kth decile class class f = frequency of the kth h = width of the median class percentile class h = width of the kth quartile h = width of the kth decile class class h = width of the kth percentile e class MEAN OF COMPOSITE GROUP If two groups contain 𝑛 𝑎𝑛𝑑 𝑛 observations with mean 𝑥̅ 𝑎𝑛𝑑 𝑥̅ respectively, then the mean (𝑥̅ ) of the composite group of 𝑁 = 𝑛 + 𝑛 observations is given by the relation 𝑛 𝑥̅ + 𝑛 𝑥̅ 𝑥̅ = 𝑁 PROPERTIES OF GOOD MEASURE OF CENTRAL TENDENCY: 1. It should be rigidly defined. 2. It should be simple to understand & easy to calculate. 3. It should be based upon all values of given data. 4. It should be capable of further mathematical treatment. 5. It should have sampling stability. 6. It should be not be unduly affected by extreme values. MEASURES OF DISPERSION Measures of dispersion are descriptive statistics that describe how similar a set of scores are to each other A quantity that measures the variability among the data, or how the data one dispersed about the average, known as Measures of dispersion, scatter, or variations. The more similar the scores are to each other, the lower the measure of dispersion will be The less similar the scores are to each other, the higher the measure of dispersion will be In general, the more spread out a distribution is, the larger the measure of dispersion will be Measures of Dispersion Absolute Measures Relative Measures Range Quartile Mean Standard deviation Deviation Deviation Coefficient Coefficient of Coefficient Coefficient of Range Quartile of Mean of deviation Deviation Variation Absolute Measures MEAN DEVIATION RANGE QUARTILE DEVIATION Raw Data STANDARD DEVIATION Highest value – Smallest Value A = Mean, Median or Raw Data Mode Grouped Data Grouped Data A = Mean, Median or Mode Relative Measures COEFFICIENT OF COEFFICIENT COEFFICIENT OF MEAN DEVIATION OF RANGE QUARTILE DEVIATION A = Mean, median 𝐻−𝑆 𝐶𝑜𝑒𝑓𝑓 𝑜𝑓 𝑅𝑎𝑛𝑔𝑒 = 𝐻+𝑆 or Mode STANDARD DEVIATION OF COMPOSITE GROUP If two groups contain 𝑛 𝑎𝑛𝑑 𝑛 observations with mean 𝑥̅ 𝑎𝑛𝑑 𝑥̅ , and standard deviation 𝜎 𝑎𝑛𝑑 𝜎 respectively, then the standard deviation (𝜎) of the composite group is given by 𝑛 𝜎 + 𝑛 𝜎 +𝑛 𝑑 +𝑛 𝑑 𝜎= 𝑁 Where 𝑑 = 𝑥̅ − 𝑥̅ , 𝑑 = 𝑥̅ − 𝑥̅ , and 𝑥̅ is the mean of the composite group, given by ̅ ̅ 𝑥̅ = OBJECT AND PURPOSE OF MEASURING DISPERSION 1. To find the average distance of the items from an average. 2. To know the structure of the series. 3. To gauge the reliability of an average. When the dispersion is small, the average is reliable. 4. To know the limits of the items. 5. To serve as a basis for control of the variability itself. 6. To compare two or more series with regard to their variability. SKEWNESS: The skewness of a distribution is defined as the lack of symmetry. In a symmetrical distribution, mean, median, and mode are equal to each other. The presence of extreme observations on the right hand side of a distribution makes it positively skewed. We shall in fact have Mean > Median > Mode when a distribution is positively skewed. On the other hand, the presence of extreme observations to the left hand side of a distribution make it negatively skewed and the relationship between mean, median and mode is: Mean < Median < Mode. MEASURES OF SKEWNESS Measures of Skewness Karl pearson’s Bowley’s Kelly’s coefficient of coefficient of coefficient of coefficient of skewness based skewness skewness skewness on moments Emperical Relation: Mode= 3Median -2Mean If Sk = 0 the frequency distribution is symmetrical about mean. If Sk > 0 the frequency distribution is Positively Skewed. If Sk < 0 the frequency distribution is negatively skewed. KURTOSIS: Kurtosis is another measure of the shape of a distribution. Whereas skewness measures thelack of symmetry of the frequency curve of a distribution, kurtosis is a measure of the relative peakedness of its frequency curve. Various frequency curves can be divided into three categories depending upon the shape of their peak. Measures of Kurtosis A measure of kurtosis is given by 𝛽 = ,a coefficient given by Karl Pearson. 𝛽 The value of 𝛽 = 3 for a mesokurtic curve. When 𝛽 > 3, the curvt: is more peaked than the mesokurtic curve and is tenned as leptokurtic. Similarly, when 𝛽 < 3, the curve is less peaked than the mesokurtic curve and is called as platykurtic curve. Question : The measures of central tendency and measures of dispersion are complementary. Comment Measures of central tendency are mean, mode and median. Even we have three types of meanings, such as arithmetic mean, geometric mean and harmonic mean. Measures of dispersion tell us better about the kind of spread. In a way, mean deviation or standard deviation tell us more about the way data is spread. Complete step-by-step answer: On one hand, a measure of central tendency indicates the centre of the data distribution; which is the value around which all the data points gather. But still, we do not know how closely data points gather around that value. It could be very tight, or it could be very loose. There is no way to tell by looking at the central tendency alone. On the other hand, a measure of dispersion indicates how 'dispersed' the data points are around the central value. A higher measure of dispersion suggests data points gather loosely around the central value (highly dispersed), and conversely, a lower measure of dispersion suggests they gather tightly. But looking at the dispersion measure alone does not tell us where the central value is. That is why, we need both measures of central tendency and dispersion, so that we know the centre of the distribution of data, and we have a good idea of how widely the data dispersed. Note: It is obvious that measures of central tendency and measures of dispersion are both important and complementary. We can have two datasets with the same median or mode, but their spread may be different. Question : “Every average has its own peculiar characteristic it is difficult to say which average is best”. Comment. Measures of central tendency are summary statistics that represent the center point or typical value of a dataset. Examples of these measures include the mean, median, and mode. These statistics indicate where most values in a distribution fall and are also referred to as the central location of a distribution. You can think of central tendency as the propensity for data points to cluster around a middle value. In statistics, the mean, median, and mode are the three most common measures of central tendency. Each one calculates the central point using a different method. Choosing the best measure of central tendency depends on the type of data you have. The central tendency is one of the most important concepts in statistics. Although it does not provide information regarding the individual values in the dataset, it delivers a comprehensive summary of the whole dataset. Generally, the central tendency of a dataset can be described using the following measures: Mean (Average): Mean is mainly used for doing comparative study. In case to know a) the performance of companies, b) to study about which college performance is best c) Real estate agents calculate the mean price of houses in a particular area so they can inform their clients of what they can expect to spend on a house. In these cases mean is best suited then the median and the mode. Mean represents the sum of all values in a dataset divided by the total number of the values. Median: It refers to the middle value in a distribution. In median one-half of the items have a value smaller and the one-half have a larger value. It basically divides the data set into two halves. The median is useful for distributions containing open-end intervals since these intervals do not enter its computation. Median is bested suited a) To find the performance of a cricketer where his worst & best extreme performance can be ignored to give his consistent performance. b) Median salary helps the employees know the middle point of their salaries in their careers. It is also called the 50 per cent income, which means half of the employees work above this median salary, while half of them work below it. This gives a sense of healthy competition and allows them to grow. Median is preferred rather than the mean and mode. Mode: Defines the most frequently occurring value in a dataset. a) If we need to find the most favorite Subject of students in a given class, mode can be used. b) In some cases, a dataset may contain multiple modes, while some datasets may not have any mode at all. c) Real estate agents also calculate the mode of the number of bedrooms per house so they can inform their clients on how many bedrooms they can expect to have in houses in a particular area. Remark: The selection of a central tendency measure depends on the properties of a dataset. For instance, the mode is the only central tendency measure for categorical data, while a median works best with ordinal data.