Business Statics Unit I-V PDF
Document Details
University of Delhi
2023
Dr. Sumita Jain
Tags
Related
Summary
This document is study material for a B.Com program/hons program at the University of Delhi. The material covers various statistical concepts like frequency distribution, central tendency, measures of variation and probability. It also includes exercises and self-assessment questions.
Full Transcript
Editorial Board Sh. K.B.Gupta Content Writers Dr. Alok Kumar, Dr. Rakesh Kumar Gupta Revised by Dr. Sumita Jain Content Reviewers Dr. Neha Nainwal and Dr. Promila Bharadwaj...
Editorial Board Sh. K.B.Gupta Content Writers Dr. Alok Kumar, Dr. Rakesh Kumar Gupta Revised by Dr. Sumita Jain Content Reviewers Dr. Neha Nainwal and Dr. Promila Bharadwaj Academic Coordinator Mr. Deekshant Awasthi © Department of Distance and Continuing Education ISBN: 978-81-19417-08-7 1st Edition: 2023 E-mail: [email protected] [email protected] Published by: Department of Distance and Continuing Education Campus of Open Learning/School of Open Learning, University of Delhi, Delhi-110007 Printed by: School of Open Learning, University of Delhi DISCLAIMER Unit I-V are edited versions of study material prepared for the courses under Annual & CBCS Mode. &RUUHFWLRQV0RGL¿FDWLRQV6XJJHVWLRQV SURSRVHG E\ 6WDWXWRU\ %RG\ '8 Stakeholder/s in the Self Learning Material (SLM) will be incorporated in WKH QH[W HGLWLRQ +RZHYHU WKHVH FRUUHFWLRQVPRGL¿FDWLRQVVXJJHVWLRQV ZLOO be uploaded on the website https://sol.du.ac.in. Any feedback or suggestions can be sent to the email- [email protected] Printed at: Taxmann Publications Pvt. Ltd., 21/35, West Punjabi Bagh, New Delhi - 110026 (5000 Copies, 2024) © Department of Distance & Continuing Education, Campus of Open Learning, School of Open Learning, University of Delhi Contents PAGE UNIT 1 Lesson 1 : Preparation of Frequency Distribution and their Graphical Presentation 1.1 Learning Objectives 3 1.2 What is Frequency Distribution 3 1.3 Types of Frequency Distribution 4 1.4 Principles of Frequency Distribution 8 1.5 Graphs 13 1.6 Summary 25 1.7 Self-Assessment Questions 26 Lesson 2 : Measures of Central Tendency - Mathematical and Positional Averages 2.1 Learning Objectives 31 2.2 What is Central Tendency? 32 2.3 Objectives of Central Tendency 32 2.4 Characteristics 33 2.5 Types of Averages 33 2.6 Mean 34 2.7 Geometric Mean 46 2.8 Harmonic Mean 51 2.9 Median 55 2.10 Other Positional Averages 59 2.11 Calculation of Missing Frequencies 61 2.12 Mode 63 2.13 Summary 73 2.14 Self-Assessment Questions 74 PAGE i © Department of Distance & Continuing Education, Campus of Open Learning, School of Open Learning, University of Delhi B.COM. (PROGRAMME)/B.COM. (HONS.) PAGE Lesson 3 : Measures of Variation – Absolute and Relative 3.1 Learning Objectives 81 3.2 Need and Importance 81 3.3 What is Variation? 83 3.4 Requisites of a Good Measure of Variation 83 3.5 Types of Variation 83 3.6 Methods Computing Variation 84 3.7 Revisionary Problems 115 3.8 Summary 122 3.9 Self-Assessment Questions 123 Lesson 4 : Skewness and Kurtosis 4.1 Learning Objectives 131 4.2 Tests of Skewness 131 4.3 Nature of Skewness 133 4.4 Characteristics of Skewness 133 4.5 Methods of Skewness 133 4.6 Measures of Kurtosis 146 4.7 Comparison among Variation, Skewness, Kurtosis 147 4.8 Summary 148 4.9 Self-Assessment Questions 148 Lesson 5 : Moments 5.1 Learning Objectives 153 5.2 Concept of Central Moments 153 5.3 Sheppard’s Method 163 5.4 &RHI¿FLHQWV RI 0RPHQWV 5.5 Summary 167 5.6 Self-Assessment Questions 168 ii PAGE © Department of Distance & Continuing Education, Campus of Open Learning, School of Open Learning, University of Delhi CONTENTS PAGE UNIT 2 Lesson 1 : Theory of Probability 1.1 Learning Objectives 177 1.2 Probability Distribution 177 1.3 Basic Terminology in Probability 180 1.4 Methods of Assigning Probability 185 1.5 Computation of Probability 189 1.6 Laws of Probability 194 1.7 Bayes’ Theorem 202 1.8 Expected Value 206 1.9 Summary 208 1.10 Self-Assessment Questions 210 Lesson 2 : Probability Distributions 2.1 Learning Objectives 217 2.2 Probability Distribution 217 2.3 Binomial Distribution 219 2.4 Poisson Distribution 225 2.5 Normal Distribution 230 2.6 Summary 242 2.7 Self-Assessment Exercise 243 Lesson 3 : Statistical Decision Theory 3.1 Learning Objectives 248 3.2 Probability in Decision Making 248 3.3 Decision Making Process 251 3.4 Decision Under Uncertainty 254 3.5 Decision Under Risk 256 3.6 Expected Value of Perfect Information (EVPI) 258 PAGE iii © Department of Distance & Continuing Education, Campus of Open Learning, School of Open Learning, University of Delhi B.COM. (PROGRAMME)/B.COM. (HONS.) PAGE 3.7 Decision Tree 263 3.8 Summary 268 3.9 Self-Assessment Questions 269 UNIT 3 Lesson 1 : Simple Correlation 1.1 Learning Objectives 279 1.2 Introduction 279 1.3 Utility of Correlation 280 1.4 Difference between Correlation and Causation 281 1.5 Types of Correlation 282 1.6 Methods of Studying Correlation 283 1.7 Summary 301 1.8 Self-Assessment Questions 302 Lesson 2 : Regression Analysis 2.1 Learning Objectives 307 2.2 Introduction 307 2.3 Difference between Correlation and Regression 309 2.4 Principle of Least Squares 310 2.5 Methods of Regression Analysis 310 2.6 3URSHUWLHV RI 5HJUHVVLRQ &RHI¿FLHQWV 2.7 Standard Error of an Estimate 324 2.8 Summary 326 2.9 Self-Assessment Questions 326 UNIT 4 Lesson 1 : Index Numbers 1.1 Learning Objectives 335 1.2 Introduction 336 iv PAGE © Department of Distance & Continuing Education, Campus of Open Learning, School of Open Learning, University of Delhi CONTENTS PAGE 1.3 Features of Index Numbers 337 1.4 Problems of Index Numbers 338 1.5 Methods of Constructing Index Numbers 341 1.6 Tests of Adequacy or Consistency 350 1.7 Chain Base Index 353 1.8 Splicing 356 1.9 Consumer Price Index 358 1.10 Index Number of Industrial Production 360 1.11 Limitations of Index Numbers 361 1.12 Construction of BSE Sensex and NSE Nifty 361 1.13 Summary 370 1.14 Self-Assessment Questions 371 UNIT 5 Lesson 1 : Time Series Analysis 1.1 Learning Objectives 379 1.2 Introduction 379 1.3 Components of Time Series 380 1.4 Models of Times Series 383 1.5 Methods of Measuring Trend 384 1.6 Second Degree Parabola 396 1.7 Exponential Trends 398 1.8 Shifting the Trend Origin 400 1.9 Conversion of Annual Trend to Monthly Trend 401 1.10 Measurement of Seasonal Variations 403 1.11 Summary 418 1.12 Self-Assessment Questions 419 Glossary 431 PAGE v © Department of Distance & Continuing Education, Campus of Open Learning, School of Open Learning, University of Delhi UNIT-1 PAGE 1 © Department of Distance & Continuing Education, Campus of Open Learning, School of Open Learning, University of Delhi L E S S O N 1 Preparation of Frequency Distribution and their Graphical Presentation STRUCTURE 1.1 Learning Objectives 1.2 What is Frequency Distribution 1.3 Types of Frequency Distribution 1.4 Principles of Frequency Distribution 1.5 Graphs 1.6 Summary 1.7 Self-Assessment Questions 1.1 Learning Objectives After reading this lesson, you should be able to: Learn a frequency distribution and types of distributions. Learn the principles and procedure of preparing a frequency distribution. Learn the graphical presentation of distribution with the help of histogram, frequency polygon, smoothed frequency curves and gives. 1.2 What is Frequency Distribution Collected and classified data are presented in a form of frequency distribution. Frequency distribution is simply a table in which the data are grouped into classes on the basis of common characteristics and the number of cases which fall in each class are recorded. It shows the frequency of occurrence of different values of a single variable. A frequency distribution is constructed for satisfying three objectives : PAGE 3 © Department of Distance & Continuing Education, Campus of Open Learning, School of Open Learning, University of Delhi B.COM. (PROGRAMME)/B.COM. (HONS.) Notes (i) to facilitate the analysis of data. (ii) to estimate frequencies of the unknown population distribution from the distribution of sample data and (iii) to facilitate the computation of various statistical measures. 1.3 Types of Frequency Distribution 1. Univariate Frequency Distribution. 2. Bivariate Frequency Distribution. This chapter consists Univariate frequency distribution. Univariate distribution incorporates different values of one variable only whereas the Bivariate frequency distribution incorporates the values of two variables. The Univariate frequency distribution is classified further into three categories : (i) Series of Individual observations, (ii) Discrete frequency distribution, and (iii) Continuous frequency distribution. Series of individual observations, is a simple listing of items of each observation. If marks of 20 students in statistics of a class are given individually, it will form a series of Individual observations. Marks obtained in Statistics: Roll Nos. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Marks 60 71 80 41 94 33 81 41 78 66 85 35 61 55 98 52 50 91 30 88 Marks in Ascending Order Marks in Descending Order 30 98 33 94 35 91 41 88 41 85 50 81 52 80 55 78 60 71 61 66 4 PAGE © Department of Distance & Continuing Education, Campus of Open Learning, School of Open Learning, University of Delhi BUSINESS STATISTICS Marks in Ascending Order Marks in Descending Order Notes 66 61 71 60 78 55 80 52 81 50 85 41 88 41 91 35 94 33 98 30 Discrete Frequency Distribution : In a discrete series, the data are presented in such a way that exact measurements of units are indicated. The observations are arranged into group by using the method of tally bars. In a discrete frequency distribution. With the help of tally bars frequency can be count the number of times each observation in the given data. The column shows all values of the variable. In the second column, a vertical bar called tally bar against the variable, we write a particular value has occurred four times, for the fifth occurrence, a cross tally mark (/) on the four tally bars to make a block of 5. The technique of putting cross tally bars at every fifth repetition facilitates the counting of the number of occurrences of the value. After putting tally bars for all the values in the data; we count the number of times each value is repeated and write it against the corresponding value of the variable in the third column entitled frequency. This type of representation of the data is called discrete frequency distribution. We are given marks of 50 students : 70 55 51 42 57 40 26 43 46 41 46 48 33 40 26 40 40 41 43 53 45 60 47 63 53 33 50 40 33 40 26 53 59 33 65 78 39 55 48 15 26 43 59 51 39 15 45 26 60 15 We can construct a discrete frequency distribution from the above given marks. PAGE 5 © Department of Distance & Continuing Education, Campus of Open Learning, School of Open Learning, University of Delhi B.COM. (PROGRAMME)/B.COM. (HONS.) Notes Marks of 50 Students Marks Tally Bars Frequency 15 ||| 3 26 |||| 5 33 |||| 4 39 || 2 40 |||| 5 41 || 2 42 | 1 43 ||| 3 45 | 2 46 || 2 47 | 1 48 || 2 50 | 1 51 || 2 53 ||| 3 55 ||| 3 57 | 1 59 || 2 60 | 1 61 | 1 63 | 1 65 | 1 70 | 1 78 | 1 Total 50 The presentation of the data in the form of a discrete frequency distribution is better than arranging but it does not condense the data as needed and is quite difficult to understand and comprehend. This distribution is quite simple in case the values of the variable are repeated otherwise there will be hardly any condensation. Continuous Frequency Distribution : If the identity of the units about a particular information is collected, is not relevant nor is the order in which the observations occur, then the first step of condensation is to 6 PAGE © Department of Distance & Continuing Education, Campus of Open Learning, School of Open Learning, University of Delhi BUSINESS STATISTICS classify the data into different classes by dividing the entire group of Notes values of the variable into a suitable number of groups and then recording the number of observations in each group. Thus, if we divide the total range of values of the variable (marks of 50 students) i.e. 78 – 15 = 63 into groups of 10 each, then we shall get (63/10) 6 groups and the distribution of marks is displayed by the following frequency distribution : Marks of 50 students Marks (×) Tally Bars Number of Students (f) 15–25 ||| 3 25–34 |||| |||| 9 35–45 |||| |||| ||| 13 45–55 |||| |||| ||| 13 55–65 |||| |||| 9 65–75 || 2 75–85 | 1 Total 50 The various groups into which the values of the variable are classified are known as classes, the length of the class interval (10) is called the width of magnitude of the class. Two values, specifying the class, are called the class limits. The presentation of the data into continuous classes with the corresponding frequencies is known as continuous frequency distribution. There are two methods of classifying the data according to class intervals : (i) Exclusive method (ii) Inclusive method In an exclusive method, the class intervals are fixed in such a manner that upper limit of one class becomes the lower limit of the following class. Moreover, an item equal to the upper limit of a class would be excluded from that class and included subsequently in the next class. The following data are classified on this basis. Income (Rs.) No. of Persons 200–250 50 250–300 100 PAGE 7 © Department of Distance & Continuing Education, Campus of Open Learning, School of Open Learning, University of Delhi B.COM. (PROGRAMME)/B.COM. (HONS.) Notes Income (Rs.) No. of Persons 300–350 70 350–400 130 400–450 50 450–500 100 Total 500 It is clear from the example that the exclusive method ensures continuity of the data in as much as the upper limit of one class is the lower limit of he next class. Therefore, 50 persons have their incomes between 200 to 249.99 and a person whose income is 250 shall be included in the next class of 250–300. According to the inclusive method, an item equal to upper limit of a class is included in that class itself. The following table demonstrates this method. Income (Rs.) No. of Persons 200–249 50 250–299 100 300–349 70 350–399 130 400–449 50 450–499 100 Total 500 Hence in the class 200–249, we include persons whose income is between Rs. 200 and Rs. 249. 1.4 Principles of Frequency Distribution Though the great importance of classification in statistical analysis, no hard and fast rules be laid down for it. A statistician uses his discretion for classifying a frequency distribution and sound experience, wisdom, skill and suitability for an appropriate classification of the data. However, the following guidelines must be considered to construct a frequency distribution: 1. Types of Classes : The classes should be clearly defined and should not lead to any ambiguity. They should be exhaustive and mutually exclusive so that any value of variable corresponds to only class. 8 PAGE © Department of Distance & Continuing Education, Campus of Open Learning, School of Open Learning, University of Delhi BUSINESS STATISTICS 2. Number of Classes : The choice about the number of classes into Notes which a given frequency distribution should be divided depends upon: (i) The total frequency which means the total number of observations in the distribution. (ii) The nature of the data which means the size or magnitude of the values of the variable. (iii) The desired accuracy. (iv) The convenience regarding computation of the various descriptive measures of the frequency distribution such as means, variance etc. The number of classes should neither be too small nor too large. In case the classes are few, the classification becomes very broad and rough which might obscure some important features and characteristics of the data. The accuracy of the results decreases as the number of classes becomes smaller. On the other hand, too many classes will result in very few frequencies in each class. This will give an irregular pattern of frequencies in different classes thus makes the frequency distribution irregular. Moreover a large number of classes will render the distribution too unwieldy to handle. The computational work for further processing of the data will become quite tedious and time consuming without any proportionate gain in the accuracy of the results. Hence a balance should be maintained between the loss of information in the first case and irregularity of frequency distribution in the second case, to arrive at a pleasing compromise giving the optimum number of classes. Normally, the number of classes should not be less than 5 and more than 20. Prof. Sturges has given a formula : k = 1 + 3.322 log n where k refers to the number of classes and n is the total frequency or number of observations. The value of k is rounded to the next higher integer : If n = 100 k = 1 + 3.322 log 100 = 1 + 6.6448 = 8 If n = 10,000 k = 1 + 3.322 log 10,000 = 1 + 13.288 = 14 PAGE 9 © Department of Distance & Continuing Education, Campus of Open Learning, School of Open Learning, University of Delhi B.COM. (PROGRAMME)/B.COM. (HONS.) Notes However, this rule should be applied only when the number of observations are not very small. Moreover, the number or class intervals should be such that they give uniform and unimodal distribution which means that the frequencies in the given classes increase and decrease steadily and there are no sudden jumps. The number of classes should be an integer preferably 5 or some multiples of 5, 10, 15, 20, 25 etc. which are quite convenient for numerical computations. 3. Size of class intervals : Because the size of the class interval is inversely proportional to the number of classes in a given distribution, the choice about the size of the class interval will also depend upon the sound subjective judgment of the statistician. An approximate value of the magnitude of the class interval say i can be calculated with the help of Sturge’s Rule : Range i= 1 + 3.322 log n where i stands for class magnitude or interval, Range is calculated by taking the difference between largest and smallest value of the distribution, and n refers to total number of observations. If we are given the following information; n = 400, Largest item = 1300 and Smallest item = 340. 1300 − 340 960 960 then, i = = = = 99.54(100 approx) 1 + 3.322log 400 1 + 3.322 × 2.6021 9.644 Another rule of thumb for determining the size of the class interval is that the length of the class interval should not be greater than 1 th of the 4 estimated population standard deviation. If 6 is the estimate of population standard deviation then the length of class interval is given by: i The size of class intervals should be taken as 5 or multiples of 5,10,15 or 20 for easy computations of various statistical measures of the frequency distribution, class intervals should be so fixed that each class has a convenient mid-point around which all the observations in that class cluster. It means that the entire frequency of the class is concentrated at the mid value of the class. This assumption will be true only if the frequencies of the different classes are uniformly distributed in the respective class 10 PAGE © Department of Distance & Continuing Education, Campus of Open Learning, School of Open Learning, University of Delhi BUSINESS STATISTICS intervals. It is always desirable to take the class intervals of equal or Notes uniform magnitude throughout the frequency distribution. 4. Class boundaries : If in a grouped frequency distribution there are gaps between the upper limit of any class and lower limit of the succeeding class (as in case of inclusive type of classification), there is a need to convert the data into a continuous distribution by applying a correction factor for continuity for determining new classes of exclusive type. The lower and upper class limits of new exclusive type classes are called class boundaries. If d is the gap between the upper limit of any class and lower limit of succeeding class, the class boundaries for any class are given by : 1 ⎫ Upper class boundary = Upper class limit + d ⎪ 2 ⎪ ⎬ 1 ⎪ Lower class boundary = Lower class limit − d 2 ⎪⎭ d/2 is called the correction factor. Let us consider the following example to understand: Marks Class Boundaries 20 – 24 (20 – 0.5, 24+ 0.5) i.e., 19.5 – 24.5 25 – 29 (25 – 0.5, 29 + 0.5) i.e., 24.5 – 29.5 30 – 34 (30 – 0.5, 34 + 0.5) i.e., 29.5 – 34.5 35 – 39 (35 – 0.5, 39 + 0.5) i.e., 34.5 – 39.5 40 – 44 (40 – 0.5, 44 + 0.5) i.e., 39.5 – 44.5 d 35 − 34 1 Correction factor = = = = 0.5 2 2 2 5. Mid-value or class mark: Mid value or class mark is the value of a variable which lies exactly at the middle of a class. Mid-value of any class is obtained on dividing the sum of the upper and lower class limits by 2. Mid value of a class = 1 [Lower class limit + Upper class limit] 2 The class limits should be selected in such a manner that the observations in any class are evenly distributed throughout the class interval so that the actual average of the observations in any class is very close to the mid-value of the class. PAGE 11 © Department of Distance & Continuing Education, Campus of Open Learning, School of Open Learning, University of Delhi B.COM. (PROGRAMME)/B.COM. (HONS.) Notes 6. Open end classes : The classification is termed as open end classification if the lower limit of the first class or the upper limit of the last class or both are not specified and such classes in which one of the limits is missing are called open end classes. For example, the classes like the marks less than 20 or age below 60 years. As far as possible open end classes should be avoided because in such classes the mid-value cannot be accurately obtained. But if the open end classes are inevitable then it is customary to estimate the class mark or mid-value for the first class with reference to the succeeding class. In other words, we assume that the magnitude of the first class is same as that of the second class. Example 1 : Construct a frequency distribution from the following data by inclusive method taking 4 as the class interval : 10 17 15 22 11 16 19 24 29 18 25 26 32 14 17 20 23 27 30 12 15 18 24 36 18 15 21 28 33 38 34 13 10 16 20 22 29 19 23 31 Solution : Because the minimum value of the variable is 10 which is a very convenient figure for taking the lower limit of the first class and the magnitude of the class interval is given to be 4, the classes for preparing frequency distribution by the Inclusive Method will be 10-13, 14-17, 18-21, 22-25,............38-41. Frequency Distribution Class Interval Tally Bars Frequency (f) 10 – 13 |||| 5 14 – 17 |||| ||| 8 18 – 21 |||| ||| 8 22 – 25 |||| || 7 26 – 29 |||| 5 30 – 33 |||| 4 34 – 37 || 2 38 – 41 | 1 12 PAGE © Department of Distance & Continuing Education, Campus of Open Learning, School of Open Learning, University of Delhi BUSINESS STATISTICS Example 2 : Prepare a statistical table from the following : Notes Weekly wages (Rs.) of 100 workers of Factory A 88 23 27 28 86 96 94 93 86 99 82 24 24 55 88 99 55 86 82 36 96 39 26 54 87 100 56 84 83 46 102 48 27 26 29 100 59 83 84 48 104 46 30 29 40 101 60 89 46 49 106 33 36 30 40 103 70 90 49 50 104 36 37 40 40 106 72 94 50 60 24 39 49 46 66 107 76 96 46 67 26 78 50 44 43 46 79 99 36 68 29 67 56 99 93 48 80 102 32 51 Solution : The lowest value is 23 and the highest 106. The difference in the lowest and highest value is 83. If we take a class interval of 10, nine classes would be made. The first class should be taken as 20–30 instead of 23–33 as per the guidelines of classification. Frequency Distribution of the Wages of 100 Workers Wages (Rs.) Tally Bars Frequency (f) 20 – 30 |||| |||| ||| 13 30 – 40 |||| |||| | 11 40 – 50 |||| |||| |||| ||| 18 50 – 60 |||| |||| 10 60 – 70 |||| | 6 70 – 80 |||| 5 80 – 90 |||| |||| |||| 14 90 – 100 |||| |||| || 12 100 – 110 |||| |||| | 11 Total 100 1.5 Graphs The guiding principles for the graphic representation of the frequency distributions are precisely the same as for the diagrammatic and graphic representation of other types of data. The information contained in a frequency distribution can be shown in graphs which reveals the important PAGE 13 © Department of Distance & Continuing Education, Campus of Open Learning, School of Open Learning, University of Delhi B.COM. (PROGRAMME)/B.COM. (HONS.) Notes characteristics and relationships that are not easily discernible on a simple examination of the frequency tables. The most commonly used graphs for charting a frequency distribution for the general understanding of the details of the data are : 1. Histogram 2. Frequency polygon 3. Smoothed frequency curves/Frequency Curves 4. Ogives or cumulative frequency curves. 1.5.1 Histogram The term ‘histogram’ must not be confused with the term ‘historigram’ which relates to time charts. Histogram is the best way of presenting graphically a simple frequency distribution. The statistical meaning of histogram is that it is a graph that represents the class frequencies in a frequency distribution by vertical adjacent rectangles. While constructing histogram the variable is always taken on the X-axis and the corresponding frequencies on the Y-axis. Each class is then represented by a distance on the scale that is proportional to its class- interval. The distance for each rectangle on the X-axis shall remain the same in case the class-intervals are uniform throughout; if they are different the width of the rectangles shall also change proportionately. The Y-axis represents the frequencies of each class which constitute the height of its rectangle. We get a series of rectangles each having a class interval distance as its width and the frequency distance as its height. The area of the histogram represents the total frequency. The histogram should be clearly distinguished from a bar diagram. A bar diagram is one-dimensional i.e., only the length of the bar is important and not the width, a histogram is two-dimensional, that is, in a histogram both the length and the width are important. However, a histogram can be misleading if the distribution has unequal class-intervals and suitable adjustments in frequencies are not made. The technique of constructing histogram is explained for : (i) distributions having equal class-intervals and (ii) distributions having unequal class-intervals. 14 PAGE © Department of Distance & Continuing Education, Campus of Open Learning, School of Open Learning, University of Delhi BUSINESS STATISTICS When class-intervals are equal, take frequency on the Y-axis, the variable Notes on the X-axis and construct rectangles. In such a case the heights of the rectangles will be proportional to the frequencies. Example 3 : Draw a histogram from the following data : Classes Frequency 0 – 10 5 10 – 20 11 20 – 30 19 30 – 40 21 40 – 50 16 50 – 60 10 60 – 70 8 70 – 80 6 80 – 90 3 90 – 100 1 Solution: When class-intervals are unequal the frequencies must be adjusted before constructing a histogram. We take that class which has the lowest class- interval and adjust the frequencies of other classes accordingly. If one class-interval is twice as wide as the one having the lowest class-interval we divide the height of its rectangle by two, if it is three times more we divide it by three etc., the heights will be proportional to the ratios of the frequencies to the width of the classes. PAGE 15 © Department of Distance & Continuing Education, Campus of Open Learning, School of Open Learning, University of Delhi B.COM. (PROGRAMME)/B.COM. (HONS.) Notes Example 4 : Represent the following data on a histogram. Average monthly income of 1035 employees in a construction industry is given below: Monthly Income (Rs.) No. of Workers 600 – 700 25 700 – 800 100 800 – 900 150 900 – 1000 200 1000 – 1200 240 1200 – 1400 160 1400 – 1500 50 1500 – 1800 90 1800 or more 20 Solution : Histogram showing monthly incomes of workers Y 200 NUMBER OF WORKERS 150 100 50 X 600 700 800 900 1000 1100 1200 1300 1400 1500 1800 MONTHLY INCOME When mid point are given, first we ascertain the upper and lower limits of each class and then construct the histogram in the same manner. Example 5 : Draw a histogram of the following distribution : Life of Electric Lamps Firm A Firm B in hours 1010 10 287 1030 130 105 1050 482 26 1070 360 230 1090 18 352 16 PAGE © Department of Distance & Continuing Education, Campus of Open Learning, School of Open Learning, University of Delhi BUSINESS STATISTICS Solution : Since we are given the mid points, we should ascertain the Notes class limits. To calculate the class limits of various classes, take difference of two consecutive mid-points and divide the difference by 2, then add and subtract the value obtained from each mid-point to calculate lower and higher class-limits. Life of Electric Frequency Frequency Lamps Firm A Firm B 1000–1020 10 287 1020–1040 130 105 1040–1060 482 76 1060–1080 360 230 1080–1100 18 352 HISTOGRAM (FIRM A) HISTOGRAM (FIRM A) 500 500 400 400 FREQUENCY FREQUENCY 300 300 200 200 100 100 1000 1020 1040 1060 1080 1100 1000 1020 1040 1060 1080 1100 LIFE OF LAMPS LIFE OF LAMPS 1.5.2 Frequency Polygon This is a graph of frequency distribution which has more than four sides. It is particularly effective in comparing two or more frequency distributions. There are two ways of constructing a frequency polygon. (i) We may draw a histogram of the given data and then join by straight line the mid-points of the upper horizontal side of each rectangle with the adjacent ones. The figure so formed shall be frequency polygon. Both the ends of the polygon should be extended to the base line in order to make the area under frequency polygons equal to the area under Histogram. PAGE 17 © Department of Distance & Continuing Education, Campus of Open Learning, School of Open Learning, University of Delhi B.COM. (PROGRAMME)/B.COM. (HONS.) Notes Y 400 NUMBER OF STUDENTS (FREQUENCY) 300 200 100 0 X CLASS MARK (ii) Another method of constructing frequency polygon is to take the mid-points of the various class-intervals and then plot the frequency corresponding to each point and join all these points by straight lines. The figure obtained by both the methods would be equal. Y 400 1 2 NUMBER OF STUDENTS (FREQUENCY) 5 4 5 300 3 5 200 7 3 2 100 r 2 8 1 9 X 0 CLASS MARK Frequency polygon has an advantage over the histogram. The frequency polygons of several distributions can be drawn on the same axis, which makes comparisons possible whereas histogram cannot be usefully 18 PAGE © Department of Distance & Continuing Education, Campus of Open Learning, School of Open Learning, University of Delhi BUSINESS STATISTICS employed in the same way. To compare histograms we draw them on Notes separate graphs. 1.5.3 Smoothed Frequency Curve/Frequency Curves A smoothed frequency curve is popularly known as Frequency Curve. A smoothed frequency curve can be drawn through the various points of the polygon. The curve is drawn by free hand in such a manner that the area included under the curve is approximately the same as that of the polygon. The object of drawing a smoothed curve is to eliminate as far as possible all accidental variations which exists in the original data, while smoothening, the top of the curve would overtop the highest point of polygon particularly when the magnitude of the class interval is large. The curve should look as regular as possible and all sudden turns should be avoided. The extent of smoothening would depend upon the nature of the data. For drawing smoothed frequency curve it is necessary to first draw the polygon and then smoothen it. We must keep in mind the following points to smoothen a frequency graph : (i) Only frequency distribution based on samples should be smoothened. (ii) Only continuous series should be smoothened. (iii) The total area under the curve should be equal to the area under the histogram or polygon. The diagram given below will illustrate the point: HISTOGRAM FREQUENCY POLYGON AND CURVE 50 HISTOGRAM 40 FREQUENCY CURVE 30 NO. OF LEAVES 20 FREQUENCY 10 POLYGON 6.5 7.5 8.5 9.5 10.5 11.5 12.5 13.5 14.5 LENGTH OF LEAVES (cm) PAGE 19 © Department of Distance & Continuing Education, Campus of Open Learning, School of Open Learning, University of Delhi B.COM. (PROGRAMME)/B.COM. (HONS.) Notes 1.5.4 Cumulative Frequency Curves or Ogives We have discussed the charting of simple distributions where each frequency refers to the measurement of the class-interval against which it is placed. Sometimes it becomes necessary to know the number of items whose values are greater or less than a certain amount. We may, for example, be interested in knowing the number of students whose weight is less than 65 lbs. or more than say 15.5 lbs. To get this information, it is necessary to change the form of frequency distribution from a simple to a cumulative distribution. In a cumulative frequency distribution of the frequency of each class is made to include the frequencies of all the lower or all the upper classes depending upon the manner in which cumulation is done. The graph of such a distribution is called a cumulative frequency curve or an Ogive. There are two method of constructing ogives, namely : (i) Less than method, and (ii) More than method. In the less than method, we start with the upper limit of each class and go on adding the frequencies. When these frequencies are plotted we get a rising curve. In the more than method, we start with the lower limit of each class and we subtract the frequency of each class from total frequencies. When these frequencies are plotted, we get a declining curve. This example would illustrate both types of ogives. Example 6 : Draw ogives by both the methods from the following data. Distribution of weight of the students of a college (lbs.) Weights No. of Students 90.5–100.5 5 100.5–110.5 34 110.5–120.5 139 120.5–130.5 300 130.5–140.5 367 140.5–150.5 319 150.5–160.5 205 160.5–170.5 76 170.5–180.5 43 20 PAGE © Department of Distance & Continuing Education, Campus of Open Learning, School of Open Learning, University of Delhi BUSINESS STATISTICS Weights No. of Students Notes 180.5–190.5 16 190.5–200.5 3 200.5–210.5 4 210.5–220.5 3 220.5–230.5 1 Solution : First of all we shall find out the cumulative frequencies of the given data by less than method. Less than (Weights) Cumulative frequency 100.5 5 110.5 39 120.5 178 130.5 478 140.5 845 150.5 1164 160.5 1369 170.5 1445 180.5 1488 190.5 1504 200.5 1507 210.5 1511 220.5 1514 230.5 1515 Plot these frequencies and weights on a graph paper. The curve formed is called an Ogive. PAGE 21 © Department of Distance & Continuing Education, Campus of Open Learning, School of Open Learning, University of Delhi B.COM. (PROGRAMME)/B.COM. (HONS.) Notes 1500 1250 1000 CUMULATIVE FREQUENCY 750 500 250 X 0 90.5 100.5 110.5 120.5 130.5 140.5 150.5 160.5 170.5 180.5 190.5 200.5 210.5 220.5 230.5 SIZES Now we calculate the cumulative frequencies of the given data by more than method. More than (Weights) Cumulative frequencies 90.5 1515 100.5 1510 110.5 1476 120.5 1337 130.5 1037 140.5 670 150.5 351 160.5 146 170.5 70 180.5 27 190.5 11 200.5 8 210.5 4 220.5 1 By plotting these frequencies on a graph paper, we will get a declining curve which will be our cumulative frequency curve or Ogive by More than method. 22 PAGE © Department of Distance & Continuing Education, Campus of Open Learning, School of Open Learning, University of Delhi BUSINESS STATISTICS Y Notes 1500 1250 1000 CUMULATIVE FREQUENCY 750 500 250 X 0 90.5 100.5 110.5 120.5 130.5 140.5 150.5 160.5 170.5 180.5 190.5 200.5 210.5 220.5 230.5 SIZES Although the graphs are a powerful and effective media of presenting statistical data, they are not under all circumstances and for all purposes complete substitutes for tabular and other forms of presentation. The specialist in this field is one who recognizes not only the advantages but also the limitations of these techniques. He knows when to use and when not to use these methods and from his experience and expertise is able to select the most appropriate method for every purpose. Example 7 : Draw an ogive by less than method and determine the number of companies getting profits between Rs. 45 crores and Rs. 75 crores : Profits (Rs. crores) No. of Companies 10–20 8 20–30 12 30–40 20 40–50 24 50–60 15 60–70 10 70–80 7 80–90 3 90–100 1 PAGE 23 © Department of Distance & Continuing Education, Campus of Open Learning, School of Open Learning, University of Delhi B.COM. (PROGRAMME)/B.COM. (HONS.) Notes Solution : Profit (Rs. No. of OGIVE BY LESS THAN METHOD Crores) Companies Less than 20 8 100 92 Less than 30 20 80 NO. OF COMPANIES 92–51 = 41 Less than 40 40 60 Less than 50 64 51 Less than 60 79 40 Less than 70 89 20 Less than 80 96 Less than 90 99 20 30 40 45 50 60 70 75 80 85 Less than 100 100 PROFIT RS. IN CRORES It is clear from the graph that the number of companies getting profits less than Rs. 75 crores is 92 and the number of companies getting profits less than Rs. 45 crores is 51. Hence the number of companies getting profits between Rs. 45 crores and Rs. 75 crores is 92–51 = 41. Example 8 : The following distribution is with regard to weight in grams of mangoes of a given variety. If mangoes of weight less than 443 grams be considered unsuitable for foreign market, what is the percentage of total yield suitable for it? Assume the given frequency distribution to be typical of the variety: Weight in gms. No. of Mangoes 410–419 10 420–429 20 430–439 42 440–449 54 450–459 45 460–469 18 470–479 7 Draw an ogive of ‘more than’ type of the above data and deduce how many mangoes will be more than 443 grams. Solution : Mangoes weighing more than 443 gms. are suitable for foreign market. Number of mangoes weighing more than 443 gms lies in the last four classes. Number of mangoes weighing between 444 and 449 grams would be: 24 PAGE © Department of Distance & Continuing Education, Campus of Open Learning, School of Open Learning, University of Delhi BUSINESS STATISTICS 6 324 Notes × 54 = = 32.4 10 10 Total number of mangoes weighing more than 443 gms. = 32.4 + 45 + 18 + 7 = 102.4 102.4 Percentage of mangoes = = × 100 = 52.25 196 Therefore, the percentage of the total mangoes suitable for foreign market is 52.25. OGIVE BY MORE THAN METHOD Weight more No. of Mangoes than (gms.) 410 196 OGIVE BY MORE THAN METHOD 200 420 186 180 430 166 160 No. of mangoes 140 440 124 120 450 70 100 80 460 25 60 470 7 40 20 410 420 430 440 450 460 470 Weight in grams From the graph it can be seen that there are 103 mangoes whose weight will be more than 443 gms and are suitable for foreign market. 1.6 Summary A frequency distribution aims to reduce the size of the given set of data for a better comprehension. An array, which is an arrangement of data in an ascending or descending order of magnitude, is a useful step in preparing a frequency distribution. To prepare a frequency distribution, we have to decide about the class intervals to be taken. The width of class intervals depends on the number of classes. The number of classes should not be very small or very large. Given values are considered one by one and placed in appropriate class intervals. The number of observations in each class is called the class frequency. The class intervals may be overlapping Like 10–20, 20–30, etc. or inclusive like 10–19, 20–29, etc. Inclusive class intervals should be PAGE 25 © Department of Distance & Continuing Education, Campus of Open Learning, School of Open Learning, University of Delhi B.COM. (PROGRAMME)/B.COM. (HONS.) Notes transformed into exclusive classes, depending on the way the given data are recorded. Class mid-points are the points that lie halfway between the two class limits. The frequencies of a distribution can also be cumulated in ascending or descending order. They are known as ACF and DCF. respectively. The ACF are ‘less than’ cumulative frequencies while the DCF are ‘more than’ cumulative frequencies. Absolute class frequencies may also be expressed as relative frequencies, either as proportions or percentages. A frequency distribution may have class intervals with equal or unequal width. A frequency distribution may be shown graphically by a histogram and frequency polygon. A histogram consists of bars drawn over class limits with heights of bars such that the areas of the bars are proportional to the frequencies of various class intervals. A frequency polygon is a line chart and is drawn by joining points given by the class mid-points and class frequencies. Cumulative frequencies arc shown graphically by means of gives. 1.7 Self-Assessment Questions Exercise 1 : True or False Statements (i) Before constructing a frequency distribution, it is necessary that the data be arranged as an array. (ii) If the class intervals are given in the exclusive form as 10–20, 20–30, etc. then a value exactly equal to 20 may be included in either of these classes. (iii) In the case of inclusive class intervals, the class mid-points are determined only after converting them into exclusive form. (iv) The number and width of class intervals are determined independently of each other. (v) A frequency distribution must have all class intervals of equal width. (vi) A distribution can have both ends open. (vii) A bivariate frequency distribution can be prepared only when both the variables involved are discrete or are continuous. (viii) Relative frequencies are obtained by dividing the frequencies of various classes by the width of the respective classes. 26 PAGE © Department of Distance & Continuing Education, Campus of Open Learning, School of Open Learning, University of Delhi BUSINESS STATISTICS (ix) Frequency density is another name for relative frequency. Notes (x) The proportionate frequencies facilitate comparison between distributions better than absolute frequencies. (xi) It is never possible to calculate absolute frequencies from the proportionate frequencies for a distribution. (xii) In presenting a distribution graphically, the variable is shown horizontally while the frequencies are shown vertically. (xiii) It is necessary that the widths of bars representing various class intervals of a frequency distribution be always equal. (xiv) The areas covered by a histogram and a frequency polygon are equal. (xv) Strictly speaking, a histogram cannot be drawn for an open-ended distribution. Ans. (i) F (ii) F (iii) T (iv) F (v) F (vi) T (vii) F (viii) F (ix) F (x) T (xi) F (xii) T (xiii) F (xiv) T (xv) T Exercise 2 : Questions and Answers (i) What is a frequency distribution? Explain the process of preparing a univariate frequency distribution. (ii) Explain the following: (a) Grouping error (b) Cumulative frequencies (c) Relative frequencies (d) Frequency density (iii) What is a bivariate frequency distribution? How is it constructed? Can we prepare a bivariate frequency distribution if one of the variables is discrete and the other is continuous? (iv) Explain the drawing of histogram when class intervals are equal and when they are not equal. (v) What are ogives? How are they constructed and what information do they provide? PAGE 27 © Department of Distance & Continuing Education, Campus of Open Learning, School of Open Learning, University of Delhi B.COM. (PROGRAMME)/B.COM. (HONS.) Notes (vi) From the time cards of a factory, the following information has been obtained about the number of days each one of the 48 workers has reported late for the work during the last month: 3 0 5 0 6 2 1 0 4 6 5 2 1 1 1 3 4 2 2 5 6 3 0 2 2 3 2 5 4 2 4 3 5 2 2 2 4 6 4 0 3 1 1 4 5 2 1 1 Prepare a frequency distribution using this information. Also, indicate percentage frequencies. (vii) XYZ Company collected data regarding the number of interviews required for each of its 40 sales persons to make their most recent sale. Following are those numbers: 102 95 90 90 101 60 80 113 102 110 126 66 121 116 139 72 101 93 114 99 112 105 97 100 99 115 129 111 119 81 91 93 119 113 128 110 75 87 107 108 (a) Construct a frequency distribution with six class intervals. (b) Construct a histogram from the data. (viii) If the class mid-points in a frequency distribution of weights of a group of students are 125, 132, 139, 146, 153, 160, 167, 174 and 181 lbs. find: (a) Size of the class interval. (b) Class limits assuming weights have been measured to the nearest pound. (ix) Convert the following class intervals into exclusive form: (a) (b) (c) Diameters (in cm) Age in years Height in inches 0.5–0.9 5–9 60–64 1.0–1.4 10–14 65–69 1.5–1.9 15–24 70–74 2.0–2.4 25–39 75–79 2.5–2.9 40–59 80–84 3.0–3.4 60–79 85–89 (x) The monthly profits earned by 100 companies during the last financial year are given below: 28 PAGE © Department of Distance & Continuing Education, Campus of Open Learning, School of Open Learning, University of Delhi BUSINESS STATISTICS Monthly Profit No. of Monthly Profit No. of Notes (Rs. lakhs) Companies (Rs. lakhs) Companies 20–30 4 60–70 15 30–40 8 70–80 10 40–50 18 80–90 8 50–60 30 90–100 7 (a) Draw an ogive by ‘less than’ method and ‘more than’ method. (b) Obtain the limits of monthly profits of central 50 per cent of the companies and check these values against the formula calculated values. (xi) The salary distribution of employees of a company is given below Salary (in ’000 Rs.) No. of Employees 8–10 18 10–12 32 12–14 70 14–16 88 16 –18 64 18–20 44 20–22 24 22–24 10 (a) Show these data by means of a histogram and frequency polygon on the same graph. (b) Draw a more-than ogive and using it estimate (i) the number of employees earning more than Rs. 16,500; and (ii) the number of employees earning less than Rs. 13,000. (xii) The following table gives the distribution of weekly income of 160 families: Weekly Income (Rs.) No. of Families 2,000–4,000 20 4,000–6,000 40 6,000–8,000 50 8,000–12,000 32 12,000–16,000 16 16,000–20,000 2 PAGE 29 © Department of Distance & Continuing Education, Campus of Open Learning, School of Open Learning, University of Delhi B.COM. (PROGRAMME)/B.COM. (HONS.) Notes Draw a ‘less than’ ogive and answer the following from it: (a) What are the limits within which incomes of the middle 50 per cent of the families lie? (b) It is decided that 80 per cent of the families should pay income tax. What is the minimum taxable income? (c) What is the minimum income of the richest 30 per cent of the families? Ans. (x) (b) = 47 and 70 (xi) (b) = 126 and 86 (xii) (a) = 5000 – 9250 (b) = 4600 (c) = 8250 30 PAGE © Department of Distance & Continuing Education, Campus of Open Learning, School of Open Learning, University of Delhi L E S S O N 2 Measures of Central Tendency–Mathematical and Positional Averages STRUCTURE 2.1 Learning Objectives 2.2 What is Central Tendency? 2.3 Objectives of Central Tendency 2.4 Characteristics 2.5 Types of Averages 2.6 Mean 2.7 Geometric Mean 2.8 Harmonic Mean 2.9 Median 2.10 Other Positional Averages 2.11 Calculation of Missing Frequencies 2.12 Mode 2.13 Summary 2.14 Self-Assessment Questions 2.1 Learning Objectives After reading this lesson, you should be able to: Learn the meaning of central tendency and other averages. Learn the process of computing arithmetic mean, weighted Mean, Harmonic mean, Geometric mean, Median, Deciles, Quartiles, Percentiles and Mode under different situations. PAGE 31 © Department of Distance & Continuing Education, Campus of Open Learning, School of Open Learning, University of Delhi B.COM. (PROGRAMME)/B.COM. (HONS.) Notes Comprehend mathematical properties of Arithmetic average. Learn specific uses of different averages. 2.2 What is Central Tendency? One of the important objectives of statistical is to find out various numerical values which explains the inherent characteristics of a frequency distribution. The first of such measures are averages. The averages are the measures which condense a huge unwieldy set of numerical data into single numerical values which represent the entire distribution. The inherent inability of the human mind to a large body of numerical data compels us to few constants that will describe the data. Averages provide us the gist and give a bird’s eye view of the huge mass of unwieldy numerical data. Averages are the typical values around which other items of the distribution congregate. This value lie between the two extreme observation of the distribution and give us an idea about the concentration of the values in the central part of the distribution. They are called the measures of central tendency. Averages are also called measures of location since they enable us to locate the position or place of the distribution in question. Averages are statistical constants which enables us to comprehend in a single value the significance of the whole. According to Croxton and Cowden, an average value is a single value within the range of the data that is used to represent all the values in that series. Since an average is somewhere within the range of the data, it is sometimes called a measure of central value. An average, known as the measure of central tendency, is the most typical representative item of the group to which it belongs and which is capable of revealing all important characteristics of that group or distribution. 2.3 Objectives of Central Tendency The most important object of calculating an average or measuring central tendency is to determine a single figure which may be used to represent a whole series involving magnitudes of the same variable. 32 PAGE © Department of Distance & Continuing Education, Campus of Open Learning, School of Open Learning, University of Delhi BUSINESS STATISTICS Second object is that an average represents the entire data, it facilitates Notes comparison within one group or between group of data. Thus, the performance of the members of a group can be compared with the average performance of different group. Third object is that an average helps in computing various other statistical measures such as dispersion, skewness. kurtosis etc. 2.4 Characteristics An average represents the statistical data and it is used for purposes of comparison, it must possess the following properties. 1. It must be rigidly defined and not left to the mere estimation of the observer. If the definition is rigid, the computed value of the average obtained by different persons shall be similar. 2. The average must be based upon all values given in the distribution. If the item is not based on all values it might not be representative of the entire group of data. 3. It should be easily understood. The average should possess simple and obvious properties. It should be too abstract for the common people. 4. It should be capable of being calculated with reasonable care and rapidity. 5. It should be stable and unaffected by sampling fluctuations. 6. It should be capable of further algebraic manipulation. 2.5 Types of Averages Different methods of measuring “Central Tendency” provide us with different kinds of averages. The following are the main types of averages that are commonly used : (A) Mean (i) Arithmetic mean (ii) Weighted mean (iii) Geometric mean PAGE 33 © Department of Distance & Continuing Education, Campus of Open Learning, School of Open Learning, University of Delhi B.COM. (PROGRAMME)/B.COM. (HONS.) Notes (iv) Harmonic mean (B) Median (C) Mode 2.6 Mean 2.6.1 Arithmetic Mean The arithmetic mean of a series is the quotient obtained by dividing the sum of the values by the number of items. In algebraic language, if X1, X2, X3,.........Xn are the n values of a variate X, then the Arithmetic Mean (X) is defined by the following formula: 1 X = (X1 + X 2 + X3 +............. + X n ) n 1⎛ n ⎞ ∑X = ⎜ ∑ Xi ⎟ = n ⎝ i=1 ⎠ N Example 1 : The following are the monthly salaries (Rs.) of ten employees in an office. Calculate the mean salary of the employees: 250, 275, 265, 280, 400, 490, 670, 890, 1100, 1250. Solution : ∑X X = N 250 + 275 + 265 + 280 + 400 + 490 + 670 + 890 + 1100 + 1250 5870 X= = = Rs. 587 10 10 Short-cut Method : Direct method is suitable where the number of items is moderate and the figures are small sizes and integers. But if the number of items is large and/or the values of the variate are big, then the process of adding together all the values may be a lengthy process. To overcome this difficulty of computations, a short-cut method may be used. Short cut method of computation is based on an important characteristic of the arithmetic mean, that is, the algebraic sum of the deviations of a series of individual observations from their mean is always equal to zero. Thus deviations of the various values of the variate from an assumed mean 34 PAGE © Department of Distance & Continuing Education, Campus of Open Learning, School of Open Learning, University of Delhi BUSINESS STATISTICS computed and the sum is divided by the number of items. The quotient Notes obtained is added to the assumed mean to find the arithmetic mean. Σdx Symbolically, X = A + , where A is assumed mean and deviations N or dx = (X – A). We can solve the previous example by short-cut method. Computation of Arithmetic Mean Serial Salary (Rupees) Deviations from assumed mean Number X where dx = (X – A), A = 400 1. 250 – 150 2. 275 – 125 3. 265 – 135 4. 280 – 120 5. 400 0 6. 490 + 90 7. 670 + 270 8. 890 + 490 9. 1100 + 700 10. 1250 + 850