Business Statistics PDF - Master in Management Studies, Semester-I

Document Details

Uploaded by Deleted User

University of Mumbai

2022

University of Mumbai

Tags

business statistics management studies MBA statistics

Summary

This document is a Master in Management Studies (Semester I) past paper for Business Statistics, published by Mumbai University. It covers fundamental statistical concepts and techniques, along with examples and solved problems in hypothesis testing.

Full Transcript

MASTER IN MANAGEMENT STUDIES SEMESTER - I BUSINESS STATISTICS SUBJECT CODE : UMMSI.3 © UNIVERSITY OF MUMBAI Prof. Suhas Pednekar Vice-Chancellor, Universityof Mumba...

MASTER IN MANAGEMENT STUDIES SEMESTER - I BUSINESS STATISTICS SUBJECT CODE : UMMSI.3 © UNIVERSITY OF MUMBAI Prof. Suhas Pednekar Vice-Chancellor, Universityof Mumbai, Prof. Ravindra D. Kulkarni Prof. Prakash Mahanwar Pro Vice-Chancellor, Director, Universityof Mumbai, IDOL, Universityof Mumbai, Program Co-ordinator : Ms. Rajashree Pandit Assistant Professor (Economics), Head Faculty of Commerce & Management, IDOL, Universityof Mumbai, Mumbai. Course Co-ordinator : Ms. Anitha Menon Assistant Prof. IDOL, University of Mumbai, Mumbai Course Writers : Dr. C.V. Joshi Visiting Faculty Alkesh Dinesh Mody Institute (ADMI), University of Mumbai, Mumbai January 2022, Print 1 Published by : Director Incharge Institute of Distance and Open Learning , Universityof Mumbai, Vidyanagari, Mumbai - 400 098. ipin Enterprises DTP Composed : MumbaiJogani Tantia UniversityPress Industrial Estate, Unit No. 2, Printed by Vidyanagari, Santacruz (E), Compound, Ground Floor, Sitaram Mill Mumbai J.R. Boricha Marg, Mumbai - 400 011 CONTENTS Unit No. Title Page No. 1. Introduction to Statistics 1 2. Data Collection and Presentation 10 3. Measures of Dispersion 27 4. Permutations and Combinations 52 5. Probabiity 62 6. Random variable and its ProbabilityDistribution 75 7. Introduction to Sampling and Reasons for Sampling 97 8. Sampling types and Methods - Random Sampling & Non-Random Sampling 108 9. Testing of Hypothesis - One Sample 132 10. Testing of Hypothesis - Two Samples (Related and Independent) 148  Masters in Management Studies Semester – I Business Statistics Syllabus Semester : I – Core Title of the Subject / course : Business Statistics Course Code : Credits : 4 Duration in Hrs. : Learning Objectives 1 To know statistical techniques 2 To understand different statistical tools 3 To understand importance of decision support provided by analysis techniques 4 To appreciate and apply it in business situations using caselets, modeling, cases and projects 5 To understand Managerial applications of Statistics Prerequisites if any Basic Mathematics Connections with Operations Research, Economics, Research Subjects in the Methodology, Quantitative currentor Future Techniques, Project Management, courses Financial Management, production and operations management, Module Sr. Content Activity Learning outcomes No. Revision of Problem solving, Learner will be able to 1 Data cases apply these basic Representatio demonstrating concepts in business n, Central typical usesof situations,Analyse Tendency mean, mode charts graphs to and median, analyse business Dispersion Use of situations Kurtosis and Microsoft Skewness Excel,available software I Probability- Axioms, Addition and Solving Multiplication rule, Understand the 2 problems and Types ofprobability, uncertainty in Caselets, Independence of business situations Writing short cases as probability events, probability tree, Bayes’ Theorem Concept of Random variable, Probability Problem 3 solving , Understand decision distribution, under risk, useof Expected value and Creating decision tree, conditional variance ofrandom expectation as basis variable, cases for comparison conditional expectation, Classical News Paper boys problem(EMV ,EVPI) Probability Problem solving, Use of distributions in 4 distributions Microsoftexcel, Quality Binomial, cases control, Six Poisson, sigma and Normal processcontrol 5 Sampling distribution Problem solving, Importance of Central MicrosoftExcel limit theorem 6 Estimation- Point Problem solving, Understand estimation , MicrosoftExcel Confidence interval Interval asway of hypothesis estimation testing 7 Hypothesis Problem solving, Use in research testing- students t, Microsoftexcel, Chi square, Z cases 8 Analysis of Problem solving, Use in research variance- one way, Microsoftexcel, two way cases Text books 1 Statistics for Management Richard Levin , David Rubin, Prentice Hall of India 2 Statistics for Managers Levine, Stephen, Krihbiel, Berenson, Pearson Education 3 Complete Business Aczel Sounderpandian, Tata McGraw Hill Statistics II Reference books 1 Statistics for Business and Newbold, Carlson, Thorne, Pearson Economics Education Anderson, Sweeney, Williams, 2 Statistics for Business and Cengage Learning Economics Albright, Winston, Zappe, Thomson 3 Data Analysis and Decision Making Assessment Internal 40% Semester end 60%  III 1 INTRODUCTION TO STATISTICS Unit Structure 1.1 Meaning 1.2 Statistical Methods 1.3 Importance of Statistics 1.4 Functions of Statistics 1.5 Limitations of Statistics 1.6 Branches in Statistics 1.7 Characteristics of Statistics 1.8 Basic Definitions in Statistics 1.9 Exercise 1.1 MEANING : The word Statistics describes several concepts of importance to decision- maker. It is important for a beginner to have an understanding of these different concepts.  STATISTICAL METHODS V/S EXPERIMENTAL METHODS We try to get the knowledge of any phenomenon through direct experiment. There may be many factors affecting a certain phenomenon simultaneously. If we want to study the effect of a particular factor, we keep other factors fixed and study the effect of only one factor. This is possible in all exact sciences like Physics, Chemistry etc. This method cannot be used in many sciences, where all the factors cannot be isolated and controlled. This difficulty is particularly encountered in social sciences, where we deal with human beings. No two persons are exactly alike. Besides the environment also changes and it has its effect on every human being and therefore it is not possible to study one factor keeping other conditions fixed. Here we use statistical methods. The results obtained by the use of this science will not be as accurate as those obtained by experimental methods. Even then they are of much use and they have a very important role to play in the modern World. Even in exact sciences some of the statistical methods are made use of. The word Statistics is derived from the Latin word "statis' which means a political state. The word Statistics was originally applied to only such facts and figures that were required by the state for official purposes. The earliest form of statistical data is related to census of population and property, through the collection of data for other purposes was not completely ruled out. The word has now acquired a wider meaning. 1 Business Statistics  STATISTICS IN PLURAL Statistics in plural refer to any set of data or information. The president of a company may call for 'statistics on the sales of northern region' or an MP may quote the statistics on price-rise in agricultural products. More familiar examples for the students will be the marks of students in a class, the ages of children in primary school. Prof. Secrist defines the word 'Statistics' in the first sense as follows" "By Statistics we mean aggregate of facts affected to a marked extend by multiplicity of causes, numerically expressed, enumerated or estimated according to reasonable standards of accuracy, collected in a systematic manner for a predetermined purpose and placed in relation to each other." This definition gives all the characteristics of Statistics: i. Aggregate of Facts: A single isolated figure is not 'Statistics.' Marks of one student in one subject will not be called Statistics. But, if we consider the marks of all the students in the class in a particular subject, they will be called 'Statistics.' ii. Affected by Multiplicity of causes: There are various causes for the changes in the data, the marks of the students depend upon, the intelligence of students, their capacity and desire to work etc. iii. Numerically expressed: Unless the characteristics have some numerical measurement they will not be called Statistics. The statement 'A student writes very good English' is not Statistics. But if marks of the whole class in 'English' are given they will be called 'Statistics.' iv. Enumerated or Estimated according to reasonable standards of accuracy: However much a person tries, it is not possible to attain perfect accuracy whether we actually measure or estimate the characteristic. But a certain standard of accuracy should be set up according to the problem under consideration. The estimate for the cost of big project may be correct up to Rs. 1, 000 but for household expenses it should be correct up to a rupee. v. Collected in a systematic manner: There should be a method in the manner of collection, and then only the figures will be reliable and useful. vi. Collected for a predetermined purpose: Unless we know the purpose, the data collected may not be sufficient. Besides some unnecessary information may be collected which will be a waste of time and money. vii. Placed in relation to each other: Only when we want to compare characteristics, which have some relation with each other, we collect Statistics. The wages of fathers and ages of sons should not be collected together. But we can have ages and heights of a group of persons, so that we can find the relation between the two. 2 Introduction to Statistics 1.2 STATISTICAL METHODS The word Statistics used in the second sense means the set of techniques and principles for dealing with data. 1. Suppose you have the data about production profits and sales for a number of years of a company. Statistics in this sense is concerned with questions such as (i) What is the best way to present these data for review? (ii) What processing is required to reveal more details about the data? (iii)What ratios should be obtained and reported? 2. A public agency wants to estimate the number of fish in a lake. Five hundred fish are captured in a net tagged and returned to the lake. One week later 1, 000 fish are captured from the same lake in nets and 40 are found to be with tags. Here Statistics in this second sense deals with questions such as: (i) What is a good estimate of the number of fish in the lake? (ii) What is our confidence in it and how much error can be expected? and (iii)Can we have a method, which will make a better estimate? Statisticians have defined this in various ways. Bowley says, "Statistics may rightly be called the science of averages." But this definition is not correct. Statistics has many refined techniques and it does much more than just averaging the data. Kendall defines it as, "The branch of scientific methods that deals with the data obtained by counting or measuring the properties of population of natural phenomena." This definition does not give the idea about the functions of Statistics. It is rather vague. Seligman defines it as, "The science which deals with the methods of collecting, classifying, presenting, comparing and interpreting numerical data collected to throw some light on any sphere of inquiry." Croxton, Cowden and Klein define it as, "The last two definitions can be considered to be proper which explain the utility of 'statistics'. We will examine the four procedures mentioned in the definition in brief. Collection: The day may be collected from various published and unpublished sources, or the investigator can collect his own information. Collecting first hand information is a very difficult task. The usefulness of the data collected depends to a very great extent upon the manner in which they are collected. Though theoretical knowledge is necessary for the proper collection of data, much can be learnt through experience and observation. 3 Business Statistics Presentation: The data collected, to be understood, should be presented in a suitable form. Just a given mass of figures signifies nothing to a person and they can lead only to confusion. They are usually presented in a tabular form and represented by diagrams. Analysis: Presentation of data in a tabular form is one elementary step in the analysis of the collected data. If we want to compare two series, a typical value for each series is to be calculated. If we want to study some characteristic of a big group, exhaustive study is not possible. We take a sample, study it and inferences are drawn on the basis of sample studies. Sometimes forecasting is necessary. The management of a firm may be interested in future sales. For that it has to analyse the past data. We are going to study some of these methods of analysing the data in this book. Interpretation: This is the final step in an investigation. Based upon the analysis of the data, we draw certain conclusions. While drawing these conclusions, we must consider that nature of the original data. Experts in the particular field of activity must make the final interpretation. The statistical methods are not like experimental methods, which are exact. For interpreting the analysis of the data dealing with some psychological problems, a psychologist is right person. (An economist, though well versed in statistical methods will not be of any use there).  STATISTICAL MEASURES Statistics also has a precise technical meaning. Measures derived from the sample data are referred to as Statistics. If only one measure is obtained it is called a Statistic. A magazine takes a sample of 100 readers. 15 of them are over 30 years of age. The sample proportion of readers over 30 years of age is 0.15. This sample proportion is referred to as a statistic obtained by this survey. The weekly sales for 5 weeks for a salesman are Rs. 2, 000, Rs. 2, 500, Rs. 15, 000, Rs. 3000 and Rs. 1, 800. As a measure of the spread of the values the difference between the smallest and the largest value (called the range) is calculated. This range is a statistic. 1.3 IMPORTANCE OF STATISTICS Statistics is not studied for its own sake. It is employed as a tool to study the problems in various natural and social sciences. The analysis of data is used ultimately for forecasting, controlling and exploring. Statistics is important because it makes the data comprehensible. Without its use the information collected will hardly be useful. To understand the economic condition of any country we must have different economic aspects quantitatively expressed and properly presented. If we want to compare any two countries, statistics is to be used. For studying 4 relationship between two phenomena, we have to take the help of Introduction to Statistics statistics, which explains the correlation between the two. People in business can study past data and forecast the condition of their business, so that they can be ready to handle the situations in future. Nowadays a businessman has to deal with thousands of employees under him and cannot have direct control over them. Therefore, he can judge them all and control their performance using statistical methods e.g., he can set up certain standards and see whether the final product conforms to them. He can find out the average production per worker and see whether any one is giving less, i.e., he is not working properly. Business must be planned properly and the planning to be fruitful must be based on the right analysis of complex statistical data. A broker has to study the pattern in the demand for money by his clients, so that he will have correct amount of reserves ready. Scientific research also uses statistical methods. While exploring new theories, the validity of the theory is to be tested only by using statistical methods. Even in business many new methods are introduced. Whether they are really an improvement over the previous ones, can be tested using statistical techniques. We can see many more examples from almost all sciences, like biology, physics, economics, psychology and show that statistical methods are used in all sciences. The point here is that 'Statistics' is not an abstract subject. It is a practical science and it is very important in the modern World. 1.4 FUNCTIONS OF STATISTICS 1. Statistics presents the data in numerical form: Numbers give the exact idea about any phenomenon. We know that India is overpopulated. But only when we see the census figure, 548 millions, we have the real idea about the population problem. If we want to compare the speed of two workmen working in the same factory, with the same type of machine, we have to see the number of units they turn out every day. Only when we express the facts with the help of numbers, they are convincing. 2. It simplifies the complex data: The data collected are complex in nature. Just by looking at the figures no person can know the real nature of the problem under consideration. Statistical methods make the data easy to understand. When we have data about the students making use of the college library, we can divide the students according to the number of hours spent in the library. We can also see how many are studying and how many are sitting there for general reading. 3. It facilitates comparison: We can compare the wage conditions in two factories by comparing the average wages in the two factories. We can compare the increase in wages and corresponding increase in price level 5 Business Statistics during that period. Such comparisons are very useful in many social sciences. 4. It studies relationship between two factors: The relationship between two factors, like, height and weight, food habits and health, smoking and occurrence of cancer can be studied using statistical techniques. We can estimate one factor given the other when there is some relationship established between two factors. 5. It is useful for forecasting: We are interested in forecasting using the past data. A shopkeeper may forecast the demand for the goods and store them when they are easily available at a reasonable price. He can store only the required amount and there will not be any problem of goods being wasted. A baker estimates the daily demand for bread, and bakes only that amount so that there will be no problem of leftovers. 6. It helps the formulation of policies: By studying the effect of policies employed so far by analysing them, using statistical methods, the future policies can be formulated. The requirements can be studied and policies can be determined accordingly. The import policy for food can be determined by studying the population figures, their food habits etc. 1.5 LIMITATIONS OF STATISTICS Though Statistics is a very useful tool for the study of almost all types of data it has certain limitations. 1. It studies only quantitative data: A very serious drawback is that statistics cannot study qualitative data. Only when we have data expressed in numerical form we can apply statistical methods for analysing them. Characteristics like beauty, cruelty, honesty or intelligence cannot be studied with the help of statistics. But in some cases we can relate the characteristics to number and try to study them. Intelligence of students can be studied by the marks obtained by them in various tests; we can compare the intelligence of students or arrange them in order if we take marks as an indicator of intelligence. Culture of a society or the lack of it can be studied considering the number of charitable institutions, their sizes and number of crimes. 2. It cannot be used for an individual: The conclusions drawn from statistical data are true for a group of persons. They do not give us any knowledge about an individual. Though Statistics can estimate the number of machines in a certain factory that will fail after say, 5 years, it cannot tell exactly which machines will fail. One in 2, 000 patients may die in a particular operation. Statistically this proportion is very small and insignificant. But for the person who dies and his family, the loss is total. Statistics shows now sympathy for such a loss. 6 3. It gives results only on an average: Statistical methods are not exact. Introduction to Statistics The results obtained are true only on an average in the long run. When we say that the average student studies for 2 hours daily there may not be a singly student studying for 2 hours, not only that, every day the average will not be 2 hours. In the long run, if we consider a number of students, the daily average will be 2 hours. 4. The results can be biased: The data collected may sometimes be biased which will make the whole investigation useless. Even while applying statistical methods the investigator has to be objective. His personal bias may unconsciously make him draw conclusions favourable in one way or the other. 5. Statistics can be misused: It is said that statistics can prove or disprove anything. It depends upon how the data are presented. The workers in a factory may accuse the management of not providing proper working conditions, by quoting the number of accidents. But the fact may be that most of the staff is inexperienced and therefore meet with an accident. Besides only the number of accidents does not tell us anything. Many of them may be minor accidents. With the help of the same data the management can prove that the working conditions are very good. It can compare the conditions with working conditions in other factories, which may be worse. People using statistics have to be very careful to see that it is not misused. Thus, it can be seen that Statistics is a very important tool. But its usefulness depends to a great extent upon the user. If used properly, by an efficient and unbiased statistician, it will prove to be a wonderful tool. 1.6 BRANCHES IN STATISTICS Statistics may be divided into two main branches: 1. Descriptive Statistics: In descriptive statistics, it deals with collection of data, its presentation in various forms, such as tables, graphs and diagrams and findings, averages and other measures which would describe the data. For example, Industrial Statistics, population statistics, trade statistics etc....Such as businessmen make to use descriptive statistics in presenting their annual reports, final accounts and bank statements. 2. Inferential Statistics: In inferential statistics deals with techniques used for analysis of data, making the estimates and drawing conclusions from limited information taken on sample basis and testing the reliability of the estimates. For example, suppose we want to have an idea about the percentage of illiterates in our country. We take a sample from the population and find 7 Business Statistics the proportion of illiterates in the sample. This sample proportion with the help of probability enables us to make some inferences about the population proportion. This study belongs to inferential statistics. 1.7 CHARACTERISTICS OF STATISTICS 1. Statistics are aggregates of facts. 2. Statistics are numerically expressed. 3. Statistics are affected to a marked extent by multiplicity of causes. 4. Statistics are enumerated or estimated according to a reasonable standard of accuracy. 5. Statistics are collected for a predetermined purpose. 6. Statistics are collected in a systematic manner. 7. Statistics must be comparable to each other. 1.8 SOME BASIC DEFINITIONS IN STATISTICS  Constant: A quantity which can be assuming only one value is called a constant. It is usually denoted by the first letters of alphabets a, b, c. For example value of π = 22/7 = 3.14159.... and value of e = 2.71828....  Variable: A quantity which can vary from one individual or object to and other is called a variable. It is usually denoted by the last letters of alphabets x, y, z. For example, heights and weights of students, income, temperature, number of children in a family etc.  Continuous variable: A variable which can assume each and every value within a given range is called a continuous variable. It can occur in decimals. For example, heights and weights of students, speed of a bus, the age of a shopkeeper, the life time of a T.V. etc.  Continuous Data: Data which can be described by a continuous variable is called continuous data. For example: Weights of 50 students in a class.  Discrete Variable: A variable which can assume only some specific values within a given range is called discrete variable. It cannot occur in decimals. It can occur in whole numbers. For example: Number of students in a class, number of flowers on the tree, number of houses in a street, number of chairs in a room etc... 8 Introduction to Statistics  Discrete Data: Data which can be described by a discrete variable is called discrete data. For example, Number of students in a College.  Quantitative Variable: A characteristic which varies only in magnitude from an individual to another is called quantitative variable. It can be measurable. For example, wages, prices, heights, weights etc.  Qualitative Variable: A characteristic which varies only in quality from one individual to another is called qualitative variable. It cannot be measured. For example, beauty, marital status, rich, poor, smell etc. 1.9 EXERCISE 1. Explain the meaning of statistics. 2. Give a definition of statistics and discuss it. 3. Explain the functions of statistics. 4. What are the limitations of statistics? 5. Define the term Statistics and discuss its characteristics. 6. Enumerate with example some terms of Statistics. 7. Discuss on the different branches of Statistics.  9 2 DATA COLLECTION AND PRESENTATION Unit Structure : 2.1 Data 2.1.1 Statistical Data 2.1.2 Collection of Data 2.1.3 Types of Data 2.1.4 Methods of Collecting Data 2.2 Classification of Data 2.2.1 Bases of Classification 2.2.2 Types of Classification 2.3 Tabulation of Data 2.3.1 Types of Tabulation 2.4 Frequency Distribution 2.4.1 Construction of Frequency Distribution 2.4.2 Cumulative Frequency Distribution 2.5 Types of Graphs 2.6 Exercise 2.1 DATA 2.1.1 STATISTICAL DATA : A sequence of observation made on a set of objects included in the sample drawn from population is known as statistical data. 1. Ungrouped Data: Data which have been arranged in a systematic order are called raw data or ungrouped data. 2. Grouped Data: Data presented in the form of frequency distribution is called grouped data. 2.1.2 COLLECTION OF DATA The first step in any enquiry (investigation) is collection of data. The data may be collected for the whole population or for a sample only. It is mostly collected on sample basis. Collection of data is very difficult job. The enumerator or investigator is the well trained person who collects the statistical data. The respondents (information) are the persons whom the information is collected. 10 2.1.3 TYPES OF DATA Data Collection and There are two types (sources) for the collection of data: Presentation 1. Primary Data: The primary data are the first hand information collected, compiled and published by organisation for some purpose. They are most original data in character and have not undergone any sort of statistical treatment. For example, Population census reports are primary data because these are collected, compiled and published by the population census organisation. 2. Secondary Data: The secondary data are second hand information which are already collected by someone (organisation) for some purpose and are available for the present study. The secondary data are not pure in character and have undergone some treatment at least once. For example, Economics survey of England is secondary data because these are collected by more than one organisation like Bureau of Statistics, Board of Revenue, the Banks etc. 2.1.4 METHODS OF DATA COLLECTION A) METHODS OF COLLECTING PRIMARY DATA Primary data are collected by the following methods: 1. Personal Investigation: The researcher conducts the survey him/herself and collects data from it. The data collected in this way is usually accurate and reliable. This method of collecting data is only applicable in case of small research projects. 2. Through Investigation: Trained investigators are employed to collect the data. These investigators contact the individuals and fill in questionnaire after asking the required information. Most of the organisations implied this method. 3. Collection through questionnaire: The researchers get the data from local representation or agents that are based upon their own experience. This method is quick but gives only rough estimate. 4. Through Telephone: The researchers get information through telephone. This method is quick. B) METHODS OF COLLECTING SECONDARY DATA The secondary data are collected by the following sources:  Official: The publications of Statistical Division, Ministry of Finance, the Federal Bureaus of Statistics, Ministries of Food, Agriculture, Industry, Labour etc.... 11 Business Statistics  Semi-Official: State Bank, Railway Board, Central Cotton Committee, Boards of Economic Enquiry etc....  Publication of Trade Associations, Chambers of Commerce etc....  Technical and Trade Journals and Newspapers.  Research Organisations such as Universities and other Institutions. C) DIFFERENCE BETWEEN PRIMARY AND SECONDARY DATA The difference between primary and secondary data is only a change of hand. The primary data are the first hand information which is directly collected from one source. They are most original data in character and have not undergone any sort of statistical treatment while the secondary data are obtained from some other sources or agencies. They are not pure in character and have undergone some treatment at least once. For example, suppose we are interested to find the average age of MS students. We collect the age's data by two methods; either by directly collecting from each student himself personally or getting their ages from the University record. The data collected by the direct personal investigator is called primary data and the data obtained from the University record is called Secondary data. D) EDITING OF DATA After collecting the data either from primary or secondary source, the next step is its editing. Editing means the examination of collected data to discover any error before presenting it. It has to be decided before hand what degree of accuracy is wanted and what extent of errors can be tolerated in the inquiry. The editing of secondary data is simpler than that of primary data. 2.2 CLASSIFICATION OF DATA The process of arranging data into homogenous group or classes according to some common characteristics present in the data is called classification. For example, the process of sorting letters in a post office, the letters are classified according to the cities and further arranged according to streets. 2.2.1 BASES OF CLASSIFICATION There are four important bases of classification: 1. Qualitative Base 2. Quantitative Base 3. Geographical Base 4. Chronological or Temporal Base 12 1. Qualitative Base: When the data are classified according to some Data Collection and quality or attributes such as sex, religion, literacy, intelligence etc.. Presentation. 2. Quantitative Base: When the data are classified by quantitative characteristics like heights, weights, ages, income etc.. 3. Geographical Base: When the data are classified by geographical regions or location, like states, provinces, cities, countries etc. 4. Chronological or Temporal Base: When the data are classified or arranged by their time of occurrence, such as years, months, weeks, days etc.... For example, Time Series Data. 2.2.2 TYPES OF CLASSIFICATION : 1. One-way classification: If we classify observed data keeping in view single characteristic, this type of classification is known as one-way classification. For example, the population of world may be classified by religion as Muslim, Christian etc. 2. Two-way classification: If we consider two characteristics at a time in order to classify the observed data then we are doing two-way classification. For example, the population of world may be classified by Religion and Sex. 3. Multi-way classification: We may consider more than two characteristics at a time to classify given data or observed data. In this way we deal in multi-way classification. For example, the population of world may be classified by Religion, Sex and Literacy. 2.3 TABULATION OF DATA The process of placing classified data into tabular form is known as tabulation. A table is a symmetric arrangement of statistical data in rows and columns. Rows are horizontal arrangements whereas columns are vertical arrangements. It may be simple, double or complex depending upon the type of classification. 2.3.1 TYPES OF TABULATION 1. Simple Tabulation or One-way tabulation: When the data are tabulated to one characteristic, it is said to be simple tabulation or one-way tabulation. 13 Business Statistics For example, tabulation of data on population of world classified by one characteristic like Religion is example of simple tabulation. 2. Double Tabulation or Two-way tabulation: When the data are tabulated according to two characteristics at a time. It is said to be double tabulation or two-way tabulation. For example, tabulation of data on population of world classified by two characteristics like religion and sex is example of double tabulation. 3. Complex Tabulation: When the data are tabulated according to many characteristics, it is said to be complex tabulation. For example, tabulation of data on population of world classified by two characteristics like Religion, Sex and Literacy etc... is example of complex tabulation. DIFFERENCES BETWEEN CLASSIFICATION AND TABULATION 1. First the data are classified and then they are presented in tables, the classification and tabulation in fact goes together. So classification is the basis for tabulation. 2. Tabulation is a mechanical function of classification because in tabulation classified data are placed in row and columns. 3. Classification is a process of statistical analysis where as tabulation is a process of presenting the data in suitable form. 2.4 FREQUENCY DISTRIBUTION A frequency distribution is a tabular arrangement of data into classes according to the size or magnitude along with corresponding class frequencies (the number of values fall in each class). Ungrouped data or Raw Data Data which have not been arranged in a systemic order is called ungrouped or raw data. Grouped Data Data presented in the form of frequency distribution is called grouped data. Array The numerical raw data is arranged in ascending or descending order is called an array. Example Array the following data in ascending or descending order 6, 4, 13, 7, 10, 16, 19. 14 Data Collection and Solution Presentation Array in ascending order is 4, 6, 7, 10, 13, 16 and 19. Array in descending order is 19, 16, 13, 10, 7, 6, and 4. CLASS LIMITS The variant values of the classes or groups are called the class limits. The smaller value of the class is called lower class limit and larger value of the class is called upper class limit. Class limits are also called inclusive classes. For example, let us take class 10-19, the smaller value 10 is lower class limit and larger value 19 is called upper class limit. CLASS BOUNDARIES The true values, which describe the actual class limits of a class, are called class boundaries. The smaller true value is called the lower class boundary and the larger true value is called the upper class boundary of the class. It is important to note that the upper class boundary of a class coincides with the lower class boundary of the next class. Class boundaries are also known as exclusive classes. For example, Weights in Kg Number of Students 60-65 8 65-70 12 70-75 5 25 A student whose weights are between 60 kg and 64.5 kg would be included in the 60-65 class. A student whose weight is 65 kg would be included in next class 65-70. A class has either no lower class limit or no upper class limit in a frequency table is called an open-end class. We do not like to use open- end classes in practice, because they create problems in calculation. For example, Weights (Pounds) Number of Persons Below - 110 6 110-120 12 120-130 20 130-140 10 140-above 2 15 Business Statistics Class Mark or Mid Point The class marks or mid point is the mean of lower and upper class limits or boundaries. So it divides the class into two equal parts. It is obtained by dividing the sum of lower and upper-class limit or class boundaries of a class by 2. For example, the class mark or mid-point of the class 60-69 is 60+69/2 = 64.5 Size of Class Interval The difference between the upper and lower class boundaries (not between class limits) of a class or the difference between two successive mid points is called size of class interval. 2.4.1 CONSTRUCTION OF FREQUENCY DISTRIBUTION Following steps are involved in the construction of a frequency distribution. 1. Find the range of the data: The range is the difference between the largest and the smallest values. 2. Decide the approximate number of classes: Which the data are to be grouped. There are no hard and first rules for number of classes. Most of the cases we have 5 to 20 classes. H. A. Sturges has given a formula for determining the approximation number of classes. K = 1 + 3.322 log N where K = Number of classes where log N = Logarithm of the total number of observations For example, if the total number of observations is 50, the number of classes would be: K = 1 + 3.322 log N K = 1 + 3.322 log 50 K = 1 + 3.322 (1.69897) K = 1 + 5.644 K = 6.644 or 7 classes approximately. 3. Determine the approximate class interval size: The size of class interval is obtained by dividing the range of data by number of classes and denoted by h class interval size (h) = Range/Number of Classes In case of fractional results, the next higher whole number is taken as the size of the class interval. 4. Decide the starting Point: The lower class limits or class boundary should cover the smallest value in the raw data. It is a multiple of class interval. 16 For example, 0, 5, 10, 15, 20 etc... are commonly used. Data Collection and Presentation 5. Determine the remaining class limits (boundary): When the lowest class boundary of the lowest class has been decided, then by adding the class interval size to the lower class boundary, compute the upper class boundary. The remaining lower and upper class limits may be determined by adding the class interval size repeatedly till the largest value of the data is observed in the class. 6. Distribute the data into respective classes: All the observations are marked into respective classes by using Tally Bars (Tally Marks) methods which is suitable for tabulating the observations into respective classes. The number of tally bars is counted to get the frequency against each class. The frequency of all the classes is noted to get grouped data or frequency distribution of the data. The total of the frequency columns must be equal to the number of observations. Example, Construction of Frequency Distribution Construct a frequency distribution with suitable class interval size of marks obtained by 50 students of a class are given below: 23, 50, 38, 42, 63, 75, 12, 33, 26, 39, 35, 47, 43, 52, 56, 59, 64, 77, 15, 21, 51, 54, 72, 68, 36, 65, 52, 60, 27, 34, 47, 48, 55, 58, 59, 62, 51, 48, 50, 41, 57, 65, 54, 43, 56, 44, 30, 46, 67, 53. Solution Arrange the marks in ascending order as: 12, 15, 21, 23, 26, 27, 30, 33, 34, 35, 36, 38, 39, 41, 42, 43, 43, 44, 46, 47, 47, 48, 48, 50, 50, 51, 51, 52, 52, 53, 54, 54, 55, 56, 56, 57, 58, 59, 59, 60, 62, 63, 64, 65, 65, 67, 68, 72, 75, 77. Minimum value = 12; Maximum value = 77 Range = Maximum value - Minimum value = 77 - 12 = 65 Number of classes = 1 + 3.322 log N = 1 + 3.322 log 50 = 1 + 3.322 (1.69897) = 1 + 5.64 = 6.64 or 7 approximate class interval size (h) = Range/No. of classes = 65/7 = 9.3 or 10. 17 Business Statistics Marks Tally Marks Number Class Class Marks x Class of Boundary Limits Students C.B. C.L. ƒ 10-19 II 2 9.5-19.5 10 + 19/2 = 14.5 20-29 IIII 4 19.5-29.5 20 + 29/2 = 24.5 30-39 IIII II 7 29.5-39.5 30 + 39/2 = 34.5 40-49 IIII IIII 10 39.5-49.5 40 + 49/2 = 44.5 50-59 IIII IIII IIII I 16 49.5-59.5 50 + 59/2 = 54.5 60-69 IIII III 8 59.5-69.5 60 + 69/2 = 64.5 70-79 III 3 69.5-79.5 70 + 79/2 = 74.5 50 Note: For finding the class boundaries, we take half of the difference between lower class limit of the 2nd class and upper class limit of the 1st class 20 - 19/2 = 1/2 = 0.5 This value is subtracted from lower class limit and added in upper class limit to get the required class boundaries. Frequency Distribution by Exclusive Method Class Boundary C.B. Tally Marks Frequency ƒ 10 - 19 II 2 20 - 29 IIII 4 30 - 39 IIII II 7 40 - 49 IIII IIII 10 50-59 IIII IIII IIII I 16 60 - 69 IIII III 8 70 - 79 III 3 50 2.4.2 CUMULATIVE FREQUENCY DISTRIBUTION The total frequency of all classes less than the upper class boundary of a given class is called the cumulative frequency of the class. "A table showing the cumulative frequencies is called a cumulative frequency distribution". There are two types of cumulative frequency distribution. Less than cumulative frequency distribution It is obtained by adding successively the frequencies of all the previous classes including the class against which it is written. The cumulate is started from the lowest to the highest size. More than cumulative frequency distribution It is obtained by finding the cumulative total of frequencies starting from the highest to the lowest class. The less than cumulative frequency distribution and more than cumulative frequency distribution for the frequency distribution given below are: 18 Less than C.F. More than C.F. Data Collection and Class ƒ C.B. Marks C.F Marks C.F. Presentation Limit 10 - 19 2 9.5 - Less than 19.5 2 9.5 or more 48 + 2 = 50 19.5 20 - 29 4 19.5 - Less than 29.5 2+4=6 19.5 or more 44 + 4 = 48 29.5 30 - 39 7 29.5 - Less than 39.5 6 + 7 = 13 29.5 or more 37 + 7 = 44 39.5 40 - 49 10 39.5 - Less than 49.5 13 + 10 = 23 39.5 or more 27 + 10 = 37 49.5 50 - 59 16 49.5 - Less than 59.5 23 + 16 = 39 49.5 or more 11 + 16 = 27 59.5 60 - 69 8 59.5 - Less than 69.5 39 + 8 = 47 59.5 or more 3 + 8 = 11 69.5 70 - 79 3 69.5 - Less than 79.5 47 + 3 = 50 69.5 or more 3 79.5 DIAGRAMS AND GRAPHS OF STATISTICAL DATA We have discussed the techniques of classification and tabulation that help us in organising the collected data in a meaningful fashion. However, this way of presentation of statistical data dos not always prove to be interesting to a layman. Too many figures are often confusing and fail to convey the message effectively. One of the most effective and interesting alternative way in which a statistical data may be presented is through diagrams and graphs. There are several ways in which statistical data may be displayed pictorially such as different types of graphs and diagrams. The commonly used diagrams and graphs to be discussed in subsequent paragraphs are given as under: 2.5 TYPES OF DIAGRAMS/CHARTS 1. Simple Bar Chart 2. Multiple Bar Chart or Cluster Chart 3. Staked Bar Chart or Sub-Divided Bar Chart or Component Bar Chart a. Simple Component Bar Chart b. Percentage Component Bar Chart c. Sub-Divided Rectangular Bar Chart d. Pie Chart 4. Histogram 5. Frequency Curve and Polygon 6. Lorens Curve 1. SIMPLE BAR CHART A simple bar chart is used to represent data involving only one variable classified on spatial, quantitative or temporal basis. In simple bar chart, we 19 Business Statistics make bars of equal width but variable length, i.e. the magnitude of a quantity is represented by the height or length of the bars. Following steps are undertaken in drawing a simple bar diagram:  Draw two perpendicular lines one horizontally and the other vertically at an appropriate place of the paper.  Take the basis of classification along horizontal line (X-axis) and the observed variable along vertical line (Y-axis) or vice versa.  Mark signs of equal breadth for each class and leave equal or not less than half breadth in between two classes.  Finally mark the values of the given variable to prepare required bars. Sample problem: Make a bar graph that represents exotic pet ownership in the United States. There are 8,000,000 fish, 1,500,000 rabbits, 1,300,000 turtles, 1,000,000 poultry and 900,000 hamsters. Step 1: Number the Y-axis with the dependent variable. The dependent variable is the one being tested in an experiment. In this sample question, the study wanted to know how many pets were in U.S. households. So the number of pets is the dependent variable. The highest number in the study is 8,000,000 and the lowest is 1,000,000 so it makes sense to label the Y- axis from 0 to 8. Step 2: Draw your bars. The height of the bar should be even with the correct number on the Y-axis. Don’t forget to label each bar under the x- axis. 20 Data Collection and Presentation Step 3: Label the X-axis with what the bars represent. For this sample problem, label the x-axis “Pet Types” and then label the Y-axis with what the Y-axis represents: “Number of pets (per 1,000 households).” Finally, give your graph a name. For this sample problem, call the graph “Pet ownership (per 1,000 households). Optional: In the above graph, I chose to write the actual numbers on the bars themselves. You don’t have to do this, but if you have numbers than don’t fall on a line (i.e. 900,000), then it can help make the graph clearer for a viewer. Tips: 1. Line the numbers up on the lines of the graph paper, not the spaces. 2. Make all your bars the same width. 2. MULTIPLE BAR CHART By multiple bars diagram two or more sets of inter related data are represented (multiple bar diagram facilities comparison between more than one phenomena). The technique of simple bar chart is used to draw this diagram but the difference is that we use different shades, colours or dots to distinguish between different phenomena. We use to draw multiple bar charts if the total of different phenomena is meaningless. 21 Business Statistics Sample Example Draw a multiple bar chart to represent the import and export of Pakistan for the years 1982-1988. Imports Exports Years Rs. (billion) Rs. (billion) 1982-83 68.15 34.44 1983-84 76.71 37.33 1984-85 89.78 37.98 1985-86 90.95 49.59 1986-87 92.43 63.35 1987-88 111.38 78.44 3. a. COMPONENT BAR CHART Sub-divided or component bar chart is used to represent data in which the total magnitude is divided into different components. In this diagram, first we make simple bars for each class taking total magnitude in that class and then divide these simple bars into parts in the ratio of various components. This type of diagram shows the variation in different components without each class as well as between different classes. Sub-divided bar diagram is also known as component bar chart or staked chart. 22 Current and Development Expenditure – Pakistan (All figures in Rs. Data Collection and Billion) Presentation Current Development Total Years Expenditure Expenditure Expenditure 1988-89 153 48 201 1989-90 166 56 222 1990-91 196 65 261 1991-92 230 91 321 1992-93 272 76 348 1993-94 294 71 365 1994-95 346 82 428 3. b. PERCENTAGE COMPONENT BAR CHART Sub-divided bar chart may be drawn on percentage basis. to draw sub- divided bar chart on percentage basis, we express each component as the percentage of its respective total. In drawing percentage bar chart, bars of length equal to 100 for each class are drawn at first step and sub-divided in the proportion of the percentage of their component in the second step. The diagram so obtained is called percentage component bar chart or percentage staked bar chart. This type of chart is useful to make comparison in components holding the difference of total constant. 23 Business Statistics Areas Under Crop Production (1985-90) (‘000 hectors) Year Wheat Rice Others Total 1985-86 7403 1863 1926 11192 1986-87 7706 2066 1906 11678 1987-88 7308 1963 1612 10883 1988-89 7730 2042 1966 11738 1989-90 7759 2107 1970 11836 Percentage Areas Under Production Year Wheat Rice Others Total 1985-86 66.2% 16.6% 17.2% 100% 1986-87 66.0 17.7 16.3 100 1987-88 67.2 18.0 14.8 100 1988-89 65.9 17.4 16.7 100 1989-90 65.6 17.8 16.6 100 3. d. PIE-CHART Pie chart can be used to compare the relation between the whole and its components. Pie chart is a circular diagram and the area of the sector of a circle is used in pie chart. Circles are drawn with radii proportional to the square root of the quantities because the area of a circle is 24 To construct a pie chart (sector diagram), we draw a circle with radius Data Collection and (square root of the total). The total angle of the circle is 360°. The angles Presentation of each component are calculated by the formula: Angle of Sector = These angles are made in the circle by means of a protractor to show different components. The arrangement of the sectors is usually anti-clock wise. Example 2.6 EXERCISES 1. Draw a histogram of the following data: Weekly Wages 1 - 10 11 - 20 21 - 30 31 - 40 41 - 50 No. of Workers 14 28 36 12 10 25 Business Statistics 2. The following table shows the temperature for the consecutive five days in a particular week. Draw range graph. Day M T W Th F High° C 40 35 50 60 25 Low° C 25 20 40 55 15 3. The following is the distribution of total house hold expenditure (in Rs.) of 202 workers in a city. Expenditure in Rs. 100 - 150 150 - 200 200 - 250 250 - 300 No. of Workers 25 40 33 28 Expenditure in Rs. 300 - 350 350 - 400 400 - 450 450 - 500 No. of Workers 30 22 16 8  26 3 MEASURES OF DISPERSION Unit Structure 3.1 Introduction to measure of dispersion 3.1.1 Dispersion 3.2 Absolute measure of dispersion 3.3 Relative measure of dispersion 3.4 Range and coefficient of range 3.5 Quartile deviation and its coefficient 3.6 The mean deviation 3.7 Standard deviation 3.7. 1 coefficient of standard deviation 3.8 Coefficient of variation 3.8.1 Uses of coefficient of variation 3.9 The variance 3.10 Skewness and kurtosis 3.11 Exercise 3.1 INTRODUCTION TO MEASURE OF DISPERSION A modern student of statistics is mainly interested in the study of variability and uncertainty. We live in a changing world. Changes are taking place in every sphere of life. A man of Statistics does not show much interest in those things which are constant. The total area of the earth may not be very important to a research minded person but the area under different crops, areas covered by forests, area covered by residential and commercial buildings are figures of great importance because these figures keep on changing from time to time and from place to place. Very large number of experts is engaged in the study of changing phenomenon. Experts working in different countries of the world keep a watch on forces which are responsible for bringing changes in the fields of human interest. The agricultural, industrial and mineral production and their transportation from one part to the other parts of the world are the matters of great interest to the economists, statisticians and other experts. The changes in human population, the changes in standard of living, and changes in literacy rate and the changes in price attract the experts to make detailed studies about them and then correlate these changes with the human life. Thus variability or variation is something connected with human life and study is very important for mankind. 3.1.1 DISPERSION The word dispersion has a technical meaning in Statistics. The average measures the centre of the data. It is one aspect observations. Another feature of the observations is as to how the observations are spread about the centre. The observation may be close to the centre or they may be 27 Business Statistics spread away from the centre. If the observation is close to the centre (usually the arithmetic mean or median), we say that dispersion, scatter or variation is small. If the observations are spread away from the centre, we say dispersion is large. Suppose we have three groups of students who have obtained the following marks in a test. The arithmetic means of the three groups are also given below: Group A: 46, 48, 50, 52, 54 A = 50 Group B: 30, 40, 50, 60, 70 B = 50 Group C; 40, 50, 60, 70, 80 C = 60 In a group A and B arithmetic means are equal i.e. A = B = 50. But in group A the observations are concentrated on the centre. All students of group A have almost the same level of performance. We say that there is consistence in the observations in group A. In group B the mean is 50 but the observations are not close to the centre. One observation is as small as 30 and one observation is as large as 70. Thus, there is greater dispersion in group B. In group C the mean is 60 but the spread of the observations with respect to the centre 60 is the same as the spread of the observations in group B with respect to their own centre which is 50. Thus in group B and C the means are different but their dispersion is the same. In group A and C the means are different and their dispersions are also different. Dispersion is an important feature of the observations and it is measured with the help of the measures of dispersion, scatter or variation. The word variability is also used for this idea of dispersion. The study of dispersion is very important in statistical data. If in a certain factory there is consistence in the wages of workers, the workers will be satisfied. But if workers have high wages and some have low wages, there will be unrest among the low paid workers and they might go on strikes and arrange demonstrations. If in a certain country some people are very poor and some are very rich, we say there is economic disparity. It means that dispersion is large. The idea of dispersion is important in the study of wages of workers, prices of commodities, standard of living of different people, distribution of wealth, distribution of land among framers and various other fields of life. Some brief definitions of dispersion are: 1. The degree to which numerical data tend to spread about an average value is called the dispersion or variation of the data. 2. Dispersion or variation may be defined as a statistics signifying the extent of the scattered items around a measure of central tendency. 3. Dispersion or variation is the measurement of the scattered size of the items of a series about the average. For the study of dispersion, we need some measures which show whether the dispersion is small or large. There are two types of measures of dispersion, which are: a. Absolute Measure of Dispersion b. Relative Measure of Dispersion. 28 Measure of Dispersion 3.2 ABSOLUTE MEASURE OF DISPERSION These measures give us an idea about the amount of dispersion in a set of observations. They give the answers in the same units as the units of the original observations. When the observations are in kilograms, the absolute measure is also in kilograms. If we have two sets of observations, we cannot always use the absolute measures to compare their dispersion. We shall explain later as to when the absolute measures can be used for comparison of dispersions in two or more than two sets of data. The absolute measures which are commonly used are: 1. The Range 2. The Quartile Deviation 3. The Mean Deviation 4. The Standard Deviation and Variance 3.3 RELATIVE MEASURE OF DISPERSION These measures are calculated for the comparison of dispersion in two or more than two sets of observations. These measures are free of the units in which the original data is measured. If the original data is in dollar or kilometres, we do not use these units with relative measure of dispersion. These measures are a sort of ratio and are called coefficients. Each absolute measure of dispersion can be converted into its relative measure. Thus, the relative measures of dispersion are: 1. Coefficient of Range or Coefficient of Dispersion. 2. Coefficient of Quartile Deviation or Quartile Coefficient of Dispersion. 3. Coefficient of Mean Deviation or Mean Deviation of Dispersion. 4. Coefficient of Standard Deviation or Standard Coefficient of Dispersion. 5. Coefficient of Variation (a special case of Standard Coefficient of Dispersion). 3.4 RANGE AND COEFFICIENT OF RANGE The Range Range is defined as the difference between the maximum and the minimum observation of the given data. If Xm denotes the maximum observation Xo denotes the minimum observation then the range is defined as Range = Xm - Xo. In case of grouped data, the range is the difference between the upper boundary of the highest class and the lower boundary of the lowest class. It is also calculated by using the difference between the mid points of the 29 Business Statistics highest class and the lowest class. it is the simplest measure of dispersion. It gives a general idea about the total spread of the observations. It does not enjoy any prominent place in statistical theory. But it has its application and utility in quality control methods which are used to maintain the quality of the products produced in factories. The quality of products is to be kept within certain range of values. The range is based on the two extreme observations. It gives no weight to the central values of the data. It is a poor measure of dispersion and does not give a good picture of the overall spread of the observations with respect to the centre of the observations. Let us consider three groups of the data which have the same range: Group A: 30, 40, 40, 40, 40, 40, 50 Group B: 30, 30, 30, 40, 50, 50, 50 Group C: 30, 35, 40, 40, 40, 45, 50 In all the three groups the range is 50 - 30 = 20. In group A there is concentration of observations in the centre. In group B the observations are friendly with the extreme corner and in group C the observations are almost equally distributed in the interval from 30 to 50. The range fails to explain these differences in the three groups of data. This defect in range cannot be removed even if we calculate the coefficient of range which is a relative measure of dispersion. If we calculate the range of a sample, we cannot draw any inferences about the range of the population. Coefficient of Range It is relative measure of dispersion and is based on the value of range. It is also called range coefficient of dispersion. It is defined as: Coefficient of Range =. The range Xm - Xo is standardised by the total Xm + Xo. Let us take two sets of observations. Set A contains marks of five students in Mathematics out of 25 marks and group B contains marks of the same student in English out of 100 marks. Set A: 10, 15, 18, 20, 20 Set B: 30, 35, 40, 45, 50 30 The values of range and coefficient of range are calculated as Measure of Dispersion Range Coefficient of Range Set A: (Mathematics) 20 - 10 = 10 = 0.33 Set B: (English) 50 - 30 = 20 = 0.25 In set A the range is 10 and in set B the range is 20. Apparently it seems as if there is greater dispersion in set B. But this is not true. The range of 20 in set B is for large observations and the range of 10 in set A is for small observations. Thus 20 and 10 cannot be compared directly. Their base is not the same. Marks in Mathematics are out of 25 and marks of English are out of 100. Thus, it makes no sense to compare 10 with 20. When we convert these two values into coefficient of range, we see that coefficient of range for set A is greater than that of set B. Thus, there is greater dispersion or variation in set A. The marks of students in English are more stable than their marks in Mathematics. Example Following are the wages of 8 workers of a factory. Find the range and coefficient of range. Wages in ($) 1400, 1450, 1520, 1380, 1485, 1495, 1575, 1440. Solution: Here Largest Value = Xm = 1575 and Smallest Value = Xo = 1380 Range = Xm - Xo = 1575 - 1380 = 195. Coefficient of Range = = = = 0.66 Example The following distribution gives the numbers of houses and the number of persons per house. Number of 1 2 3 4 5 6 7 8 9 10 Persons Number of 26 113 120 95 60 42 21 14 5 4 Houses Calculate the range and coefficient of range. Solution: Here Largest Value = Xm = 10 and Smallest Value = Xo = 1 Range = Xm - Xo = 10 - 1 = 9. Coefficient of Range = = = = 0.818 31 Business Statistics Example Find the range of the weight of the students of a University. Weights (Kg) 60-62 63-65 66-68 69-71 72-74 Number of Students 5 18 42 27 8 Calculate the range and coefficient of range. Solution: Weights (Kg) Class Boundaries Mid Value No. of Students 60-62 59.5 - 62.5 61 5 63-65 62.5 - 65.5 64 18 66-68 65.5 - 68.5 67 42 69-71 68.5 - 71.5 70 27 72-74 71.5 - 74.5 73 8 Method 1 Here Xm = Upper class boundary of the highest class = 74.5; Xo = Lower Class Boundary of the lowest class = 59.5 Range = Xm - Xo = 74.5 - 59.5 = 15 Kilogram. Coefficient of Range = = = = 0.1119. Method 2 Here Xm = Mid value of the highest class = 73; Xo = Mid Value of the lowest class = 61 Range = Xm - Xo = 73 - 61 = 12 Kilogram. Coefficient of Range = = = = 0.0895. 3.5 QUARTILE DEVIATION AND ITS COEFFICIENT  Quartile Deviation It is based on the lower Quartile Q1 and the upper quartile Q3. The difference Q3 - Q1 is called the inter quartile range. The difference Q3 - Q1 divided by 2 is called semi-inter-quartile range or the quartile deviation. Thus Quartile Deviation (Q.D) =. The quartile deviation is a slightly better measure of absolute dispersion than the range. But it ignores the observation on the tails. If we take different samples from a population and calculate their quartile deviations, their values are quite likely to be sufficiently different. This is called sampling fluctuation. It is not a popular measure of dispersion. The quartile deviation calculated from the 32 sample data does not help us to draw any conclusion (inference) about the Measure of Dispersion quartile deviation in the population.  Coefficient of Quartile Deviation A relative measure of dispersion based on the quartile deviation is called the coefficient of quartile deviation. It is defined as Coefficient of Quartile Deviation == =. It is pure number free of any units of measurement. It can be sued for comparing the dispersion in two or more than two sets of data. Example The Wheat production (in Kg) of 20 acres is given as: 1120, 1240, 1320, 1040, 1080, 1200, 1440, 1360, 1680, 1730, 1785, 1342, 1960, 1880, 1755, 1600, 1470, 1750 and 1885. Find the quartile deviation and coefficient of quartile deviation. Solution After arranging the observation in ascending order, we get, 1040, 1080, 1120, 1200, 1240, 1320, 1342, 1360, 1440, 1470, 1600, 1680, 1720, 1730, 1750, 1755, 1785, 1880, 1885, 1960. Q1 = Value of th item = Value of th item = Value of (5.25)th item = 5th item + 0.25 (6th item - 5th item) = 1240 + 0.25 (1320 - 1240) Q1 = 1240 + 20 = 1260 Q3 = Value of th item = Value of th item = Value od (15.75) th item 15th item + 0.75 (16th item - 15th item) = 1750 + 0.75 (1755 - 1750) Q3 = 1750 + 3.75 = 1753.75 33 Business Statistics Quartile Deviation (QD) = = = = 246.875 Coefficient of Quartile Deviation = = = 0.164. Example Calculate the quartile deviation and coefficient of quartile deviation from the data given below: Maximum Load (Short tons) Number of Cables 9.3 - 9.7 2 9.8 - 10.2 5 10.3 - 10.7 12 10.8 - 11.2 17 11.3 - 11.7 14 11.8 - 12.2 6 12.3 - 12.7 3 12.8 - 13.2 1 Solution The necessary calculations are given below: Maximum Load Number of Cables Class Boundaries Cumulative (Short Tons) F Frequencies 9.3 - 9.7 2 9.25 - 9.75 2 9.8 - 10.2 5 9.75 - 10.25 2+5=7 10.3 - 10.7 12 10.25 - 10.75 7 + 12 = 19 10.8 - 11.2 17 10.75 - 11.25 19 + 17 = 36 11.3 - 11.7 14 11.25 - 11.75 36 + 14 = 50 11.8 - 12.2 6 11.75 - 12. 25 50 + 6 = 56 12.3 - 12.7 3 12.25 - 12. 75 56 + 3 = 59 12.8 - 13.2 1 12. 75 - 13.25 59 + 1 = 60 Q1 = Value of [ ] th item = Value of [ ] th item = 15th item Q1 lies in the class 10.25 - 10.75 ⸫ Q1 = 1 + [ ] Where 1 = 10.25, h = 0.5, f = 12, n/4 = 15 and c = 7 Q1 = 10.25 + (15 - 7) 34 = 10.25 + 0.33 Measure of Dispersion = 10.58 Q1 = Value of [ ] th item = value of [ ] th item = 45th item Q3 lies in the class 11.25 - 11.75 ⸫ Q3 = 1 + [ ] where 1 = 11.25, h = 0.5, f = 14, 3n/4 = 45 and c = 36 ⸫ Q1 = 11.25 + (45 - 36) = 11.25 + 0.32 = 11.57 Quartile Deviation (Q.D) = = = = 0.495 Coefficient of Quartile Deviation = = = = 0.045 3.6 THE MEAN DEVIATION The mean deviation or the average deviation is defined as the mean of the absolute deviations of observations from some suitable average which may be arithmetic mean, the median or the mode. The difference (X - average) is called deviation and when we ignore the negative sign, this deviation is written as and is read as mod deviations. The mean of these more or absolute deviations is called the mean deviation or the mean absolute deviation. Thus for sample data in which the suitable average is the , the mean deviation (M.D) is given by the relation M.D = For frequency distribution, the mean deviation is given by M.D = When the mean deviation is calculated about the median, the formula becomes M.D. (about median) = The mean deviation about the mode is 35 Business Statistics M.D (about mode) = For a population data the mean deviation about the population mean µ is M.D = The mean deviation is a better measure of absolute dispersion than the range and the quartile deviation. A drawback in the mean deviation is that we use the absolute deviations which does not seem logical. The reason for this is that (X - is always equal to zero. Even if we use median or more in place of , even then the summation (X - median) or (X - mode) will be zero or approximately zero with the result that the mean deviation would always be better either zero or close to zero. Thus, the very definition of the mean deviation is possible only on the absolute deviations. The mean deviation is based on all the observations, a property which is not possessed by the range and the quartile deviation. The formula of the mean deviation gives a mathematical impression that it is a better way of measuring the variation in the data. Any suitable average among the mean, median or more can be used in its calculation but the value of the mean deviation is minimum if the deviations are taken from the median. A drawback of the mean deviation is that it cannot be used in statistical inference. Coefficient of the Mean Deviation A relative measure of dispersion based on the mean deviation is called the coefficient of the mean deviation or the coefficient of dispersion. It is defined as the ratio of the mean deviation to the average used in the calculation of the mean deviation. Thus, Coefficient of M.D (about mean) = Mean Deviation from Mean/Mean Coefficient of M.D (about median) = Mean Deviation from Median/Median Coefficient of M.D (about mode) = Mean Deviation from Mode/Mode Example Calculate the mean deviation from (1) Arithmetic Mean (2) Median (3) Mode in respect of the marks obtained by nine students given below and show that the mean deviation from median is minimum. Marks out of 25: 7, 4, 10, 9, 15, 12, 7, 9, 7 Solution After arranging the observations in ascending order, we get Marks 4, 7, 7, 7, 9, 9, 10, 12, 15 36 Measure of Dispersion Mean = = = 8.89 Median = Value of ( ) th item = Value of ( ) th item = Value of (5) the item = 9 Mode = 7 (Since 7 is repeated maximum number of times) Marks X 4 4.89 5 3 7 1.89 2 0 7 1.89 2 0 7 1.89 2 0 9 0.11 0 2 9 0.11 0 2 10 1.11 1 3 12 3.11 3 5 15 6.11 6 8 Total 21.11 21 23 M.D from mean = = = 2.35 M.D from Median = = = 2.33 M.D from Mode = = = 2.56 From the above calculations, it is clear that the mean deviation from the median has the least value. Example Calculate the mean deviation from mean and its coefficients from the following data: Size of items 3-4 4-5 5-6 6-7 7-8 8-9 9 - 10 Frequency 3 7 22 60 85 32 8 37 Business Statistics Solution The necessary calculation is given below: Size of Items X F fX f 3-4 3.5 3 10.5 3.59 10.77 4-5 4.5 7 31.5 2.59 18.13 5-6 5.5 22 121.0 1.59 34.98 6- 7 6.5 60 390.0 0.59 35.40 7-8 7.5 85 637.5 0.41 34.85 8-9 8.5 32 272.0 1.41 45.12 9 - 10 9.5 8 76.0 2.41 19.28 Total 217 1538.5 198.53 Mean = = = 7.09 M.D from Mean = = 0.915 Coefficient of M.D (Mean) = = = 0.129 3.7 STANDARD DEVIATION The standard deviation is defined as the positive square root of the mean of the square deviations taken from arithmetic mean of the data. For the sample data the standard deviation is denoted by S and is defined as S= For a population data the standard deviation is denoted by σ (sigma) and is defined as: σ= For frequency distribution the formulas become S= or σ = The standard deviation is in the same units as the units of the original observations. If the original observations are in grams, the value of the standard deviation will also be in grams. The standard deviation plays a dominating role for the study of variation in the data. It is a very widely used measure of dispersion. It stands like a tower among measure of dispersion. As far as the important statistical 38 tools are concerned, the first important tool is the mean and the second Measure of Dispersion important tool is the standard deviation S. It is based on the observations and is subject to mathematical treatment. It is of great importance for the analysis of data and for the various statistical inferences. However, some alternative methods are also available to compute standard deviation. The alternative methods simplify the computation. Moreover in discussing these methods we will confirm ourselves only to sample data because sample data rather than whole population confront mostly a statistician. Actual Mean Method In applying this method first of all we compute arithmetic mean of the given data either ungroup or grouped data. Then take the deviation from the actual mean. This method is already being defined above. The following formulas are applied: For Ungrouped Data For Grouped Data 2 S= S= This method is also known as direct method. Assumed Mean Method a. We use the following formulas to calculate standard deviation: For Ungrouped Data For Grouped Data S= S= where D = X - A and A is any assumed mean other than zero. This method is also known as short-cut method. b. If A is considered to be zero then the above formulas are reduced to the following formulas: For Ungrouped Data For Grouped Data S= -( S= c. If we are in a position to simplify the calculation by taking some common factor or divisor from the given data the formulas for computing standard deviation are: 39 Business Statistics For Ungrouped Data For Grouped Data S= -( Xc S= X c or h Where u = = ; h = Class Interval and c = Common Divisor. This method is also called method of step-deviation. Examples of Standard Deviation This tutorial is about some examples of standard deviation using all methods which are discussed in the previous tutorial. Example Calculate the standard deviation for the following sample data using all methods: 2, 4, 8, 6, 10 and 12. Solution: Method - 1 Actual mean Method X (X - )2 2 (2 - 7)2 = 25 4 (4 - 7)2 = 9 8 (8 - 7)2 = 1 6 (6 - 7)2 = 1 10 (10 - 7)2 = 9 12 (12 - 7)2 = 25 ƩX = 42 Ʃ(X - )2 = 70 = = =7 2 S= S= = = 3.42 Method 2: Taking assumed mean as 6. X D = (X - 6) D2 2 -4 16 4 -2 4 8 2 4 6 0 0 10 4 16 12 6 36 Total ƩD = 6 ƩD2 = 76 40 Measure of Dispersion 2 S= 2 S= = = = 3.42 Method 3: Taking Assumed Mean as Zero X X2 2 4 4 16 8 64 6

Use Quizgecko on...
Browser
Browser