Statistics for Management Sciences II STT206 PDF

COURSE GUIDE NATIONAL OPEN UNIVERSITY OF NIGERIA Course Code: STT 206 Course Title: STATISTICS FOR MANAGEMENT SCIENCES II Course Developer/Writer: KADIRI KAYODE I. School of Management Sciences (SMS) National Open University of Nigeria. Programme Leader: Dr. I.D. Idrisu (NOUN) Course Coordinator: ANTHONY EHIAGWINA Course Editor: March, 2014 STATISTICS FOR MANAGEMENT SCIENCES II CONTENTS Introduction What You Will Learn In This Course STATISTICS FOR MANAGEMENT SCIENCES II (STT206) Course Aims Course Objectives Working Through This Course Course Materials Study Units Set Textbooks Assignment File Presentation Schedule Assessment Tutor-Marked Assignment (TMAs) Final Examination And Grading Course Marking Scheme Course Overview How To Get The Most From This Course Tutors And Tutorials Summary. INTRODUCTION: Business Statistics is a one semester, 3 credit units second year level course. It will be available to all second degree of the school of Management Sciences at the National Open University, Nigeria. It will also be useful for those seeking introductory knowledge in STATISTICS FOR MANAGEMENT SCIENCES II. The course consists of eighteen units that involved basic concepts and principles of statistics and decision making process, forms of data, methods of data estimation, summarizing data, graphical presentation of data, measures of both index number and dispersion, co-efficient of correlation and regression analysis , some elements of hypothesis tests and time series analysis, d i s t r i b u t i o n s of both discrete and continuous random variables. The course requires you to study the course materials carefully, supplement the materials with other resources from Statistics Textbooks both to be prescribed and those not prescribed that may treat the contents NOUN 2 STATISTICS FOR MANAGEMENT SCIENCES II (STT206) of the course. This Course Guide tells you what the course is about, what course materials you will be using and how you can work your way through these materials. It suggests some general guidelines for the amount of time you are likely to spend on each unit of the course in order to complete it successfully. It also gives you some guidance on your tutor--marked assignments. Detailed information on tutor-marked assignment is found in the separate file. There are likely going to be regular tutorial classes that are linked to the course. It is advised that you should attend these sessions. Details of the time and locations of tutorials will be communicated to you by National Open University of Nigeria (NOUN). What You Will Learn In The Course The overall aim of STT 206 Statistics for management sciences II is to introduce you to the basic concepts and principles of statistics and decision making process, forms of data, methods of data estimation, summarizing data, graphical presentation of data, measures of both index number and dispersion, co- efficient of correlation and regression analysis , some elements of hypothesis tests and time series analysis, d i s t r i b u t i o n s of both discrete and continuous random variables. Course Aims The course aims to give you an understanding of statistical information and presentation for decision-making. It exposes you to measures that are computed and used for processing materials for decision-making. It also gives the basic knowledge of some concepts used for making decisions and carefully summarizes some Probability Distributions. This will be achieved by: 1. Introducing you to nature and form of statistical data 2. Showing how the statistical data can be collected and presented 3. Showing you how to compute measurement of dispersion in a sample or population 4. Showing you how to compute value of chi-square contingency table 5. Introducing you to the basic concepts of hypothesis tests 6. Give the basic principles for the application of some important NOUN 3 STATISTICS FOR MANAGEMENT SCIENCES II (STT206) forecasting and time series analysis Course Objectives To achieve the aims set above the course sets overall objectives; in addition, each unit also has specific objectives. The unit objectives are included at the beginning of a unit, you should read them before you start working through the unit. You may want to refer to them during your study of the unit to check on your progress. You should always look at the unit objectives after completing a unit. In this way you can be sure you have done what was required of you by the unit. We set out wider objectives of the course as a whole below. By meeting these objectives, you should have achieved the aims of the course. On successful completion of the course, you should be able to: 1: Role of Statistics (Application of Statistics) 2 Measurement of Variables 3: Measurement of Dispersion, Skewness and Kurtosis 4 Decision Analysis and Administration 5: Index Number 6: Statistical Data 7: Sample and Sampling Theory 8: Estimation Theory 9: Correlation Theory and Goodness of Fit 10: Pearson’s Correlation Co-efficient 11: Spearman’s Regression Analysis 12: Ordinary Lease Square Estimation (Regression) 13: Multiple Regression Analysis 14: Hypothesis AND T-tests 15 F- Tests 16: Chi-Square Distribution 17: ANOVA 18: Forecasting and Time Series Analysis NOUN 4 STATISTICS FOR MANAGEMENT SCIENCES II (STT206) Working through This Course To complete this course, you are required to read the study units, read set books and other materials on the course. Each unit contains self-assessment exercises called Student Assessment Exercises, (SAE). At some points in the course, you are required towrite TMA on computer basic and submit on NOUN TMA PORTAR for assessment purposes. At the end of the course there is a final Examination. This course should take about 15 weeks to complete. Some listed components of the course, what you have to do and how you should allocate your time to each unit in order to complete the course successfully on time, are given below Below you will find listed components of the course, what you have to do and how you should allocate your time to each unit in order to complete the course successfully on time. Course Materials Major components of the course are: (1) Course Guide (2) Study Units (3) Textbooks (4) Presentation Schedule. Study Units The course is in four modules and eighteen Study Units as follows: Module 1: Role and Concepts of Statistics Unit 1: Role of Statistics (Application of Statistics) Unit 2 Measurement of Variables NOUN 5 STATISTICS FOR MANAGEMENT SCIENCES II (STT206) Unit 3: Measurement of Dispersion, Skewness and Kurtosis Unit 4 Decision Analysis and Administration Module 2: INDEX NUMBER AND SAMPLING THEORIES Unit 1: Index Number nit 2: Statistical Data Unit 3: Sample and Sampling Theory Unit 4: Estimation Theory Module 3: CORRELATION AND REGRESSION ANALYSIS Unit 1: Correlation Theory and Goodness of Fit Unit 2: Pearson’s Correlation Co-efficient Unit 3: Spearman’s Regression Analysis Unit 4: Ordinary Lease Square Estimation (Regression) Unit 5: Multiple Regression Analysis Module 4: STATISTICAL TEST Unit 1: Hypothesis AND T-tests Unit 2 F- Tests Unit 3: Chi-Square Distribution Unit 4: ANOVA Unit 5: Forecasting and Time Series Analysis The first four units concentrate on the roles and concepts of statistics. This constitutes Module 1. The next four units, module 2, concentrate on index number and research in management. Module3, deal with the correlation and regression analysis, The last five units Module 4, teach the principles underlying the applications of some important probability distributions., module 5, teach the principles underlying the applications of some important test of hypothesis and theory. Each unit consists of one week direction for study, reading material, other resources and summaries of key issues and ideas. The units direct you to work on exercises related to the required readings Each unit contains a number of self-tests. In general, these self-tests question you on the material you have just covered or required you to apply it in NOUN 6 STATISTICS FOR MANAGEMENT SCIENCES II (STT206) some way and thereby help you to assess your progress and to reinforce your understating of the material. Together with tutor-marked assignments, these exercises will assist you in achieving the stated learning objectives of the individual units and of the course. Set Textbooks It is advisable you have some of the following books ONWE J.O. NOUN TEXT BOOK, ENT 321: Quantitative Methods for Business Decisions OKOJIE, Daniel E. NOUN Statistics for Economist. Eco203 OTOKOTI O.S. Contemporary Statistics JIDE JONGBO Fundamental Statistics for Business KEHINDE J.S. Statistics Method & Quantitative Techniques JUDE I.E, MICAN & EDIITH Statistics& Quantitative Methods for Construction & Business Managers. Assessment There are two types of the assessment of the course. First are the tutor-marked assignments (TMA); second, there is a computer base examination. In tackling the assignments, you are expected to apply information, knowledge and techniques gathered during the course. The assignments must be submitted to your tutor for formal Assessment in accordance with the deadlines stated in the Presentation Schedule and the Assignments File on your NOUN portal. The work you submit to your tutor for assessment will count for 30 % of your total course mark. At the end of the course, you will need to sit for a final computer base examination of two hours' duration at designated centre. This examination will also count for 70% of your total course mark. Tutor-Marked Assignments TMAs There are four tutor-marked assignments in this course. You will submit all the assignments. You are encouraged to work all the questions thoroughly. Each assignment counts 12.5% toward your total course mark. Assignment questions for the units in this course are contained in the Assignment File. You will be able to complete your assignments from the information and materials contained in your set books, reading and study units. However it is desirable in all degree level education to demonstrate that you have read and researched more widely than the NOUN 7 STATISTICS FOR MANAGEMENT SCIENCES II (STT206) required minimum. You should use other references to have a broad viewpoint of the subject and also to give you a deeper understanding of the subject. When you have completed each assignment, send it, together with a TMA form, to your tutor. Make sure that each assignment reaches your tutor on or before the deadline given in the Presentation File. If for any reason, you cannot complete your work on time, contact your tutor before the assignment is due to discuss the possibility of an extension. Extensions will not be granted after the due date unless there are exceptional circumstances. Final Examination and Grading The final examination will be of three hours' duration and have a value of 70% of the total course grade. The examination will consist of questions which reflect the types of self testing, practice exercises and tutor-marked problems you have previously encountered. All areas of the course will be assessed Use the time between finishing the last unit and sitting the examination to revise the entire course. You might find it useful to review your self-tests, tutor-marked assignments and comments on them before the examination. The final examination covers information from all parts of the course. NOUN 8 STATISTICS FOR MANAGEMENT SCIENCES II (STT206) NATIONAL OPEN UNIVERSITY OF NIGERIA Course Code: STT 206 Course Title: STATISTICS FOR MANAGEMENT SCIENCES II Course Developer/Writer: KADIRI KAYODE I. School of Management Sciences (SMS) National Open University of Nigeria. Programme Leader: Dr. I.D. Idrisu (NOUN) Course Coordinator: ANTHONY EHIAGWINA Course Editor: March, 2014 STATISTICS FOR MANAGEMENT SCIENCES II NOUN 9 STATISTICS FOR MANAGEMENT SCIENCES II (BHM202) NOUN 10 STATISTICS FOR MANAGEMENT SCIENCES II (STT206) NOUN 11 STATISTICS FOR MANAGEMENT SCIENCES II (STT206) SMS STATISTICS CONTENTS PAGES Module 1: Role and Concepts of Statistics Unit 1: Role of Statistics (Application of Statistics) …………………….... 4 Unit 2 Measurement of Variables…………………………………………..9 Unit 3: Measurement of Dispersion, Skewness and Kurtosis.........................13 Unit 4 Decision Analysis and Administration..............................................29 Module 2: INDEX NUMBER AND SAMPLING THEORIES Unit 1: Index Number...............................................................................41 Unit 2: Statistical Data................................................................................51 Unit 3: Sample and Sampling Theory.............................................................55 Unit 4: Estimation Theory.............................................................65 Module 3: CORRELATION AND REGRESSION ANALYSIS Unit 1: Correlation Theory and Goodness of Fit................................................71 Unit 2: Pearson’s Correlation Co-efficient.........................................................75 Unit 3: Spearman’s Regression Analysis...........................................................83 Unit 4: Ordinary Lease Square Estimation (Regression)...................................95 Unit 5: Multiple Regression Analysis………………………………………...104 Module 4: STATISTICAL TEST Unit 1: Hypothesis AND T-tests ………………………................................109 Unit 2 F- Tests………………………………….............................................115 Unit 3: Chi-Square Distribution ……………................................................120 Unit 4: ANOVA…………………………………………………..................135 Unit 5: Forecasting and Time Series Analysis...............................................152 NOUN 12 STATISTICS FOR MANAGEMENT SCIENCES II (STT206) MODULE 1 Roles and Concepts of Statistics The general aim of this module is to provide you with a thorough understanding of Roles and Concepts of Statistics. Main focus here is to present you with the common roles and concepts of statistics as a general background to the course. The role of statistics and measurement of variables are brought to you. The four units that constitute this module are statistically linked. By the end of this module you would have been able to list, differentiate and link these common statistics functions as well as identify and use them to solve related statistical problems. These units to be studied are; Unit 1: Role of Statistics (Application of Statistics) Unit 2: Basic Concepts in Statistics Unit 3: Measurement of Variables Unit 4 Measurement of Moments UNIT 1: ROLE OF STATISTICS (APPLICATION OF STATISTICS) CONTENTS 1.0 Introduction 2.0 Objectives 3.0 Main Content 3.1 Definition of Statistics 3.2 Role of Statistics 3.3 Basic Concept in Statistics 4.0 Conclusion 5.0 Summary 6.0 Tutor-Marked Assignment 7.0 References/Further Reading NOUN 13 STATISTICS FOR MANAGEMENT SCIENCES II (STT206) 1.0 Introduction You will realize that the activities of man and those of the various organizations, that will often be referred to as firms, continue to increase. This brings an increase in the need for man and the firms to make decisions on all these activities. The need for the quality and the quantity of the information required to make the decisions increases also. The management of any firm requires scientific methods to collect and analyze the mass of information it collects to make decisions on a number of issues. Such issues include the sales over a period of time, the production cost and the expected net profit. In this regard, statistics plays an important role as a management tool for making decisions. 2.0 Objectives By the end of this unit, you should be able to: Understand the various definitions of statistics Describe the uses of statistics Define the basic concepts in statistics. 3.1 Definitions of Statistics Statistics can be defined as a management tool for making decision. It is also a scientific approach to presentation of numerical information in such a way that one will have a maximum understanding of the reality represented by such information. Statistics is also defined as the presentation of facts in numerical forms. A more comprehensive definition of statistics shows statistics as a scientific method which is used for collecting, summarizing, classifying, analyzing and presenting information in such a way that we can have thorough understanding of the reality the information represents. From all these definitions, you will realize that statistics are concerned with numerical data.. Examples of such numerical data are the heights and weights of pupils in a primary school when evaluating the nutritional well being of the pupils and the accident fatalities on a particular road for a period of time. You should also know that when there are numerical data, there must be non- numerical data such as the taste of brands of biscuits, the greenness of some vegetables and the texture of some joints of a wholesale cut of meat. Non- numerical data cannot be subjected to statistical analysis except they are transformed to numerical data. To transform greenness of vegetables to numerical data, a five point scale for measuring the colour can be developed with 1 indicating very dull and 5 indicating very green. 3.2 The Roles of Statistics You will realize that statistics is useful in all spheres of human life. A woman with a given amount of money, going to the market to purchase foodstuff for the family, takes decision on the types of food items to purchase, the quantity and the quality of the items to maximize the satisfaction she will derive from the NOUN 14 STATISTICS FOR MANAGEMENT SCIENCES II (STT206) purchase. For all these decisions, the woman makes use of statistics Government uses statistics as a tool for collecting data on economic aggregates such as national income, savings, consumption and gross national product. Government also uses statistics to measure the effects of external factors on its policies and to assess the trends in the economy so that it can plan future policies. Government uses statistics during census. The various forms sent by the government to individuals and firms on annual income, tax returns, prices, costs, output and wage rates generate a lot of statistical data for the use of the government Business uses statistics to monitor the various changes in the national economy for the various budget decisions. Business makes use of statistics in production, marketing, administration and in personnel management. Statistics is also used extensively to control and analyze stock level such as minimum, maximum and reorder levels. It is used by business in market research to determine the acceptability of a product that will be demanded at various prices by a given population in a geographical area. Management also uses statistics to make forecast about the sales and labour cost of a firm. Management uses statistics to establish mathematical relationship between two or more variables for the purpose of predicting a variable in terms of others. For the conduct and analyses of biological, physical, medical and social researches, we use statistics extensively. 3.3 Basic Concepts In Statistics Let us quickly define some of the basic concepts you will continue to come across in this course. Entity: This may be person, place, and thing on which we make observations. In studying the nutritional well being of pupils in a primary school, the entity is a pupil in the school. Variable: This is a characteristic that assumes different values for different entities. The weights of pupils in the primary school constitute a variable. Random Variable: If we can specify, for a given variable, a mathematical expression called a function, which gives the relative frequency of occurrence of the values that the variable can assume, the function is called a probability function and the variable a random variable. Quantitative Variable: This is a variable whose values are given as numerical quantities. Examples of this is the hourly patronage of a restaurant Qualitative Variable: This is a variable that is not measurable in NOUN 15 STATISTICS FOR MANAGEMENT SCIENCES II (STT206) numerical form or that cannot be counted. Examples of this are colours of fruits, taste of some brands of a biscuit. Discrete Variable: This is the variable that can only assume whole numbers. Examples of these are the number of Local Government Council Areas of the States in Nigeria, number of female students in the various programmes in the National Open University. A discrete variable has "interruptions" between the values it can assume. For instance between 1 and 2, there are infinite number of values such as 1.1, 1.11, 1.111, 1.IV land so on. These are called interruptions. Continuous Variable: This is a variable that can assume both decimal and non decimal values. There is always a continuum of values that the continuous variable can assume. The interruptions that characterize the discrete variable are absent in the continuous variable. The weight can be both whole values or decimal values such as 20 kilograms and 220.1752 kilograms. Population: This is the largest number of entities in a study. In the study of how workers in Nigeria spend their leisure hours, the number of workers in Nigeria constitutes the population of the study. Sample: This is the part of the population that is selected for a study. In studying the income distribution of students in the National Open University, the incomes of 1000 students selected for the study, from the population of all the students in the Open University will constitute the sample of the study. Random Sample: This is a sample drawn from a population in such a way that the results of its analysis may be used to generalize about the population from which it was drawn. Exercise 1.1 What is the importance of Statistics to human activities? Your answer can be obtained in section 3.2 of this unit. 4.0 Conclusion In this unit you have learned a number of important issues that relate to the meaning and roles of statistics. The various definitions and examples of concepts given in this unit will assist tremendously in the studying of the units to follow. 5.0 Summary What you have learned in this unit concerns the meaning and roles of statistics, and the various concepts that are important to the study of statistics. 6.0 Tutor Marked Assignment What is Statistics? Of what importance is statistics? NOUN 16 STATISTICS FOR MANAGEMENT SCIENCES II (STT206) 7.0 REFERENCES /FURTHER READING AJAYI J. NOUN TEXT BOOK, BHM 106: STATISTICS FOR MANAGEMENT SCIENCES II JUDE, MICAN & EDITH N. Statistical & Quantitative Methods for Construction & Business Managers NOUN 17 STATISTICS FOR MANAGEMENT SCIENCES II (STT206) UNIT 2: MEASUREMENT OF VARIABLES CONTENTS 1.0 Introduction 2.0 Objectives 3.0 Main Content 3.1 Definition of Variable 3.2 Measurement of Variables 3.3 Variance of Binomial Distribution 4.0 Conclusion 5.0 Summary 6.0 Assignment 7.0 References/Further Reading 1.0 INTRODUCTION Variable can be used under the following conditions: Information, which are not numeric in nature are cafled qualitative verb instance, information on colour of the skin, colour of the eye or hair, level, of education, si status, and other qualitative categories as building types are qualitative variables. (Variables can be assigned numerical values. This assignment of numerical values to information is called coding. Also these qualitative data can be arranged in order of impoi values assigned to them in that order. This is called ranking..2.0 OBJECTIVES The aim of this unit is to enable student understand the meaning of variable and instances when it is applicable. 3.0 MAIN CONTENT 3. 3.1 What is a Variable? A variable is any characteristic of an object or concept that is capable different values or falling into more than one distinct category. For instance, a bull object, but the characteristics of a building such as size, type, cost and age are varil NOUN 18 STATISTICS FOR MANAGEMENT SCIENCES II (STT206) Also rain is an object, but the amount of rain is a variable. Other variables Inc height, sex, weight, colour of the skin, hair colour, genotype, blood group, maci religious affiliation, level of education attained, place of residence of a person strength of Dangote cement, tensile strength, number of bags of cement in the star of bags of cement used in the site per day, expenditure, income of household per r degree of satisfaction, level of intelligence etc. Therefore, any characteristic of an vary in time and space is called a variable. Statistical raw data are generated or provided by these variables. That is, attached to the variables constitutes statistical data. A single value of a variable it observation, an item, a score or a case. Quantitative variable can be classified into two major types, viz. dis continuous variables. 3.1.2. Discrete variables are variable those values are whole numbers or integers. Th fractional part, they are countahs or finite. Examples of discrete variables include housing unit, number of students in a class, number of goals scored in a fooft number of cars sold etc. 3.1.3. Continuous variables are variables that assume any value ii’ithin an interval or r have the property of infinite divisibility. They can assume fractional values. Example weight, height, cost, scores, income, breaking strength etc. 3.2. Measurement of variables There are four measurement scales available as insrrumenrs for measurinl These scales easily identiQi variables. The scales are nominal, ordinal, interval and Nominal scale - This scale groups the objects into distinct categories to facilitate referencing. Alt is attached to each distinct category. Examples of nominal scale variables include sex, in religious or party affIliation, genotype, blood group, place of residence, etc. Also, we the various categories of the nominal variables with numbers (or codes): When this number or code is mere label or mere identification mark, which do not permit an operation. For instance, marital status may be categorized as married, separated, divorce married, If we assign 1 to married, 2 to separated, 3 to divorced and 4 to never marry, the married. If we assign 1 to married, 2 to separted, 3 to divorced and 4 to never marry, these numbers ine coc The numbers do not indicate order of importance of the various categories and the sum of land 3 can not produce categoiy 4. This is the lowest scale of measurement. O, sCar. NOUN 19 STATISTICS FOR MANAGEMENT SCIENCES II (STT206) This scale ranks or orders the mutually exdusive categories of the variables according to the importance attached to each category. This scale has all the properties of the nominal scale plus the additional property of ordering or ranking the categories. Examples are, a teacher rating his students according to their performance — A, B, C, D, E, and F or 1”, 3 inco me groups of individuals dassifled as high, medium, and low, dassiflcanon of a city according to high, medium and low density of population concentration. The numbers assigned to each variable category only help toortleror rank the observations in ascending or descending order. Many statist!ca! npcrntions that are based on ranking or rank ordering are permissible under this scale. Examples of such statistical techniques are Spearman’s rank correlation coefficient, Wilcoxon rank- sum test, signed rank test etc. This scale is higher than the normal scale. This has the combined properties of the nominal and ordinal scale plus the additional property of measuring the distance or interval between two measurements. This scale gives information on how much one category is more or less than the other. Examples are age in years, income, pressure, and temperature. This scale has no absolute zero. That is, the selected zero point in this scale is arbitrary. That a student scored zero percent in examination does not mean that he does not know anything in that course. Interval variables are quantitative and may be discrete or continuous. As such arithmetic operations of addiction and subtraction are permitted. Many statistical procedures are permissible in this scale, the mean, standard deviation, product moment correlation coefficient and other statistical inferences are possible on this scale. Ratio scale This scale has all the properties of the nominal, ordinal and interval scales including the additional property of having an absolute zero point. This is the highest level of measurement. Examples are measurement of height, weight, volume, price of an item, votes scored in an election, etc. many statistical procedures are available for ratio scale data. Note that the scale of measurement of variables determines the type of statistical tool to be employed. 4.0 CONCLUSION In probability theory and statistics, the Binomial distribution is the discrete probability distribution of the number of successes in a sequence of nindependent yes/no experiments, NOUN 20 STATISTICS FOR MANAGEMENT SCIENCES II (STT206) each of which yields success with probabilityp. Such a success/failure experiment is also called a Bernoulli experiment or Bernoulli trial; when n = 1, the binomial distribution is a Bernoulli distribution. The binomial distribution is the basis for the popular binomial test of statistical significance. The Binomial distribution is frequently used to model the number of successes in a sample of size n drawn with replacement from a population of size N. If the sampling is carried out without replacement, the draws are not independent and so t he resulting distribution is a hypergeometric distribution, not a binomial one. However, for N much larger than n, the binomial distribution is a good approximation, and widely used. 5.0 SUMMARY You have been made to understand in this unit that the meaning of variables. And the measurement of various variables.. Therefore, in summary, the measurement of variable describes the behaviour of a scale, if the following conditions apply: 1. The Ratio Scale. 2. Nominal Scale. 3. Ordinal Scale. 4. Interval Scale. If in your application of variables, these conditions are met, then statistical scale has a meaning. 6.0 TUTOR-MARKED ASSIGNMENT 1. What is a variable? Distinguish between quantitative and qualitative variables, discrete and continuous variables. 2. Write short notes on: Nominal scale (ii) Ordinal scale (iii) Interest scale (iv) Ratio scale 7.0 REFERENCES/FURTHER READINGS ONWE J.O. NOUN TEXT BOOK, ENT 321: Quantitative Methods for Business Decisions OTOKOTI O.S. Contemporary Statistics JUDE, MICAN & EDITH N. Statistical & Quantitative Methods for Construction & Business Managers KEHINDE J.S. Statistics Method & Quantitative Techniques NOUN 21 STATISTICS FOR MANAGEMENT SCIENCES II (STT206) UNIT 3: MEASURES OF DISPERSION, SKEWNESS AND KURTOSIS CONTENTS 1.0 Introduction 2.0 Objectives 3.0 Main Content 3.1 Measurement of Dispersion 3.2 Measure of Skewness 3.3 Kurtosis 4.0 Conclusion 5.0 Summary 6.0 Assignment 7.0 References / Further Reading 1.0 INTRODUCTION The second most important characteristics which describe a set of data is the amount of variation, scatter, or spread in the data. In this chapter, we discuss in detail the various measures of dispersion and skewness. The purpose of these measures is to amplify the imperfect summary of any statistical distribution usually provided by the three measures of averages commonly used: the mean, the median, and the mode. These averages are inherently unsatisfactory because no single measure of average can tell you everything about a distribution, and the wider the dispersion of a given data around the average, the less satisfactory the average becomes. In order to improve your understanding of population averages, you need to know how wide the dispersion is around the average, and whether it is symmetrical (un-skewed) or asymmetrical (skewed). The first set of measures to be discussed here are measures of dispersion, and the second set measures of skewness. NOUN 22 STATISTICS FOR MANAGEMENT SCIENCES II (STT206) Fig. 1: Normal Curve 2.0 OBJECTIVE The main aim of this unit is to ensure students’ proper understanding of the measurement of dispersion and skewness; appreciate its applicability in day-to-day business and scientific live and be able to use it as appropriate in practical statistical studies 3.0 MAIN CONTENT 3.1 MEASURES OF DISPERSION The common measures of dispersion include: (a) The Range (b) The Quartile Deviation (c) Mean Deviation (d) Variance (e) Standard Deviation (f) Coefficient of Variation The variation or dispersion can be said to measure the degree of uniformity of observations in a given set of data. The greater the variation, the more un-uniform the observations in a given set of data The Range The Range (R) of a given set of ungrouped data can be determined from an ordered array as the difference between the highest observation and the lowest observation in a distribution.. Let Xh = Highest observation XL = Lowest observation NOUN 23 STATISTICS FOR MANAGEMENT SCIENCES II (STT206) Then, R = Xh-XL Given the arrayed data: X = 2,5,8,9,12,13,18, the range will be: R = 18 – 2 = 16. The range can be an unsatisfactory measure of dispersion because it is affected by extreme values or items which renders it unrepresentative of majority of the set of data. The Quartile Deviation Unlike the range, quartile deviation does not take extreme values or items. Quartiles are the boundaries separating the items in a given distribution or set of data into quarters. There are, therefore, three quartiles: the lower quartile (at the 25 percent mark); the median (at the 50 percent mark); and, the upper quartile (at the 75 percent mark). To compute the quartiles of ungrouped data, you simply use: 0.25 (n + 1), for the lower quartile 0.50 (n + 1), for the median quartile 0.75 (n + 1), for the upper quartile For grouped data, you simply use: 0.25n for the lower quartile NOUN 24 STATISTICS FOR MANAGEMENT SCIENCES II (STT206) 0.5n for the median quartile 0.75n for the upper quartile Example Consider the following output distribution of the employees of a manufacturing company: Table 3.1: Output of Employees Units of Output Number of Employees (f) 21 – 30 7 31 – 40 11 41 – 50 14 51 – 60 8 61 – 70 5 Table 3.1 indicates that there are 45 items or observations ( ie. total number of employees or sum of the frequencies, f). Using these information, the quartiles are as follows: Lower quartile (Q1) = 0.25n = 0.25(45) = 11.25 th item Median quartile (Q2) = 0.5n = 0.5(45) = 22.5th item Upper quartile (Q3) = 0.75n = 0.75(45) = 33.75 th item The values of the quartile items are determined simply as follows: NOUN 25 STATISTICS FOR MANAGEMENT SCIENCES II (STT206) Lower quartile: Since, according to table 3.1, there are 7 items in the first group (ie, group of 21 – 30), the quartile item is the (11.25 – 7) = 4.25th item of the second group. Thus, Value of the lower quartile (Q1) = 30 + (4.25) x 10 units 11 = 30 + 3.66 = 34 approximately. Therefore, the value of the lower quartile is about 34 units. In a similar process, the value of the median and upper quartiles can be determined, thus: Value of Median quartile: The 22.5th item in the distribution is in the 41 – 50 group and is the (22.5 – 18 ) = 4.5th item out of 14 in the group (note that the figure 18 is the cumulative frequency of the first and second groups,and the figure 10 appearing in the calculations is the class interval of the distribution). The value of the median quartile (Q2) is therefore: Q2 = 40 + (4.5) x 10 14 = 40 + 3.21 = 43.21 = 43 units approximately. Value of the Upper quartile (Q3): The 33.75 th item in the distribution in the third group, the group of (41 – 50), and since there are 32 items in the third group (the cumulative frequency), the median is the (33.75 – 32) = 1.75th item in the fourth group. The value of the upper quartile is therefore: Q3 = 50 + (1.75) x 10 8 = 50 + 2.19 = 52.19 = 52 units approximately. NOUN 26 STATISTICS FOR MANAGEMENT SCIENCES II (STT206) The quartile deviation referred to as the semi-interquartile range is defined as one-half the difference between the upper quartile and the lower quartile. Thus, Quartile Deviation = Q3 –Q1 2 In this example, therefore, the quartile deviation is: 52 –34 = 9 units 2 The distribution in table 3.1 can then be described as having a median value of 43 units and a quartile deviation around the median value of 9 units. The Mean Deviation (MD) The Mean Deviation can be defined simply by the following relationship: MD = Σ/X-X/ n where Σ /X-X/ = sum of the absolute values of deviation from arithmetic mean n = number of observation As an example, consider again the arrayed data, X = 2,5,8,9,12,13,18. The mean deviation, MD, can be computed as follows: X = ΣX = 67 = 9.57 n 7 NOUN 27 STATISTICS FOR MANAGEMENT SCIENCES II (STT206) By tabulation, X (X - X)/X - X/ 2 -7.57 7.57 5 -2.57 2.57 8 -1.57 1.57 9 -0.57 0.57 12 2.43 2.43 13 3.43 3.43 18 8.43 8.43_ Σ /X-X/ = 26.57 Thus, MD = ∑/X-X/=26.57 = 3.7957 n 7 The Variance The Variance for a given set of an ungrouped data can be defined by: Variance = S2 = ∑x2-(∑x)2 n n-1 where X represents the numerical values of the given set of an ungrouped data. Continuing with our earlier example, where X = 2,5,8,9,12,13,18 and by tabulation: NOUN 28 STATISTICS FOR MANAGEMENT SCIENCES II (STT206) X X2 2 4 5 25 8 64 9 81 12 144 13 169 18 324 ∑X = 67; ∑X2 = 811; (∑X)2 = (67)2 = 4489 = 641.29 n 7 7 Thus, S2 = ∑x2-(∑x)2/n = 811-641.29 n-1 7-1 = 169.71 = 28.285 6 Thus, the variance of the given set of ungrouped data is 28.285. The Standard Deviation Simply stated, the standard deviation is the most useful measure of variation. It can be defined as the square root of the variance for a given set of data. Thus, Standard deviation = S = √S2 Or, S = ∑X2-(∑X)2/n, for ungrouped data. n-1 The standard deviation for the last example is: NOUN 29 STATISTICS FOR MANAGEMENT SCIENCES II (STT206) S = √S2 = √28.285 = 5.318 Variance And Standard Deviation For A Grouped Data The computation of variance and standard deviation for a grouped data is illustrated with the following example. The Variance and Standard Deviation for a grouped data are defined by the following formulations: Variance = S2 = ∑fx2-(Σfx)2/n_ n-1 Standard deviation = √S2 = (∑fx2 – (∑fx)2/n n-1 Example. The following data presents the profit ranges of 100 firms in a given industry. Profits(N’millions) No. of Firms (f) 10-15 8 16-21 18 22-27 20 28-33 12 34-39 15 40-45 17 46-51 10 ∑f=n=100 We are required to compute the variance and standard deviation of profits within the industry. Solutions. By definition, Variance = S2=∑fx2-(∑fx)2/n n-1 NOUN 30 STATISTICS FOR MANAGEMENT SCIENCES II (STT206) Standard Deviation = √S 2 = ∑fx2-(∑fx)2/n n-1 The computational process is as follows: Profits Frequency Mid-Value (N millions) (f) (x) fx x2 fx2 10-15 8 12.5 100 156.25 1250 16-21 18 18.5 333 342.25 6160.5 22-27 20 24.5 490 600.25 12005 28-33 12 30.5 366 930.25 11163 34-39 15 36.5 547.5 1332.25 19983.75 40-45 17 42.5 722.5 1806.25 30706.25 46-51 10 48.5 485 2352.25 23522.50 ∑f=n=100 ∑fx=3044 SUMMARY: ∑fx2 = 104791 ∑fx=3044 (∑fx)2 = (3044)2 = 92659.36 n 100 ∑fx2 = 104791 It follows that: NOUN 31 STATISTICS FOR MANAGEMENT SCIENCES II (STT206) Variance = S2 = ∑fx2-(∑fx)2/n = 104791-92659.36 n-1 100-1 = 12131.64 = 122.54 99 Standard Deviation = √S 2 = √122.54 = 11.07 Thus, the required variance and standard deviation are 122.54 and 11.07 respectively. The Coefficient of Variation Unlike other measures of variability, the coefficient of variation is a relative measure. It is particularly useful when comparing the variability of two or more sets of data that are expressed in different units of measurements. The coefficient of variation measures the standard deviation relative to the mean and is computed by: Coefficient of Variation = CV = S x 100% X The coefficient of variation is also useful in the comparison of two or more sets of data which are measured in the same units but differ to such an extent that a direct comparison of the respective standard deviations is not very helpful. As an example, suppose a potential investor is considering the purchase of shares in one of two companies, A or B, which are listed on the Nigerian Stock Exchange (NSE). If neither company offered dividends to its shareholders and if both companies were rated equally high in terms of potential growth, the potential investor might want to consider the volatility of the two stocks to aid in the investment decision. Now, suppose each share of stock in Company A has averaged N50 over the past months with a standard deviation of N10. In addition, suppose that in this same time perio d, the price per share for Company B’s stock averaged N12 with a standard deviation of N4. Observe that in terms of actual standard deviations, the price of Company A’s shares seems to be more volatile than that of Company B. However, since the average prices per share for the two stocks are so different, it would be more appropriate for the potential investor to consider the NOUN 32 STATISTICS FOR MANAGEMENT SCIENCES II (STT206) variability in price relative to the average price in order to examine the volatility/stability of two stocks. The coefficient of variation of company A’s stock is CVA = SA x 100% = N10 x 100% = 20% XA N50 That of Company B’s is CVB = SB x 100% = N4 x 100% = 33.3% XB N12 It follows that relative to the average, the share price of company B’s stock is much more variable/unstable than that of Company A. 3.2 MEASURES OF SKEWNESS The measures of skewness are generally called Pearson’s first coefficient of skewness and Pearson’s second coefficient of skewness. Measures of skewness are used in determining the degree of asymmetry of a distribution; a distribution which is not symmetrical is said to be skewed. The Pearson’s No. 1 Coefficient of skewness: The formula used in calculating Pearson’s No. 1 coefficient is: Sk = Mean –Mode   Notice that the mean, the mode, and the standard deviation are all expressed in the units of the original data. When the difference between the mean and the mode is computed as a NOUN 33 STATISTICS FOR MANAGEMENT SCIENCES II (STT206) fraction as a fraction of the standard deviation ( or average spread of the data around the mean), the original units cancel out in the fraction. The result will be a coefficient of skewness, a number which tells you the extent of the skewness in the distribution. Example: Consider a set of data on monthly sales of a company’s product, the mean of which was found to be N240,000; the mode found to be N135,000; and the standard deviation found to be N85,000. The Pearson’s No. 1 Coefficient of skewness would be calculated as follows: Sk = mean –mode = 240,000 –135,000  85,000 = 1.24 3.2 KURTOSIS Kurtosis measures the degree of peakedness of a distribution. It is usually taken relative to a normal distribution. There are usually three types of kurtosis namely. LEPTOKURTIC, PLATYKURTIC and MESOKURTIC. The mesokurtic is otherwise known as normal distribution curve i.e. the curve that is moderately distributed. The figures below show the relative peakedness of distribution of data. The moment coefficient of kurlosis is used to calculate the peakedness of a distribution. However, for normal distribution (mesokurtic). The moment coefficient is given as b = a = 3. If moment coefficient of kurtosis a > 3 it is said NOUN 34 STATISTICS FOR MANAGEMENT SCIENCES II (STT206) to be leptokurtic: If a < 3 ii is equal to platykurtic and it is a1led mesokurtic when a = 3. Example calculates the first four moments about the means for the weight distribution of the students in National Open University of Nigeria given below: Class(X) 51-53 54-56 57-59 60-62 63-65 f 4 17 41 26 7 Solution: Class(X) MD F U=xA/c FU F(U2) F(U3) F(U4) 51-53 52 4 -2 -8 16 -32 64 54-56 55 17 -1 -17 17 -17 17 57-59 58 41 0 0 0 0 0 60-62 61 26 1 26 26 26 26 63-65 64 7 2 14 28 56 112 95 15 87 33 219 A= Average Mean of ‘X’=58, C= class interval= 53.5-50.5=3. M1= (€fu/€f) c = 15/95 x 3 = 0.474 M2 = [€f(u2)/€f] C2 = 87/95 x 32 =8.242 M3 = [€f(u3)/€f] C3 = 33/95 x 33 = 9.379 M4 = [€f(u4)/€f] C4 = 219/95 x 34 = 186.73 Thus m1 = 0 M2 = m2-(m1)2 = 8.242 –(0.474)2 = 8.017 M3 = m3 – 3m1 m2 + (m1)3 = 9.379 – 3(0.474)(8.242) + (0.474)3 = 9.379 – 11.720124 + 0.1065 = -2.235 NOUN 35 STATISTICS FOR MANAGEMENT SCIENCES II (STT206) M4 = m4 – 4m1m3 + 6(m1)2m2 – 3(m1)4 M4 = 186.73 – 4(0.474)(9.379) + 6(0.474)2(8.242) – 3(0.474)4 = 186.73 – 17.783 + 11.1107 – 0.1514 M4 = 179.91 Then moment coefficient of kurtosis is A4 = m4/54 = m4/(m2)2 = 179.91/(8.017)2 = 2.799. Since a4 = 2.799 σ22, in numerical problems we will tale greater of the variances or as the numerator and adjust for the degree of freedom accordingly. Thus, in F ~ (v1, v2), v1 refers to the degree of freedom of the larger variance, which must be taken as the numerator while computing F. If Ho is true i.e. σ12 = σ22 = σ2 the value of F should be around 1, otherwise, it should be greater than 1. If the value of F is far greater than 1 the Ho should be rejected. Finally, if we take larger of or as the numerator, all the tests based on the F-statistic become right tailed tests. - All one tailed tests for Ho at level of significance “α” will be right tailed t ests only with area “α” in the right. - For two-tailed tests, the critical valuesare located in the right tail of F-distribution with area (α/2) in the right tail. Example 1: The time taken (in minutes) by drivers to drive from Town A to Town B driving two different types of cars X and Y is given below Car Type X: 20 16 26 27 23 22 Car Type Y: 27 33 42 35 32 34 38 Do the data show that the variances of time distribution from population from which the samples are drawn do not differ significantly? Solution: X d = x – 22 d2 Y d = y -35 D2 20 -2 4 27 -8 64 16 -6 36 33 -2 4 26 4 16 42 7 49 NOUN 127 STATISTICS FOR MANAGEMENT SCIENCES II 25 (BHM202) 5 9 35 0 0 23 1 1 32 -3 9 22 0 0 34 -1 1 38 3 9 Total 2 d2 = 82 -4 ΣD2 =136 Since, , under Ho, the test statistic is Tabulated F0.05(6,5) =4.95 Since the calculated F is less than tabulated F, it is not significant. Hence Ho may be accepted at 5% level of significance or risk level. We may therefore conclude that variability of the time distribution in the two populations is same. 4.0 CONCLUSION In conclusion, F-test can be used to test the equality of several population variances, several population means, and overall significance of a regression model. 5.0 SUMMARY Students have learnt the theories and application of the F-test NOUN 128 STATISTICS FOR MANAGEMENT SCIENCES II 6.0 TUTOR-MARKED ASSIGNMENT (BHM202) Can the following two samples be regarded as coming from the same normal population? Sample Size Sample Mean Sum of squares of deviation from the mean 1 10 12 120 2 12 15 314 7.0 REFERENCE/FURTHER READING OKOJIE, DANIEL E. NOUN TEXT BOOK, Eco 203: Statistics for Economists Spiegel, M. R., Stephens L.J., (2008).Statistics.(4th ed.). New York, McGraw Hill press. Swift L., (1997).Mathematics and Statistics for Business, Management and Finance.London UK, Macmillan. NOUN 129 STATISTICS FOR MANAGEMENT SCIENCES II (BHM202) UNIT 3: CHI-SQUARE TEST CONTENTS 1.0 Introduction 2.0 Objectives 3.0 Main Content 3.1 Application of Chi-Square Distribution 3.2 Chi-squared test of goodness of fit 3.3 Steps for computing χ2 and drawing conclusions 3.4 Chi-Square test for independence of attributes 4.0 Conclusion 5.0 Summary 6.0 Assignment 7.0 References/ Further Reading 1.0 INTRODCUTION The square of a standard normal variable is called a Chi-square variate with 1 degree of freedom, abbreviated as d.f. Thus if x is a random variable following normal distribution with mean μ and standard deviation , then (X- μ)/ is a standard normal variate. Therefore, is a chi-square (abbreviated by the letter χ2 of the Greek alphabet) variate with 1 d.f. If X1, X2, X3,...........................Xv are v independent random variables following normal distribution with means μ1, μ2, μ3,................... μv, and standard deviations σ1, σ2, σ3,..... σv respectively then the variate χ2 = + = which is the sum of the squares of v independent standard normal variates, follow Chi-square distribution with v d.f. 2.0 OBJECTIVE NOUN 130 STATISTICS FOR MANAGEMENT SCIENCES II The main objective of this unit (BHM202) is to enable students understand the theory behind and the application of chi-square statistics. Students are expected at the end of this unit to be able to apply chi-square analysis to solving day-to-day business and economic problems. 3.0 MAIN CONTENTS 3.1 Applications of the χ2-Distribution Chi-square distribution has a number of applications, some of which are enumerated below: (i) Chi-square test of goodness of fit. (ii) χ2-test for independence of attributes (iii) To test if the population has a specified value of variance σ 2. (iv) To test the equality of several population proportions Observed and Theoretical Frequencies Suppose that in a particular sample a set of possible events E1, E2, E3,..................Ek are observed to occur with frequencies O1, O2, O3,..........Ok, called observed frequencies, and that according to probability rules they are expected to occur with frequencies e1, e2, e3,.....ek, called expected or theoretical frequencies. Often we wish to know whether the o bserved frequencies differ significantly from expected frequencies. Definition of χ2 A measure of discrepancy existing between the observed and expected frequencies is supplied by the statistics χ2 given by χ2 = 3.2 Chi-Square test of goodness of fit The chi-square test can be used to determine how well theoretical distributions 9such as the normal and binomial distributions) fit empirical distributions (i.e. those obtained from sample data). Suppose we are given a set of observed frequencies obtained under some experiment and we want to test if the experimental results support a particular hypothesis or theory. Karl Pearson in 1900, developed a test for testing the significance of the discrepancy between experimental values and the theoretical values obtained under some theory or hypothesis. This test is known as χ2-test of goodness of fit and is used to test if the deviation between NOUN 131 STATISTICS FOR MANAGEMENT SCIENCES II observation (experiment) and theory may be (BHM202) attributed to chance (fluctuations of sampling) or if it is really due to the inadequacy of the theory to fit the observed data. Under the null hypothesis that there is no significant difference between the observed (experimental and the theoretical or hypothetical values i.e. there is good compatibility between theory and experiment. Karl Pearson proved that the statistic χ2 = = Follows χ2-distribution with v = n-1, d.f. where O1, O2,..................On are the observed frequencies and E1, E2,..................En are the corresponding expected or theoretical frequencies obtained under some theory or hypothesis. 3.3 Steps for computing χ2 and drawing conclusions (i) Compute the expected frequencies E1, E2,.....................En corresponding to the observed frequencies O1, O2,...................On under some theory or hypothesis (ii) Compute the deviations (O-E) for each frequency and then square them to obtain (O-E)2. (iii) Divide the square of the deviations (O-E)2 by the corresponding expected frequency to obtain (O-E)2/E. (iv) Add values obtained in step (iii) to compute χ2 = (v) Under the null hypothesis that the theory fits the data well, the statistic follows χ 2- distribution with v = n-1 d.f. (vi) Look for the tabulated (critical) values of χ2 for (n-1) d.f. at certain level of significance, usually 5% or 1%, from any Chi-square distribution table. If calculated value of χ2 obtained in step (iv) is less than the corresponding tabulated value obtained in step (vi), then it is said to be non-significant at the required level of significance. This implies that the discrepancy between observed values (experiment) and the expected values (theory) may be attributed to chance, i.e. fluctuations of sampling. In other words, data do not provide us any evidence against the null hypothesis [given in step (v)] which may, therefore, be accepted at NOUN 132 STATISTICS FOR MANAGEMENT SCIENCES II (BHM202) the required level of significance and we may conclude that there is good correspondence (fit) between theory and experiment. (vii) On the other hand, if calculated value of χ2 is greater than the tabulated value, it is said to be significant. In other words, discrepancy between observed and expected frequencies cannot be attributed to chance and we reject the null hypothesis. Thus, we conclude that the experiment does not support the theory. Example 1:A pair of dice is rolled 500 times with the sums in the table below Sum (x) Observed Frequency 2 15 3 35 4 49 5 58 6 65 7 76 8 72 9 60 10 35 11 29 12 6 Take α = 5% It should be noted that the expected sums if the dice are fair, are determined from the distribution of x as in the table below: Sum (x) P(x) 1 2 /36 2 3 /36 3 4 /36 4 5 /36 5 6 /36 NOUN 133 STATISTICS FOR MANAGEMENT SCIENCES II 6 (BHM202) 7 /36 5 8 /36 4 9 /36 3 10 /36 2 11 /36 1 12 /36 To obtain the expected frequencies, the P(x) is multiplied by the total number of trials Sum (x) Observed P(x) Expected frequency Frequency (O) (P(x).500) 1 2 15 /36 13.9 2 3 35 /36 27.8 3 4 49 /36 41.7 4 5 58 /36 55.6 5 6 65 /36 69.5 NOUN 134 STATISTICS FOR MANAGEMENT SCIENCES II 6 (BHM202) 7 76 /36 83.4 5 8 72 /36 69.5 4 9 60 /36 55.6 3 10 35 /36 41.7 2 11 29 /36 27.8 1 12 6 /36 13.9 Recall that χi2 = (Oi – Ei)2/Ei Therefore χ12 = (O1 – E1)2/E1 = (15 – 13.9)2/13.9 = 0.09 χ22 = (O2 – E2)2/E2 = (35 – 27.8)2/27.8 = 1.86 χ32 = (O3 – E3)2/E3 = (49 – 41.7)2/41.7 = 1.28 χ42 = (O4 – E4)2/E4 = (58 – 55.6)2/55.6 = 0.10 χ52 = (O5 – E5)2/E5 = (65 – 69.5)2/69.5 = 0.29 χ62 = (O6 – E6)2/E6 = (76 – 83.4)2/83.4 = 0.66 χ72 = (O7 – E7)2/E7 = (72 – 69.5)2/69.5 = 0.09 χ82 = (O8 – E8)2/E8 = (60 – 55.6)2/55.6 = 0.35 χ92 = (O9 – E9)2/E9 = (35 – 41.7)2/41.7 = 1.08 χ102 = (O10 – E10)2/E10 = (29 – 27.8)2/27.8 = 0.05 χ112 = (O11 – E11)2/E11 = (6 – 13.9)2/13.9 = 4.49 To calculate the overall Chi-squared value, recall that χ2 = i.e. we add the individual χ2 value. Therefore, χ2 = 0.09 + 1.86 + 1.28+ 0.10 + 0.29 + 0.66 + 0.09 + 0.35 + 1.08 + 0.05 + 4.49 χ2 = 10.34 For the critical value, since n=11, d.f. = 10 NOUN 135 STATISTICS FOR MANAGEMENT SCIENCES II (BHM202) Therefore, table value = 18.3 Decision: since the calculated value which is 10.34 is less than table (critical) value the null hypothesis is accepted. Conclusion: There is no significant difference between observed and expected frequencies. The slight observed differences occurred due to chance. Exercise: The following figures show the distribution of digits in numbers chosen at random from a telephone directory: Digit 0 1 2 3 4 5 6 7 8 9 Total Frequency 1,02 1,107 997 966 1,075 933 1,107 972 964 853 10,000 Test whether the digits may be taken to occur equally frequently in the directory. The table value of χ2 for d.f at 5% level of significance is 16.92. Hint: Set up the null hypothesis that the digits 0, 1, 2, 3,..........9 in the numbers in the telephone directory are uniformly distributed, i.e all digits occur equally frequently in the directory. Then, under the null hypothesis, the expected frequency for each of the digits 0, 1, 2, 3,.............9 is 10,000/10 = 1,000 1.4 Chi-Square test for independence of attributes Consider a given population consisting of N items divided into r mutually disjoint (exclusive) and exhaustive classes A1, A2,...................Ar with respect to (w.r.t) the attribute A, so that randomly selected item belongs to one and only one of the attributes A1, A2,...................Ar. Similarly, let us suppose that the same population is divided into s mutually disjoint and exhaustive classes B1, B2,...................Bsw.r.t another attribute Bs so that an item selected at random possesses one and only one of the attributes B1, B2,...................Bs can be represented in the following r x s manifold contingency e.g like below: B B1 B2......... Bj.......... Bs Total NOUN 136 STATISTICS FOR MANAGEMENT SCIENCES II A (BHM202) A1 (A1 B1) (A1 B2) (A1Bj)............ (A1Bs) (A1) A2 (A2 B1) (A2 B2)............ (A2Bj)............ (A2Bs) (A2) : : :......................... : : Ai (Ai B1) (Ai B2)............ (AiBj)............ (AiBs) (Ai) : : :.................... : : Ar (Ar B1) (Ar B2)............ ArBj............ (ArBs) (Ar) Total (B1) (B2)............ (Bj)............ (Bs) Where (Ai) is the frequency of the ith attribute Ai,i.e, it is, number of persons possessing the attribute Ai , i=1,2,.........r; (Bj) is the number of persons possessing the attribute Bj, j=1,2.....s; and (AiBj) is the number of persons possessing both the attributes Ai and Bj ; (i: 1, 2,......r; j: 1, 2,........, s) Under the hypothesis that the two attributes A and B are independent, the expected frequency for (Ai, Bi) is given by E[(AiBj)] = N.P [AiBj] = N.P[Ai∩B j] = N.P [Ai]. P[Bj] [By compound probability theorem, since attributes are independent] =N If (AiBj)o denotes the expected frequency of (AiBj) then (AiBj)o = ; (i = 1, 2,........,r; j=1,2,.........s) Thus, under the null hypothesis of independence of attributes, the expected frequencies for each of the cell frequencies of the above table can be obtained on using this last equation. The rule in the last can be stated in the words as follows: NOUN 137 STATISTICS FOR MANAGEMENT SCIENCES II “Under the hypothesis (BHM202) of independence of attributes the expected frequency for any of the cell frequencies can be obtained by multiplying the row totals and the column totals in which the frequency occurs and dividing the product by the total frequency N”. Here, we have a set of r x s observed frequencies (AiBj) and the corresponding expected frequencies (AiBj)o. Applying χ2–test of goodness of fit , the statistic χ2 = follows χ2-distribution with (r-1)X(s-1) degrees of freedom. Comparing this calculated value of χ2 with the tabulated value for (r-1)X(s-1) d.f.and at certain level of significance, we reject or retain the null hypothesis of independence of attributes at that level of significance. Note: For the contingency table data, the null hypothesis is always set up that the attributes under consideration are independent. It is only under this hypothesis that formula (AiBj)o = ; (i = 1, 2,........,r; j=1,2,.........s) can be used for computing expected frequencies. Example: A movie producer is bringing out a new movie. In order to map out her advertising, she wants to determine whether the movie will appeal most to a particular age group or whether it will appeal equally to all age groups. The producer takes a random sample from persons attending a pre-reviewing show of the new movie and obtained the result in the table below. Use Chi-square (χ2) test to arrive at the conclusion (α=0.05). Age-groups (in years) Persons Under 20 20-39 40– 59 60& over Total Liked the movie 320 80 110 200 710 Dislikedthe movie 50 15 70 60 195 Indifferent 30 5 20 40 95 Total 400 100 200 300 1,000 Solution: It should be noted that the two attributes being considered here are the age groups of the people and their level of likeness of the new movie. Our concern here is to determine whether the two attributes are independent or not. NOUN 138 STATISTICS FOR MANAGEMENT SCIENCES II Null hypothesis (Ho): Likeness (BHM202) of the of the movie is independent of age group (i.e. the movie appeals the same way to different age group) Alternative hypothesis (Ha): Likeness of the of the movie depends on age group (i.e. the movie appeals differently across age group) As earlier explained, to calculate the expected value in the cell of row 1 column 1, we divide the product of row 1 total and column 1 total by the grand total (N) i.e. NOUN 139 STATISTICS FOR MANAGEMENT SCIENCES II Eij = (AiBj)/N (BHM202) Therefore, E11 = E12 = E13 = E14 = E21 = E22 = E23 = E24 = E31 = E32 = E33 = E34 = We can get a table of expected values from the above computations NOUN 140 STATISTICS FOR MANAGEMENT SCIENCES II Table of expected values (BHM202) Under 20 20-39 40-59 60 &above Like 284 71 142 213 Dislike 78 19.5 39 58.5 Indifferent 38 9.5 19 28.5 χ2 value = = χ2 = where Oij are the observed frequencies while theEij are the expected values. NOUN 141 STATISTICS FOR MANAGEMENT SCIENCES II (BHM202) χ2calculated= =4.56+1.14+7.12+0.79+10.05+1.04+24.64+0.04+1.68+2.13+0.05+4.64 = 57.97 Recall, that the d.f. is (number of row minus one) X (number of column minus one) χ2(r-1)(s-1) = 12.59 (critical value) Decision: Since the calculated χ2 value is greater than the table (critical value) we shall reject the null hypothesis and accept the alternative. Conclusion:It can be concluded that the movie appealed differently to different age groups (i.e. likeness of the movie is dependent on age). 4.0 CONCLUSION In conclusion, chi-squared analysis has very wide applications which include test of independence of attributes; test of goodness fit; test of equality of population proportion and to test if population has a specified variance among others. This powerful statistical tool is useful in business and economic decision making. 5.0 SUMMARY In this unit, we have examined the concept of chi-square and its scope. We also look at its methodology and applications. It has been emphasized that it is not just an ordinary statistical exercise but a practical tool for solving day-to-day business and economic problems. NOUN 142 STATISTICS FOR MANAGEMENT SCIENCES II 6.0 TUTOR-MARKED ASSIGNMENT (BHM202) 1.A sample of students randomly selected from private high schools and sample of students randomly selected from public high schools were given standardized tests with the following results Test Scores 0-275 276 - 350 351 - 425 426 - 500 Total Private School 6 14 17 9 46 Public School 30 32 17 3 86 Total 36 46 34 12 128 Ho: The distribution of test scores is the same for private and public high school students at α=0.05 2. A manufacturing company has just introduced a new product into the market. In order to assess consumers’ acceptability of the product and make efforts towards improving its quality, a survey was carried out among the three major ethnic groups in Nigeria and the following results were obtained: Ethnic groups Persons Igbo Yoruba Hausa Ijaw Total Acceptthe product 48 76 56 70 250 Do not Accept 57 44 74 30 205 Total 105 120 130 100 455 Using the above information, does the acceptability of the product depend on the ethnic group of the respondents? (Take α=1%) NOUN 143 STATISTICS FOR MANAGEMENT SCIENCES II 7.0 REFERENCES/FURTHER (BHM202) READING OKOJIE, DANIEL E. NOUN TEXT BOOK, Eco 203: Statistics for Economists Spiegel, M. R., Stephens L.J., (2008).Statistics.(4th ed.). New York, McGraw Hill press. Gupta S.C. (2011). Fundamentals of Statistics.(6th Rev.& Enlarged ed.).Mumbai India, Himalayan Publishing House. Swift L., (1997).Mathematics and Statistics for Business, Management and Finance.London UK, Macmillan. NOUN 144 STATISTICS FOR MANAGEMENT SCIENCES II (BHM202) UNIT 4: ANALYSIS OF VARIANCE (ANOVA) CONTENTS 5.0 Introduction 6.0 Objectives 7.0 Main Content 7.1 Assumption for ANOVA test 7.2 The one-way classification 7.3 Bernoulli Distribution 8.0 Conclusion 9.0 Summary 10.0 Assignment 11.0 References/Further Reading 1.0 INTRODUCTION In day-to-day business management and in sciences, instances may arise where we need to compare means. If there are only two means e.g. average recharge card expenditure between male and female students in a faculty of a University, the typical t-test for the difference of two means becomes handy to solve this type of problem. However in real life situation man is always confronted with situation where we need to compare more than two means at the same time. The typical t-test for the difference of two means is not capable of handling this type of problem; otherwise, the obvious method is to compare two means at a time by using the t-test earlier treated. This process is very time consuming, since as few as 4 sample means would require 4C2 = 6, different tests to compare 6 possible pairs of sample means. Therefore, there must be a procedure that can compare all means simultaneously. One such procedure is the analysis of variance (ANOVA). For instance, we may be interested in the mean telephone recharge expenditures of various groups of students in the university such as student in the faculty of Science, Arts, Social Sciences, Medicine, and Engineering. We may be interested in testing if the average monthly expenditure of students in the five faculties are equal or not or whether they are drawn from the same normal population. The answer to this problem is provided by the technique of analysis of variance. It should be noted that the basic purpose of the analysis of variance is to test the homogeneity of several means. NOUN 145 STATISTICS FOR MANAGEMENT SCIENCES II The term Analysis of Variance (BHM202) was introduced by Prof. R.A Fisher in 1920s to deal with problem s in the analysis of agronomical data. Variation is inherent in nature. The total variation in any set of numerical data is due to a number of causes which may be classified as: (i) Assignable causes and (ii) chance causes The variation due to assignable causes can be detected and measured whereas the variation due to chances is beyond the control of human and cannot be traced separately. 2.0 OBJECTIVE The main objective of this unit is to teach students the theories and application of Analysis of Variance (ANOVA). It is hoped that students should after taking this unit be able to apply ANOVA in solving business and economic problem especially as it concern multiple comparison of means 3.0 MAIN CONTENT 3.1 Assumption for ANOVA test ANOVA test is based on the test statistic F (or variance ratio). For the validity of the F-test in ANOVA, the following assumptions are made: (i) The observations are independent. (ii) Parent population from which observation are taken are normal. (iii) Various treatment and environmental effects are additive in nature. ANOVA as a tool has different dimensions and complexities. ANOVA can be (a) One-way classification or (b) two-way classification. However, the one-way ANOVA we will deal with in this course material. Note (i) ANOVA technique enables us to compare several population means simultaneously and thus results in lot of saving in terms of time and money as compared to several experiments required for comparing two populations means at a time. NOUN 146 STATISTICS FOR MANAGEMENT SCIENCES II (ii) (BHM202) The origin of the ANOVA technique lies in agricultural experiments and as such its language is loaded with such terms as treatments, blocks, plots etc. However, ANOVA technique is so versatile that it finds applications in almost all types of design of experiments in various diverse fields such as industry, education, psychology, business, economics etc. (iii) It should be clearly understood that ANOVA technique is not designed to test equality of several population variances. Rather, its objective is to test the equality of several population means or the homogeneity of several independent sample means. (iv) In addition to testing the homogeneity of several sample means, the ANOVA technique is now frequently applied in testing the linearity of the fitted regression line or the significance of the correlation ratio. NOUN 147 STATISTICS FOR MANAGEMENT SCIENCES II 3.2 The one-way classification (BHM202) Assuming n sample observations of random variable X are divided into k classes on the basis of some criterion or factor of classification. Let the ithclass consist of niobservations and let: Xij = jth member of the ith class; {j=1,2,......ni; i= 1,2,........k} n = n1 +n2 +...........................+ nk = The n sample observations can be expressed as in the table below: Class Sample observation Total Mean 1 X11, X12,................ X1n T1 Mean X1 2 X21, X22,................ X2n T2 Mean X2 : : : : : : : : : : : : : : I Xi1, Xi2,...................Xin T i= Mean Xi : : : : : : : : : : : : : : K Xk1, Xk2,................Xkn Tk Mean Xk Such scheme of classification according to a single criterion is called one-way classification and its analysis of variance is known as one-way analysis of variance. The total variation in the observations Xijcan be split into the following two components: (i) The variation between the classes or the variation due to different bases of classification (commonly known as treatments in pure sciences, medicine and agriculture). This type of variation is due to assignable causes which can be detected and controlled by human endeavour. (ii) The variation within the classes, i.e. the inherent variation of the random variable within the observations of a class. This type of variation is due to chance causes which are beyond the control of man. NOUN 148 STATISTICS FOR MANAGEMENT SCIENCES II The main objective of the analysis (BHM202) of variance technique is to examine if there is significant difference between the class means in view of the inherent variability within the separate classes. Steps for testing hypothesis for more than two means (ANOVA): Here, we adopt the rejection region method and the steps are as follows: Step1: Set up the hypothesis: Null Hypothesis: Ho: μ1 = μ2 = μ3 =..............= μki.e, all means are equal Alternative hypothesis: H1 : At least two means are different. Step 2: Compute the means and standard deviations for each of the by the formular: = Also, compute the mean of all the data observations in the k-classes by the formula: = Step 3: Obtain the Between ClassesSum of Squares (BSS) by the formula: BSS = ( Step 4: Obtain the Between Classes Mean Sum of Squares (MBSS) Step 5: Obtain the Within Classes Sum of Squares (WSS) by the formula: Step 6: Obtain the Within Classes Mean Sum of Squares (MWSS) Step 7: Obtain the test statistic F or Variance Ratio (V.R) NOUN 149 STATISTICS FOR MANAGEMENT SCIENCES II Which follows F-distribution (BHM202) with (v1 = k-1, v2 = n-k)d.f (This implies that the degrees of freedom are two in number. The first one is the number of classes (treatment) less one, while the second d.f is number of observations less number of classes) Step 8: Find the critical value of the test statistic F for the degree of freedom and at desired level of significance in any standard statistical table. If computed value of test-statistic F is greater than the critical (tabulated) value, reject (Ho, otherwise Ho may be regarded as true. Step 9: Write the conclusion in simple language. Example 1: To test the hypothesis that the average number of days a patient is kept in the three local hospitals A, B and C is the same, a random check on the number of days that seven patients stayed in each hospital reveals the following: Hospital A: 8 5 9 2 7 8 2 Hospital A: 4 3 8 7 7 1 5 Hospital A: 1 4 9 8 7 2 3 NOUN 150 STATISTICS FOR MANAGEMENT SCIENCES II Test the hypothesis at 5 percent (BHM202) level of significance. Solution: Let X1j, X2j, X3j denote the number of days the jth patient stays in the hospitals A, B and C respectively Calculations for various Sum of Squares X1j X2j X3j 8 4 1 4.5796 1 14.8996 5 3 4 0.7396 4 0.7396 9 8 9 9.8596 9 17.1396 2 7 8 14.8996 4 9.8596 7 7 7 1.2996 4 4.5796 8 1 2 4.5796 16 8.1796 2 5 3 14.8996 0 3.4596 Total=ƩX1j = ƩX2j = T2 ƩX3j = T3 =58.8572 T1 = 41 = 35 = 41 =50.8572 =38 = ; = = = Within Sample Sum of Square: To find the variation within the sample, we compute the sum of the square of the deviations of the observations in each sample from the mean values of the respective samples (see the table above) NOUN 151 STATISTICS FOR MANAGEMENT SCIENCES II Sum of Squares within Samples (BHM202) = = 50.8572 + 38 + 58.8572 = 147.7144 ~ 147.71 Between Sample sum of Squares: To obtain the variation between samples, we compute the sum of the squares of the deviations of the various sample means from the overall (grand) mean. = 0.3844; = 0.0576; = 0.1444; Sum of square Between Samples (hospitals): = ( = 7(0.3844) + 7(0.0576) + 7(0.1444) = 2.6908 + 0.4032 + 1.0108 = 4.1048 = 4.10 Total Sum of Squares: = The total variation in the sample data is obtained on calculating the sum of the squares of the deviations of each sample observation from the grand mean, for all the samples as in the table below: X1j X2j X3j = = = NOUN 152 STATISTICS FOR MANAGEMENT SCIENCES II (BHM202) 8 7.6176 4 1.5376 1 17.9776 5 0.0576 3 5.0176 4 1.5376 9 14.1376 8 7.6176 9 14.1376 2 10.4976 7 3.0976 8 7.6176 7 3.0976 7 3.0976 7 3.0976 8 7.6176 1 17.9776 2 10.4976 2 10.4976 5 0.0576 3 5.0176 Total = 53.5232 35 38.4032 34 59.8832 41 Total sum of squares (TSS) = = 53.5232 + 38.4032 + 59.8832 = 151.81 Note: Sum of Squares Within Samples + S.S Between Samples = 147.71 + 4.10 =151.81 = Total Sum of Squares Ordinarily, there is no need to find the sum of squares within the samples (i.e, the error sum of squares), the calculations of which are quite tedious and time consuming. In practice, we find the total sum of squares and between samples sum of squares which are relatively simple to calculate. Finally within samples sum of squares is obtained by subtracting Between Samples Sum of Squares from the Total Sum of Squares: W.S.S.S = T.S.S – B.S.S.S Therefore, Within Sample (Error) Sum of Square = 151.8096 – 4.1048 = 147.7044 Degrees of freedom for: Between classes (hospitals) Sum of Squares = k-1 = 3-1=2 Total Sum of Squares = n-1 = 21-1 = 20 NOUN 153 STATISTICS FOR MANAGEMENT SCIENCES II Within Classes (or Error) (BHM202) Sum of Squares = n-k = 21 – 3= 18 ANOVA TABLE Sources of d.f(2) Sum of Mean Sum of Variance Ratio(F) variation(1) Squares(S.S) (3) Squares(4) = Between Samples 3-1 =2 4.10 (Hospitals) Within Sample 20-2=18 147.71 (Error) Total 21-1=20 151.81 Critical Value: The tabulated (critical) value of F for d.f (v1=2, v2=18) d.f at 5% level of significance is 3.55 Since the calculated F = 0.25 is less than the critical value 3.55, it is not significant. Hence we fail to accept Ho. However, in cases like this when MSS between classes is less than the MSS within classes, we need not calculate F and we may conclude that the means , and do not differ significantly. Hence, Ho may be regarded as true. Conclusion: Ho : μ1 = μ2 = μ3, may be regarded as true and we may conclude that there is no significant difference in the average stay at each of the three hospitals. Critical Difference: If the classes (called treatments in pure sciences) show significant effect then we would be interested to find out which pair(s) of treatment differ significantly. Instead of calculating Student’s t for different pairs of classes (treatments) means, we calculate the Least Significant Difference (LSD) at the given level of significance. This LSD is also known as Critical Difference (CD). The LSD between any two classes (treatments) means, say and at level of significance ‘α’ is given by: NOUN 154 STATISTICS FOR MANAGEMENT SCIENCES II LSD ( - ) = [The critical value of t (BHM202) at level of significance α and error d.f] X [S.E ( - )] Note: S.E means Standard Error. Therefore, the S.E ( - ) above mean the standard error of the difference between the two means being considered. = t n-k (α/2) X MSSE means sum of squares due to Error If the difference between any two classes (treatments) means is greater than the LSD or CD, it is said to be significant. Another Method for the computation of various sums of squares Step 1: Compute: G = Step 2: Compute Correction Factor (CF) =

Statistics for Management Sciences II STT206 PDF

Document Details

Tags

Related

Summary

Full Transcript

Upgrade to continue