GECO-03.pdf

GECO- 03/OSOU GECO- 03/OSOU GECO-03: BUSINESS STATISTICS Brief Contents Block Block Unit Unit No. No....

GECO- 03/OSOU GECO- 03/OSOU GECO-03: BUSINESS STATISTICS Brief Contents Block Block Unit Unit No. No. 1 Introduction to Statistics Statistical Data and 2 Classification of Data 1 Descriptive 3 Measures of Central Tendency: Mathematical Statistics Averages 4 Measures of Central Tendency: Positional Averages Block Block Unit Unit No. No. 5 Measures of Dispersion (Range, Quartile Measures of Deviation, Mean Deviation) 2 6 Standard Deviation and its Properties Variation 7 Skewness 8 Kurtosis Block Block Unit Unit No. No. Simple 9 Correlation Analysis: Meaning and Types Correlation 10 Methods of Measurement of Correlation 3 and 11 Regression Analysis: Concept and Principles Regression 12 Regression Coefficients Analysis Block Block Unit Unit No. No. Index Numbers and 13 Components of Time Series Time Series 14 Methods of Measurement of Trend 4 Analysis 15 Construction of Index Number 16 Test for Index Number GECO- 03/OSOU ODISHA STATE OPEN UNIVERSITY, SAMBALPUR Programme Name: Bachelor of Commerce (B. Com) Programme Code: BCO Course Name: Business Statistics Course code: GECO-03 Semester: III Credit:6 BlockNo.1 to 4 Unit No.1 to 16 Page No. 4 to 216 This study material has been developed by Odisha State Open University as per the State Model Syllabus for Under Graduate Course in Commerce (Bachelor of Commerce Examinations) under Choice Based Credit System (CBCS). COURSE WRITERS Mr. Prahallad Behera Mr. Loknath Majhi Assistant Professor, Commerce Assistant Professor, Commerce Odisha State Open University Odisha State Open University In-House Source: OSOU (Compiled from BBA -02) Link 1. https://drive.google.com/file/d/1aAWMUYfTk4YQjiT8Yz42ZiMADpqIYblx/view MATERIAL PRODUCTION Registrar Odisha State Open University, Sambalpur (cc) OSOU, 2022. Business Statistics is made available under a Creative Common Attribution-ShareAlike4.0 http://creativecommons.org/licences/by-sa/4.0 Printers by: GECO- 03/OSOU GECO-03: BUSINESS STATISTICS Contents BLOCKS/UNITS PageNo. (1-55) BLOCK 1: STATISTICAL DATA AND DESCRIPTIVE STATISTICS Unit1: Introduction to Statistics: Introduction, Meaning of Statistics, Types of Statistic, Functions of Statistics, Importance of Statistics, Limitations of Statistics, Distrust of Statistics Unit 2: Classification of Data Introduction, Nature of data, Categories of data, Classification, Tabulation Unit3:Measures of Central Tendency: Mathematical Averages Introduction, Characteristics of an ideal measure, Arithmetic Mean or Simple Mean, Harmonic Mean, Geometric Mean Unit4: Measures of Central Tendency: Positional Averages: Introduction, Median, Mode, BLOCK-2: MEASURES OF VARIATION (56-93) Unit-5:Measures of Dispersion (Range, Quartile Deviation, Mean Deviation) Introduction, Significance of Measuring Dispersion, Properties of Good Measure of Dispersion, Absolute and Relative Measures of Dispersion, Methods of Measuring Dispersion, Range, Quartile Deviation, Mean Deviation Unit-6:Standard Deviation and its Properties –Introduction, Computation of Standard Deviation, Applying standard deviation method, Variance and Coefficient of Variation Unit-7: Skewness - Introduction, Co efficient of Skewness, Karl Pearson’s Coefficient of Skewness, β and γ-Coefficients Unit-8: Kurtosis- Introduction, Types of kurtosis, Calculation of kurtosis BLOCK-3: Simple Correlation and Regression (94-147) Unit-9: Correlation Analysis: Meaning and Types-Introduction, Meaning of Correlation, Significance of Correlation, Correlation Coefficient and Causation, Various Kinds of Correlation Unit-10:Methods of Measurement of correlation- Introduction, Scatter Diagram Method, Importance of Scatter Diagram, Interpretation of Scatter Diagram, Types of Scatter Diagram, Merits of Scatter Diagram, Demerits of Scatter Diagram, Scatter Diagram Procedure, Pearson’s Correlation Method, Properties of R, Merits of Pearson’s Correlation Method, Demerits of Pearson’s Correlation Method, Interpretation of Pearson Correlation Coefficient, Numerical Problems, Significance of Pearson’s Correlation(r), Standard Error of Correlation Coefficient(r), Probable Error of Correlation Coefficient(r), Coefficient of Determination (r²), Rank Correlation Method, Problems in Rank Correlation Coefficient, Interpretation of Rank Correlation Coefficient (R), Merits of Rank Correlation Coefficient, Demerits of Rank Correlation Coefficient, Numerical Problems, Limitations of Correlation GECO- 03/OSOU Unit-11: Regression Analysis: concepts and principles: Introduction, Meaning of Regression, Significance of Regression, Regression vs. Correlation Unit-12:Regression Coefficients- Introduction, Regression Lines, Properties of Regression Coefficient, Numerical Problems, Standard Error of Estimate BLOCK-4: Index number and Time series Analysis (148-214) Unit-13: Components of Time Series: Introduction, Significance of Time Series Analysis, Components of Time Series, Mathematical Models for Time Series, Adjustments in Time Series Data Unit-14: Methods of Measurement of Trend–Introduction, Methods used for Measuring Trends, Graphic Method, Semi-Average Method, Moving Average Method, Least-Square Method, Fitting of Straight-line trend, Fitting of parabolic trend, Fitting of logarithmic trend Unit-15: Construction of Index Number- Introduction, Significance of Index Number, Classification of Index Number, Calculation of Index Number, Limitations of Index Number Unit-16: Test for Index Numbers-Methods of Constructing Index Numbers Construction of General Index Number from Group Indices, Consumer Price Index or Cost of Living Index. Test Adequacy of Index Numbers, Base Shifting, Splicing Index Numbers, Deflating Index Numbers GECO-03/OSOU BLOCK 1: STATISTICAL DATA AND DESCRIPTIVE STATISTICS Unit 1: Introduction to Statistics Unit 2: Classification of Data Unit 3: Measures of Central Tendency: Mathematical Averages Unit 4: Measures of Central Tendency: Positional Averages 1 GECO-03/OSOU UNIT-1: INTRODUCTION TO STATISTICS Structure: 1.1 Introduction 1.2 Meaning of Statistics 1.3 Types of Statistic 1.4 Functions of Statistics 1.5 Importance of Statistics 1.6 Limitations of Statistics 1.7 Distrust of Statistics 1.8 Let Us Sum Up 1.9 Model Questions Learning Objectives After studying this unit, you will be able to:  explain the meaning of Statistics  know the types of Statistics  explain the importance of Statistics  explain the limitations of Statistics  know the reasons of distrust in statistics 1.1 INTRODUCTION Statistics is an indispensable tool for an economist that helps him to understand an economic problem. Using its various methods, effort is made to find the causes behind it with the help of qualitative and quantitative facts of an economic problem. Once the causes of the problem are identified, it is easier to formulate certain policies to tackle it. But there is more to Statistics. When economic facts are expressed in statistical terms, they become exact. Exact facts are more convincing than vague statements. Statistics also helps in condensing mass data into a few numerical measures (such as mean, variance etc., about which you will learn later). These numerical measures help to summaries data. Quite often, Statistics is used in finding relationships between different economic factors. Sometimes, formulation of plans and policies requires the knowledge of future trends. One might use statistical tools to predict consumption that could be based on the data of consumption of past years or of recent years obtained by surveys. Thus, statistical methods help formulate appropriate economic policies that solve economic problems. 2 GECO-03/OSOU 1.2 MEANING OF STATISTICS Statistics is fundamentally a branch of applied mathematics that developed from the application of mathematical tools including calculus and linear algebra to probability theory. Statistics are used in virtually all scientific disciplines such as the physical and social sciences, as well as in business, the humanities, government, and manufacturing. In practice, statistics is the idea we can learn about the properties of large sets of objects or events (a population) by studying the characteristics of a smaller number of similar objects or events (a sample). Because in many cases gathering comprehensive data about an entire population is too costly, difficult, or flat out impossible, statistics start with a sample that can conveniently or affordably be observed. Two types of statistical methods are used in analyzing data: descriptive statistics and inferential statistics. Statisticians measure and gather data about the individuals or elements of a sample, then analyze this data to generate descriptive statistics. They can then use these observed characteristics of the sample data, which are properly called "statistics," to make inferences or educated guesses about the unmeasured (or unmeasured) characteristics of the broader population, known as the parameters. Descriptive Statistics Descriptive statistics mostly focus on the central tendency, variability, and distribution of sample data. Central tendency means the estimate of the characteristics, a typical element of a sample or population, and includes descriptive statistics such as mean, median, and mode. Variability refers to a set of statistics that show how much difference there is among the elements of a sample or population along the characteristics measured, and includes metrics such as range, variance, and standard deviation. Descriptive statistics can also describe differences between observed characteristics of the elements of a data set. Descriptive statistics help us understand the collective properties of the elements of a data sample and form the basis for testing hypotheses and making predictions using inferential statistics. Inferential Statistics Inferential statistics are tools that statisticians use to draw conclusions about the characteristics of a population, drawn from the characteristics of a sample, and to decide how certain they can be of the reliability of those conclusions. Based on the sample size and distribution statisticians can calculate the probability that statistics, which measure the central tendency, variability, distribution, and relationships between characteristics within a data sample, provide an accurate picture of the corresponding parameters of the whole population from which the sample is drawn. Inferential statistics are used to make generalizations about large groups, such as estimating average demand for a product by surveying a sample of consumers' buying habits or to attempt to predict future events, such as projecting the future return of a security or asset class based on returns in a sample period. Regression analysis is a widely used technique of statistical inference used to determine the 3 GECO-03/OSOU strength and nature of the relationship (i.e., the correlation) between a dependent variable and one or more explanatory (independent) variables. The output of a regression model is often analyzed for statistical significance, which refers to the claim that a result from findings generated by testing or experimentation is not likely to have occurred randomly or by chance but is likely to be attributable to a specific cause elucidated by the data. Definition of Statistics A.L. Bowley gave several definitions of Statistics: “Statistics may be called the science of counting”. This definition emphasizes enumeration aspect only. In another definition he describes it as “Statistics may rightly be called the science of average”. At another place Statistics is defined as, “Statistics is the science of measurement of social organism regarded as a whole in all its manifestations”. According to Selligman “Statistics is the science which deals with the methods of collecting, classifying, presenting, comparing and interpreting numerical data collected to throw some light on any sphere of enquiry”. Croxton and Cowden defined “statistics as the collection, presentation, analysis and interpretation of numerical data”. Among all the definitions, the one given by Croxton and Cowden is considered to be most appropriate as it covers all aspects and field of statistics. These aspects are given below: a) Collection of Data: Once the nature of study is decided, it becomes essential to collect information in form of data about the issues of the study. Therefore, the collection of data is the first basic step. Data may be collected either from primary source or secondary or from both the sources depending upon the objective/s of the investigation. b) Classification and Presentation: Once data are collected, researcher has to arrange them in a format from which they would be able to draw some conclusions. The arrangement of data in groups according to some similarities is known as classification. c) Tabulation: Tabulation is the process of presenting the classified data in the form of table. A tabular presentation of data becomes more intelligible and fit for further statistical analysis. Classified and Tabulated data can be presented in diagrams and graphs to facilitate the understanding of various trends as well as the process of comparison of various situations. d) Analysis of Data: It is the most important step in any statistical enquiry. Statistical analysis is carried out to process the observed data and transform it in such a manner as to 4 GECO-03/OSOU make it suitable for decision making. e) Interpretation of Data: After analysing the data, researcher gets information partly or wholly about the population. Explanation of such information is more useful in real life. The quality of interpretation depends more and more on the experience and insight of the researcher. This definition makes it quite clear that as numerical statement of facts, ‘statistic’ should possess the following characteristics. Statistics are aggregate of facts. A single age of 20 or 30 years is not statistics, a series of ages are. Similarly, a single figure relating to production, sales, birth, death etc., would not be statistics although aggregates of such figures would be statistics because of their comparability and relationship. Statistics are affected to a marked extent by a multiplicity of causes. A number of causes affect statistics in a particular field of enquiry, e.g., in production statistics are affected by climate, soil, fertility, availability of raw materials and methods of quick transport. Statistics are numerically expressed, enumerated or estimated. The subject of statistics is concerned essentially with facts expressed in numerical form—with their quantitative details but not qualitative descriptions. Therefore, facts indicated by terms such as ‘good’, ‘poor’ are not statistics unless a numerical equivalent, is assigned to each expression. Also, this may either be enumerated or estimated, where actual enumeration is either not possible or is very difficult. Statistics are numerated or estimated according to reasonable standard of accuracy. Personal bias and prejudices of the enumeration should not enter into the counting or estimation of figures, otherwise conclusions from the figures would not be accurate. The figures should be counted or estimated according to reasonable standards of accuracy. Absolute accuracy is neither necessary nor sometimes possible in social sciences. But whatever standard of accuracy is once adopted, should be used throughout the process of collection or estimation.Statistics should be collected in a systematic manner for a predetermined purpose. The statistical methods to be applied on the purpose of enquiry since figures are always collected with some purpose. If there is no predetermined purpose, all the efforts in collecting the figures may prove to be wasteful. The purpose of a series of ages of husbands and wives may be to find whether young husbands have young wives and the old husbands have old wives. Statistics should be capable of being placed in relation to each other. The collected figure should be comparable and well-connected in the same department of inquiry. Ages of husbands are to be compared only with the corresponding ages of wives and not with, say, heights of trees. 1.3 TYPES OF STATISTICS Though various bases have been adopted to classify statistics, following are the two major ways of classifying statistics: (i) on the basis of function and (ii) on the basis of distribution. 5 GECO-03/OSOU On the Basis of Functions: As statistics has some particular procedures to deal with its subject matter or data, three types of statistics have been described. a. Descriptive statistics: The branch which deals with descriptions of obtained data is known as descriptive statistics. On the basis of these descriptions a particular group of population is defined for corresponding characteristics. The descriptive statistics include classification, tabulation measures of central tendency and variability. These measures enable the researchers to know about the tendency of data or the scores, which further enhance the ease in description of the phenomena. b. Correlational statistics: The obtained data are disclosed for their inter correlations in this type of statistics. It includes various types of techniques to compute the correlations among data. Correlational statistics also provide description about sample or population for their further analyses to explore the significance of their differences. c. Inferential statistics: Inferential statistics deals with the drawing of conclusions about large group of individuals (population) on the basis of observations of few participants from them or about the events which are yet to occur on the basis of past events. It provides tools to compute the probabilities of future behaviour of the subjects. On the Basis of Distribution of Data: Parametric and nonparametric statistics are the two classifications on the basis of distribution of data. Both are also concerned to population or sample. By population we mean the total number of items in a sphere. In general, it has infinite number therein but in statistics there is a finite number of a population, like the number of students in a college. According to Kerlinger (1968) “the term population and universe mean all the members of any well-defined class of people, events or objects.” In a broad sense, statistical population may have three kinds of properties – (a) containing finite number of items and knowable, (b) having finite number of articles but unknowable, and (c) keeping infinite number of articles. a) Parametric statistics Parametric statistics is defined to have an assumption of normal distribution for its population under study. “Parametric statistics refers to those statistical techniques that have been developed on the assumption that the data are of a certain type. In particular the measure should be an interval scale and the scores should be drawn from a normal distribution”. There are certain basic assumptions of parametric statistics. The very first characteristic of parametric statistics is that it moves after confirming its population’s property of normal distribution. The normal distribution of a population shows its symmetrical spread over 6 GECO-03/OSOU the continuum of –3 SD to +3 SD and keeping unimodal shape as its mean, median, and mode coincide. If the samples are from various populations, then it is assumed to have same variance ratio among them. The samples are independent in their selection. The chances of occurrence of any event or item out of the total population are equal and any item can be selected in the sample. This reflects the randomized nature of sample which also happens to be a good tool to avoid any experimenter bias. In view of the above assumptions, parametric statistics seem to be more reliable and authentic as compared to the nonparametric statistics. These statistics are more powerful to establish the statistical significance of effects and differences among variables. It is more appropriate and reliable to use parametric statistics in case of large samples as it consists of more accuracy of results. The data to be analysed under parametric statistics are usually from interval scale. However, along with many advantages, some disadvantages have also been noted for the parametric statistics. It is bound to follow the rigid assumption of normal distribution and further it narrows the scope of its usage. In case of small sample, normal distribution cannot be attained and thus parametric statistics cannot be used. Further, computation in parametric statistics is lengthy and complex because of large samples and numerical calculations. T-test, F-test, r-test, are some of the major parametric statistics used for data analysis. b) Nonparametric statistics Nonparametric statistics are those statistics which are not based on the assumption of normal distribution of population. Therefore, these are also known as distribution free statistics. They are not bound to be used with interval scale data or normally distributed data. The data with non-continuity are to be tackled with these statistics. In the samples where it is difficult to maintain the assumption of normal distribution, nonparametric statistics are used for analysis. The samples with small number of items are treated with nonparametric statistics because of the absence of normal distribution. It can be used even for nominal data along with the ordinal data. Some of the usual nonparametric statistics include chi- square, Spearman’s rank difference method of correlation, Kendall’s rank difference method, Mann-Whitney U test, etc. 1.4 FUNCTIONS OF STATISTICS Statistics can be well-defined as a branch of research which is concerned with the development and application of techniques for collecting, organising, presenting, analysing and interpreting data in such a manner that the reliability of conclusions may be evaluated in terms of probability statements. Statistical methods and processes are useful for business development and, hence, applied to 7 GECO-03/OSOU enormous numerical facts with an objective that “behind every figure, there’s a story”. Some key functions of statistics are as follows: a) Condensation b) Comparison c) Forecast d) Testing of Hypotheses e) Preciseness f) Expectation a) Condensation Statistics can be used to compress a large amount of data into small meaningful information; for example, aggregated sales forecast, BSE indices, GDP growth rate, etc. It is almost impossible to get a complete idea of the profitability of a company by looking at the records of its income and expenditure. Financial ratios such as return on investment, earnings per share, profit margins, etc., however, can be easily remembered and thus can be used in quick decision making. b) Comparison Statistics facilitate comparing different quantities. For example, the price-to-earnings ratio of ITC as of January 22, 2021 is 19.54 as compared to HUL. HUL is overvalued, quoting a price-to-earnings ratio of 71 times. Forecast Statistics helps forecast by looking at trends of a variable. It is essential for planning and decision-making. Predictions or forecasts based on intuition can be disastrous for any business. c) Testing of hypotheses Hypotheses are statements about population parameters based on knowledge from literature that a researcher would like to test for validity in the light of new information. Drawing inferences about the population using sample estimates involves an element of risk. d) Preciseness Statistics visualises and presents facts precisely in a quantitative form. Facts and information conveyed in quantitative terms are more convincing than qualitative data. For example, ‘increase in profit margin is less in the year 2020 than in the year 2019’ does not convey a precise and complete piece of information. On the other hand, statistics summarise the information more precisely. For example, ‘profit margin is 5% of the turnover in the year 2020 against 7% in the year 2019’. e) Expectation Statistics can act as the basic building block for framing clear plans and policies. For example, how much raw material to be imported in a year, how much capacity to be expanded, or manpower to be recruited, etc., depends on the expected value of outcome of our decisions taken under different situations 1.5 IMPORTANCE OF STATISTICS Data on a wide range of activities such as population, births and deaths were collected by the state for administrative purposes. However, in recent years, the scope has widened 8 GECO-03/OSOU considerably to bring to its fold social and economic phenomena. The fact that in the modern world statistical methods are universally applicable. It is in itself enough to show how important the science of statistics is. Statistical methods are common ways of thinking and hence are used by all types of persons. Examples can be multiplied to show that human behaviour and statistical methods have much in common. In fact, statistical methods are so closely connected with human actions and behaviour that practically all human activity can be explained by statistical methods. This shows how important and universal statistics is. a) Statistics in Planning: Statistics is indispensable in planning—may it be in business, economics or government level. The modern age is termed as the ‘age of planning’ and almost all organisations in the government or business or management are resorting to planning for efficient working and for formulating policy decision. To achieve this end, the statistical data relating to production, consumption, birth, death, investment, income are of paramount importance. Today efficient planning is a must for almost all countries, particularly the developing economies for their economic development. b) Statistics in Mathematics: Statistics is intimately related to and essentially dependent upon mathematics. The modern theory of Statistics has its foundations on the theory of probability which in turn is a particular branch of more advanced mathematical theory of Measures and Integration. c) Statistics in Economics: Statistics and Economics are so intermixed with each other that it looks foolishness to separate them. Development of modern statistical methods has led to an extensive use of statistics in Economics. All the important branches of Economics—consumption, production, exchange, distribution, public finance—use statistics for the purpose of comparison, presentation, interpretation, etc. Problem of spending of income on and by different sections of the people, production of national wealth, adjustment of demand and supply, effect of economic policies on the economy etc. simply indicate the importance of statistics in the field of economics and in its different branches. d) Statistics in Social Sciences: Every social phenomenon is affected to a marked extent by a multiplicity of factors which bring out the variation in observations from time to time, place to place and object to object. Statistical tools of Regression and Correlation Analysis can be used to study and isolate the effect of each of these factors on the given observation. Sampling Techniques and Estimation Theory are very powerful and indispensable tools for conducting any social survey, pertaining to any strata of society and then analysing the 9 GECO-03/OSOU results and drawing valid inferences. The most important application of statistics in sociology is in the field of Demography for studying mortality (death rates), fertility (birth rates), marriages, and population growth and so on. e) Statistics in Trade: Business is full of uncertainties and risks. We have to forecast at every step. The future trend of the market can only be expected if we make use of statistics. Failure in anticipation will mean failure of business. Changes in demand, supply, habits, fashion etc. can be anticipated with the help of statistics. Statistics is of utmost significance in determining prices of the various products, determining the phases of boom and depression etc. Use of statistics helps in smooth running of the business, in reducing the uncertainties and thus contributes towards the success of business. f) Statistics in Research Work: The job of a research worker is to present the result of his research before the community. The effect of a variable on a particular problem, under differing conditions, can be known by the research worker only if he makes use of statistical methods. Statistics are everywhere basic to research activities. To keep alive his research interests and research activities, the researcher is required to lean upon his knowledge and skills in statistical methods. 1.6 LIMITATIONS OF STATISTICS Statistics has the following limitations: Qualitative Aspect Ignored: The statistical methods don’t study the nature of phenomenon which cannot be expressed in quantitative terms. Such phenomena cannot be a part of the study of statistics. These include health, riches, intelligence etc. It needs conversion of qualitative data into quantitative data. It does not deal with individual items: It is clear from the definition given by Prof. Horace Sacrist, “By statistics we mean aggregates of facts…. and placed in relation to each other”, that statistics deals with only aggregates of facts or items and it does not recognize any individual item. Thus, individual terms as death of 6 persons in an accident, 85% results of a class of a school in a particular year, will not amount to statistics as they are not placed in a group of similar items. It does not deal with the individual items, however, important they may be. It does not depict entire story of phenomenon: When even phenomena happen, that is due to many causes, but all these causes cannot be expressed in terms of data. So, we cannot reach at the correct conclusions. Development of a group depends upon many social factors like, parents’ economic condition, education, culture, region, administration by government etc. But all these factors cannot be placed in data. It is liable to be miscued: As W.I. King points out, “One of the short-comings of statistics is that do not bear on their 10 GECO-03/OSOU face the label of their quality.” So, we can say that we can check the data and procedures of its approaching to conclusions. But these data may have been collected by inexperienced persons or they may have been dishonest or biased. As it is a delicate science and can be ea sily misused by an unscrupulous person. So, data must be used with a caution. Otherwise, results may prove to be disastrous. Laws are not exact: Laws are based on probability. So, these results will not always be as good as of scientific laws. On the basis of probability or interpolation, we can only estimate the production of paddy in 2008 but cannot make a claim that it would be exactly 100 %. Here only approximations are made. Results are true only on average: As discussed above, here the results are interpolated for which time series or regression or probability can be used. These are not absolutely true. If average of two sections of students in statistics is same, it does not mean that all the 50 students is section A has got same marks as in B. There may be much variation between the two. So, we get average results. 1.7 DISTRUST OF STATISTICS Distrust of statistics means lack of confidence in the statistical methods and statements. As statistics suffers from various limitations that is the reason statistics become a thing of distrust. According to Yule and Kendall “Statistical methods are most dangerous tools in the hands of an inexpert.” The following are the points that give rise to the distrust of statistics: i. Figures may be incomplete, inaccurate and deliberately manipulated: The facts and figures to be studied under the statistics are collected by the human beings. This is their bent of mind that which kind of data they collect. If the data is faulty, then statistical findings may not be reliable. So, it is not the statistics which proves itself unreliable, it is the mishandling of the statistical affairs by the investigator. ii. Statistics can prove whatever it wants: Statistics can prove whatever it wants is a baseless notion. Statistics do not explain the relationships between the items rather this depends upon the investigator which kind of comparison he is going to do. If the investigator studies the relationship between height of giraffe and height of human being, then the result will be useless or baseless. If investigator studies the relationship between demand and price of a commodity, the result will be fruitful. So, it is not the science of statistics which is faulty, it is the interpretation and study topic of the investigator which give wrong conclusions. iii. Statistics are the tissues of false hood: Some of the statements regarding the statistics made by various statistician are: 11 GECO-03/OSOU  There are three degrees of lies- lies, damn lies and statistics.  There are black lies, white lies, multi-chromatic lies, statistics is a rainbow of lies.  History asserts without evidence while statistics asserts contrary to the evidence.  Statistics are like a bathing costume which reveals all that is interesting and conceals all that is vital. These statements point towards the fact that statistics are tissues of false hood. But this notion is wrong, it is only the inexperience of the researchers who gives false conclusions which comes in front as distrust of statistics. 1.8 LET’S SUM UP In present era people must have some knowledge of statistics. In singular sense, it means statistical methods which include collection, classification, analysis and interpretation of data. In plural sense, it means quantitative information called data. Descriptive, correlational and inferential statistics are three different types of statistics on the basis of their functions. On the other hand, parametric and non-parametric are other types of statistics on the basis of the nature of distribution. Statistics has application in almost in all branches of knowledge as well as all sphere of life. In spite of its wide applicability, it has certain limitations too. Sometimes inexperienced people misuse statistics to fulfill their own motives. 1.9 MODEL QUESTIONS 1. What do you mean by statistics? Define its various types with the help of examples of daily life. 2. “Statistical methods are most dangerous tools in the hand of inexpert.” Discuss briefly. 3. Write a note on the limitations of statistics. 4. Define following concepts: a) Descriptive statistics b) Inferential statistics c) Parametric statistics d) Non parametric statistics 5. Discuss briefly the types of Statistics. 12 GECO-03/OSOU UNIT-2: CLASSIFICATION OF DATA Structure: 2.1 Introduction 2.2 Nature of data 2.3 Categories of data 2.4 Classification 2.5 Tabulation 2.6 Let Us Sum Up 2.7 Model Questions 2.1 INTRODUCTION Everybody collects, interprets and uses information, much of it in a numerical or statistical forms in day-to-day life. It is a common practice that people receive large quantities of information everyday through conversations, televisions, computers, the radios, newspapers, posters, notices and instructions. It is just because there is so much information available that people need to be able to absorb, select and reject it. In everyday life, in business and industry, certain statistical information is necessary and it is independent to know where to find it how to collect it. As consequences, everybody has to compare prices and quality before making any decision about what goods to buy. As employees of any firm, people want to compare their salaries and working conditions, promotion opportunities and so on. In time the firms on their part want to control costs and expand their profits. One of the main functions of statistics is to provide information which will help on making decisions. Statistics provides the type of information by providing a description of the present, a profile of the past and an estimate of the future. The following are some of the objectives of collecting statistical information. 1. To describe the methods of collecting primary statistical information. 2. To consider the status involved in carrying out a survey. 3. To analyse the process involved in observation and interpreting. 4. To define and describe sampling. 5. To analyse the basis of sampling. 6. To describe a variety of sampling methods. Statistical investigation is a comprehensive and requires systematic collection of data about some group of people or objects, describing and organizing the data, analyzing the data with the help of different statistical method, summarizing the analysis and using these results for making judgements, decisions and predictions. The validity and accuracy of final judgement 13 GECO-03/OSOU is most crucial and depends heavily on how well the data was collected in the first place. The quality of data will greatly affect the conditions and hence at most importance must be given to this process and every possible precaution should be taken to ensure accuracy while collecting the data. 2.2 NATURE OF DATA It may be noted that different types of data can be collected for different purposes. The data can be collected in connection with time or geographical location or in connection with time and location. The following are the three types of data: 1. Time series data. 2. Spatial data 3. Spacio-temporal data. 1. Time series data: It is a collection of a set of numerical values, collected over a period of time. The data might have been collected either at regular intervals of time or irregular intervals of time. Example 1: The following is the data for the three types of expenditures in rupees for a family for the four years 2001,2002,2003,2004. Year Food Education Others Total 2001 3000 2000 3000 8000 2002 3500 3000 4000 10500 2003 4000 3500 5000 12500 2004 5000 5000 6000 16000 2. Spatial Data: If the data collected is connected with that of a place, then it is termed as spatial data. For example, the data may be Number of runs scored by a batsman in different test matches in a test series at different places, District wise rainfall in Tamil Nadu, Prices of silver in four metropolitan cities. Example 2: The population of the southern states of India in 1991. State Population Tamil Nadu 5,56,38,318 Andhra Pradesh 6,63,04,854 Karnataka 4,48,17,398 Kerala 2,90,11,237 Pondicherry 7,89,416 3. Spacio Temporal Data: If the data collected is connected to the time as well as place then it is known as spacio 14 GECO-03/OSOU temporal data. Example 3: State Population 1981 1991 Tamil Nadu 4,82,97,456 5,56,38,318 Andhra Pradesh 5,34,03,619 6,63,04,854 Karnataka 3,70,43,451 4,48,17,398 Kerala 2,54,03,217 2,90,11,237 Pondicherry 6,04,136 7,89,416 2.3 CATEGORIES OF DATA Any statistical data can be classified under two categories depending upon the sources utilized. These categories are, PRIMARY DATA: Primary data is the one, which is collected by the investigator himself for the purpose of a specific inquiry or study. Such data is original in character and is generated by survey conducted by individuals or research institution or any organisation. Example 4: If a researcher is interested to know the impact of noon- meal scheme for the school children, he has to undertake a survey and collect data on the opinion of parents and children by asking relevant questions. Such a data collected for the purpose is called primary data. The primary data can be collected by the following five methods.  Direct personal interviews.  Indirect Oral interviews.  Information from correspondents.  Mailed questionnaire method.  Schedules sent through enumerators. 1. Direct personal interviews: The people from whom information are collected are known as informants. The investigator personally meets them and asks questions to gather the necessary information. It is the suitable method for intensive rather than extensive field surveys. It suits best for intensive study of the limited field. Merits: 15 GECO-03/OSOU i. People willingly supply information0 because they are approached personally. Hence, more response noticed in this method than in any other method. ii. The collected information is likely to be uniform and accurate. The investigator is there to clear the doubts of the informants. iii. Supplementary information on informant’s personal aspects can be noted. Information on character and environment may help later to interpret some of the results. iv. Answers for questions about which the informant is likely to be sensitive can be gathered by this method. v. The wordings in one or more questions can be altered to suit any informant. Explanations may be given in other languages also. Inconvenience and misinterpretations are thereby avoided. Limitations: i. It is very costly and time consuming. ii. It is very difficult, when the number of persons to be interviewed is large and the persons are spread over a wide area. iii. Personal prejudice and bias are greater under this method. 2. Indirect Oral Interviews: Under this method the investigator contacts witnesses or neighbors or friends or some other third parties who are capable of supplying the necessary information. This method is preferred if the required information is on addiction or cause of fire or theft or murder etc., If a fire has broken out a certain place, the persons living in neighborhood and witnesses are likely to give information on the cause of fire. In some cases, police interrogated third parties who are supposed to have knowledge of a theft or a murder and get some clues. Enquiry committees appointed by governments generally adopt this method and get people’s views and all possible details of facts relating to the enquiry. This method is suitable whenever direct sources do not exists or cannot be relied upon or would be unwilling to part with the information. The validity of the results depends upon a few factors, such as the nature of the person whose evidence is being recorded, the ability of the interviewer to draw out information from the third parties by means of appropriate questions and cross examinations, and the number of persons interviewed. For the success of this method one person or one group alone should not be relied upon. 3. Information from correspondents: The investigator appoints local agents or correspondents in different places and compiles the information sent by them. Information to Newspapers and some departments of Government come by this method. The advantage of this method is that it is cheap and appropriate for extensive investigations. But it may not ensure accurate results because the correspondents are likely to be negligent, prejudiced and biased. This method is adopted in those cases where information are to be collected periodically from a wide area for a long time. 4. Mailed questionnaire method: Under this method a list of questions is prepared and is sent to all the informants by post. The list of questions is technically called questionnaire. A covering letter accompanying the 16 GECO-03/OSOU questionnaire explains the purpose of the investigation and the importance of correct information and requests the informants to fill in the blank spaces provided and to return the form within a specified time. This method is appropriate in those cases where the informants are literates and are spread over a wide area. Merits: i. It is relatively cheap. ii. It is preferable when the informants are spread over the wide area. Limitations: i. The greatest limitation is that the informants should be literates who are able to understand and reply the questions. ii. It is possible that some of the persons who receive the questionnaires do not return them. iii. It is difficult to verify the correctness of the information furnished by the respondents. With the view of minimizing non-respondents and collecting correct information, the questionnaire should be carefully drafted. There is no hard and fast rule. But the following general principles may be helpful in framing the questionnaire. A covering letter and a self- addressed and stamped envelope should accompany the questionnaire. The covering letter should politely point out the purpose of the survey and privilege of the respondent who is one among the few associated with the investigation. It should assure that the information would be kept confidential and would never be misused. It may promise a copy of the findings or free gifts or concessions etc. Characteristics of a good questionnaire: a) Number of questions should be minimum. b) Questions should be in logical orders, moving from easy to more difficult questions. c) Questions should be short and simple. Technical terms and vague expressions capable of different interpretations should be avoided. d) Questions fetching YES or NO answers are preferable. There may be some multiple choice questions requiring lengthy answers are to be avoided. e) Personal questions and questions which require memory power and calculations should also be avoided. f) Question should enable cross check. Deliberate or unconscious mistakes can be detected to an extent. g) Questions should be carefully framed so as to cover the entire scope of the survey. h) The wording of the questions should be proper without hurting the feelings or arousing resentment. i) As far as possible confidential information should not be sought. j) Physical appearance should be attractive, sufficient space should be provided for answering each question. 17 GECO-03/OSOU 5. Schedules sent through Enumerators: Under this method enumerators or interviewers take the schedules, meet the informants and filling their replies. Often distinction is made between the schedule and a questionnaire. A schedule is filled by the interviewers in a face-to-face situation with the informant. A questionnaire is filled by the informant which he receives and returns by post. It is suitable for extensive surveys. Merits: i. It can be adopted even if the informants are illiterates. ii. Answers for questions of personal and pecuniary nature can be collected. iii. Non-response is minimum as enumerators go personally and contact the informants. iv.The information collected are reliable. The enumerators can be properly trained for the same. v. It is most popular methods. Limitations: i. It is the costliest method. ii. Extensive training is to be given to the enumerators for collecting correct and uniform information. Interviewing requires experience. Unskilled investigators are likely to fail in their work. Before the actual survey, a pilot survey is conducted. The questionnaire/Schedule is pre-tested in a pilot survey. A few among the people from whom actual information is needed are asked to reply. If they misunderstand a question or find it difficult to answer or do not like its wordings etc., it is to be altered. Further it is to be ensured that every question fetches the desired answer. Merits and Demerits of primary data: The collection of data by the method of personal survey is possible only if the area covered by the investigator is small. Collection of data by sending the enumerator is bound to be expensive. Care should be taken twice that the enumerator records correct information provided by the informants. Collection of primary data by framing a schedule or distributing and collecting questionnaires by post is less expensive and can be completed in shorter time. Suppose the questions are embarrassing or of complicated nature or the questions probe into personnel affairs of individuals, then the schedules may not be filled with accurate and correct information and hence this method is unsuitable. The information collected for primary data is more reliable than those collected from the secondary data. SECONDARY DATA: Secondary data are those data which have been already collected and analysed by some earlier 18 GECO-03/OSOU agency for its own use; and later the same data are used by a different agency. According to W.A.Neiswanger, ‘A primary source is a publication in which the data are published by the same authority which gathered and analysed them. A secondary source is a publication, reporting the data which have been gathered by other authorities and for which others are responsible’. Sources of Secondary data: In most of the studies the investigator finds it impracticable to collect first-hand information on all related issues and as such he makes use of the data collected by others. There is a vast amount of published information from which statistical studies may be made and fresh statistics are constantly in a state of production. The sources of secondary data can broadly be classified under two heads: A. Published sources, and B. Unpublished sources. A. Published Sources: The various sources of published data are: a) Reports and official publications of international bodies such as the International Monetary Fund, International Finance Corporation and United Nations Organisation. b) Central and State Governments such as the Report of the Tandon Committee and Pay Commission. c) Semi-official publication of various local bodies such as Municipal Corporations and District Boards. d) Private publications-such as the publications of – e) Trade and professional bodies such as the Federation of Indian Chambers of Commerce and Institute of Chartered Accountants. f) Financial and economic journals such as ‘Commerce’, ‘Capital’ and ‘Indian Finance’. g) Annual reports of joint stock companies. h) Publications brought out by research agencies, research scholars, etc. It should be noted that the publications mentioned above vary with regard to the periodically of publication. Some are published at regular intervals (yearly, monthly, weekly etc.,) whereas others are ad hoc publications, i.e., with no regularity about periodicity of publications. Note: A lot of secondary data is available in the internet. We can access it at any time for the further studies. B. Unpublished Sources All statistical material is not always published. There are various sources of unpublished data such as records maintained by various Government and private offices, studies made by 19 GECO-03/OSOU research institutions, scholars, etc. Such sources can also be used where necessary. Precautions in the use of Secondary data The following are some of the points that are to be considered in the use of secondary data: i. How the data has been collected and processed ii. The accuracy of the data iii. How far the data has been summarized iv. How comparable the data is with other tabulations v. How to interpret the data, especially when figures collected for one purpose is used for another. Generally speaking, with secondary data, people have to compromise between what they want and what they are able to find. Merits and Demerits of Secondary Data: Secondary data is cheap to obtain. Many government publications are relatively cheap and libraries stock quantities of secondary data produced by the government, by companies and other organisations. Large quantities of secondary data can be got through internet. Much of the secondary data available has been collected for many years and therefore it can be used to plot trends.o Secondary data is of value to:  The government – help in making decisions and planning future policy.  Business and industry – in areas such as marketing, and sales in order to appreciate the general economic and social conditions and to provide information on competitors.  Research organisations – by providing social, economic and industrial information. 2.4 CLASSIFICATION The collected data, also known as raw data or ungrouped data are always in an unorganized form and need to be organised and presented in meaningful and readily comprehensible form in order to facilitate further statistical analysis. It is, therefore, essential for an investigator to condense a mass of data into more and more comprehensible and assimilable form. The process of grouping into different classes or sub classes according to some characteristics is known as classification, tabulation is concerned with the systematic arrangement and presentation of classified data. Thus, classification is the first step in tabulation. For Example, letters in the post office are classified according to their destinations viz., Delhi, Madurai, Bangalore, Mumbai etc., Objects of Classification: a) The following are main objectives of classifying the data: b) It condenses the mass of data in an easily assimilable form. c) It eliminates unnecessary details. d) It facilitates comparison and highlights the significant aspect of data. 20 GECO-03/OSOU e) It enables one to get a mental picture of the information and helps in drawing inferences. f) It helps in the statistical treatment of the information collected. Types of classification: Statistical data are classified in respect of their characteristics. Broadly there are four basic types of classification namely  Chronological classification  Geographical classification  Qualitative classification  Quantitative classification Chronological classification: In chronological classification the collected data are arranged according to the order of time expressed in years, months, weeks, etc. The data is generally classified in ascending order of time. For example, the data related with population, sales of a firm, imports and exports of a country are always subjected to chronological classification. Example 5: The estimates of birth rates in India during 1970 – 76 are Year 1970 1971 1972 1973 1974 1975 1976 Birth Rate 36.8 36.9 36.6 34.6 34.5 35.2 34.2 Geographical classification: In this type of classification, the data are classified according to geographical region or place. For instance, the production of paddy in different states in India, production of wheat in different countries etc. Example 6: Country America China Denmark France India Yield of wheat in (kg/acre) 1925 893 225 439 862 Qualitative classification: In this type of classification data are classified on the basis of same attributes or quality like sex, literacy, religion, employment etc. Such attributes cannot be measured along with a scale. For example, if the population to be classified in respect to one attribute, say sex, then we can classify them into two namely that of males and females. Similarly, they can also be classified into ‘employed’ or ‘unemployed’ on the basis of another attribute ‘employment’. Thus, when the classification is done with respect to one attribute, which is dichotomous in nature, two classes are formed, one possessing the attribute and the other not possessing the attribute. This type of classification is called simple or dichotomous classification. A simple classification may be shown as under 21 GECO-03/OSOU Population Male Female The classification, where two or more attributes are considered and several classes are formed, is called a manifold classification. For example, if we classify population simultaneously with respect to two attributes, e.g. sex and employment, then population are first classified with respect to ‘sex’ into ‘males’ and ‘females’. Each of these classes may then be further classified into ‘ employment’ and ‘ unemployment’ on the basis of attribute ‘ employment’ and as such Population are classified into four classes namely. Male employed Male unemployed Female employed Female unemployed Still the classification may be further extended by considering other attributes like marital status etc. This can be explained by the following chart Population Male Female Employed Unemployed Employed Unemployed Quantitative classification: Quantitative classification refers to the classification of data according to some characteristics that can be measured such as height, weight, etc., For example the students of a college may be classified according to weight as given below. Weight (in lbs.) No of Students 90-100 50 100-110 200 110-120 260 120-130 360 130-140 90 140-150 40 Total 1000 In this type of classification there are two elements, namely (i) the variable (i.e.) the weight in the above example, and (ii) the frequency in the number of students in each class. There are 50 students having weights ranging from 90 to 100 lb, 200 students having weight ranging between 100 to 110 lb. and so on. 2.5 TABULATION Tabulation is the process of summarizing classified or grouped data in the form of a table 22 GECO-03/OSOU so that it is easily understood and an investigator is quickly able to locate the desired information. A table is a systematic arrangement of classified data in columns and rows. Thus, a statistical table makes it possible for the investigator to present a huge mass of data in a detailed and orderly form. It facilitates comparison and often reveals certain patterns in data which are otherwise not obvious. Classification and ‘Tabulation’, as a matter of fact, are not two distinct processes. Actually, they go together. Before tabulation data are classified and then displayed under different columns and rows of a table. Advantages of Tabulation: i. Statistical data arranged in a tabular form serve following objectives: ii. It simplifies complex data and the data presented are easily understood. iii. It facilitates comparison of related facts. iv. It facilitates computation of various statistical measures like averages, dispersion, correlation etc. v. It presents facts in minimum possible space and unnecessary repetitions and explanations are avoided. Moreover, the needed information can be easily located. vi. Tabulated data are good for references and they make it easier to present the information in the form of graphs and diagrams. Preparing a Table The making of a compact table itself an art. This should contain all the information needed within the smallest possible space. What the purpose of tabulation is and how the tabulated information is to be used are the main points to be kept in mind while preparing for a statistical table. An ideal table should consist of the following main parts: a) Table number b) Title of the table c) Captions or column headings d) Stubs or row designation e) Body of the table f) Footnotes g) Sources of data a) Table Number A table should be numbered for easy reference and identification. This number, if possible, should be written in the center at the top of the table. Sometimes it is also written just before the title of the table. b) Title A good table should have a clearly worded, brief but unambiguous title explaining the nature of data contained in the table. It should also state arrangement of data and the period covered. The title should be placed centrally on the top of a table just below the table number (or just after table number in the same line). c) Captions or column Headings: 23 GECO-03/OSOU Captions in a table stand for brief and self-explanatory headings of vertical columns. Captions may involve headings and sub-headings as well. The unit of data contained should also be given for each column. Usually, a relatively less important and shorter classification should be tabulated in the columns. d) Stubs or Row Designations: Stubs stands for brief and self-explanatory headings of horizontal rows. Normally, a relatively more important classification is given in rows. Also, a variable with a large number of classes is usually represented in rows. For example, rows may stand for score of classes and columns for data related to sex of students. In the process, there will be many rows for scores classes but only two columns for male and female students. A model structure of a table is given below: Table Number Title of the Table Sub Heading Caption Headings Total Caption Sub-Headings Stub Sub- Headings Body Total Foot notes: Sources Note: e) Body: The body of the table contains the numerical information of frequency of observations in the different cells. This arrangement of data is according to the description of captions and stubs. f) Footnotes: Footnotes are given at the foot of the table for explanation of any fact or information included in the table which needs some explanation. Thus, they are meant for explaining or providing further details about the data, hat have not been covered in title, captions and stubs. g) Sources of data: Lastly one should also mention the source of information from which data are taken. This may preferably include the name of the author, volume, page and the year of publication. This should also state whether the data contained in the table is of ‘primary or secondary’ nature. Requirements of a Good Table A good statistical table is not merely a careless grouping of columns and rows but should be such that it summarizes the total information in an easily accessible form in minimum possible space. Thus, while preparing a table, one must have a clear idea of the information to 24 GECO-03/OSOU be presented, the facts to be compared and he points to be stressed. Though, there is no hard and fast rule for forming a table yet a few general points should be kept in mind: a) A table should be formed in keeping with the objects of statistical enquiry. b) A table should be carefully prepared so that it is easily understandable. c) A table should be formed so as to suit the size of the paper. But such an adjustment should not be at the cost of legibility. d) If the figures in the table are large, they should be suitably rounded or approximated. The method of approximation and units of measurements too should be specified. e) Rows and columns in a table should be numbered and certain figures to be stressed may be put in ‘box’ or ‘circle’ or in bold letters. f) The arrangements of rows and columns should be in a logical and systematic order. This arrangement may be alphabetical, chronological or according to size. g) The rows and columns are separated by single, double or thick lines to represent various classes and sub-classes used. The corresponding proportions or percentages should be given in adjoining rows and columns to enable comparison. A vertical expansion of the table is generally more convenient than the horizontal one. h) The averages or totals of different rows should be given at the right of the table and that of columns at the bottom of the table. Totals for every sub-class too should be mentioned. i) In case it is not possible to accommodate all the information in a single table, it is better to have two or more related tables. Type of Tables: Tables can be classified according to their purpose, stage of enquiry, nature of data or number of characteristics used. On the basis of the number of characteristics, tables may be classified as follows: Simple or one-way table Two-way table Manifold table Simple or one-way Table: A simple or one-way table is the simplest table which contains data of one characteristic only. A simple table is easy to construct and simple to follow. For example, the blank table given below may be used to show the number of adults in different occupations in a locality. The number of adults in different occupations in a locality Occupations No. Of Adults Total Two-way Table: A table, which contains data on two characteristics, is called a two- way table. In such case, therefore, either stub or caption is divided into two co-ordinate parts. In the given table, as an example the caption may be further divided in respect of ‘sex’. This subdivision is shown in 25 GECO-03/OSOU two-way table, which now contains two characteristics namely, occupation and sex. The number of adults in a locality in respect of occupation and sex Occupation No. of Adults Total Male Female Total Manifold Table: Thus, more and more complex tables can be formed by including other characteristics. For example, we may further classify the caption sub-headings in the above table in respect of “marital status”, “religion” and “socio-economic status” etc. A table, which has more than two characteristics of data is considered as a manifold table. For instance, table shown below shows three characteristics namely, occupation, sex and marital status. Occupation No. of Adults Total Male Female M U Total M U Total Total Foot note: M Stands for Married and U stands for unmarried. Manifold tables, though complex are good in practice as these enable full information to be incorporated and facilitate analysis of all related facts. Still, as a normal practice, not more than four characteristics should be represented in one table to avoid confusion. Other related tables may be formed to show the remaining characteristics. LET US SUM UP Statistics is a numerical representation of data. Statistics can be used in insurance companies, business, banks etc. Frequency distribution is a tabular representation of data. Data can be classified on geographical, chronological, qualitative and quantitative factors. Statistical survey means some sort of investigation by an individual or agency where in the relevant information is collected in quantitative terms. Two main stages of statistical survey are – Planning the survey and executing the survey. There are various types of statistical survey. The unit in terms of which the investigation counts or measures the variable selected for enumeration, analysis and interpretation is done known as statistical unit. Depending on the sources, the data may be classified as primary data and secondary data. There are five methods of collecting Primary data. Precautions must be taken while using secondary data. Measurable characteristics are called variable and non-measurable characteristics are called attribute. The collection of all items related to any survey is called population or universe. A part of the selected items taken from a population is called a sample. REVIEW QUESTIONS Q 1: Define primary and secondary data. Q 2: What are the sources of secondary data? 26 GECO-03/OSOU Q 3: Give the merits and demerits of primary data. Q 4: State the characteristics of a good questionnaire. Q 5: Define classification. What are the main objects of classification? Q 6: Write a detail note on the types of classification. Q 7: What are the main parts of an ideal table? Explain. Q 8: Explain the different types of table. 27 GECO-03/OSOU Unit-3: Measures of central tendency: Mathematical average STRUCTURE 3.1 Introduction 3.2 Characteristics of an ideal measure 3.3 Arithmetic Mean or Simple Mean 3.4 Harmonic Mean 3.5 Geometric Mean 3.6 Let Us Sum Up 3.7 Review Questions LEARNING OBJECTIVES After studying this unit you will be able to know:  Ideal measure i.e., mean.  Different types of mean and their usage. 3.1 INTRODUCTION For better understanding and generalising the characteristics of occurrence of events that take place around us on a daily basis, we need some measuring tools. Suppose we want to depict the rainfall of a particular year during monsoon, but every day in the monsoon season the rainfall will not remain the same and would surely vary each day. Then how to represent the rainfall of a particular year by a single value? In another similar situation, suppose in your class you want to compare the performance of Girls with that of Boys on a particular paper, say for English, so how best you can compare and present the comparison between the two sets (Boys and Girls) of performances of English paper. All measurable traits such as snowfall, height, density, income, age, levels of educational attainment, etc. will vary. If we seek to interpret them, how would we interpret them? We may, require a single value or number that best represents all the observations. This single value usually lies near the Centre or middle of a distribution from lowest to highest observation or measurement. The statistical techniques used to find out the centre of distributions are referred to as Measures of Central tendency. This number denotes the central tendency which is a representative figure for the entire data set because it is that number or measurement or observation around whose vicinity items has a tendency to gather. 3.2 CHARACTERISTICS OF AN IDEAL MEASURE The value of central tendency is the ideal measure that represents the entire data set with a single value is the Central Value. The measurement of central values around which all the observations reside have the following characteristics according to Professor Yule, the: 28 GECO-03/OSOU  It should be rigidly defined  It should be readily comprehensible and easy to calculate  It should be based upon all the observations  It should be suitable for further mathematical treatment. By this we mean that if we are given the averages and sizes of a number of series, we should be able to calculate the average of the composite series obtained on combining the given series.  It should be affected as little as possible by fluctuations of sampling. In addition to the above criteria, we may add the following (which is not due to Prof. Yule)  It should not be affected much by extreme values. Measures of Central Tendency are also known as Statistical Averages. There are a number of the Measures of Central Tendency, such as the Arithmetic Mean, Geometric Mean, Harmonic Mean, Median, and Mode. Arithmetic Mean, Geometric Mean and Harmonic Means are usually called Mathematical averages while Mode and Median are called Positional averages. 3.3 ARITHMETIC MEAN OR SIMPLE MEAN The mean is the average of all the values or observations of data set which is derived by adding all the values and dividing by the number of observations. Let us learn about Arithmetic Mean or Simple Mean. Many a times we use the words ‘average’ and ‘arithmetic mean’ interchangeably. It is obtained by dividing the summation of values of observations of a dataset by the number of observations. This can be calculated for Grouped data as well asungroupeddata. The mean is the average of all the values or observations of data set which is derived by adding all the values and dividing by the number of observations. Calculation of Arithmetic mean by Direct Method for Grouped data and ungrouped data: Mean=Sum of all values or observations/ Number of observations OR 𝑋1+ 𝑋2+ 𝑋3+⋯+𝑋𝑛 𝑋= 𝑛 OR ∑𝑋𝑖 𝑋= 𝑛 29 GECO-03/OSOU Formula for Grouped data is as follow : 𝐹1𝑋1+𝐹2 𝑋2+𝐹3 𝑋3+⋯+𝐹𝑛𝑛 𝑋= 𝑛 x̄ = (∑𝐹𝑖𝑋𝑖)/∑𝐹𝑖 Where n= ∑𝐹𝑖 Example 1: Find out the arithmetic mean or average for the following the marks obtained by 10 students are given below: Xi: 45 32 37 46 39 36 41 48 36 50 Solution : The above problem can be solved by using Arithmetic mean formula for ungrouped data: X1+X2+X3+⋯+Xn x̄ = OR n ∑Xi x̄ = 𝑛 Where n=10 Xi: 42 35 40 43 39 36 41 48 36 50 ∑𝑋𝑖= 410 Therefore, ∑Xi x̄ = 𝑛 = 410 = 41 10 Example 2: Compute the arithmetic mean or simple average of the following Grouped data by direct method: x 6 7 8 9 10 11 12 frequency 20 43 57 61 72 45 36 Solution: This is an example where the frequency of the variable occurred is given in a data set, so we can apply the formula of calculating arithmetic mean by direct method for grouped data: 30 GECO-03/OSOU x̄ = (∑𝐹𝑖𝑋𝑖)/∑𝐹𝑖 Where n= ∑Fi x 10 17 8 9 10 13 12 frequenc 18 40 57 61 70 45 36 ∑Fi = 327 y FiXi 180 680 456 549 700 585 432 ∑FiXi = 3582 A.M 𝑋= (∑𝐹𝑖𝑋𝑖)/∑𝐹=10.95413 The above two examples are typical ones with less number of observation, now we would try to understand if we have a data set where large number of observations are available. Let us try with that too: Example 3: Suppose, the weighing machine recorded to the nearest grams of 60 mangoes picked out randomly from a merchant are given below: 106, 107, 76, 82, 109, 107, 115, 92, 187, 95, 124, 125, 111, 92, 86, 70, 126, 68, 130, 128, 139, 119, 115, 128, 100, 186, 84, 99, 113, 204, 111, 141, 136, 123, 90, 115, 98, 110, 78, 185, 162, 178, 140, 152, 173, 146, 158, 194, 148, 90, 107, 181, 131, 75, 184, 104, 110, 80, 118, 82. In this case, it becomes bit laborious to do the summation of all the observations to find out the arithmetic mean, so we will try to reduce the above observations by creating groups or class intervals. Let us see how: Solution: First observe the weights of 60 mangoes and try to figure out the lowest and highest values in the data set. After this recreate the data set with groups or class intervals like as follows: Weight (grams) Frequency 65----84 9 85----104 10 105----124 17 125----144 10 145----164 5 165----184 4 185----204 5 Next, the mid points of all the class intervals will be calculated by (Lowest value + highest value) / 2 as follows: 31 GECO-03/OSOU Weight (grams) Midpoints ( Xi ) Frequency (Fi) FiXi 65----84 (65+84)/2 = 74.5 9 670.5 85----104 94.5 10 945 105----124 114.5 17 1946.5 125----144 134.5 10 1345 145----164 154.5 5 772.5 165----184 174.5 4 698 185----204 194.5 5 972.5 ∑Fi= 60 ∑FiXi= 7350 Now using the formula 𝑋 = (∑𝐹𝑖𝑋𝑖)/∑𝐹𝑖 = 122.5 𝑔𝑟𝑎𝑚𝑠 Calculation of Arithmetic Mean by Short Cut method for Ungrouped and Grouped data So far we have already learnt direct method of calculating Arithmetic Mean for both grouped and ungrouped data. Now, in this section we shall learn about the Short Cut method of calculating Arithmetic Mean. This method is applied to simplify the calculations of arithmetic means. In this method, assumed mean (A) is considered (any assumed number or provisional mean), and the deviation (D) is calculated of each variable from assumed mean (A). Then the formula for Arithmetic Mean for Grouped data using Short cut method is as follows: 𝑋̅ = 𝐴 + ∑D , n= number of observations, A= Assumed mean, Deviation is Di = (Xi– A) 𝑛 𝑛 Let us find out arithmetic mean of the previous example by using short cut method. Example 4: Suppose, the weighing machine recorded to the nearest grams of 60 mangoes picked out randomly from a merchant are given below: 106, 107, 76, 82, 109, 107, 115, 92, 187, 95, 124, 125, 111, 92, 86, 70, 126, 68, 130, 128, 139, 119,115, 128, 100, 186, 84, 99, 113, 204, 111, 141, 136, 123, 90, 115, 98, 110, 78, 185, 162, 178, 140, 152, 173, 146, 158, 194, 148, 90, 107, 181, 131, 75, 184, 104, 110, 80, 118, 82 In this case, it becomes bit laborious to do the summation of all the observations to find out the arithmetic mean, so we will try to reduce the above observations by creating groups or 32 GECO-03/OSOU class intervals. Let us see how: First observe the weights of 60 mangoes and try to figure out the lowest and highest values in the data set. After this recreate the data set with groups or class intervals like as follows: Weight(grams) Frequency 65----84 9 85----104 10 105----124 17 125----144 10 145----164 5 165----184 4 185----204 5 In this problem for applying short cut method, assumed mean A is taken as 114.5, you may assume any other number from the mid points calculated. Next, the mid points of all the class intervals will be calculated by : (Lowest value + highest value) / 2 as follows: Weight Midpoints( Xi ) Frequency(Fi) Di=(Xi–A) FiDi (grams) A=114.5 65----84 (65+84)/2=74.5 9 -40 -360 85----104 94.5 10 -20 -200 105----124 114.5 17 0 0 125----144 134.5 10 20 200 145-164 154.5 5 40 200 165----184 174.5 4 60 240 185----204 194.5 5 80 400 ∑Fi=60 ∑FiDi=480 Now applying the formula of short cut method for Arithmetic Mean; ∑𝐹𝐷 𝑋̅ = 𝐴 + ∑𝐹 480 𝑋̅ = 114.5 + = 122.5 grams 60 33 GECO-03/OSOU Calculation of Arithmetic Mean by Step Deviation method for Ungrouped and Grouped data Step Deviation method is yet another method for calculating Arithmetic mean, where in a data set with class intervals have equal or common width. Step deviation method is applied to further simplify the calculation of arithmetic mean if the deviation that was found by short cut method gives large numerical values, then to avoid such larger numerical values, the deviation Di = (Xi– A) is again divided by the fixed class interval or for an Ungrouped data set the deviation is Di = (Xi– A) is divided by a common factor to reach a value ‘u’ Step Deviation method for ungrouped data set: 𝑋̅ = 𝐴 +∑𝑢 × 𝑐 , 𝑛 Where A= Assumed mean, is the common factor and u = (Xi– A) /c, n= number of Observations Step Deviation method for Grouped data set: 𝑋̅ = 𝐴 + ∑𝐹𝑢 × ℎ ∑𝐹 Where A= Assumed mean, h is the common class width and u = (Xi– A) / h, F is the frequency or occurrence of an observation Let us find out arithmetic mean of the previous example by using Step Deviation method. Example 5: Suppose, the weighing machine recorded to the nearest grams of 60 mangoes picked out randomly from a merchant are given below: 106, 107, 76, 82, 109, 107, 115, 92, 187, 95, 124, 125, 111, 92, 86, 70, 126, 68, 130, 128, 139, 119,115, 128, 100, 186, 84, 99, 113, 204, 111, 141, 136, 123, 90, 115, 98, 110, 78, 185, 162, 178, 140, 152, 173, 146, 158, 194, 148, 90, 107, 181, 131, 75, 184, 104, 110, 80, 118, 82 In this case, it becomes bit laborious to do the summation of all the observations to find out the arithmetic mean, so we will try to reduce the above observations by creating groups or class intervals. Let us see how: First observe the weights of 60 mangoes and try to figure out the lowest and highest values in the data set. After this recreate the data set with groups or class intervals like as follows: 34 GECO-03/OSOU Weight Frequency (grams) 65----84 9 85----104 10 105----124 17 125----144 10 145----164 5 165----184 4 185----204 5 In this problem for applying short cut method, assumed mean A is taken as 114.5, you may assumeany other number from the mid points calculated. Next, the mid points of all the class intervals will be calculated by(Lowest value + highest value) / 2 asfollows: Weight Midpoints ( Xi ) Frequency (Fi) Di = (Xi– A) (grams) A= 114.5 65----84 (65+84)/2 = 74.5 9 -40 85----104 94.5 10 -20 105----124 114.5 17 0 125----144 134.5 10 20 145----164 154.5 5 40 165----184 174.5 4 60 185----204 194.5 5 80 ∑Fi= 60 Now in this problem, we will apply step deviation method by calculating u = (X i– A) / h, where h isthe width of class interval. After calculating we arrive at the following table: Weight Midpoints ( Xi ) Frequency Di = (Xi– A) u = (Xi– A) / h Fu (grams) (Fi) A= 114.5 h= 20 65----84 (65+84)/2 = 74.5 9 -40 -2 -18 35 GECO-03/OSOU 85----104 94.5 10 -20 -1 -10 105----124 114.5 17 0 0 0 125----144 134.5 10 20 1 10 145----164 154.5 5 40 2 10 165----184 174.5 4 60 3 12 185----204 194.5 5 80 4 20 ∑Fi= 60 ∑𝐹𝑢 = 24 Example 6: Find the average sales in Rupees for the data given, by using step deviation and assumedmean methods. Weekly Sale in Rs. 950 750 650 Solution: Let us calculate the average sales by sing step deviation method. In this example we will apply step deviation method for ungrouped data. Let us assume the assumed mean A as 750 and common factor c as 10. It is advisable to consider a common factor from all the observations so that the observations are divisible by the common factor. We have to calculate u by using the formula u= (Xi– A) / c. Thus we reach to the following Weekly sale in Rs. D= X-A, where A= 750 u= (X_A)/c, where c = 10 900 150 15 750 0 0 650 -100 10 800 50 5 30 Now applying the formula of arithmetic mean by using step deviation method: 36 GECO-03/OSOU Hence the average sales is Rs 825. Properties of Arithmetic Mean Property 1: The sum total of deviations from the mean value is zero. This property says, if𝑋̅is the average or arithmetic mean then ∑(𝑋𝑖 − 𝑋̅) = 0 OR if𝑋̅is the average or arithmetic mean then (X 1 - x̄) + (X 2 - x̄) + (X 3 - x̄) +... + (X n - x̄) = 0. Property 2: The arithmetic mean of n number of observations X1 , X2,..., Xn is x̄. If each observation is increased by c, then the mean of the new observations is (x̄ + c). Let us prove property 2: 𝑋1+𝑋2+𝑋3+⋯+𝑋𝑛 We know that 𝑋 = 𝑛 Hence nx̄ = 𝑋1 + 𝑋2 + 𝑋3 + ⋯ + 𝑋𝑛.............(.Part A) If we increase each observation by c, then new mean will be; 37 GECO-03/OSOU Property 3: The arithmetic mean of n number of observations X1 , X2,..., Xn is x̄. If each value ofobservation is decreased by c, the mean of the new observations is (x̄ - c). Let us prove property 3: 𝑋1+𝑋2+𝑋3+⋯+𝑋𝑛 We know that 𝑋 = 𝑛 Hence nx̄ = 𝑋1 + 𝑋2 + 𝑋3 + ⋯ + 𝑋𝑛.............(.Part A) Property 4: The arithmetic mean of n number of observations X 1, X2,..., Xn is x̄. If each value of observation is multiplied by a positive numberc, the mean of the new observations is cx̄. Let us prove property 4: 𝑋1+𝑋2+𝑋3+⋯+𝑋𝑛 We know that 𝑋 = 𝑛 Hence n x̄ = 𝑋1 + 𝑋2 + 𝑋3 + ⋯ + 𝑋𝑛............ (.Part A) Suppose, we multiply, a nonzero positive number c with X1 , X2,...... , Xn , then we find the new mean 38 GECO-03/OSOU , we will get A.M.= (cX1 + cX2+ cX3........cXn) /n A.M.= {c(X1+ X2 + X3 +....+ Xn ) / n, Now from part A, replace 𝑋1 + 𝑋2 + 𝑋3 + ⋯ + 𝑋𝑛 = n x̄ , thus we get, the new mean as follows: A.M.= c(n x̄ )/n = cx̄ , hence proved. Property 5: The mean of n observations X1, X2,..., Xn is x̄. If each value of observation is divided by apositive number c, the mean of the new observations is (x̄ /c). Let us prove property 4: 𝑋1+𝑋2+𝑋3+⋯+𝑋𝑛 We know that 𝑋 = 𝑛 Hence n x̄ = 𝑋1 + 𝑋2 + 𝑋3 + ⋯ + 𝑋𝑛............ (.Part A) Suppose, we divide, a nonzero positive number c with X1 , X2 , , Xn , then we find the new mean , we will get A.M.= (X1/c + X2 /c+ X3/c +.. +.Xn /c) /n, hence we find the new mean as; A.M. = (X1+X2 + X3 +... + Xn )/cn , now by replacing X1+ X2 + X3 +... + Xn by n x̄ from part A, we get the newmean as follows; A.M.= nx̄ /cn = x̄ / c. Hence it is proved. Advantages of Mean 1) The mean is rigid which and thus a good measure of central tendency or central value. 2) It is simple to understand and easy to estimate. 3) All the values of observations in the data set are considered when mean is calculated. 4) Other mathematical calculations can be done based on mean. 5) It is least affected by the presence of extreme observations. 6) Fluctuations in sampling are least likely to affect mean. Limitations of Mean 1) Outliers or extreme values can have an impact on mean or average. 2) When there are open ended classes, such as ‘20 and above or below 10’, mean cannot be calculated. In these cases, median and mode can be calculated. This is mainly because in such distributions mid-point cannot be determined to carry out calculations. 3) If a score in the observation is missing, then mean cannot be calculated. 4) It is not possible to determine mean through inspection. Further, it cannot be 39 GECO-03/OSOU determined based on a graph. 5) It is not suitable for very asymmetrical data as mean will not adequately represent the data 3.4 Harmonic mean The Harmonic mean of a data set indicates the Reciprocal of average of reciprocal values of data set.Therefore, if X1, X2, X3,....,Xn is a series and H.M. is its harmonic mean then Example 1: Find the harmonic mean of the scores scored by batsmen obtained in a test, given below: Score(X) 25 35 40 15 29 Batsmen(f) 2 1 3 1 3 Solution: Score(X) Batsmen(f) 1/X f x 1/X 25 2 0.04 0.08 35 1 0.028571 0.028571 40 3 0.025 0.075 15 1 0.066667 0.066667 29 3 0.034483 0.103448 40 GECO-03/OSOU 1 ∑𝑓𝑖 = 10 ∑𝑓 × ( ) = 𝑋 0.353686 Applying the formula of Harmonic Mean, we get: ∑𝑓𝑖 𝐻. 𝑀. = = 10/0.353686= 28.27 ∑𝑓 ×( 1) 𝑋 Example 2: Find the Harmonic Mean of the following data set: Weight of Frequency weight lifters (grams) 65----84 9 85----104 10 105----124 17 125----144 10 145----164 5 165----184 4 185----204 5 Weight of weight Mid-Points Frequency 1/X f x 1/X lifters (kilograms) (X) (f) 65----84 74.5 9 0.013 0.121 85----104 94.5 10 0.011 0.106 105----124 114.5 17 0.009 0.148 125----144 134.5 10 0.007 0.074 145----164 154.5 5 0.006 0.032 165----184 174.5 4 0.006 0.023 185----204 194.5 5 0.005 0.026 ∑𝑓𝑖 = 60 ∑𝑓 × (1/𝑥) = 0.530 41 GECO-03/OSOU 3.5 GEOMETRIC MEAN The Geometric Mean (GM) is basically defined as the nth root of the product of n numbers. It is noted that the

Document Details

Tags

Related

Full Transcript

Upgrade to continue