🎧 New: AI-Generated Podcasts Turn your study notes into engaging audio conversations. Learn more

Measures of Skewness PDF

Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...

Summary

This document provides a detailed explanation of measures of skewness in statistics. It covers different types of statistical classifications such as geographical, chronological, and qualitative classifications and also explores the concepts of central tendency and dispersion. It is presented as an instructional guide or textbook.

Full Transcript

Unit I : Measures of Skewness Introduction : The word ‘Statistics’ was used in ancient period as well as medieval period in many countries. Hence, there was a question about the origin of the word ‘Statistics’. The word ‘Statistics’ has been derived from th...

Unit I : Measures of Skewness Introduction : The word ‘Statistics’ was used in ancient period as well as medieval period in many countries. Hence, there was a question about the origin of the word ‘Statistics’. The word ‘Statistics’ has been derived from the Latin word ‘Status’ or Italian word ‘Statista’ or German word ‘Statistik’ or French word ‘Statistique’. All the terms mean political state. In the past, the word ‘Statistics’ was associated only with the display of facts and figures pertaining to the economic, demographic and political situations prevailing in a country for bringing out by the local government. During the Chandragupta’s region the then finance minister Kautilya had kept a record of births and deaths as well as some other records in his book ‘Arthashastra’. During the region of Akbar, Abu Fazal wrote a book ‘Ain-i-Akbari’, which includes statistical records on agriculture. A famous statistician Gattfried Achenwall used for human objectives, hence he is known as ‘Father of Statistics’ and Sir Ronald Fisher known as ‘Father of Modern Statistics’. Definition : Statistics may be defined as the science of collection, classification, presentation, analysis and interpretation of numerical data. – Croxton and Cowdern. Classification : After the collection of data, next step is the classification of data. Originally the collected data is in the raw form. Classification is used to understand this raw form of data. Classification is used to understand and compare the data. It helps in interpreting and drawing meaningful conclusions about the data. Classification means arranging the raw data into different classes or groups on the basis of some characteristics under study. For example – If the characteristic under study is the sex of students admitted to the college in a year, the students are classified as boys and girls. Objectives : 1. It helps in presenting the raw data in a compact and simple form. 2. It helps in comparison. 3. It helps in drawing meaningful conclusions. 4. It provides a basis for tabulation and analysis of data. Types of Classification : 1) Geographical Classification : When the data is classified according to geographical areas or locations to which it belongs, it is called as geographical classification. This classification is made on the basis of locations like countries, states, cities, regions, zones, areas, etc. Geographical classification is usually listed in alphabetical order for easy reference. The number of families in various states in India having BSNL telephone connection is shown in the following table. This is an example of geographical classification. State No. of families (In thousand) Andhra Pradesh 170 Gujarat 147 Kerala 50 Maharashtra 150 Measures of Skewness by Dr. Vyankat Dhumal Page 1 2) Chronological Classification : When the data is classified with reference to time, it is called as chronological classification. The data is arranged over a period of time i.e. in the order of years or months or weeks. The time series are usually listed in chronological order, normally starting with the earliest period. Consider the following data giving yield of wheat for the years 2011 to 2016. This is an example of chronological classification. Year 2018 2019 2020 2021 2022 2023 Yield (in million tones) 12.8 13.8 14.3 15.7 16.4 17.5 3) Qualitative Classification : Sometimes classification is based on certain characteristics which cannot be measured. These are qualitative characteristics. They are called attributes. For example, colour of eyes, colour of hair, intelligence, beauty, sex, religion, etc. The classification based on some attributes is called as qualitative attributes. Bulbs manufactured by a company, from a box of 100 bulbs are classified as defectives and non-defectives. Type of bulbs Number of bulbs Defective 03 Non-defective 97 Total 100 There are two types of qualitative classification – a) Simple classification : Here the data is classified on the basis of only one attributes. Students Boys Girls The data is classified with respective to only one attribute i.e. sex. b) Manifold Classification : Here the data is classified on the basis of more than one attributes. Students Boys Girls From Village From City From Village From City Here, the data is classified with respect to two attributes i.e. sex and place. 4) Quantitative Classification : When classification is based on some characteristics which can be measured it is called as quantitative classification. These characteristics are called as variates or variables, for example – height, weight, income, sales, profit, production, etc. The classification based on some variables is called as quantitative classification. In this classification the data is in the numeric form. For example – the students of a college may be classified according to weight as follows: Weight (in kg) No. of students 40-50 50 50-60 200 60-70 260 70-80 90 80-90 40 Total 640 Such a distribution is known as empirical frequency distribution or simple frequency distribution. In this type of classification there are two elements (i) the variable i.e. the weight in the above example (ii) the frequency i.e. the number of students in each class. A frequency distribution refers to data classified on the basis of some variable that can be measured such as prices, wages, age, number of units produced or consumed. The term ‘variable’ refers to the characteristic that varies in amount or magnitude in frequency distribution. Generally, there are two types of quantitative variables – a) Discrete variables : If a variable can take only integer value (complete figure) like 0, 1, 2, 3, …. then it is called as a discrete variable. For example number of children in a family, number of students in a class, number of defective items in a production, number of telephone calls received etc. Discrete Frequency Distribution No. of Children No. of families 0 10 1 40 2 80 3 100 4 250 5 150 6 50 Total 680 b) Continuous Variables : If a variable can take any number in a given interval, then it is called a continuous variable, i.e. it can be in decimal point. For example : height of a student’s 160 cms, 162.5 cms, 169.3 cms, etc. Production of rice measured in kgs. 805.3 kg, a day’s temperature 40.8 degree Celsius, etc. Continuous Frequency Distribution Weight (Kgs) No. of persons 50-60 10 60-70 15 70-80 40 80-90 45 90-100 20 Total 130 Frequency distribution: In quantitative classification, the data is shown in the form of a frequency distribution. A frequency distribution is a tabular form in which the data is arranged in different classes or groups (variable) along with the number of observations (frequencies) in each class. Type of Classes in Frequency Distribution : a) Exclusive class : If upper limit of earlier class is same as the lower limit of the next class, then these types of classes are called as exclusive types of classes. For example – 0-10, 10-20, 20-30 and so on ….. In these classes though upper limit is written there, but it is not actually taken. Here in the first class value of 0 to 9.99 are taken. But 10 is taken in 10-20. Since the upper limit is excluded, the name is exclusive classes, also called continuous classes. These are used for both continuous and discrete variables. Measures of Skewness by Dr. Vyankat Dhumal Page 3 b) Inclusive classes : Here both the class limits are included in the same class. For example – 10-19, 20-29, 30-39, 40-49 and so on. Since both the class limits are included, the name is inclusive classes. Such classes are used for only discrete variables. Terms relating to Frequency Distribution : 1) Class interval or class : When the number of observations is large and they vary in a wide range, they are divided into several groups according to their values. Each of these groups has an interval called as class interval. Thus a large set of observations is divided into different class intervals. 2) Class Limits : The two extreme values of the class intervals are called class limits. The smallest value is the lower class limit and the highest value is the upper class limit. For example – for a class 10-19, 10 is the lower limit and 19 is the upper class limit. These are defined for inclusive classes. 3) Class boundaries : These are the true class limits of class intervals i.e. when the inclusive class is to be converted to exclusive class, class boundaries are found. For example – Marks obtained 10-19 20-29 30-39 40-49 50-59 Here, the difference between first upper limit and next lower limit is equal to 1. Make half of this i.e. 20  19 1   0.5 2 2 Now, this value should be subtracted from each lower limit and added to each upper limits of the classes. Now exclusive classes will be – Marks Obtained 9.5-19.5 19.5-29.5 29.5-39.5 39.5-49.5 49.5-59.5 The class boundaries also classed the true limit of the class. To draw certain graphs like histogram, we need exclusive classes. Also, to find certain measures like mean, median, mode, etc. need exclusive classes. 4. Class mark or Mid-value : The value which is exactly at the middle of a class is the class mark or the mid-value. It is calculated using the formula - Measures of Skewness by Dr. Vyankat Dhumal Page 4 Class Mark = (Lower class boundary  Upper class boundary) 2 OR Class Mark = (Lower class limit  Upper class limit) 2 Class limits Class Boundaries Class Mark 20-29 19.5-29.5 24.5 30-39 29.5-39.5 34.5 40-49 39.5-49.5 44.5 50-59 49.5-59.5 54.5 60-69 59.5-69.5 64.5 5. Width (or size) of class Interval : The difference between the lower and upper class boundaries (not class limits) is called width or size of the class interval i.e. width of a class interval. Width of a class interval = Upper class boundary – lower class boundary For example Width of class 5 - 15 = 15 - 5 = 10 Classes may or may not be of the same width. But, for the convenience, classes are taken of the same width. 6. Class Frequency : The number of observations lying within a class is called the class frequency or frequency of that class. 7. Frequency density : It is defined as the frequency of a class per unit of width. It indicates the concentration of frequency in a class. It is given by the formula – Class frequency Frequency density = Width of the class 8. Open- end classes : In open-end classes, a class limit is missing either at the lower end of the first class interval or at the upper end of the last class or both are not specified. Class interval Frequency Below 30 10 30- 40 25 40-50 15 Above 50 5 Below 30 and above 50 indicate open-end classes. Statistics deals with quantitative data. Data can be defined as a collection of information or facts which is used for making conclusions. Data are individual pieces of factual information recorded and used for the purpose of analysis. It is the raw information from which statistics are created. Statistics are the results of data analysis - its interpretation and presentation. In other words, some computation has taken place that provides some understanding of what the data means. Statistics are often presented in the form of a table, chart or graph. Data is often described as ungrouped or grouped. Ungrouped data is data given as individual data points. Grouped data is data given in intervals. Measures of Skewness by Dr. Vyankat Dhumal Page 5 Data Ungrouped Data Grouped Data Individual Series Discrete Series Continuous Series Example: 1) Ungrouped data without a frequency distribution is also known as raw data or individual observations : 1, 3, 6, 4, 5, 6, 3, 4, 6, 3, 6. 2) Ungrouped data with a frequency distribution, which is also known as discrete series : Number of television sets 0 1 2 3 4 5 Total Frequency 2 13 18 0 10 2 45 3) Grouped data, which is also known as continuous series : Marks 20-30 30-40 40-50 50-60 60-70 70-80 80-90 Total Frequency 7 9 15 24 17 10 8 90 Measures of Central Tendency / Location : A single term is selected, which represents the whole group. This term conveys a fairly adequate idea about the whole group. This term expression in statistics is known as the average. Generally, averages are the central part of the distribution; therefore, they are also called as measures of central tendency. They are five types of measures of central tendency or averages which are commonly used. These are – 1. Arithmetic Average / Arithmetic Mean. 2. Median 3. Mode 4. Geometric Mean 5. Harmonic Mean. Measures of Central Tendency Median Mode Mean (The average of the data) (The middle value (Most commonly of the data) occuring value) 1) Mean ( x ): The mean (also called arithmetic mean) in everyday language called the average. The most popular and widely used measure of representing the entire data by one value is the arithmetic mean. Its value is obtained by adding together all the items and by dividing this total by the number of items. Mean is a calculated average. 2) Median (M) : The median by definition refers to the middle value in a distribution. Here if we arrange the given data in increasing or decreasing order (i.e. from smaller number to higher number or reverse) then the middle most value is the median. It is the number which is exactly in the middle so that 50% of the ranked numbers are above and 50% are below the median. Median is position average. Measures of Skewness by Dr. Vyankat Dhumal Page 6 3) Mode (Z) : Mode is also known as Norm. The mode is the number that appears most frequently in a data set. A set of numbers may have one mode, more than one mode or no mode at all. It is called unimodal, bimodal, trimodal, multimodal. If a series of observations has more than one mode then the mode is said to be ill-defined. Mode represents that value which is most frequent or typical or predominant. Empirical Relationship between Mean, Median and Mode : When mode is bimodal, its value may be ascertained by the following formula based upon the relationship between mean, median and mode. This measure is called the empirical (observed) mode. Mode = 3 Median – 2 Mean Calculation of Mean: Individual Series Discrete Series Continuous Series x x x  fx x  fm N f f (UL  LL) m= 2 Calculation of Median: Individual Series Discrete Series Continuous Series Odd No. of Observations th Median =  N  1 th Median =    N 1   2  N  M=   observation   c. f. 2  2  ob. l1    i Even No. of Observations Corresponding variable f M= is Median. Where, th th N N  l1 = True lower limit of   observation    1 observation 2 2  median class. 2 c. f. = Cumulative frequency of immediately previous class to median class. f = frequency of median class i = width of median class i.e. [ i = True Upper Limit –True Lower limit] Calculation of Mode: Individual Series Discrete Series Continuous Series It is most frequent value Highest frequency  d1  Mode = 3 Median – 2 Mean Z  l1     i  d1  d 2  Measures of Skewness by Dr. Vyankat Dhumal Page 7 Where, d1 = f1 – f0 d2 = f1 – f2 f1 = frequency of modal class f0 = frequency of class previous to modal class f2 = frequency of class next to modal class Introduction to Measures of Dispersion : Measures of Central Tendency indicate the general magnitude of the data but they do not reveal the degree of spread or extent of variability of individual items in a distribution. The average alone cannot adequately describe a set of observations, unless all the observations are the same. It is necessary to describe the variability or dispersion of the observations. In two or more distributions the central value may be the same but still there can be wide disparities in the formation of distribution. Dispersion / Scatter / Spread / Variation measures the extent to which the items vary from some central value. Since measures of dispersion give an average of the differences of various items from an average, hence they are also called averages of the second order. The absolute measures can be divided into four positional measures – 1. Range 2. Quartile Deviation or Semi-inter-quartile deviation. 3. Mean Deviation. 4. Standard deviation. The relative measures in each of the above four case are called the coefficient of the respective measures such as coefficient of standard deviation, etc. The relative measures are used only for the purpose of comparison between two or more series with varying size or the number of items or varying central values or varying units of calculations. Example : The data about runs scored by three batsmen in three one-day matches is as given below – Match Score by Dhoni Score by Virat Score by Harbhajan I 100 100 0 II 100 50 295 III 100 150 05 Total 300 300 300 Average (mean) 100 100 100 Arithmetic mean is the same in all the three series; one is likely to conclude that these series are alike in nature. But a close examination should reveal that distributions differ widely from one another. Dhoni’s score perfectly represented by the arithmetic mean. Values of Dhoni’s score did not deviate from the arithmetic mean and hence there is no dispersion. But Virat’s score only one score is perfectly represented by the arithmetic mean and other items vary but the variation is very small as compared to Harbhan’s score. Harbhajan’s score not a single score is represented by the arithmetic mean and the items vary widely from one another. Measures of Skewness by Dr. Vyankat Dhumal Page 8 For the next match we select to Dhoni, because his original score is very close to the average score. Whereas Harbhajan’s score is dispersed or scatted i.e. such data is called varied data. So here other than average, we use measures of dispersion. A measure of variation or dispersion is one that measures the extent to which there are differences between individual observation from some central or average value. In measuring variation, we shall be interest in the amount of the variation or its degree but not in the direction. Absolute Measure and Relative Measure : Measures of dispersion may be either absolute or relative. Absolute Measure : Absolute measures of dispersion are expressed in the same statistical unit in which the original data are given such as rupees, kilograms, tones, etc. These values may be used to compare the variations in two distributions provided the variables are express in the same units and of the same average size. In case the two sets of data are expressed in different units, such as quintals of sugar versus tones of sugarcane, or size is very different such as Manager’s salary versus workers salary, the absolute measures of dispersion are not comparable. In such cases measures of relative dispersion should be used. Relative Measure : A measure of relative dispersion is the ratio of a measure of absolute dispersion to an appropriate average. It is sometimes called a coefficient of dispersion, because ‘coefficient’ means a pure number that is independent of the unit of measurement. It should be remembered that while computing the relative dispersion the average used as base should be the same one from which the absolute deviations were measured. Relative measures are used for making comparison between two or more distributions. Standard Deviation : It is the most important and widely used measure of studying dispersion. The standard deviation concept was introduced by Karl Pearson in 1823. Standard Deviation satisfies most of the properties of a good measure of dispersion. It is the square root of the mean of the square deviation from the arithmetic mean, hence it is also known as root mean square deviation. The standard deviation is denoted by ‘S’ in case of sample and Greek letter ‘σ’ (sigma) in case of population. The standard deviation measures the absolute dispersion. The greater the standard deviation means the low degree of uniformity and the small standard deviation means a high degree of uniformity of the observations. Hence standard deviation is extremely useful in judging the representativeness of the mean. Sr. Method of Absolute Relative Measurement No. Dispersion Measurement 1. Range L–S LS Coefficient of Range = Where, L = Largest item LS S = Smallest item 2. Quartile Deviation or Inter Quartile Range Coefficient of Q.D. Measures of Skewness by Dr. Vyankat Dhumal Page 9 Semi-inter-quartile = Q3  Q1 Q3  Q1 = deviation Q3  Q1 Quartile Deviation Q3  Q1 = 2 3. Mean Deviation Mean Deviation Coefficient of Mean Deviation = M.D.x   xx = Mean Deviation N Median For Discrete and Continuous series = fD Where, D  x  A 4. Standard deviation Actual mean method: Coefficient of variation = 2   x C.V.  100 N x Where, x  ( x  x ) Assumed Mean Method: 2 2  d  d     N N    Where d = [X – A] Continuous Series: 2 2   fd   fd     i N N    where (m  A) d and i i = width of the class Skewness : Skewness refers to lack of symmetry in the shape of frequency distribution. Skewness is a measure of the symmetry of a frequency distribution. Measures of skewness indicate the direction and extent of skewness. The highest point of a distribution is its mode. The mode marks the response value on the x-axis that occurs with the highest probability. A distribution is skewed if the tail on one side of the mode is fatter or longer than on the other side, which is known as asymmetrical. Study of skewness is the study of distribution of items around the central tendency. The distribution of items on either side of the mode helps in deciding the direction of skewness. The concept of skewness can be better understood with the help of symmetrical and skewed distribution. Measures of Skewness by Dr. Vyankat Dhumal Page 10 Frequency Distribution Symmetrical Asymmetrical (Non-symmetrical / Skewed) Bell-shaped / Positively Skewed Distribution / Unimodel Symmetrical Distribution Skewed to Right 'U' shaped / Negatively Skewed Distribution / Bimodel Symmetrical Distribution Skewed to Left 'L' Shaped Positively Skewed Distribution 'J' Shaped Negatively Skewed Distribution An example: There are 100 families living in a small town in India. 90 families have a common family income, which lies around an average of Rs. 1,300. If we were to draw a distribution of these 90 incomes, the mode would also be at around Rs. 1,300. The interpretation of this would be that the income value of Rs. 1,300 occurs with the highest probability. Now we add 10 more families who are building a vacation home in our small town. Their monthly income lies between Rs. 8,000 and Rs. 15,000, which is significantly higher than the town average. In this case, the distribution of all 100 families would turn out to have a positive skew. The mean moves to the right, away from the mode and not stands around Rs. 2,200. The distribution is skewed to the right now. 1. Symmetrical Distribution : A frequency distribution is said to be symmetrical if the frequencies are equally distributed on both the sides of central value. A symmetrical distribution may be either bell shaped or U shaped. Distribution is said to be symmetrical when its mean, median and mode are identical, i.e. Mean = Median = Mode A distribution is said to be symmetric when the frequencies are symmetrically distributed about the mean or when the values of the variable are equidistant from the mean have the same frequency. Consider the following frequency distribution : x 1 2 3 4 5 6 7 f 3 5 6 11 6 5 3 In this case, x f fx Median > Mode. Fig. 3 : Positively Skewed Distribution b) Negatively Skewed Distribution / Skewed to Left : A frequency distribution is said to be negatively skewed if frequencies are highest in the lower values on left hand side. In other words, frequencies are spread out on lower side of mode. As a result, there is a longer tail of distribution towards the lower values or left-hand side. In such a case Mean < Median < Mode. Shape of such distribution on a graph is shown below on the right. Fig. 4 : Negatively Skewed Distribution c) 'L' Shaped Positively Skewed Distribution : A frequency distribution is said to be L shaped positively skewed distribution when frequencies are highest in the lower values and they steadily fall as the values increases. Fig. 5 : 'L' Shaped Positively Skewed Distribution d) 'J' Shaped Negatively Skewed Distribution : A frequency distribution is said to be J shaped negatively skewed distribution when frequencies are lowest in the lower values and they steadily increase as value increase. Fig. 6 : 'J' Shaped Negatively Skewed Distribution Definition of Skewness : 1) Skewness refers to the asymmetry or lack of symmetry in the shape of the frequency distribution. – Morris Hamburg. 2) When a series is not symmetrical it is said to be asymmetrical or skewed. – Croxton and Cowden. 3) A distribution is said to be ‘skewed’ when the mean and the median fall at different points in the distribution and the balance is sifted to the one side or the other – to left or right’. – Garret. Difference between Dispersion and Skewness : No. Basis Dispersion Skewness 1. What shows? Measure of dispersion shows the Measure of skewness shows the degree of scatteredness of the degree of imbalance in the items from its central tendency. distribution of items around the central tendency. 2. Concerned It is concerned with the amount of It is concerned with the direction of variation. variation. 3. What judges? The measure of dispersion judges the The measure of skewness judges the representativeness of any of the difference between any two of three averages : Mean, Median and the three averages : Mean, Median Mode. and Mode. 4. Value for It may have any value. Skewness is zero. symmetrical distribution 5. Usefulness It is useful to find variability in It is useful to find concentration data. in higher or lower values. Tests of Skewness : The skewness is present if - 1. Mean ≠ Median ≠ Mode i.e. the value of mean, median and mode do not coincide. 2. Q3 – Median ≠ Median – Q1 i.e. the quartiles are not equidistant from median. 3. The sum of positive deviation is not equal to the sum of negative deviations. 4. The frequencies on either side of the mode are unequal. 5. The plotted graph will have unequal halves i.e. the graph of the data do not give the normal bell-shaped curve. Characteristics of Good Measure of a Skewness : A good measure of skewness should have the following characteristics - 1. It should have a zero value when the distribution is symmetrical. 2. It should be a pure number i.e. its value should be independent of units of the series and also the degree of variation in the series. 3. It should have some meaningful scale of measure so that we could easily interpret the measured value. Interpretation of skewness : Depending upon the value of skewness, the distribution may be symmetrical, positively skewed or negatively skewed. Value of skewness Nature of distribution If value Sk is positive Positively skewed distribution If value of Sk is negative Negatively skewed distribution If value of Sk is zero Symmetrical distribution Characteristics of Distribution : Table showing various characteristics of Distribution No. Type of Tail Relationship between Coefficient Distribution Mean and Mean and Quartiles of Mode Median Skewness 1) Symmetrical a) Bell-shaped Equal Both Mean = Mean = (Q3 - M) = Zero Sides Mode Median (M - Q1) b) U-shaped Equal Both Mean < Mean = (Q3 - M) = Zero Sides Mode Median (M - Q1) 2) Skewed a) Moderately Longer tail on right Mean > Mean > (Q3 - M) > Positive Positive hand side Mode Median (M - Q1) b) Moderately Longer tail on left Mean < Mean < (Q3 - M) < Negative Negative hand side Mode Median (M - Q1) c) Highly L shaped Mean < Mean < (Q3 - M) < Positive Positive Mode Median (M - Q1) d) Highly J shaped Mean > Mean > (Q3 - M) > Negative Negative Mode Median (M - Q1) Measures of Skewness : The measures of asymmetry are usually called measures of skewness. Measures of skewness is used to find out the extent of skewness (in numerical expression), and whether it is positive or Measures of Skewness by Dr. Vyankat Dhumal Page 15 negative. These measures can be absolute or relative. The absolute measures are also known as Measures of Skewness and are denoted by absolute Sk. Absolute Skewness : Based on Mean and Mode : Sk = Mean – Mode. If the value of mean is greater than the mode, the skewness will be positive and if the value of mean is less than the mode, the skewness will be negative. The absolute measures are expressed in the units of the original data and therefore, cannot be used for comparison of skewness in two different distribution if they are in different units. Thus, for comparison purposes, we use relative measures of skewness known as co-efficient of skewness. Illustration 1 : Calculate absolute skewness based on (i) Mean and Mode (ii) Quartiles from the following data. Marks 5 15 25 35 45 55 No. of students 10 20 30 50 40 30 Solution : Absolute skewness based on Mean and Mode : Step 1 : Calculate Mean Marks (x) No. of students (f) fx 5 10 50 15 20 300 25 30 750 35 50 1750 45 40 1800 55 30 1650 ∑ =180 ∑ =6300 ∑ = = = 35 Step 2 : Calculation of Mode : Since the maximum frequency is 50, the mode corresponding to this value is 35. Step 3 : Absolute skewness : Sk = Mean – Mode 35 – 35 = 0 Step 4 : Interpretation : Zero value of skewness indicates that the distribution is symmetrical. Relative measures of Skewness or Coefficient of Skewness : There are four types of relative measures of Skewness: 1. Karl Pearson’s Coefficient of Skewness 2. Bowley’s Coefficient of Skewness 3. Kelly’s Coefficient of Skewness 4. Measure of Skewness based on the Moments. 1) Karl Pearson’s Coefficient of Skewness This method of measuring skewness was given by a British statistician Karl Pearson. It is also known as Pearson’s coefficient of skewness. It is given by the formula – Measures of Skewness by Dr. Vyankat Dhumal Page 16 When Mode is not ill-defined When Mode is ill-defined Mean-Mode 3(Mean-Median) SKp = SKp = Standard Deviation Standard Deviation There is no numerical limit to this measure in theory. This is a slight drawback of it. But in practical problems, the value of SKp is rarely very high and usually lies between +1 or -1. Illustration 2 : (Individual Observations) Calculate Karl Pearson’s Coefficient of Skewness from the following data : Marks 5 15 25 35 45 55 Solution : Step 1 : Calculation of Mean : Assumed Mean = 35 Marks d = (x-A) d2 (x) A = 35 5 -30 900 15 -20 400 25 -10 100 35 0 0 45 10 100 55 20 400 180 ∑ = -30 ∑ =1900 ∑ = = = 30 Step 2 : Calculation of Median th th N N    observation    1 observation 2   2  M= 2 th th 6 6    observation    1 observation M= 2 2  2 M= 3rd observation  4th observation 2 25  35 60 M= M= M = 30 2 2 Step 3 : Calculation of Mode Mode = 3Median – 2 Mean Mode = (3 x 30) – (2 x 30) Mode = 90 – 60 Mode = 30 Step 4 : Calculation of Standard Deviation : ∑ ∑ = − = − = 316.67 − (−5) = √316.67 − 25 = √291.67 = 17.08 Measures of Skewness by Dr. Vyankat Dhumal Page 17 Step 5 : Calculation of Karl Pearson’s Coefficient of Skewness Mean-Mode 30 - 30 SKp = SKp = Standard Deviation 17.08 0 SKp = SKp = 0 17.08 Step 6 : Interpretation : Zero value of skewness indicates that the distribution is symmetrical. Illustration 3 : (Discrete series) Calculate Karl Pearson’s Coefficient of Skewness from the following data : Marks 5 15 25 35 45 55 No. of students 10 20 30 50 40 30 Solution : Step 1 : Calculation of Mean : Assumed Mean = 35 Marks No. of d = x-A fx d2 fd fd2 (x) students (f) (A=35) 5 10 50 -30 900 -300 9000 15 20 300 -20 400 -400 8000 25 30 750 -10 100 -300 3000 35 50 1750 0 0 0 0 45 40 1800 10 100 400 4000 55 30 1650 20 400 600 12000 ∑ =180 ∑ =6300 ∑ =0 ∑ =36000 ∑ = = = 35 Step 2 : Calculation of Mode : Since the maximum frequency is 50, the mode corresponding to this value is 35. Step 3 : Calculation of Standard Deviation : ∑ ∑ = − = − = 200 − (0) = √200 = 14.14 Step 4 : Calculation of Karl Pearson’s Coefficient of Skewness Mean-Mode 35 - 35 SKp = SKp = Standard Deviation 14.14 0 SKp = SKp = 0 14.14 Step 5 : Interpretation : Zero value of skewness indicates that the distribution is symmetrical. Illustration 4 : (Continuous series) Calculate Karl Pearson’s Coefficient of Skewness from the following data : Marks 0-10 10-20 20-30 30-40 40-50 50-60 No. of students 10 20 30 50 40 30 Measures of Skewness by Dr. Vyankat Dhumal Page 18 Solution : Step 1 : Calculation of Mean : Assumed Mean = 35 No. of d= Marks m = students fm (m - A)/i d2 fd fd2 (x) (UL+LL)/2 (f) (A=35) 0-10 10 5 50 -3 9 -30 90 10-20 20 15 300 -2 4 -40 80 20-30 30 25 750 -1 1 -30 30 30-40 50 35 1750 0 0 0 0 40-50 40 45 1800 1 1 40 40 50-66 30 55 1650 2 4 60 120 ∑ =180 ∑ =6300 ∑ 2 =19 ∑ =0 ∑ 2 =360 ∑ = = = 35 Step 2 : Calculation of Mode : Since the maximum frequency is 50 corresponding class 30-40, the modal class is 30-40. = + × l1 = Lower Limit i.e. 30 d1 = f1 – f0 d1 =50-30 d1 = 20 d2 = f1 – f2 d2 =50-40 d2 = 10 = + × + = 30 + × 10 = 30 + = 30 + 6.67 Mode = 36.67 Step 3 : Calculation of Standard Deviation : ∑ ∑ = − = − 10 = 2 − ( 0) 10 = √2 10 = 1.41 10 = 14.1 Step 4 : Calculation of Karl Pearson’s Coefficient of Skewness Mean-Mode 35 - 36.67 SKp = SKp = Standard Deviation 14.1 -1.67 SKp = SKp = 0.118 14.1 Step 5 : Interpretation : Negative value of skewness indicates that the negatively skewed distribution. Measures of Skewness by Dr. Vyankat Dhumal Page 19 2) Bowley’s Coefficient of Skewness Prof. Bowley’s measure of skewness is based on the quartiles and median. In a symmetrical distribution, the distance between the first quartile and median is equal to the distance between the median and third quartile, i.e. But in a skewed distribution, the quartiles will not be equidistant from the median. Bowley’s Absolute Measure of Skewness = Q3 + Q1 – 2 Median Based on Quartiles : Sk = Q3 + Q1 – 2 Median or Q3 + Q1 – 2Q2 Bowley’s Coefficient of Skewness : = Bowley’s coefficient of skewness is also known as ‘Quartile coefficient of skewness’. Properties of Bowley’s Coefficient of Skewness : 1. Bowley’s measure is useful when the distribution has open end classes or unequal class intervals. In such situations Pearson’s Coefficient of skewness cannot be used. 2. The only and perhaps quite serious limitation of Bowley’s measure is that it is based on the central 50% of the data and ignores the remaining 50% of the data towards the extreme. 3. Bowley’s measure is based on continuous frequency distribution with exclusive type classes, i.e. without any gaps. 4. The value of Bowley’s coefficient of skewness usually lies between -1 to +1. 5. It should be clearly understood that the values of the coefficient of skewness obtained by Bowley’s formula and Pearson’s formula are not comparable, although in each case, SK = 0 implies the absence of skewness, i.e. the distribution is symmetrical. Illustration : Absolute skewness Based on Quartiles : Step 1 : Calculate less than cumulative frequency Marks (x) No. of students (f) < c.f. 5 10 10 15 20 30 25 30 60 35 50 110 45 40 150 55 30 180 Step 2 : Q1 = Size of th item = Size of th item = Size of th item = Size of 45.25th item Thus Q1 = 25 Step 3 : Q2 = Size of th item = Size of th item = Size of th item = Size of 90.5th item Thus Q2 = 35 ( ) ( ) Step 4 : Q3 = Size of th item = Size of th item = Size of th item = Size of 135.75th item Thus Q3 = 45 Step 5 : Absolute skewness : Sk = Q3 + Q1 – 2 Q2 45 + 25 – 2 x 35 = 70 – 70 =0 Step 6 : Interpretation : Zero value of skewness indicates that the distribution is symmetrical. Illustration : (Relative measurement – Individual Observation) From the following data, calculate Bowley’s Coefficient of Skewness by using Bowley’s Method. 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, Solution : Step 1 : Q1 : item = item = = 3rd item Hence, Q1 = 6. Step 2 : Q2 : item = item = = 6th item Hence, Q2 = 12. ( ) ( ) ( ) Step 3 : Q3 : item = item = ) = = 9th item Hence, Q3 = 18. Step 4 : Absolute measure based on Quartiles : Sk = Q3 + Q1 – 2 Median or Q3 + Q1 – 2Q2 Sk = Q3 + Q1 – 2Q2 Sk = 18 + 6 – 2 x 12 Sk = 24 = 24 Sk = 0 Step 5 : Bowley’s Coefficient of Skewness : = = = = =0 Step 6 : Interpretation : Zero value of skewness indicates that the distribution is symmetrical. Illustration : (Relative measurement – Discrete Series) From the following data, calculate Bowley’s Coefficient of Skewness. Marks 5 15 25 35 45 55 No. of students 10 20 30 50 40 30 Solution : Step 1 : Calculate less than cumulative frequency Measures of Skewness by Dr. Vyankat Dhumal Page 21 Marks (x) No. of students (f) < c.f. 5 10 10 15 20 30 25 30 60 35 50 110 45 40 150 55 30 180 Step 2 : Q1 = Size of th item = Size of th item = Size of th item = Size of 45.25th item Thus Q1 = 25 Step 3 : Q2 = Size of th item = Size of th item = Size of th item = Size of 90.5th item Thus Q2 = 35 ( ) ( ) Step 4 : Q3 = Size of th item = Size of th item = Size of th item = Size of 135.75th item Thus Q3 = 45 Step 5 : : Bowley’s Coefficient of Skewness : = = = = =0 Step 6 : Interpretation : Zero value of skewness indicates that the distribution is symmetrical. Illustration : (Relative measurement – Continuous Series) From the following data, calculate Bowley’s Coefficient of Skewness. Marks 0-10 10-20 20-30 30-40 40-50 50-60 No. of students 10 20 30 50 40 30 Solution : Step 1 : Calculate less than cumulative frequency Marks No. of Students

Use Quizgecko on...
Browser
Browser