Normal Distribution PDF
Document Details
Tags
Summary
This document describes the normal distribution, a concept in psychology and quantitative methods. It explores the properties and applications of the normal curve, including deviations from normalcy. It's used in various fields to analyze data.
Full Transcript
____________________________________________________________________________________________________ Subject PSYCHOLOGY Paper No and Title Paper No.2: QUANTITATIVE METHODS Module No and Title Module No.7: NORMAL DISTRIBUTION Modul...
____________________________________________________________________________________________________ Subject PSYCHOLOGY Paper No and Title Paper No.2: QUANTITATIVE METHODS Module No and Title Module No.7: NORMAL DISTRIBUTION Module Tag PSY_P2_M 7 PSYCHOLOGY PAPER No.2: QUANTITATIVE METHODS MODULE No.7: NORMAL DISTRIBUTION ____________________________________________________________________________________________________ TABLE OF CONTENTS 1. Learning Outcomes 2. Introduction 3. Properties of normal curve 3.1 Physical Properties 3.2 Statistical properties 4. Deviations from normalcy 5. Assumptions of normalcy 6. Applications of normal distribution 7. Summary PSYCHOLOGY PAPER No.2: QUANTITATIVE METHODS MODULE No.7: NORMAL DISTRIBUTION ____________________________________________________________________________________________________ 1. Learning Outcomes After reading this text the reader should be able to: · Understand the concept of normal probability curve. · Explain the nature and properties of normal probability curve. · Know the concept of deviations from normalcy in data · Understand the assumptions associated with the normal probability curve applications in statistics. · Apply the concept of normal distribution to research problems. 2. Introduction Let us look around our world and we will find that most of the things are distributed in a very predictable fashion. Take the height of the people around us; we usually encounter people with average heights (corresponding to the average of the particular group one is seeing). It is relatively less common to find people who are either very shorter or very taller than the average. Similarly it is common to meet average intelligent people in a large group as compare to genius or mentally challenged. The simplest approach to understand this is the elementary concept of probability. Probability of a given event is defined as the expected frequency of occurrence of any event among the events of a like sort. Most of the empirical data about a phenomenon in mental and social science measurement can be explained by understanding the principles of probability. The probability of occurrence of an event is given by probability ratio: Probability ratio of an event = desired outcome ÷ total number of outcomes A probability ratio always falls between 0 a\nd 1, zero being no possibility of occurrence of that event and one being complete certainty of occurrence of that event. In between 0 and 1 of course are all possible likelihoods of occurrence of that event. A classic example involves throwing twelve dice a large number of times. Each time taking 4, 5 and 6 combinations as success while 3,2 and1 combinations are recorded as failures. The observed frequency of success if plotted as a frequency curve the resulting graph will be a nearly symmetrical curve rising at the centre and tapering equally on both its sides (as seen in figure 1). This is called the normal probability curve. This normal distribution of observed scores is actually a mathematical model that explains the frequency of occurrence of variables with a high accuracy. The normal distribution has wide usage in the fields of social sciences, biological statistics, anthropometrical data and psychological measurements. PSYCHOLOGY PAPER No.2: QUANTITATIVE METHODS MODULE No.7: NORMAL DISTRIBUTION ____________________________________________________________________________________________________ Normal Distribution This module deals with the entire concept of normal curve and properties and important factors related to it. It is the general tendency of the quantitative data to take the symmetrical bell shaped form. This tendency may also be stated in the form of a “principle” as follows: measurement of many natural phenomena and many mental and social traits under certain conditions tend to be distributed symmetrically about their means in proportions which approximate those of the normal probability distribution. Theoretically, the normal curve is a bell-shaped, smooth, mathematically defined curve that is highest at its center. From the center it tapers on both sides approaching the X-axis asymptotically (meaning that it approaches, but never touches, the axis). In theory, the distribution of the normal curve ranges from negative infinity to positive infinity. The curve is perfectly symmetrical with no skewness. The cases where the data is distributed around a central value with no bias towards right or left side draws closer to a Normal Distribution as shown in the figure 1. FIG 1: A Normal Distribution PSYCHOLOGY PAPER No.2: QUANTITATIVE METHODS MODULE No.7: NORMAL DISTRIBUTION ____________________________________________________________________________________________________ The normal distribution can be completely described by two important descriptive measures: 1) Mean 2) Standard deviation 3. Properties of the Normal Probability Curve (NPC) 3.1 Physical Properties: · The NPC is a bell shaped continuous curve. It is sometimes also called as a "bell curve" because its shape closely resembles a bell. · The NPC is a one-peaked curve. The highest frequency is obtained by only a particular event or score and therefore it can be called as uni-modal curve. · The NPC tapers equally on both the sides of its peak it shows no skewness or in other words it is a zero skewed curve. · The NPC is an open ended curve i.e. it does not touch the X axis from both the sides. The term used for this property is “asymptotic”. · It is symmetrical about the midpoint of the horizontal axis. 3.2 Statistical properties of the NPC · The total area under a NPC is always one. Also it is known what area under the curve is contained between the central point (mean) and the point where one standard deviation falls. In fact, working in units of one standard deviation, we can calculate any area under the curve. · The point about which it is symmetrical is the point at which the mean, median and model all fall. Thus Mean = Median = Mode. PSYCHOLOGY PAPER No.2: QUANTITATIVE METHODS MODULE No.7: NORMAL DISTRIBUTION ____________________________________________________________________________________________________ · The normal curve is defined by the function: In which, X= frequency of a given event µ= mean of the given event σ= standard deviation of the distribution π= 3.1416 e= 2.7183 Empirical rule estimates the spread of a data set which follows a normal distribution around its mean and standard deviation are specified. The normal curve can be conveniently divided into areas defined in units of standard deviation. · According to the empirical rule for normal distribution. 50% of the scores occur above the mean and 50% of the scores occur below the mean. PSYCHOLOGY PAPER No.2: QUANTITATIVE METHODS MODULE No.7: NORMAL DISTRIBUTION ____________________________________________________________________________________________________ Approximately 34% of all scores occur between the mean and 1 standard deviation above the mean. Approximately 34% of all scores occur between the mean and 1 standard deviation below the mean. Approximately 68% of all scores occur between the mean and + 1 standard deviation. Approximately 95% of all scores occur between the mean and + 2 standard deviations. Approximately 99.74% the data will fall within 3 standard deviation of the mean. One may have a look at the figures given below to understand the distribution of areas under the NPC. As NPC is an open ended curve and travels infinite distances on both the sides the remaining.3% of the total area is accounted for that. The area under the NPC is given as part of the statistical tables brochure in most the statistics books. PSYCHOLOGY PAPER No.2: QUANTITATIVE METHODS MODULE No.7: NORMAL DISTRIBUTION ____________________________________________________________________________________________________ 68% of values are within 1 standard deviation of the mean 95% are within 2 standard deviations 99.7% are within 3 standard deviations · A “Z” score results from conversion of a raw score into a number indicating how many standard deviation units the raw score is below or above the mean of the distribution. This makes use of the mean and standard deviation to describe a particular score. In a NPC the standard score Z is calculated using the formula: Where, Z is the "z- score” or the standard score X is the value which is to be standardized µ is the standard mean of the distribution σ is the standard deviation of the distribution. In essence, a z score is equal to the difference between a particular raw score and the mean divided by the standard deviation. The normal curve areas table shows proportion/ percentage of the areas between the mean and the Z scores. They also show the areas in the tail of the NPC. Readers are advised to look at these tables in any quantitative methods or statistics books. 5. Deviations from normalcy in data It is not that all the variables occur in the form of normal distribution. There are two main ways in which a distribution can deviate from normal: (i) lack of symmetry (called skew) and (ii) pointyness (called kurtosis). The empirical data deviate from normalcy too. The deviations from normalcy can be measured by: · Skewness: although the NPC is scattered symmetrically around its mean at the centre but sometime the spread is not equal on both sides of the mean this is skewness. The mean, median and mode coincide to give the NPC a perfect shape, so that the left and right sides of the curve are balanced. If these three statistic do not coincide in a distribution then the symmetry is disturbed and the distribution appears to shift either to the left or to the right (look at the figure). In a perfectly normal distribution the skewness is zero, but if the scores are massed at the lower side the spread of the curve is gradually towards the right PSYCHOLOGY PAPER No.2: QUANTITATIVE METHODS MODULE No.7: NORMAL DISTRIBUTION ____________________________________________________________________________________________________ side, this is called positive skewness. The negative skewness occurs when the scores are clustered towards the higher end of the distribution, consequently, the spread of the curve is more towards the left. The index of skewness is given by the formula: Skewness (sk) = 3(mean – median) ÷ Standard deviation There are more precise measures of skewness available, the reader if interested may refer to advanced books of mathematical statistics. A distribution has a positive skew when relatively few of the scores fall at the high end of the distribution. In positive skewness; the curve tapers to the right with scores loaded at the left. Check the figures given below A distribution has a negative skew when relatively few of the scores fall at the low end of the distribution. The negative skewness is characterized by the curve tapering at the left while the scores are loaded at the right. Figure given below: · Kurtosis: Distributions also vary in their kurtosis. Kurtosis refers to the degree to which scores cluster at the end of the distribution (known as the tails) and how pointy a distribution is (but there are other factors that can affect how pointy the distribution looks). Thus, kurtosis refers to the degree of flatedness or peakedness in a distribution. PSYCHOLOGY PAPER No.2: QUANTITATIVE METHODS MODULE No.7: NORMAL DISTRIBUTION ____________________________________________________________________________________________________ The NPC is a mesokurtic curve. Distributions are generally described as platykurtic (relatively flat), leptokurtic (relatively peaked), or somewhere in the middle i.e., mesokurtic. , A distribution with positive kurtosis has many scores in the tails (a so called heavy-tailed distribution) and is pointy. This is known as leptokurtic distribution. In contrast, a distribution with negative kurtosis is relatively thin in the tails (has light tails) and tends to be flatter than normal. This distribution is called platykurtic. Ideally, we want our data to be normally distributed (i.e., not too skewed, and not too many or too few scores at the extremes). Kurtosis (ku) = Q ÷ (P90 -- P10) 6. Applications of normal distribution The NPC has wide spread applications in the field of psychology. Let us pick up a few numerical examples to understand its applications. Q1. If scores are normally distributed with a mean of 500 and a standard deviation of 100, what proportion of the scores fall a) Above the score of550 b) Below the score of 420 c) Between the scores of 550 and 750 (Source: statistics in psychology and education by Minium, King and Bear, THIRD EDITION Answer. a) Above 550 PSYCHOLOGY PAPER No.2: QUANTITATIVE METHODS MODULE No.7: NORMAL DISTRIBUTION ____________________________________________________________________________________________________ We are given that, µ=500 and σ=100 and in this case x=550 By putting this value in the formula we get z= We get z=0.5 and as we need the proportion of score above it, we see in the table of normal distribution the column of area beyond a particular z score. Therefore, we get that 0.3085 proportion of the score fall above 550. Below 420 We are given that, µ=500 and σ=100 and in this case x=420 By putting this value in the formula we get z= We get z= - 0.8 and as we need the proportion of score below it, we see in the table of normal distribution the column of area beyond a particular z score. Therefore we get that 0.2119 proportion of the score fall below 420. c) Between 550 and 750 In this case we calculate the z value and area beyond the z scores in both cases. Firstly, we calculate the z value for x=550 We are given that, µ=500 and σ=100 and in this case x=550 By putting this value in the formula we get z= We get z=0.5 and area beyond this score by looking at the area under normal curve table comes out to be 0.3085. PSYCHOLOGY PAPER No.2: QUANTITATIVE METHODS MODULE No.7: NORMAL DISTRIBUTION ____________________________________________________________________________________________________ Secondly, we calculate the z value for x=750 We are given that, µ=500 and σ=100 and in this case x=550 By putting this value in the formula we get z= We get z=2.5 and area beyond this score by looking at the area under normal curve table comes out to be 0.0062. NOW, to find the area between 550 and 750 we just subtract the areas. Area between 550 and 750= 0.3085-0.0062 So, the required area comes out to be 0.3023. Q2. At state university, all new freshmen are given a mathematical ability exam during the registration month. Scores are normally distributed with a mean of 70 and S.D of 10. The university decides to place the top 25 percent into mathematical program and the bottom 20 percent into just simple mathematical awareness program. What are the respective scores to decide this respective pattern? Answer: We are given that, µ=70 and σ=10 The top 25 percent means that area is 0.25 and this area comes out to be area beyond a particular z score. Thus by looking at the area under the normal curve we find out that the respective Z score comes out to be z=0.67. Now we put the value in the formula, 0.67 = Thus, x=76.7 Therefore, the score of 76.7 is the cut-off for the upper 25 percent and students scoring above this should be eligible to get into mathematical program. PSYCHOLOGY PAPER No.2: QUANTITATIVE METHODS MODULE No.7: NORMAL DISTRIBUTION ____________________________________________________________________________________________________ Now, for the bottom 20 percent The bottom 20 percent means that area is 0.20 and this area comes out to be area beyond a particular z score. Thus by looking at the area under the normal curve we find out that the respective z score comes out to be z=-0.84( the score is minus because we need the bottom 20 percent and that bottom 20 percent will fall on the left side of the normal curve, negative side of z scores) Now we put the value in the formula, -0.84 = Thus, x=61.6 Therefore, the score of 61.6 is highest score for the bottom 20 percent and students scoring below this should be eligible to get into simple mathematical awareness program. One of the most important applications of normality is in error analysis. We assume that random errors occur in measurement of variable. These errors follow deviance from the true scores in both the positive and negative directions. The frequency of occurrence is close to normal distribution. The branch of inferential statistics makes ample use of this assumption and many powerful statistical tests are based on it. But, if the error distribution is not normal and an assumption is made on this line, then it can lead to seriously wrong and incorrect statistical analysis. There are certain statistical tests which a research can perform to find out whether the normal distribution assumptions are valid or not. The assumptions of normality are valid in most cases, but when they are not they can produce immense serious troubles and errors. Therefore the researcher should be completely aware of the assumptions of normalcy while conducting his research. · The normal distribution has application in business administration. · Applications in operation management. · Application in human resource management as employee performance is considered to be normally distributed. 7. Summary We summarize that, · A normal curve is a curve where the data is distributed around a central value with no bias towards right or left side. · The normal distribution can be completely described by two parameters which are MEAN and STANDARD DEVIATION. PSYCHOLOGY PAPER No.2: QUANTITATIVE METHODS MODULE No.7: NORMAL DISTRIBUTION ____________________________________________________________________________________________________ · The normal curve is unimodal, continuous and bell shaped curve. · Area under the normal curve is 1. · In normal distribution, MEAN=MEDIAN=MODE. · In normal distribution, 68% values are within 1 standard deviation of the mean. · 95% values are within 2 standard deviation of the mean. · 99.7% values are within 3 standard deviation of the mean. · The equation for normal distribution is · Error Analysis is one of the most important assumptions in normal distribution. · The normal distribution has applications in various fields like in business administration, operation management and human resource management. PSYCHOLOGY PAPER No.2: QUANTITATIVE METHODS MODULE No.7: NORMAL DISTRIBUTION