Summary

This document discusses different types of norms used in psychological testing, focusing on developmental norms and mental age. It explains how these norms are established and used to interpret test results. The text is relevant to the understanding of psychological measurement techniques.

Full Transcript

Psychological Testing and Measurement (PSY-P631) VU Lesson 07 Types of Norms As discussed prev...

Psychological Testing and Measurement (PSY-P631) VU Lesson 07 Types of Norms As discussed previously, raw test scores need to be turned into a more meaningful form. Therefore, they are converted into relative measures or derived scores. Not only do these derived scores tell us about the relative standing of any one who takes this test, they also provide comparable measures. Using these scores, a person’s performance on various tests can be compared. Derived scores are expressed either in terms of developmental level attained or relative position of a person (or his score) in a certain group. Developmental Norms: Test scores can be expressed in terms of developmental norms. Developmental norms can be defined as the typical patterns or characteristics, and age specific tasks or skills of development at any age or stage of development. Developmental norms are established keeping in view development and maturation. The underlying assumption is that people, children and adults, are capable of performing at specific levels at different stages of life. When most people can perform certain tasks at a certain age level then it is considered as the norm for that age level. This is also considered the mental age of persons at that age level. Subsequently if a 13 year old girl can perform the tasks accurately and completes a test that most 13 year olds can do, then her mental age will be stated as 13 (MA = 13). If an adult can perform only the tasks that a six year old can do, then his MA will be 6. Considering his physical age, he is mentally deficient, backward, or special. The MA of a 9 year old who can perform tasks meant for a 16 year old will be interpreted in the same manner, but as deviating in the positive direction. His mental age will be higher than his biological age. We can also say that a fifth-grade child has 7th grade ability when tested in a specific mathematic ability because he could solve most of the problems that a seventh grader could do. Developmental norms may also be based upon highly qualitative descriptions of behavior in specific functions such as sensorimotor activities or concept formation, expressed in qualitative terms. This is one way of comparing one’s performance with the norms. But this approach is not as easy to apply as it may seem to be. At times people take tests that measure different abilities. Even the subtests of the same test may be measuring a variety of skills or abilities. In such cases it is not necessary that everybody attains the same MA in all tests or subtests. This makes comparison in terms of developmental norms difficult. Although developmental norms are not an unpopular form of norms, test scores based on developmental norms are not psychometrically sound. However they are used commonly for descriptive purposes, especially in clinical and research settings. Mental Age: The term mental age is widely used after the development of Binet-Simon scales. Binet himself used a more neutral term “mental level” in his own scale. In Binet’s scale items were grouped in year levels e.g items that were passed by the majority of 8 years old children in the standardization sample were included in the 8- year level. Similarly items that were passed by a large number of 10 year old children were placed in 10- year level and so on. Stanford- Binet and other similar scales are age scales. While the scale was being used very frequently, the problem of ‘scatter’ of scores was observed. It meant that many subjects did not show uniform performance on all subtests of the scale/test. Some individuals would failed the tests that were below their age level while at the same time they had passed the tests that were above their age levels. For overcoming this problem the concept of “basal age” was introduced. Basal age refers to the highest year at which a person passes all items. For all the tests that were passed at higher year levels, the subject was given partial credits in months. These were then added to basal age. Basal age and additional months of credit were added together to yield the child’s mental age. Mental age norms are also used with the tests that are not formatted or designed according to the year levels. In these tests, mean raw scores of children of specific age groups in the standardization sample are the norms of the tests for the corresponding age groups. The mental age of a child is determined by comparing her raw score with the age norm available. For example if the raw score of a 12 year old girl is equal to the 12 year norm then she would be said to have mental age of 12 years. ©copyright Virtual University of Pakistan Psychological Testing and Measurement (PSY-P631) VU One major shortcoming of using mental age as indicator of intellectual ability is that mental age does not mean the same thing at different stages of life. MA of 4 of a 5 year old is not the same as the MA 24 of a 25 year old. As age progresses the unit of MA tends to shrink. According to Anastasi and Urbina (2007), a child who has mental age of 3 at the age of 4, would be 3 years retarded at the age of 12. Mental growth of one year from 3 to 4 years of age is equivalent to 3 years of mental growth from 9 to 12 years. Therefore positive or negative deviation from norm at different age groups does not mean the same. Deviation at a very young age means a lot as compared to that at older age. Grade Equivalents: Grade equivalents represent the scores on educational achievement tests attained by children in a certain grade. These norms are obtained by calculating the mean raw scores of children in the standardization sample representing each grade. If 6th grade children in the standardization sample obtained a mean score of 35 in arithmetic test then this raw score has a grade equivalent of 6. Hence a student obtaining 35 on the same test will be said to have a grade equivalent of 6. Most schools have an academic year spanning over ten months. A whole year is represented by the corresponding grade. However the measurement may be made after some months have passed after the grade started. In such cases the successive months can be expressed in decimal points. For example, the grade equivalent of 7.0 refers to the average performance of a 7th grader at the beginning of session. 7.5 present the average at the middle of the session and so on. Grade norms have several limitations. According to Anastasi and Urbina (2007), grade units are unequal and these inequalities occur irregularly in different subject matter areas. They are only applicable for the common subjects taught throughout the grade levels covered by the tests; but not for different subjects taught for only one to two years in high schools or colleges. Even when the same courses are covered in the tests, it cannot be ensured that they received identical importance, attention, and learning in all grades. Yet another complication, it may happen that a child progresses in one subject more rapidly than another subject during the same grade. Grade norms tend to be incorrectly regarded as the performance level of students. Because of grade norms it is possible that a teacher of 6th grade assumes that all the students in class will obtain scores equal to or near to 6th grade norm in achievement tests. However individual differences in any grade can be so large that scores on achievement test will vary over several grades. Ordinal Scales: Another approach to developmental norms develops from the research in the area of child psychology. Psychologists exploring development in infants and young children made interesting observations. They gave descriptions of the behavioral functions of infants and children that were typical of successive ages. These behaviors included functions like sensory discrimination, linguistic communication and concept formation. These empirical observations proved to be valuable in the understanding of human development. An example of this research is the work of Gesell and his associates. Their main emphasis was on the sequential patterning of early behavior development. The Gesell Developmental Schedules were developed to see the approximate developmental level in months that a child has achieved. The attainment of this development is shown in each of four major areas of behavior; motor, adaptive, language, and personal- social. Eight key ages from 4 to 36 weeks are used as standards of developmental level. The developmental level of a child is determined by comparing his behavior with the behavior typical of each level. Gesell and his associates, who focused on the sequential patterning of early behavior development, claimed that the children’s development involved: a) orderly progression of behavior changes and b) uniformities of developmental sequence. For example, a chronological sequence can be observed in visual fixation and in hand and finger movements when they are reacting to a small object placed in front of them. The way palm, thumb, or fingers etc. are moving and the manner in which they are used varies from one stage of development to another. This approach had the underlying notion that developmental stages follow a certain sequential order. Hence the scales used for measuring these are ordinal scales that yield information regarding the stage where a child stands. The use of these also involves the understanding that successful performance of a child at one level implies success at all lower levels of age. Jean Piaget did extensive work in child and developmental psychology in the 1960s. He gave his theory of cognitive development. He talked about stages of cognitive development as falling in a sequence, and said ©copyright Virtual University of Pakistan Psychological Testing and Measurement (PSY-P631) VU that age levels for these stages were arbitrary. His theory covered ages from infancy till mid-teens. Rather than broad abilities, he was interested in studying specific concepts. He introduced many specific concepts e.g. Object permanence, conservation, and perspective etc. In object permanence the child is aware of the object existence, when they are out of sight. Conservation is the recognition that an attribute remains constant over changes in perceptual appearance e.g. the quantity of a liquid will remain constant whenever it is poured in a different shaped container. Perspective means the knowledge that objects appear differently when at a distance and seen in perspective. In order to assess cognitive development, Piagetian tasks are used. These are designed in such a manner that they reveal the dominant aspect of each developmental stage. In short, ordinal scales gauge the uniform progression of development through successive stages by measuring attainment of specific functions. Within-Group Norms: Within-group norms evaluate a person’s performance with the most nearly comparable standardization group like a child’s raw score is compared with the children of his age or grade. These norms are so popular that now all test scores provide within-group norms in some types of form. Within-group scores employ many statistical procedures because of their clearly defined quantitative meaning. Percentiles: “A percentile indicates the individual’s relative position in the standardization sample”. The percentage of a person in standardization sample expressed in terms of percentiles scores. For example, if 50 % people obtained 25 score in an analytical reasoning test then this score corresponds to 50th percentile. Percentiles can also describe ranks in a group of 100 people i.e a person who is at the top in the group given the rank of one; likewise a person who is at the bottom in a group will be given a poorer rank. 50th percentile refers to the median as percentile; if a score of 50 was at the 50th percentile then a score above 50 represents the above average score while a score below 50 indicates the below average scores. The 25th and 75th percentiles are known as the first and third quartile points (Q1 and Q3) because they cut off the lowest and highest quarters of the distribution. The difference between percentage and percentile is that percentage is a raw score while percentiles are derived scores. 50th percentile refers to the median as percentile; if a score of 50 was at the 50th percentile then a score above 50 represents the above average score while a score below 50 indicates the below average scores. The 25th and 75th percentiles are known as the first and third quartile points (Q1 and Q3) because they cut off the lowest and highest quarters of the distribution. The difference between percentage and percentile is that percentage is a raw score while percentiles are derived scores. Standard Scores: Standard scores are the scores that express the individual’s distance from the mean in terms of the standard deviation of the distribution. Standard scores can be calculated by the linear and non-linear transformation of the raw scores. Linearly derived scores are also known as “z-scores”. In z-scores the mean of the normative sample is subtracted from the raw score and then divided by the standard deviation of this sample. Computation of Standard Scores: z = X-M/ SD where X = 100, M = 80 and SD= 10 By putting the given values in formula z = 100-80/ 10 z=2 Any raw score that is equal to mean will end up in to a z-score of zero. A negative derived score indicates that a person’s score is below average; positive scores indicate that it is an above average score. Most modern tests use standard scores and interpretation of their scores is made with reference to standard scores. ©copyright Virtual University of Pakistan Psychological Testing and Measurement (PSY-P631) VU Normal Standard Scores: These are standard scores expressed in terms of a distribution that has been transformed to fit a normal curve. Normal standard scores are obtained by finding the percentage of a person in standardization sample. Then this percentage is located in the normal curve frequency table, and the standard score is obtained. Normal standard score can also be put in any convenient form. If the normalized standard score is multiplied by 10 and added or subtracted from the 50, it is converted into a T score. It was first proposed by McCall (1922). In this scale, an individual score of 50 corresponds to mean and score of 60 to 1 SD above the mean and so on. Normalized standard scores should be applied when the sample is large and representative and when this is confirmed that deviation from normal results is due to the some drawback in the test rather than from the characteristics of the sample. Another variation of such transformation of scores is on the Stanine scale. United States Air force developed this scale during the Second World War. Stanine is based upon the words ‘standard nine’ and the fact that scores run from one to nine. A single digit system is employed with a mean of 5 and standard deviation of approximately 2. The Deviation IQ: The term IQ (Intelligence Quotient) was introduced in early intelligence tests. It is simply obtained by dividing the MA by chronological age, and multiplied by 100: IQ = MA/ CA×100 If the child’s MA equals to CA then the child’s IQ will be exactly 100. IQ of 100 represents the average or mean performance. IQ below 100 indicated below average scores that are moving toward retardation and above 100 presents the acceleration. However it is proved that Ratio IQ has some major technical problems. The problem with the IQ level is that it is not comparable with different age levels unless the SD of IQ distribution remains constant with the age. For example if a child can read at the age of 3 which is his chronological age and an average child starts reading at the age of 6 which is mental age than his/her IQ will be scored 200. For this reason ratio IQ is replaced by the so called deviation IQ. Deviation IQ is a standard score with a mean of 100 and an SD that approximates the SD of the Stanford- Binet IQ distribution. It compares people of the same age and assumes that IQ of individuals is normally distributed. Relativity of Norms: Interest Comparisons: An IQ of a person should always be described by the name of the test on which it was obtained. For example, the IQ of one person is 110 and another person is 90. It cannot be accepted without further detailed information. The relative standing of the IQ of both persons can change with the exchange of particular tests. An individual’s relative standing in different functions may be misrepresented by the lack of comparability of the test norms. For example, an individual has been given a verbal comprehension test and a spatial aptitude test to determine his/her relative standing in the two fields. If the verbal ability test was standardized on a random sample of high school students, while the spatial test was standardized on a selected group of students attending elective courses, the examiner might erroneously conclude that the individual is much more able on verbal ability than spatial ability, when the reverse may actually be the case. In longitudinal comparisons, individual’s scores on a specific test obtained over time. For example, if the child scores are 110, 115, and 120 at the fourth, fifth, and sixth grades, it can be said that these differences in the scores may be due to the different tests. There are three reasons for these variations among the scores of the same individual performance on different tests. 1. Intelligence tests can differ in content with the same label. Like one test may include only verbal content, other includes numerical content and so on. 2. The scales’ units may not be comparable e.g. IQ on one test may have SD of 12 while IQ on another test has SD of 18. 3. The compositions of the standardization samples used in establishing norms for different tests may vary. The same individual will appear to have performed better when compared with a less able group than when compared with a more able group. ©copyright Virtual University of Pakistan Psychological Testing and Measurement (PSY-P631) VU Scales of Measurement: In order for us to describe test scores in a quantitative form, we have to design tests in such a manner that they yield results in a numeric form. They either should be originally obtained in the form of numbers or should allow conversion in that form. Psychological measurement involves rules according to which objects are assigned numbers and “quality” is expressed in numeric form. For example, in a personality test, an item asks “do you like to be in the company of young age mates most of the time? The answer is allowed in terms of degrees e.g. “always”, “often” “could not say”, “rarely” and “never”. The subject has to choose one option that best describes her. Now the response is going to be in a qualitative form. Comparison with others is not possible in this form. Therefore a certain number is assigned to these options. Ranging from one to five, option “never” is allocated one, and option “always” is assigned 5. Now all the responses of the subject can be quantified, and these quantities can be subjected to st5atistical treatment. In short tests where every question has a right answer, like in ability tests, the total numbers of corrected responses is counted and that yields the test score. ©copyright Virtual University of Pakistan

Use Quizgecko on...
Browser
Browser