Analysis and Interpretation of Test Scores PDF

Topic  Analysis and Interpretation 10 of Test Scores LEARNING OUTCOMES By the end of this topic, you should be able to: 1. Compute various central tendency measures; 2. Explain the use of st...

Topic  Analysis and Interpretation 10 of Test Scores LEARNING OUTCOMES By the end of this topic, you should be able to: 1. Compute various central tendency measures; 2. Explain the use of standard scores; 3. Compute the z-score and T-score; 4. Describe the characteristics of a normal curve; and 5. Explain the role of norms in standardised tests.  INTRODUCTION All the data you have collected on the performance of students will have to be analysed. In this topic we will focus on the analysis and interpretation of the data you have collected about the knowledge, skills and attitudes of your students. You analyse and interpret the information you have collected about your students quantitatively and qualitatively. For quantitative analysis of data, various statistical tools are used, which we will be focussing on in this topic. For example, statistics are used to show the distribution of scores on a Geography test and the average score obtained by students. Copyright © Open University Malaysia (OUM) 204  TOPIC 10 ANALYSIS AND INTERPRETATION OF TEST SCORES 10.1 WHY DO WE USE STATISTICS? When you administer a Geography test to your class of 40 students at the end of the semester, you will obtain a score for each student which is the measurement of a sample of the studentÊs ability. The behaviour tested could be the ability to solve problems such as reading maps, the globe and interpretation of graphs. For example, student A gets a score of 64 while student B gets 32. Does this mean that the ability of student A is better than that of student B? Does it mean that the ability of student A is twice the ability of student B? Are the scores 64 and 32 in percentages? These scores or marks are difficult to interpret because they are raw scores. Raw scores can be confusing if there is no reference made to a „unit‰. It is only logical that you convert the scores to a unit such as percentages. In this example, you get 64 per cent and 32 per cent. Even the use of percentages may not be meaningful. For example, getting 64 per cent in the test may be considered „good‰ if the test was a difficult test. On the other hand, if the test was an easy one, then 64 per cent may be considered to be only „average‰. In other words, to get a more accurate picture of the scores obtained by students in the test, the teacher should: (a) Find out which student obtained the highest marks in the class and the number of questions correctly answered; (b) Find out which student obtained the lowest marks in the class and the number of questions correctly answered; and (c) Find out the number of questions correctly answered by all students in the class. This illustrates that the marks obtained by students in a test should be carefully examined. It is not sufficient to just report the marks obtained. More information should be given about the marks obtained and to do this you have to rely on statistics. Some teachers may be afraid of statistics while others may regard it as too time consuming. In fact, many of us often use statistics without being aware of it. For example, when we talk about average rainfall, per capita income, interest rates and percentage increases in our daily lives, we are talking the language of statistics. What is statistics? Copyright © Open University Malaysia (OUM) TOPIC 10 ANALYSIS AND INTERPRETATION OF TEST SCORE  205 Statistics is a mathematical science pertaining to the analysis, interpretation and presentation of data. It is applicable to a wide variety of academic disciplines from physical and social sciences to humanities. Statistics have been widely used by researchers in education and classroom teachers. In applying statistics in education, one begins with a population to be studied. Example of this could be all Form Two students in Malaysia which is about 450,000 students or all secondary school teachers in the country. For practical reasons, rather than compiling data about an entire population we usually select or draw a subset of the population called a sample. In other words, the 40 Form Two students that you teach is a sample of the population of Form Two students in the country. The data you collect about the students in your class can be subjected to statistical analysis which serves two related purposes, namely descriptive and inference. (a) Descriptive Statistics You use descriptive statistical techniques to describe how your students performed. For example, you use descriptive statistics techniques to summarise data in a useful way either numerically or graphically. The aim is to present the data collected so that it can be understood by teachers, school administrators, parents, the community and the Ministry of Education. The common descriptive techniques used are the mean or average and standard deviation. Data may also be presented graphically using various kinds of charts and graphs. (b) Inferential Statistics You use inferential statistical techniques when you want to infer about the population based on your sample. You use inferential statistics when you want to find out the differences between groups of students, the relationship between variables or when you want to make predictions about student performance. For example, you want to find out whether males did better than females or whether there is a relationship between performance in coursework and the final examination. The inferential statistics often used are the t-test, ANOVA and linear regression. Copyright © Open University Malaysia (OUM) 206  TOPIC 10 ANALYSIS AND INTERPRETATION OF TEST SCORES 10.2 DESCRIBING TEST SCORES Let us assume that you have just given a test on Bahasa Malaysia to a class of 35 Form One students. After marking the scripts, you have a set of scores for each of the students in the class and you want to find out more about how your students performed. Figure 10.1 shows the distribution of the scores obtained by students in the test. Figure 10.1: The distribution of Bahasa Malaysia marks The „frequency‰ column indicates how many students scored for each mark shown and the percentage is shown in the „percentage‰ column. You can describe these scores using two types of measures, namely central tendency and dispersion. Copyright © Open University Malaysia (OUM) TOPIC 10 ANALYSIS AND INTERPRETATION OF TEST SCORE  207 (a) Central Tendency The term „central tendency‰ refers to the „middle‰ value and is measured using the mean, median and mode. It is an indication of the location of the scores. Each of the three measures is calculated differently. Which one to use will depend upon the situation and what you want to show. (i) Mean The mean is the most commonly used measure of central tendency. When we talk about an „average‰, we usually refer to the mean. The mean is simply the sum of all the values (marks) divided by the total number of items (students) in the set. The result is referred to as the arithmetic mean. Using the data from Figure 10.1 and applying the following formula, you can calculate the mean. Mean   x  35  41  42  75  2148  53.22 N 35 40 (ii) Median The median is determined by sorting the score obtained from lowest to highest values and taking the score that is in the middle of the sequence. For the example in Figure 10.1, the median is 52. There are 17 students with scores less than 52 and 17 students whose scores are greater than 52. If there is an even number of students, there will not be a single point at the middle. In this case, you calculate the median by taking the mean of the two middle points, that is, divide the sum of the two scores by 2. (iii) Mode The mode is the most frequently occurring score in the data set. Which object appears the most often in your data set? In Figure 10.1, the mode is 57 because 7 students obtained that score. However, you can also have more than one mode. If you have two modes it is bimodal. Distribution of the scores may be graphed to demonstrate visually the relations among the scores in a group. In such graphs, the horizontal axis or x axis is the continuum on which the individuals are measured; the vertical axis or y axis is the frequency (or the number) of individuals earning any given score shown on the x axis. Figure 10.2 is a histogram representing the scores for Bahasa Malaysia obtained by a group of 35 students as indicated earlier in Figure 10.1. Copyright © Open University Malaysia (OUM) 208  TOPIC 10 ANALYSIS AND INTERPRETATION OF TEST SCORES Figure 10.2 shows a graph with the distribution of Bahasa Malaysia scores. Figure 10.2: Graph showing the distribution of Bahasa Malaysia scores SELF-CHECK 10.1 What is the difference between mean, median and mode? (b) Dispersion Although a mean tells us about the groupÊs average performance, it does not tell us how close to the average or mean students scored. For example, did every student score 80 per cent in the test or were the scores spread out from 0 to 100 per cent? Dispersion is the distribution of the scores. Among the measures used to describe spread are range and standard deviation. (i) Range The range of scores in a test refers to the lowest and highest scores obtained in the test. The range is the distance between the extremes of a distribution. Copyright © Open University Malaysia (OUM) TOPIC 10 ANALYSIS AND INTERPRETATION OF TEST SCORE  209 (ii) Standard Deviation Standard deviation refers to how much the scores obtained by students deviate or defer from the mean. Figure 10.3 is a set of scores obtained by 10 students in a Science test. Figure 10.3: Scores in a Science test obtained by 10 students Based on the raw scores, you can calculate the standard deviation using the formula given in the following.  x  x 2 153 Standard Deviation   N 1  9  17  4.12 (a) The first step in computing the standard deviation is to find the mean which is 390 divided by 10 = 39. (b) Next is to subtract the mean from each score in the column labelled x  x. (c) This is followed by the calculation in the column on the right labelled (x  x )2. Note that all numbers in this column are positive. The squared differences are then summed and the square root calculated. (c) The standard deviation is 4.12, which is the positive square root of 153 divided by 9. Copyright © Open University Malaysia (OUM) 210  TOPIC 10 ANALYSIS AND INTERPRETATION OF TEST SCORES To better understand what the standard deviation means, refer to Figure 10.4 which shows the spread of scores with the same mean but different standard deviations. (a) For Class A, with a standard deviation of 4.12, approximately 68% (1 standard deviation) of students scored between 34.88 and 43.12. (b) For Class B, with a standard deviation of 2, approximately 68% (1 standard deviation) of students scored between 37 and 41. (c) For Class C, with a standard deviation of 1, approximately 68% of students scored between 38 and 40. Figure 10.4: Distribution of scores with varying standard deviations Copyright © Open University Malaysia (OUM) TOPIC 10 ANALYSIS AND INTERPRETATION OF TEST SCORE  211 Note that the smaller the standard deviations, the greater the scores tend to „bunch‰ around the mean and vice versa. Hence, it is not enough to just examine the mean alone because the standard deviation tells us a lot about the spread of the scores around the mean. Which class do you think performed better? The mean does not tell us which class performed better. Class C performed the best because approximately two thirds ( 2 3 ) of the students scored between 38 and 40. SELF-CHECK 10.2 What is the difference between range and standard deviation? ACTIVITY 10.1 What is the difference between a standard deviation of 2 and a standard deviation of 5? Share your answer with your coursemates in the myINSPIRE online forum. Skew Skew refers to the symmetry of a distribution. A distribution is skewed if one of its tails is longer than the other. Refer to Figure 10.5 which shows the distribution of the scores obtained by 38 students on a History test. There is a negative skew because it has a longer tail in the negative direction. What does it mean? It means that more students were getting high scores in the history test which may indicate that either the test was too easy or the teaching methods and materials were successful in bringing about the desired learning outcomes. Copyright © Open University Malaysia (OUM) 212  TOPIC 10 ANALYSIS AND INTERPRETATION OF TEST SCORES Figure 10.5: Negative skew Figure 10.6 illustrates the distribution of the scores obtained by 38 students in a Biology test. There is a positive skew because it has a longer tail in the positive direction. What does it mean? It means that more students were getting low scores in the Biology test which indicates that the test was too difficult. Alternatively, it could also mean that the questions were not clear or that the teaching methods and materials did not bring about the desired learning outcomes. Copyright © Open University Malaysia (OUM) TOPIC 10 ANALYSIS AND INTERPRETATION OF TEST SCORE  213 Figure 10.6: Positive skew ACTIVITY 10.2 A teacher administered an English test to 10 students in her class. The students earned the following marks: 14, 28, 48, 52, 77, 63, 84, 87, 90 and 98. For the distribution of marks, find the following: (a) Mean; (b) Median; (c) Range; and (d) Standard deviation. Copyright © Open University Malaysia (OUM) 214  TOPIC 10 ANALYSIS AND INTERPRETATION OF TEST SCORES 10.3 STANDARD SCORES Having given a test, most teachers report the raw scores obtained by students. Say for example, Zulinda, a Form Four student earned the following scores at the end of semester examination: (a) 80 for Science; (b) 72 for History; and (c) 40 for English. With just the raw scores, what can you say about ZulindaÊs performance on these tests or her standing in the class? Well, actually not very much. Without knowing how these raw scores compare to the total distribution of raw scores for each subject, it is difficult to draw any meaningful conclusions regarding her relative performance in each of these tests. How do you make these raw scores more meaningful? (a) Assume that the score of all three tests are approximately normally distributed. (b) The mean and standard deviation of the three tests are as follows: (i) Science: Mean = 90 and Standard deviation = 10 (ii) History: Mean = 60 and Standard deviation = 12 (iii) English: Mean = 40 and Standard deviation = 15 Based on the additional information, what statements can you make regarding ZulindaÊs relative performance in each of the three tests? The following are some conclusions you can make: (a) Zulinda scored the best in the History test and her raw score of 72 falls at a point one standard deviation above the mean; (b) Her next best score is English and her raw score of 40 falls exactly at the mean of the distribution of the scores; and (c) Finally, even though her raw score for Science was 80, it falls one standard deviation below the mean. Copyright © Open University Malaysia (OUM) TOPIC 10 ANALYSIS AND INTERPRETATION OF TEST SCORE  215 Raw scores, like ZulindaÊs scores, can be converted to two types of standard scores which are the Z-score and T-score. (a) Z-Score Converting ZulindaÊs raw scores into „z-scores‰, we can say that she achieved a: (i) Z-score of +1 for History (ii) Z-score of 0 for English (iii) Z-score of ă1 for Science. What is a z-score? How do you compute the z-score? A z-score is a type of standard score. The term standard score is the general name for converting a raw score to another scale using a predetermined mean and a predetermined standard deviation. Z-scores tell how many standard deviations away from the mean the score is located. Z-scores can be positive or negative. A positive z-score indicates that the value is above the mean while a negative z-score indicates that the value is below the mean. A z-score is a raw score that has been transformed or converted to a scale with a predetermined mean of 0 and a predetermined standard deviation of 1. A z-score of ă6 means that the score is 6 standard deviations below the mean. The formula used for transforming raw scores into z-scores involves subtracting the mean from the raw score and dividing it by the standard deviation. x x z SD Let us use this formula to convert KumarÊs marks of 52 obtained in a Geography test. The mean for the test is 70 and the standard deviation is 7.5. x x 52  70 18 z    2.4 SD 7.5 7.5 The z-score computed for the raw score of 52 is ă2.4 which means that KumarÊs score for the Geography test is located 2.4 standard deviations below the mean. Copyright © Open University Malaysia (OUM) 216  TOPIC 10 ANALYSIS AND INTERPRETATION OF TEST SCORES Example: Using the Z-Score to Make Decisions A teacher administered two Bahasa Malaysia tests to students in Form Four A, Form Four B and Form Four C. The two top students in Form Four C are Seng Huat and Mei Ling. The teacher was planning to give a prize for the best student in Bahasa Malaysia in Form Four C but was not sure who is the better student. Test 1 Test 2 Seng Huat 30 50 Mei Ling 45 35 Mean 42 47 Standard Deviation 7 8 The teacher could use the mean to determine who is better. But both students have the same mean. How does the teacher decide? By using the z-score, the teacher can know how far from the mean are the scores of the two students and thus who performed better. Using the formula above, the teacher computes the z-score shown in the following: Test 1 Test 2 Total 30  42 50  47 Seng Huat  1.71  0.375 1.34 7 8 45  42 35  72 Mei Ling  0.43  1.50 1.07 7 8 Upon examination of the information in the table, the teacher finds that both Seng Huat and Mei Ling have negative z-scores for the total of both tests. However, Mei Ling has a higher total z-score (ă1.07) compared to Seng HuatÊs total z-score (ă1.34). In other words, Mei LingÊs total score was closer to the mean and therefore the teacher concludes that Mei Ling did better than Seng Huat. Z-scores are relatively simple to use but many educators are reluctant to use it especially when test scores are reported as negative numbers. How would you like to have your Mathematics score reported as ă4? For these reasons, alternative standard score methods are used such as the T-score. Copyright © Open University Malaysia (OUM) TOPIC 10 ANALYSIS AND INTERPRETATION OF TEST SCORE  217 (b) T-Score The T-score was developed by W. McCall in the 1920s and is one of the many standard scores currently being used. T-scores are widely used in psychology and education especially when reporting performance in standardised tests. The T-score is a standardised score with a mean of 50 and a standard deviation of 10. The formula for computing the T-score is: T = 10(z) + 50 Say for example, a student has a z-score of ă1.0 and to convert it to T-score: T  10(z)  50  10(1.0)  50  (10)  50  40 When converting z-scores to T-scores, you should be careful not to drop the negatives. Dropping the negatives will result in a completely different score. ACTIVITY 10.3 Convert the following z-scores to T-scores: z-score T-score +1.0 ă2.4 +1.8 Why would you use T-scores rather than z-scores when reporting the performance of students in the classroom? Explain your answer to your coursemates in the myINSPIRE online forum. Copyright © Open University Malaysia (OUM) 218  TOPIC 10 ANALYSIS AND INTERPRETATION OF TEST SCORES 10.4 THE NORMAL CURVE The normal curve (also called the „bell curve‰) is a hypothetical curve that is supposed to represent all natural occurring phenomena. In a normal distribution, the mean, median and mode have the same value. It is assumed that if we were to sample a particular characteristic such as the height of Malaysian men, you will find the average height to be 5 feet 4 inches or 163 cm. However, there will be some men who will be relatively shorter than the average height and an equal number who will be relatively taller. By plotting the heights of all Malaysian men according to frequency of occurrence, you can expect to obtain something similar to a normal distribution curve. Figure 10.7 shows a normal distribution curve for IQ based on the Wechsler Intelligence Scale for children. In a normal distribution, about two-thirds (⅔) of individuals will have an IQ of between 85 and 115 with a mean of 100. According to the American Association of Mental Retardation (2006), individuals who have an IQ of less than 70 may be classified as mentally retarded or mentally challenged and those who have an IQ of more than 130 may be considered as gifted. Figure 10.7: A normal distribution curve of IQ based on the Wechsler Intelligence Scale for children Similarly, test scores that measure a particular characteristic such as language proficiency, quantitative ability or scientific literacy of a specific population can be expected to produce a normal curve. The normal curve is divided according to standard deviations (such as ă4s, ă3s ⁄⁄ +3s and 4s) which are shown on the horizontal axis. The area of the curve between standard deviations is indicated Copyright © Open University Malaysia (OUM) TOPIC 10 ANALYSIS AND INTERPRETATION OF TEST SCORE  219 as a percentage on the diagram. For example, the area between the mean and standard deviation +1 is 34.13%. Similarly, the area between the mean and standard deviation ă1 is also 34.13%. Hence, the area between standard deviation ă1 and standard deviation +1 is 68.26%. It means that in a normal distribution, 68.26% of individuals will score between standard deviations ă1 and +1. In using the normal curve, it is important to make a distinction between standard deviation values and standard deviation scores. A standard deviation value is a constant and is shown on the horizontal axis in Figure 10.7. On the other hand, the standard deviation score is the obtained score when we use the standard deviation formula (which we discussed earlier). For example, if we obtained a standard deviation equal to 5, then the score for 1 standard deviation is 5 and the score for 2 standard deviations is 10, the score for 3 standard deviations is 15 and so forth. Standard deviation values of ă1, ă2, and ă3 will have corresponding negative scores of ă5, ă10 and ă15. Note that in Figure 10.7, z-scores are indicated from + 1 to + 4 and ă1 to ă4 with the mean as 0. Each interval is equal to one standard deviation. Similarly, T-scores are reported from 10 to 90 (interval of 10) with the mean set at 50. Each interval of 10 is equal to one standard deviation. Ć The term „central tendency‰ refers to the „middle‰ value and is measured using the mean, median and mode. It is an indication of the location of the scores. Ć The mean is simply the sum of all the values (marks) divided by the total number of items (students) in the set.  The median is determined by sorting the score obtained from the lowest to highest values and taking the score that is in the middle of the sequence.  The mode is the most frequently occurring score in the data set. Ć The range of scores in a test is the distance between the lowest score and the highest score obtained in the test. Ć Standard deviation refers to how much the scores obtained by students deviate or defer from the mean. Ć Skew refers to the symmetry of a distribution. A distribution is skewed if one of its tails is longer than the other. Copyright © Open University Malaysia (OUM) 220  TOPIC 10 ANALYSIS AND INTERPRETATION OF TEST SCORES Ć A negative skew has a longer tail in the negative direction. Ć A positive skew has a longer tail in the positive direction. Ć The standard score refers to raw score that has been converted from one scale to another scale using the mean and standard deviation. Ć Z-scores indicate how many standard deviations away from the mean the score is located. Ć The T-score is a standardised score with a mean of 50 and a standard deviation of 10. Ć The normal curve (also called the „bell curve‰) is a hypothetical curve that is supposed to represent all natural occurring phenomena. Central tendency Positive skew Dispersion Range Mean Standard deviation Median Standard score Mode T-scores Negative skew z-scores Normal curve Copyright © Open University Malaysia (OUM)

Analysis and Interpretation of Test Scores PDF

Document Details

Tags

Related

Summary

Full Transcript