Psychological Testing & Measurement PDF
Document Details
Uploaded by SimplifiedNewton
Tags
Summary
This document discusses reliability in psychological testing and measurement, including the concepts of true score variance, observed score variance, and the sources of error in test scores. It covers various factors affecting test reliability, such as difficulty level, test length, and test administration procedures.
Full Transcript
Psychological Testing and Measurement (PSY-P631) VU Lesson 13 Reliability Whatev...
Psychological Testing and Measurement (PSY-P631) VU Lesson 13 Reliability Whatever attribute, characteristic, trait, object, or phenomenon one aims to measure, one wants the measurement to be reliable. In case of measurement in physical sciences reliability of measures is not a big issue. The measure or instrument will either be reliable or not. An instrument that is not completely reliable will not be used for measurement. On the other hand, in psychology we may be using measures whose reliability may be affected by a number of variables. Psychologists deal with phenomena that may not remain stable and consistent all the time. Therefore, the tools or instruments used for measuring these phenomena may not give us the same results every time we use them. There may always be a chance of some degree of error in the measurement or the findings in any investigation. We, or the tools we use, may end up in underestimation or over estimation of a given phenomenon. Hence it is very important for us to estimate or calculate the chance and amount of error that may be involved in our assessment. This issue becomes even more significant when our assessment or measurement is to be used for serious decisions about someone’s future, education, profession, diagnosis of a condition, or major life decisions. Therefore psychologists involved in test development do two things try to make tests as reliable as possible. Also, they report the reliability of their measure. There are three basic qualities of a good psychological test namely, reliability, validity, and standardization. A test that is not reliable is not a trustworthy test. A test that is not valid will not measure what it is supposed to measure; and a test that has not been standardized will not give us results that we can confidently generalize to various other groups. We have discussed the concept of norms and standardization in the previous sessions and now we will look into reliability, its types, and applications. By definition reliability means “ the consistency of the scores obtained by the same persons when they are reexamined with the same test on different occasions, or with different sets of equivalent items, or under other variable examining conditions” (Anastasi & Urbina,2007). In order to understand the concept reliability we have to understand two other concepts; correlation and error in measurement. According to Kaplan & Saccuzzo (2001) reliability is “the extent to which a score or measure is free from measurement error. Theoretically, reliability is the ratio of true score variance to observed score variance.” The classical test score theory implies that everyone can obtain a true score on any test or measure if the measure is free of error. But we know that there is perhaps no measure that can be considered as totally error free. There is always some chance of error. This further implies that the scores that we obtain on different measures are not the true scores of the test taker. These are the observed scores plus error. In other words the observed score is not the true score of a test taker, implying that the observed score may not be a 100% true representative of a person’s traits or abilities. Here we need to realize that the term ‘error’ is not used to indicate that something has gone wrong or we have made some ‘mistake’. Error, in this context, refers to the amount and extent of variance that may be expected in results. The observed score is therefore the sum of true score and error i.e., X= T+E Here X refers to the observed score, T is the true score, and E is the error that can be expected in the measurement. Sources of Error in Test Scores: Psychologists and educationists try to estimate the amount and degree of error that may be expected in their measurement. That is the main reason why test developers and test administrators emphasize on uniform testing procedures besides controlling other possible sources of error. The following are a few of the variables that may be possibly causing error in measurement: a. Test Related Factors: Difficulty level (too difficult or too easy) Length of the test (too long causing fatigue or boredom) Domain or content (not suitable for all test takers…..suitable for some but not all) Items may not be representing the domain or content Time limit (may cause stress or a handicap) ©copyright Virtual University of Pakistan Psychological Testing and Measurement (PSY-P631) VU b. Test Administration Process: Poor and not uniform testing conditions (physical setting and environment) Poor, faulty, improperly worded, improperly delivered, not uniform instructions Test administrator’s personality ( different administrators in different situations having different personality styles) Rapport ( poor, too much, or too little) c. Examinee Related Variables: Prior learning and experience Individual differences (within group differences; Personality styles, stress tolerance level, IQ level, emotionality, motivation, knowledge) Difference from the normative sample ( no group is identical to the normative sample; every group is different from every other group) Within- person differences (the same persons may change over time (life, academic, and professional experiences; physical conditions, health, motivational level, emotional state etc) The test developers, psychologists, or other professionals involved in test construction try their best to make sure that these variables are controlled or kept constant in test administration. But bringing in complete consistency in the testing process is not possible, and there is no test that is hundred percent reliable. However, the test developers do report the coefficient of reliability of their measures as well as the characteristics of the normative sample from which the coefficient was obtained. This provides a guideline to the test users about the type of people or groups to whom the measure may be administered. Correlation and Reliability: We understand that reliability is about the consistency of scores of the same persons on same measures in different conditions; or it is about the consistency observed in the scores of different persons or different groups having similar characteristics on the same measure. Calculation of reliability involves the concept of correlation. Coefficient of correlation is the value yielded by the calculation of correlation. Coefficient of correlation is denoted by letter ‘r’. This coefficient indicates the relationship between two variables. In the context of testing, it expresses the correspondence between two scores. As students of psychology we know that the coefficient of correlation tells us two things about a relationship; the magnitude of relationship and the direction of relationship. Magnitude means the size of correlation whereas direction indicates whether the correlation is positive or negative. The size of a correlation ranges between -1 and +1, with a zero value in between. The value of a coefficient of correlation is always one or less than one. A coefficient of correlation of +1 means a perfect positive correlation. A coefficient of correlation equal to -1 indicates a perfect negative correlation, whereas a zero value indicates no correlation. The closer is a value of correlation to one, the stronger is the correlation e.g. r= 0.9. The closer a coefficient of correlation is to zero, the weaker will be the relationship. If scores on two sets of scores increase and decrease together then it is a positive correlation. On the contrary, if when scores on set-I increase the scores on set-II decrease then it is a negative correlation. The concept of correlation can best be understood by looking at the concept of a scatter plot or graph. If we plot values of two sets of scores in a graph, then the appearance of the graph or the scatter of the scores will indicate the relationship between the two sets of scores. If the person who scored the lowest in set-I is also the one to score the lowest on set-II, and the one having the highest score on first set is also the highest scorer on set-II; and if every other person has the same position on both sets then it is a perfect positive correlation. On the other hand if the situation is the other way round i.e., the lowest scorer on set-I is the top scorer on set-II, and the top scorer on set-I is the lowest scorer on set-II, and all other persons’ scores follow the same pattern, in the same order on both sets, then it shows a negative correlation. For example look at the following sets of scores obtained from a group of 10 students regarding their IQ, marks in annual exam of English and math, and the classes that they had missed in a year. IQ Marks in Math Marks in Missed classes English ©copyright Virtual University of Pakistan Psychological Testing and Measurement (PSY-P631) VU 70.00 55.00 20.00 19.00 75.00 60.00 13.00 18.00 80.00 65.00 45.00 17.00 85.00 70.00 6.00 16.00 90.00 75.00 67.00 15.00 95.00 80.00 78.00 14.00 100.00 85.00 20.00 13.00 105.00 90.00 12.00 12.00 110.00 95.00.00 11.00 115.00 100.00 32.00 10.00 When these scores are plotted in a graph, we see that a perfect positive correlation exists between the IQ and the marks in math. Everyone has obtained exactly similar scores on the two measures. A graph moving from the lower right corner to the upper left corner expresses a positive correlation. Dot/Lines showMeans 110.00 100.00% IQ of respondents 100.00 90.00 80.00 70.00 60.00 70.00 80.00 90.00 100.00 Marks in m athe ma tics Positive correlation: IQ and marks in math A perfect negative correlation can be seen in case of IQ and classes missed in an academic year. Every one’s scores on IQ are moving in the opposite direction of the number of classes that they had missed. A graph moving from the upper right corner to the lower left corner expresses a negative correlation. Dot/Lines show Means 110.00 IQ of respondents 100.00 90.00 80.00 70.00 10.00 12.00 14.00 16.00 18.00 missed classes Negative correlation: IQ and classes missed in a semester The following graph shows a zero correlation between marks in English and marks in math. No significant trend of scores can be seen from the graph. ©copyright Virtual University of Pakistan Psychological Testing and Measurement (PSY-P631) VU. 100.00 Dot/Lines showMeans 90.00 Marks in mathematics 80.00 70.00 60.00 0.00 25.00 50.00 75.00 marks in english Zero correlation: Marks in math and marks in English Although more than one way of computing the coefficient of correlation is available the most commonly used procedure is the Pearson Product moment method. The size of correlation coefficient generally acceptable for the purpose of reliability is around.80 or.90. In order to see if our obtained coefficient of correlation is significant we can use tables of significance of correlation found at the end of most statistics and research math textbooks. In case the data have come from a large number of people a coefficient less than.80 may also be acceptable. However a coefficient less than.60 is usually not acceptable. ©copyright Virtual University of Pakistan