Unit 2 B Reliability PDF
Document Details
Uploaded by EnterprisingSolarSystem
AK Singh
Tags
Summary
This document discusses the concepts of reliability, specifically in psychological testing, emphasizing consistency and stability over time or across various situations. It details test-retest and internal consistency reliability, demonstrating how tests are measured for consistency and describes the split-half method and the Kuder-Richardson formulas as methods to estimate reliability.
Full Transcript
Unit 2 B RELIABILITY (AK Singh pg 85) Reliability denotes the consistency of the psychological test over a period of time. In psychological testing, reliability refers to the consistency or stability of a test over t...
Unit 2 B RELIABILITY (AK Singh pg 85) Reliability denotes the consistency of the psychological test over a period of time. In psychological testing, reliability refers to the consistency or stability of a test over time or across different situations. A reliable test produces similar results under consistent conditions. Reliability can be denoted with the help of the reliability coefficient, which can be statistically computed. There are various methods to test the reliability of a test, these are discussed as follows: 1. TEST-RETEST RELIABILITY This type of reliability examines the stability of test scores over time. In test-retest reliability, the single form of the test is administered twice on the same sample with a reasonable time gap. Example: If a personality test is given to a group of people today and then again in a month, and the results are almost identical, the test has high test-retest reliability. In this way, two administrations of the same test yield two independent sets of scores. The two sets, when correlated, give the value of the reliability coefficient. The correlation obtained needs to be obviously positive and lower the correlation, lower the reliability. A high test-retest reliability coefficient indicates that the examinee who obtains a low score on the first administration tends to score low on the second administration, and its converse when the examinee scores high on the first administration tends to score high on the second administration. In computing retest reliability, the investigation is often faced with the problem of determining a reasonable time gap between the two administrations of the test. The most appropriate and convenient time gap between the two administrations is a fortnight, which is considered neither too short nor too long. Though, error could occur in test- retest reliability due to issues like practice effect, maturation, learning and memorisation of items by the participants. 2. INTERNAL CONSISTENCY RELIABILITY Internal consistency reliability indicates the homogeneity of the test. If all the items of the test measure the same function or trait, the test is said to be a homogeneous one and its internal consistency reliability would be pretty high. Example: In a depression inventory, if all the questions are about various symptoms of depression (like sadness, fatigue, lack of motivation), and people who score high on one question tend to score high on all the others, the test has high internal consistency. a) Split-half Method The most common method of estimating internal consistency reliability is the split-half method in which the test is divided into two equal or nearly equal halves. Example: If a 40-question anxiety test is divided into two 20-question sets, and people score similarly on both sets, the test shows high split-half reliability. The common way of splitting the test is the odd-even method. Almost any split can be accepted except the first half and the second half of the items. A division of this sort is not preferred because the nature of items in the two halves in a power test is different. Usually, the easier items are placed at the beginning or in the first half of the test and the comparatively difficult items are placed towards the end of the test or in the second half of the test. However, the odd-even method can be reasonably applied for the purpose of splitting. In this method, all odd-numbered items (like 1, 3, 5, 7, 9, etc.), constitute one part of the test and all even-numbered items (like 2, 4, 6, 8, 10, 12, etc.) constitute another part of the test. Each examinee, thus, receives two scores: the number of correct answers on all odd-numbered items constitutes one score and the number of correct answers on all even-numbered items constitutes another score for the same examinee. In this way, from single administration of the single form of the test, two sets of scores are obtained. Product moment (PM) correlation is computed to obtain the reliability of the half-test. b) Kuder-Richardson Formulas and Coefficient alpha Kuder and Richardson (1937) did a series of research to remove some of the difficulties of the split-half method of estimating reliability. They were dissatisfied with split-half method of estimating reliability and therefore, they devised their own formulas for estimating the internal consistency of the test. Their formulas 20 and 21 have become very popular and well-known. K-R20 is the basic formula for computing the reliability coefficient and K-R21 is the modifier form of K-R20. The Kuder–Richardson Reliability indicates inter-item consistency and is mainly used for psychological tests that have items with multiple choice, items with fill-in the blanks, items seeking short answer and so on. Coefficient Alpha, also known as Cronbach’s Alpha can be termed as an extension of Kuder–Richardson Reliability (Veeraraghavan and Shetgovekar, 2016). This can be used for tests that have more than two answers and thus can also be used for Likert scale. This also focuses on internal consistency. Example: In a true/false test on anxiety symptoms, if the KR-20 value is high, it means the items on the test are consistently measuring anxiety. 3. ALTERNATE-FORMS RELIABILITY Alternate-forms reliability is known by various names such as the parallel-forms reliability, equivalent-forms reliability, and the comparable-forms reliability. Alternate- forms reliability requires that the test be developed in two forms, which should be comparable or equivalent. Two forms/versions of the test are administered to the same sample, either immediately the same day or with the time interval of usually a fortnight. Step 1: Give test A to a group of 50 students on a Monday. Step 2: Give test B to the same group of students that Friday. Step 3: Correlate the scores from test A and test B. Example: Perceived Social Scale. Differences between parallel and split-half reliability: Parallel or equivalent forms reliability requires two versions of a test. With split-half reliability, a researcher only conducts one administration of the test or measure and splits the test in half. Alternate-Forms Reliability = two versions of the same test (given on different occasions). Split-Half Reliability = splitting one test into two parts (measured at the same time) 4. SCORER RELIABILITY There are tests such as tests of creativity and projective tests of personality which leave a lot to the judgement of the scorer. In fact, such tests are in as much need of scorer reliability as there is for the more usual reliability coefficients. Scorer reliability is the reliability which can be estimated by having a sample of test independently scored by two or more examiners. The two sets of scores obtained by each examiner are completed in the usual way and the resulting correlation coefficient is known as scorer reliability. This type of reliability is needed especially when subjectively scored tests are employed in research. Inter-Scorer Reliability: The consistency between different scorers or evaluators. It measures how much two or more individuals agree on scoring the same test. Intra-Scorer Reliability: The consistency of the same scorer over time, meaning whether the same individual scores consistently when evaluating the same material multiple times. Example: If two teachers are grading essays and give very similar scores to the same set of essays, the test has high inter-scorer reliability. If the same teacher grades the essays again after a month and gives similar scores, it shows high intra-scorer reliability.