Reliability Group 1 PDF

Summary

This document summarizes different approaches to understanding reliability in psychological testing, including the different components and historical aspects of the subject.

Full Transcript

PREPARED BY GROUP 1 RELIABILITY 31 AUGUST, 2024 MEMBERS GROUP 1 Tracy Ritzy Bilbao Kyle Bernadette Calibo MEMBERS GROUP 1 Franchez Anne Dabuet Sofia Justine Martin Denver Ramos GROUP 1 MEMBERS RELIABILITY In the language of psychom...

PREPARED BY GROUP 1 RELIABILITY 31 AUGUST, 2024 MEMBERS GROUP 1 Tracy Ritzy Bilbao Kyle Bernadette Calibo MEMBERS GROUP 1 Franchez Anne Dabuet Sofia Justine Martin Denver Ramos GROUP 1 MEMBERS RELIABILITY In the language of psychometrics reliability refers to consistency in measurement. In psychological testing, the word error does not imply that a mistake has been made. Error implies that there will always be some inaccuracy in our measurements. In other words, tests that are relatively free of measurement error are reliable, and tests that have so much measurement error are considered unreliable. A reliability coefficient is an index of reliability, a proportion that indicates the ratio between the true score variance on a test and the total variance. HISTORY & THEORY OF RELIABILITY Notable people: Abraham De Moivre introduced the basic notion of sampling error. Karl Pearson developed the product moment correlation. Charles Spearman worked out most of the basics of contemporary reliability theory with 1904 article entitled “The Proof and Measurement of Association between Two Things.” BASICS OF TEST SCORE THEORY Classical Test Score Theory assumes that each person has a true score that would be obtained if there were no errors in measurement. The difference between the true score and the observed score results from measurement error. X = observed score T = true score X=T+E E = Error The difference between the score we obtain and the score we are really interested is the error of measurement: X-T=E BASICS OF TEST SCORE THEORY Standard Error of Measurement (SEm) Estimates how repeated measures of a person on the same instrument tend to be distributed around his “true” score. The true score is always unknown because no measure can be constructed that provides a perfect reflection of the true score. BASICS OF TEST SCORE THEORY Domain Sampling Theory Another central concept in CTT. It assumes that the items that have been selected for any one test are just a sample of items from an infinite domain of potential items. As the sample gets larger, it represents the domain more and more accurately. The greater the number of items, the higher the reliability. BASICS OF TEST SCORE THEORY Item Response Theory (IRT) A way to analyze responses to tests or questionnaires with the goal of improving measurement accuracy and reliability. It is used to focus on the range of item difficulty that helps assess an individual’s ability level. Item Difficulty = Item Easiness Item sampling or content sampling, ERROR VARIANCE 1. Test terms that refer to a variation among Construction items within a test as well as variation SOURCES OF among items between tests. 2. Test This may influence the test taker’s Administration attention or motivation. 3. Test Scoring Individual administered test still and Interpretation require scoring by trained personnel. 4. Other sources Surveys and polls are two tools of of error assessment commonly used by researchers who study public opinion. RELIABILITY ESTIMATES Test Re-Test Reliability 1 group, 1 test, 2 administration used to evaluate the error associated with administering a test at two different times. Most appropriate for variables that should be stable over time and not appropriate for variables expected to change over time RELIABILITY ESTIMATES Test Re-Test Reliability Carryover Effect This effect occurs when the first testing session influences the scores from the old session. > The shorter the interval, the more at risk for carryover effect. Practice effect > the test-retest correlation usually overestimate the true reliability. RELIABILITY ESTIMATES Test Re-Test Reliability TEST-RETEST PROCEDURE Sample population = (Test A) + (Test A) Administer the psychological test Get the test result Wait for the interval (time gap) Re-administer the psychological test Get the test result Correlate the results RELIABILITY ESTIMATES Test Re-Test Reliability When the interval between testing is greater than six months, the estimate of test-retest reliability is often referred to as the coefficient of stability. coefficient of stability the extent to which a test varies as the result of factors associated with the particular time and occasion on which the test was administered(APA PsycNet, n.d.). RELIABILITY ESTIMATES Test Re-Test Reliability Disadvantages of Test-Retest Possible better performance Checking / knowing the answers Practice effect REMEMBER! The higher the interval, the higher the reliability. RELIABILITY ESTIMATES PARALLEL-FORMS & ALTERNATE- FORMS RELIABILITY compares two equivalent forms of test that measure the same attribute. The two forms use different items; however the rules used to select items of a particular difficulty level are the same. RELIABILITY ESTIMATES PARALLEL-FORMS & ALTERNATE- FORMS RELIABILITY compares two equivalent forms of test that measure the same attribute. The two forms use different items; however the rules used to select items of a particular difficulty level are the same. Coefficient of equivalence - measures the same attributes. Pearson’s R product- moment correlation is used to estimate the reliability. PARALLEL-FORMS & ALTERNATE- FORMS RELIABILITY PROCEDURE: Administer the first test Administer the Alternate form test Score both tests Correlate Disadvantages Hard to develop or construct Time consuming RELIABILITY ESTIMATES 1 group, 2 test, 2 Split Half Reliability administration Obtained by correlating two pairs of scores obtained from equivalent halves of a single test administered once. A itest D s a d is v agiven n t a gand e s divided into halves that are scored separately. The results of one half of the test are then compared with the results of the other. Split Half Reliability One acceptable way to split a test is to randomly assign items to one or the other half of the test. To shorten the items, odd-even system or top bottom method can be used. Disadvantages Top bottom method - 1st half (25 items), 2nd half (25 items) Odd even system - 1st half (odd), 2nd half (even Split Half Reliability The Spearman–Brown formula allows a test developer or user to estimate internal consistency reliability from a correlation of two halves of a test. r = 2r/1 + r D(r) i s aisdthe v a nestimated tages correlation between the two halves of the test Split Half Reliability Ex. You decided to use split-half to establish its reliability. The correlation between two halves of the test is.78. according to the spearman-brown formula, the estimated reliability would be.876 r = 2r/1 + r Disadvantages Computation: r= 2 (.78) / 1+.78 Kuder Richardson (KR20) KR20 is (Kuder & Richardson, 1937) was develop for their own measure of reliability. The formula is applicable for dichotaumus KR21: used Disadvantages when the items have same level of difficulty Kuder Richardson (KR20) KR20 = the reliability estimate (r) N = number of items on the test S²= the variance of the total score p = the proportion of people getting each item correct (this is D ifound s a dseparately v a n t a gfor e seach item) q= the proportion of people getting each item incorrect. For each item, q equals 1 - p Epq = sum of the products of p times q for each item on the test Coefficient Alpha Cronbach's Alpha (Cronbach, 1951) or often called coefficient alpha (a) is the common measure of internal consistency Commonly used for multiple likert scale (survey/questionnaire) Disadvantages USING AND INTERPRETING A COEFFICIENT OF RELIABILITY Purpose of the Reliability Coefficient Measures how similar items are within the constructs of a test. It ensures that different items in a test attempting to measure the same idea do so. (Dillon et al., 2023) USING AND INTERPRETING A COEFFICIENT OF RELIABILITY Nature of the Test Homogeneity Vs. Heterogeneity Dynamic Vs. Static Characteristics Restriction or Inflation of Range Speed Test Vs. Power Test Criterion-referenced Test Homogeneity Vs. Heterogeneity a test is said to be homogeneous in items if it is functionally uniform throughout, by contrast, if the test is heterogeneous in items, an estimate of internal consistency might be low relative to a more appropriate estimate of test-retest reliability. Dynamic Vs. Static Characteristics dynamic characteristic is a trait, state, or ability presumed to be ever-changing as a function of situational and cognitive experiences, while in static these variables are said to be relatively unchanging. Restriction or Inflation of Range the variance of either variable in a correlational analysis is restricted by the sampling procedure used, then the resulting correlation coefficient tends to be lower. If the variance of either variable in a correlational analysis is inflated by the sampling procedure, then the resulting correlation coefficient tends to be higher. Speed Test Vs. Power Test When a time limit is long enough to allow testtakers to attempt all items, and if some items are so difficult that no testtaker is able to obtain a perfect score, then the test is a power test. By contrast, a speed test generally contains items of uniform level of difficulty (typically uniformly low) so that, when given generous time limits, all testtakers should be able to complete all the test items correctly. Criterion-referenced Test Designed to provide an indication of where a testtaker stands with respect to some variable or criterion, such as an educational or a vocational objective. Unlike norm-referenced tests, criterion- referenced tests tend to contain material that has been mastered in hierarchical fashion. USING AND INTERPRETING A COEFFICIENT OF RELIABILITY True Score Model of Measurement and Alternatives to It Classical Test Theory (CTT) Domain Sampling Theory Generalizability Theory generalizability study decision study Item Response Theory (IRT Domain Sampling Theory & Generalizability Theory Items in the domain are thought to have the same means and variances of those in the test that samples from the domain, while in generalizability theory test scores vary from testing to testing because of variables in the testing situation. Classical Test Theory (CTT) The true score model of measurement, true score as a value that according to classical test theory genuinely reflects an individual’s ability (or trait) level as measured by a particular test. Item Response Theory (IRT Refers to a family of theories and methods—and quite a large family at that—with many other names used to distinguish specific approaches, In the context of IRT, discrimination signifies the degree to which an item differentiates among people with higher or lower levels of the trait, ability, or whatever it is that is being measured. PREPARED BY GROUP 1 THANK YOU 31 AUGUST, 2024

Use Quizgecko on...
Browser
Browser