Psychometrics in RT Assessment PDF

Psychometrics in RT Assessment RCTX 3254 “Psychometrics”  the field of study concerned with the theory and technique of educational and psychological measurement, which includes the measurement of knowledge (achievement), abilities, attitudes, and personality traits. The field is primarily concerned with the study of differences between individuals. It involves two major research tasks, namely: (i) the construction of instruments and procedures for measurement; and (ii) the development and refinement of theoretical approaches to measurement Validity vs. Reliability Does the instrument provide similar Reliabilit results when administered a second time? y Dependability STABILITY “The extent to which an instrument Validity measures what it is supposed to measure” (Berg & Latin, 2004) Types of Evidence of Validity Internal vs. External? Types of Internal Validity Logical Statistical Face Criterion Content Concurrent Predictive Construct Divergent Convergent Responsiveness  Appears at “face value” that this instrument measures what it is “Face supposed to measure Validity”  Weakest form of validity  Example: 40 yard dash time for “speed”  Extent to which the items or questions measure the desired information  Bestused for questionnaires or written instruments when “Content comparison to another standard is not possible (Berg & Latin, 2004) Validity”  No statistical value  How to measure? Expert Panel or Jury of Authorities  Stronger than Face Validity  Comparison of scores to an acceptable standard or criterion “Concurrent  “Gold Standard” (Most accurate)  Typically indicated with Pearson Validity” or Spearman Correlation  Example: Underwater weighing- body composition to Body Mass Indices  the extent to which a score on a scale or test PREDICTS scores on some other “Criterion measure.  SAT/GRE: “Student Success” Validity”  Mini-Mental Status Score (MMSE) PREDICTS symptoms of dementiaxtent to which a score on a scale or test predicts scores on some “Convergent Validity”  When 2 similar traits/variables measured are highly correlated when they are measured by similar instruments .80 correlation?? “significantly higher than.00”  Example:  CES-D & Beck Depression given to same sample  When 2 instruments measuring DISSIMILAR constructs have a low “Discriminan correlation between them t  “correlation 1.00” significantly lower than Validity”  Example:  CES-D (Depression) & Spielberger’s State- Trait (Anxiety) given to same sample “Responsiveness” The ability of the instrument to detect change over time  Responsive evaluative measures are able to detect important changes in health during a period of time, even if those changes are small. Reliability “Repeatable” “Trustworthy” “Reliable” Notes about Reliability A Measure of Reliability refers to the scores or data and NOT the instrument  HIGH RELIABILTY DOES NOT ASSURE VALIDITY “Reliable” Types of Reliability Evidence for Assessments Types of Reliability Measures  Stability  Equivalence  Internal Consistency  Objectivity “Stability”  Are results on two separate occasions correlated?  aka: “Test-Retest” Method  Stabilitymeasures not accurate for knowledge or paper-and-pencil tests  Better indicator for  Physicalfitness (heart rate and blood pressure)  Motor performance  Time b/t administrations should be timed appropriately  Too long??  Too short??  Statistic: Pearson r, Spearman r, Multiple r, standard error of estimate, coefficient of determination “Equivalence”  Scoresfrom two versions of the same test are correlated  Used for knowledge tests to determine reliability indices for standardized tests (e.g., ACT or SAT)  “Parallel or Alternate Forms Method” English vs. Spanish versions Adult vs. Child versions Long vs. Short  Statistics: Pearson r or Spearman r “Internal Consistency”  How consistent scores are within a single test?  How evaluated?  First half vs. Second half  Odd vs. Even Items  Statistics: Pearson r (half-test), Spearman- Brown r (whole test correlation) “Objectivity”  Consistency of scores across more than one tester  AKA “Interrater Reliability”  Best used with behavioral observations or ratings  <.50% agreement = “poor” or “unacceptable”  Behavioral observation (80% or greater)  Examples in Healthcare Assessment?  Statistics: Pearson r, Spearman r, Multiple r, Coefficient of determination What is the Inter-rater Agreement Reliability between these observers? Interval 1 2 3 4 5 6 7 8 9 10 Observer 1 PRIMARY Observer 2 Key: + = M&M occurrence 0 = M&M nonoccurrence Summary  Validity: Does your Instrument Measure what is intended?  Reliability: Does your Instrument produce REPEATABLE results?  Ways to Determine if your Assessments are Valid & Reliable  1) Change Your Current Assessment  Adopt a Valid Tool  2) Evaluate Your Current Assessment Procedures  ATRA SOP  3) Test Your Current Assessment

Psychometrics in RT Assessment PDF

Document Details

Tags

Related

Summary

Full Transcript