Week 7 lecture slides.pptx

Full Transcript

WRITING AND EVALUATING TEST ITEMS: PART 2 PSY61204 Psychological Tests and Measurements Dr Michele Anne Overview Interpreting test scores Item characteristics Norms Combining test scores Interpreting Test Scores Raw scores The total scores from tests are raw scores Raw scores...

WRITING AND EVALUATING TEST ITEMS: PART 2 PSY61204 Psychological Tests and Measurements Dr Michele Anne Overview Interpreting test scores Item characteristics Norms Combining test scores Interpreting Test Scores Raw scores The total scores from tests are raw scores Raw scores are meaningless, but can get meaning when compared to norm scores to understand individual performance compared to the group / society For this comparison, raw scores must be converted to derived scores such as Percentiles Standard scores T scores Stanines Percentile Percentile are ranks within the group scores Each score is assigned a rank based on group performance, and individual score is converted to associated percentile for comparison Higher rank indicates higher score Highest percentile is 99 (full score) and lowest is 1 Limitation: Percentiles are ordinal scale, and will differ in the gap in scores Standard scores Standard deviation is how many units the score deviates from the mean. This allows comparison across different test. To compare scores based on standard deviation, it has to be converted to standard scores or z scores Can be positive or negative values (deviates higher or lower from mean) z=X–M X is raw score SD M is mean SD is standard Standard deviation T scores z scores are converted to T scores to allow comparison with preferred scale of measurement (in terms of mean and standard deviation) For easier comparison across test of same domain and removes decimal values E.g., T scores for personality test always represents mean of 50 and SD of 10. T score for educational testing always represents mean of 500 and SD of 100. T = z (desired SD) + desired M Stanine Changes raw scores into normal distributions Uses nine score range for 1 (low) to nine (high), with mean of 5 and SD of 2 Scores are assigned based on percentage across normal distribution Stanine Comparison of derived scores Item Characteristics Item difficulty Item difficulty is percentage of people who answered item correctly / keyed response Range of difficulty with average of average of p =.50 is ideal, but may vary based on type of test Measured in percentage, but can be converted to z score or t score using table of normal curve frequencies Item discrimination Item discrimination is the ability of item to correctly discriminate between those who are higher and lower on the variable being measured Need to define what is a high and low score first Can be based on total scores (internal consistency) or external criterion Underlying approach is either factor analysis (dimension) or empiricism (reflective of real behaviour) Item discrimination (cont.) Item response theory Assumes basic performance on test is due to unobservable proficiency variable Four aspects Ability of person on variable measured Extent test discriminates between high and low scores Item difficulty Probability of person with low ability making correct response (guessing) Norms Selection of norms Norms are performance of the population, which can be obtained from combining several groups’ performances Norms can be based on random sampling stratified sampling (reflective of percentage in population) or samples of convenience special groups e.g., gender, age, education, geographic location Expectancy tables are used to predict what can be expected from a certain score (in percentage) Criterion-referenced testing Scores are not compared against group norms, but against a criterion which reflects mastery Criterion can be explicit (a number of things must be fulfilled) or implicit (examiner’s subjective judgement) Need to first specify the criterion (what combination of skills reflects mastery for that area) Combining Test Scores Combining scores Scores are combined to obtain meaningful results and make decisions Items scores can be combined to form total scores Total scores from several test can be combined to create composite scores of a bigger domain Can be done by converting raw scores to z scores or T scores Differential weighting vs unit weighting Subjective judgement of examiner Questions?

Use Quizgecko on...
Browser
Browser