R-Norms-Reliability PDF
Document Details
Uploaded by TopGorgon2831
University of St. La Salle
Tags
Summary
This document discusses norms and reliability in psychological assessment. It covers topics such as raw scores, derived scores, grade equivalents, ordinal scales, and developmental schedules. The document also touches on concepts like percentile ranks and correlation.
Full Transcript
PSYC109 – PSYCHOLOGICAL ASSESSEMENT MOD2: LESSONS 2 & 3 MAGBANUA, KEILAH NORMS & MEANING OF TEST SCORES B. Grade Equivalents Found by computing the mean RS ❖ Raw Score (RS)...
PSYC109 – PSYCHOLOGICAL ASSESSEMENT MOD2: LESSONS 2 & 3 MAGBANUA, KEILAH NORMS & MEANING OF TEST SCORES B. Grade Equivalents Found by computing the mean RS ❖ Raw Score (RS) obtained by children in each grade o refers to the distribution of scores obtained by the standardization sample Test children at different times to find out where he/she falls in that within the school year distribution Successive months over 10 months o normally indicates the number of of school can be expressed as correct responses obtained on a test decimals o for RS to be more meaningful, it is C. Ordinal Scales converted to some relative measure Designed to identify the stage (derived scores) reached by the child in the o Why convert raw scores to derived development of specific behavior scores? functions 1. To indicate the individual’s relative Developmental tasks that an standing in the normative sample, infant/child can do at a specific time thus permitting an evaluation of his/her performance in reference to ❖ GESELL DEVELOPMENTAL SCHEDULES others. o Show the approximate developmental 2. To provide comparable measures level in months that the child has that permit a direct comparison of attained in 4 major areas of behavior: the individual’s performance on motor, adaptive, language, personal- different tests. social ❖ Derived Scores o description of behavior typical of o Can be expressed in either of the 2 successive ages in functions as ways: locomotion, sensory discrimination, 1. Developmental level attained linguistic communication, concept 2. Relative position within the group formation DEVELOPMENTAL NORMS ❖ Jean Piaget ❖ Indicate how far along the normal o Studied the development of cognitive developmental path the individual has processes from infancy to mid-teens progressed o Attainment of one stage is dependent A. Mental Age upon completion of the earlier stages in Introduced in the Binet-Simon scales the development of the concept Used in tests where the items are grouped into year levels WITHIN-GROUP NORMS ▪ Basal age – highest age or point where an individual can answer ❖ An individual’s performance is compared the items with that of the standardization group. ▪ Ceiling age – point where an A. Percentiles individual can no longer answer Scores are expressed in terms of the ▪ Mental Age = sum of basal age percentage of persons in the and the additional months of credit earned at higher age levels PSYC109 – PSYCHOLOGICAL ASSESSEMENT MOD2: LESSONS 2 & 3 MAGBANUA, KEILAH standardization sample who fall distribution sheet or on a sheet of below a given RS lined paper. Indicates the individual’s relative 4. Tally the number of students earning position in the standardization scores falling in each group. sample 5. Sum the tallies in each score Rank in a group of 100 interval and record these B. Standard Scores frequencies. Express the individual’s distance from the mean in terms of the SD of the distribution ▪ Z Score or Standard Score (SS) – obtained by subtracting a constant from each raw score and dividing the result by another constant z: mean = 0, SD = 1 ❖ Computing Percentile Ranks SS: mean = 500, SD = 100 o Percentile ranks are computed from ▪ Normalized Standard Scores – cumulative frequencies. These are expressed in term of a obtained by adding frequencies in the distribution that has been score distribution from the bottom up, transformed to fit a normal curve so that the number opposite each T score: mean = 50, SD = 10 score group equals the sum of the Stanine (standard nine): frequency for that and the frequencies mean = 5, SD = +/- 2 for all the groups below it. The number Deviation IQ (or DIQ): opposite the top score group should mean = 100, SD = 15 equal the total number of students in the norms group. ------------------------------------------------------------- o The Percentile Rank for any score NORM CONSTRUCTION: PERCENTILE RANKS group is determined by doing the following steps: ❖ Preparing the score distribution 1. Find one-half the frequency for that o The following steps are suggested for score group. preparing distribution for total scores: 2. Add the result of (a) to the 1. Look through the scores to find the cumulative frequency for the score highest and the lowest score. group just below the group in 2. Set up two-score intervals running question. from the highest down to the lowest 3. Divide the result of (b) by the total score number of students in the norms Ex: 306-307, etc. group, taking the answer to the Care should be taken that no nearest hundredth. score groups overlap. 4. Multiply the answer from (c) by 100. 3. List score intervals from the highest to the lowest on the score PSYC109 – PSYCHOLOGICAL ASSESSEMENT MOD2: LESSONS 2 & 3 MAGBANUA, KEILAH o Percentiles are points in a continuous ❖ Interpretation of Correlation Value distance below which lie given 0.00 zero correlation percentages of N. 0.01 to ± 0.20 negligible correlation 0.21 to ± 0.40 low or slight correlation 0.41 to ± 0.70 marked or moderate relationship 0.71 to ± 0.90 high relationship 0.91 to ± 0.99 very high relationship ------------------------------------------------------------- ± 1.00 perfect correlation RELIABILITY ❖ refers to the consistency of scores Error of Measurement obtained by the same persons when they ❖ Refers to the range of fluctuation likely to are re-examined with the same test on occur in a single score as a result of different occasions, or with different sets of irrelevant or unknown chance factors equivalent items, or under other variable examining conditions. X=T+E ❖ can be expressed in terms of a correlation X – observed score coefficient T – true score ❖ Correlation coefficient (r) E - error o expresses the degree of correspondence or relationship E=X–T between two sets of scores ❖ Systematic Error o may be computed in different ways, o Error which is consistent across uses of depending on the nature of the data the measurement tool (i.e., test or scale) o reliability coefficients usually fall in the and is likely to affect validity, but not.80 to.90 range reliability. CORRELATION FORMULA o Examples include an incorrectly worded item, poorly written directions, ❖ Pearson Product Moment Correlation inclusion of items unrelated to the o Interval Scores content, theory, etc. upon which the measurement tool is based. ❖ Random Error o exerts a differential effect on the same examinee across different testing ❖ Spearman’s Rank Correlation sessions o Ordinal data o this inconsistency affects reliability o vary from examinee to examinee; there is no consistency in the source of error PSYC109 – PSYCHOLOGICAL ASSESSEMENT MOD2: LESSONS 2 & 3 MAGBANUA, KEILAH o Source of random error: o Example: 1. Individual examinee variations A Math test was given to students on 2. Administration condition variation Monday & will be given again the next such as noise, temperature, lighting, Monday without any Math lessons been seat comfort taught between these times. The scores 3. Measurement device bias of the students after the first a. Gender, culture, religion, administration & their scores during the language, ambiguous wording of second administration are correlated. test item The resulting index is the reliability 4. Participant bias e.g., guessing, coefficient. motivation, cheating, & sabotage 5. Test administrator bias such as o Disadvantages: nonstandard directions, a. when the time interval is short, the inconsistent proctoring, scoring respondents may recall their errors, inconsistent score or results previous responses & this tends to interpretation make the correlation coefficient high (memory effect) TYPES OF RELIABILITY b. when the time interval is long, ❖ Can be identified through one of or more factors such as unlearning & occasions of test, one or more test forgetting may result in low instrument, or both correlation of the test ❖ Internal: extent to which a measure is c. environmental conditions such as consistent within itself noise, temperature, lighting & other ❖ External: extent to which a measure varies factors may affect the correlation from one use to another coefficient of the test. 1. Test-retest o involves administering the same test 2. Alternate-form twice to the same person or group after o also, equivalent or parallel reliability a certain time interval has elapsed o uses two different but equivalent forms o measures consistency of scores over of the test administered to the same time group o the longer the time interval, the lower o fluctuation in performance depends on the reliability the different form of test, not over time o keep the interval short, should rarely o How to ensure a parallel form of test: exceed 6 months a. contain the same number of items o Procedure: b. expressed in the same form a. Administer the test. c. cover the same type of content b. Wait awhile (preferably two weeks), d. equal range and level of difficulty then administer it again to the same individuals. 3. Split-half c. Compute the correlation between o also, internal consistency the two results. o two scores are obtained separately for o Test is considered reliable if the each person by dividing the test into correlation is high (e.g., above +.90) equivalent halves PSYC109 – PSYCHOLOGICAL ASSESSEMENT MOD2: LESSONS 2 & 3 MAGBANUA, KEILAH o use odd & even items used only if it can be assumed that the o the longer the test, the higher the items are of equal difficulty. reliability o Spearman Brown formula K = number of items on the test M = mean of the set of test scores o Procedure: a. Administer the test once. SD = standard deviation of the set of b. Randomly split items into two test scores groups with half the items in each 5. Coefficient alpha group (split halves/odd and even o also, Cronbach alpha items). o used in items not scored as 1 or 0 c. Score the split halves separately. o also used in tests where two or more d. Compute Pearson r on the resulting scoring weights are assigned to answers pairs of scores. o personality or attitude scales o Advantage: o used in calculating the reliability of a. Requires only one testing session. items in essay tests where more than b. Eliminates the possibility that the one answer is possible variable being measured will change between measurements. 6. Scorer reliability o Disadvantage: o also, interscorer or interrater a. No guarantee that the two “split o score depends upon the judgment of the halves” are equivalent. If not, then method underestimates the scorer reliability of the test. o examiner variance or scorer variance 4. Kuder-Richardson RELIABILITY STANDARDS o also, inter-item consistency o based on the consistency of responses ❖ For instruments where groups are to all items in the test concerned: 0.80 or higher is adequate. o used in tests scored as 1 or 0, right or ❖ For decisions about individuals: 0.90 is the wrong bare minimum: 0.95 is the desired o the more homogenous the domain, the standard. higher the consistency TECHNIQUES FOR MEASURING RELIABILITY o influenced by 2 factors: a. content sampling One test form Two test forms One test Split-half b. heterogeneity of behavior domain session Kuder-Richardson sampled Coefficient Alpha Two test Test-retest Alternate form o Kuder-Richardson Formula 21 (KR21) – sessions (immediate) requires only three pieces of Alternate form (delayed) information—the number of items on the test, the mean, & the standard deviation. Note, formula KR21 can be PSYC109 – PSYCHOLOGICAL ASSESSEMENT MOD2: LESSONS 2 & 3 MAGBANUA, KEILAH SOURCED OF ERROR VARIANCE ❖ The spread of scores produced by the assessment—the larger the spread of Type of reliability Error Variance results, the higher the reliability. coefficient ❖ The clearness of marking guides and Test-retest Time sampling checking of marking procedures. Scoring Alternate-form errors (e.g., inconsistent scoring) will (immediate) Content sampling depress a reliability estimate. Keep scoring (delayed) Time and content simple and consistently applied. Sampling ❖ Item Quality: Poorly constructed test items introduce ambiguity into the testing Split-half Content sampling situation, thus affecting examinee performance. Kuder-Richardson Content sampling & ❖ The suitability of the questions or tasks for & Coefficient content heterogeneity the students being assessed—questions Alpha that are too hard or too easy for the students will not increase reliability. Scorer Interscorer differences ❖ The training of the assessors. ❖ The wording of the rubric—carefully worded rubrics make it easier to decide on FACTORS AFFECTING RELIABILITY achievement levels. ❖ How closely standardized procedures and ❖ Test length. Generally, the longer a test is, conditions for assessment are followed. the more reliable it is. If a test is too short, ❖ How well questions and tasks are phrased. the reliability coefficient is low. ❖ Variation with the testing situation. Errors ❖ Speed. The rate at which an examinee in the testing situation (e.g., students works will systematically influence misunderstanding or misreading test performance. directions, noise level, distractions, and ❖ Group homogeneity. In general, the more sickness) can cause test scores to vary. heterogeneous the group of examinees, ❖ The anxiety or readiness of the students for the higher the correlation coefficient or the assessment—assessing students when more reliable the measure will be. they are tired or after an exciting event is ❖ Item difficulty. When there is little less likely to produce reliable results. variability among test scores, the reliability ❖ Other threats: These include differences in will be low. content on test or measurement forms; ❖ Objectivity. Objectively scored tests, administration, examinee, and/or scoring rather than subjectively scored tests, show errors; guessing, effects of memory, a higher reliability. practice, boredom, etc. ❖ Test-retest interval. The shorter the time interval between two administrations of a test, the less likely that changes will occur and the higher the reliability will be. ❖ The number of tasks in the test or assessment—more tasks will generally lead to higher reliability.