PSYCH ASSESS LESSON 4.pdf

Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...

Full Transcript

1st Semester S.Y. 2024-2025 Far Eastern University – Institute of Arts and Sciences PSY1207 PSYCHOLOGICAL ASSESSMENT Property of LARA, PIA ALTHEA S....

1st Semester S.Y. 2024-2025 Far Eastern University – Institute of Arts and Sciences PSY1207 PSYCHOLOGICAL ASSESSMENT Property of LARA, PIA ALTHEA S. Chapter No. # Cronbach colleagues – major Reliability and Validity advancement by developing methods for evaluating many sources of error in 1.0 Reliability behavioral research It refers to the consistency of scores obtained by the same persons when they are reexamined with 1.2 Basics of Test Score Theory the same test on different occasions. Classical Test Score Theory – each person has a o Consistency of Measurement true score that would be obtained if there were no error in measurement. 1.1 History and Theory of Reliability Conceptualization of Error o Score on ability test is presumed to reflect not only the testtaker’s true score o Psychology vs. Physical Science on ability being measured but also the o Measurement task in psychology is error. more difficult. o Errors - component of the observed test Spearman’s Early Studies scores that does not have to do with the Charles Spearman (British psychologist) – taker’s ability. advanced development of reliability assessment Difference between true score and observed score (psych owes him) results from measurement error: X (observed score)= o Karl Pearson – developed product T(true score) + E(error) moment correlation X–T=E o Abraham De Moivre – introduced basic notion of sampling error Reasons why observed scores are higher or lower than true scores: Spearman worked in the basics of contemporary reliability theory. 1. Test itself 2. Test-taker 1904 – published his work entitled “The Proof and 3. Environment Measurement of Association between Two Things” 4. How the test was scored o Published in the American Journal of Measurement Error – factors associated with Psychology the process of measuring some variable, other o This article caught the attention of than the variable being measured. Edward L. Thorndike. o Example: whiz kids Random Errors – source of error in measuring Edward Thorndike – measurement pioneer and a targeted variable caused by unpredictable was writing the first edition of “An Introduction to fluctuations and inconsistencies of other variable the Theory of Mental and Social Measurement” in in the measurement process. 1904 o Example: Unanticipated physical events happening within the testtaker: high or 1904 – many developments on both low blood pressure, or blood sugar, sides of the Atlantic Ocean have led to emotional concerns further refinements in the assessment of Systematic Error – source of error in reliability. measuring a variable that is typically constant or 1937 – article by Kuder and Richardson proportionate to what is presumed to be the true has introduced several new reliability value of the value being measured. coefficients Page 1 of 7 1st Semester S.Y. 2024-2025 Far Eastern University – Institute of Arts and Sciences PSY1207 PSYCHOLOGICAL ASSESSMENT Property of LARA, PIA ALTHEA S. o Systematic error becomes known it o Compare the scores of the becomes predictable and fixable. individuals who has been measured twice by the same instrument Range of Reliability Coefficient is : 0.00 – 1.00 o Of value only when the measured Classical Test Theory – standard deviation characteristic does not change over of errors as the basic measure of error. Also time called as the standard error of measurement. ▪ Children = shorter intervals o SD tells us about the average (developmental progress deviation around the mean. during the early ages is fast) o Standard error of measurement tells Other concerns: the average of how much a score varies from the true score. Carryover effects – occurs when the first testing session influences scores from the Domain Sampling Model – another central concept second session. in classical test theory. This model considers the o Systematic Carryover – consistent problem created by using limited number of items to influence that affects everyone the represent a larger and more complicated construct. same way. Example: when Example: Example of Spelling Ability everyone’s score improves exactly 5 points Reliability of a test increases as the number o Random Carryover - changes are of items increases. not predictable from earlier scores or when something affects some but Item Response Theory – most important new development relevant to psychometrics; most of the not all test takers methods for assessing reliability depend on Classical Practice Effect – one type of carryover test theory. effect; some skills improve with practice. Classical Test Theory vs Item Response Item Sampling: Parallel Forms Method Theory Parallel forms reliability - compares two Classical Test Theory – exact same test items equivalent forms of a test that measure the same to be administered to each person. attribute. Item Response Theory – focus on the range o 2 forms use diff items but the rules used of item difficulty that helps assess an indiv to select items of a particular difficulty ability level level are the same. The two forms of the same test should: 1. Be independently constructed tests designed to meet the same specifications 2. Contain the same number of items 3. Have items which are expressed in the same 1.2 Models of Reliability form Time – Sampling: The Test-Retest Method 4. Have items that cover the same type of content Test-retest reliability estimates are used to 5. Items with same range and level of difficulty evaluate the error associated with 6. Have the same instructions, time limits, administering a test at two different times. illustrative examples, format, etc. Page 2 of 7 1st Semester S.Y. 2024-2025 Far Eastern University – Institute of Arts and Sciences PSY1207 PSYCHOLOGICAL ASSESSMENT Property of LARA, PIA ALTHEA S. Split-Half Reliability o Covariance – occurs when the items are correlated with each Two scores are obtained for each person by other dividing the test into equivalent halves Coefficient Alpha Advantage: Only 1 test is needed for the computation of the reliability coefficients where one receives a different numerical score on an item, depending on whether one checks “usually, Problem: How to split the test sometimes, rarely, or never” VARIETY OF WAYS TO SPLIT THE TEST: o variance of all individual’s scores for 1. First-half and second-half of the items – each item, and then to add these when items are arranged according to order variances across all items (covariance of difficulty. matrix) 2. Odd-even system – when the items gets o It is an index of the internal consistency progressively more difficult of the items (tendency to correlate 3. Other splits - items 1 & 2 go into the 1st positively with one another. score o Indicator of interrelatedness of the a. Coefficients computed by this individual items but not synonymous method are called with unidimensionality of what the coefficients of internal consistency. test/scale measures. (two or more b. Reliability of the test is directly distinct factors and yet still possess a related to the length of the test. very strong coeffecient alpha) When this method is used, we 2.1 Reliability in actually cut the length of the Behavioral original test in half Observation Studies c. Reliability coefficient we compute Psychologists with behavioral orientations prefer is then the equivalent of one for a not to use psychological tests. They favor direct test of half the size of the original observation of behavior. test Reliability estimates: interrater, interscorer, Spearman-Brown Prophecy Formula - estimates interobserver, or interjudge reliability what the correlation between the two halves would have been if each half had been the length of the o Considers the consistency among whole test different judges who evaluate the same behavior. Kuder-Richardson 20 or KR20 Formula o Employed when instruments used Test whose items are scored as right/wrong or are subjectively scored according to some other all-or-none system o Ways: (dichotomous) ▪ Percentage of time the 2 or more observers agree Simultaneously considers all possible ways of ▪ Kappa statistic – the best splitting the items method When the items covary, they can be assumed to Kappa Statistics – best method for assessing the measure the same general trait, and the level of agreement among several observers. reliability of the test will be high Page 3 of 7 1st Semester S.Y. 2024-2025 Far Eastern University – Institute of Arts and Sciences PSY1207 PSYCHOLOGICAL ASSESSMENT Property of LARA, PIA ALTHEA S. o Introduced by J. Cohen (1960) as a b. New items added, the test must be measure of agreement between reevaluated two judges who each rate a set of c. Applying Spearman-Brown objects using nominal scales prophecy formula can estimate how o Indicates the actual agreement as a many times will have to be added in proportion of the potential order to bring a test to an acceptable agreement following correction for level of reliability. chance agreement 2. Factor and Item Analysis o Values of Kappa may vary a. Reliability of a test dependes on the between 1 (perfect agreement) and extent to which all the items -1 (less agreement than can be measure one common characteristics expected on the basis of chance b. Factor analysis alone. c. Item Analysis or Discriminatory Analysis – when the correlation Behavioral Observation is difficult and expensive. between the performance on a single Behavioral Observation may soon be revolutionized item and the total test score is low, with the advent of new technologies and big data the item is probably measuring analysis programs something different from the other items on the test. SMART SOFTWARE TO IDENTIFY AND 3. Correction for Attenuation RECORD PARTICULAR BEHAVIORS: a. Potential correlations are attenuated 1. Cameras or diminished by measurement error 2. Sensors b. Use the methods, one needs to know 3. Motion Detectors only the reliabilities of 2 tests and the correlation between them. 2.2 How Reliable is Reliable? Tests that are relatively free of measurement error Reliability estimates in the range of.70 and.80 are considered to be reliable, and tests that contain are good enough for most purposes in basic relatively large measurement error are considered research. unreliable. Suggested that reliabilities greater than.95 are not very useful because they suggest that all of 3.0 Validity the items are testing essentially the same thing It can be defined as the agreement between a test and that the measure could easily be shortened score or measure and the quality it is A report from the National Academy of Sciences believed to measure notes that extremely high reliability might be Validity is the evidence for inferences made expected for tests that are highly focused. about a test score. There are three types of evidences: Clinical Settings: High reliability is extremely a. Construct related important (tests are used to make important b. Criterion related decisions about someone’s future) c. Content related 2.2 What to do about Low Reliability? 3.1 Aspect of Validity 1. Increase the Number of Items Face Validity a. Reliability of test increase as the o Mere appearance that a measure has number of items increases validity Page 4 of 7 1st Semester S.Y. 2024-2025 Far Eastern University – Institute of Arts and Sciences PSY1207 PSYCHOLOGICAL ASSESSMENT Property of LARA, PIA ALTHEA S. o Least stringent type of validity (not o Method for gauging agreement among really a form of validity at all) raters or judges regarding how essential o Whether a test looks valid to test users, a particular item is. examiners and examinees o Lawshe (1075) proposed that each item o Social acceptability and not a technical the rater respond to the following form of validity like the rest question “Is the skills or knowledge o Crucial tests possess it otherwise those measured by this item?” who take the test may be dissatisfied and Essential doubt the value of psychological testing. Useful but not essential Not necessary Example: Anxiety Test Criterion-related Validity o My stomach gets upset when I think about o Indicates the test effectiveness in taking test. predicting/ estimating an individual’s behavior in a particular A test could possess extremely strong face validity situation (criterion) yet produce no totally meaningless scores. Criterion – any outcome measure against which is a Content Validity test is validated (can be anything) o Only type of validity besides face validity that is logical rather than must itself be reliable if it is to be a useful statistical. index of what the test measures o Test covers the behavior domain to be 2 SUBTYPES: Concurrent and Predictive measured which is built through the Predictive Validity - scores on a test can predict choice of appropriate content areas, future behavior or scores on another test taken in questions, tasks, and items. the future o Has been of greatest concern in the Example: Entrance examinations and educational setting employment tests o Experts can review the domain Concurrent Validity - comes from assessments specification and judge test items if they of the simultaneous relationship between the test possess CV and the criterion o Requires good logic, intuitive skills and perseverance in evaluating the items Validity Coefficient - relationship between a test and a criterion is usually expressed as a correlation TWO NEW CONCEPTS RELEVANT TO CONTENT VALIDITY EVIDENCE: o Rarely validity coefficient larger than.60 and validity coefficients in the range Construct underrepresentation – failure to of.30 -.40 are commonly considered capture important components of a construct adequate. o Mathematical test – covers algebra Construct Validity but not geometry o Judgement about the appropriateness of Construct-irrelevant variance – occurs inferences drawn from test scores when scores are influenced by factors regarding individual standings on a irrelevant to the construct variable called construct o Intelligence test influences by reading comprehension, test Construct – defined as something built by mental anxiety/illness synthesis. Ex: intelligence, love, curiosity, mental Content Validity health Page 5 of 7 1st Semester S.Y. 2024-2025 Far Eastern University – Institute of Arts and Sciences PSY1207 PSYCHOLOGICAL ASSESSMENT Property of LARA, PIA ALTHEA S. Construct Validity Evidence – series of activities in constructs, or evidence for what the which a researcher simultaneously defines some test does not measure construct and develops the instrumentation to measure it o Researcher investigating a test’s construct validity must formulate hypothesis about the expected behavior of high scorers and low scorers. o Hypothesis give rise to a tentative theory about the nature of the construct the test was designed to measure. (If results obtained is contrary to those predictions it simply does not measure the construct) EVIDENCE OF CONSTRUCT VALIDITY: ❖ Test is homogenous, measuring a single construct ❖ Test scores increase or decrease as a function of age or experimental manipulation ❖ Test scores obtained after some event or the mere passage of time differ from pretest scores as theoretically predicted ❖ Test scores obtained by people from distinct groups TWO TYPES OF EVIDENCE: ❖ Convergent Evidence - measure correlates well with other tests believed to measure the same construct, convergent evidence for validity is obtained. o This evidence shows that measures of the same construct coverage or narrow in on the same thing o Example: Test Anxiety vs General Anxiety ❖ Discriminant Evidence/ Divergent Validation - type of evidence a person needs in test validation is proof that the test measures something unique o To demonstrate discriminant evidence for validity, a test should have low correlations with measures of unrelated Page 6 of 7 1st Semester S.Y. 2024-2025 Far Eastern University – Institute of Arts and Sciences PSY1207 PSYCHOLOGICAL ASSESSMENT Property of LARA, PIA ALTHEA S. Page 7 of 7

Use Quizgecko on...
Browser
Browser