Exam 2 Psych Testing PDF
Document Details
Tags
Summary
This document covers different types of validity, including content validity, criterion validity, predictive validity, and concurrent validity. It also details how to establish these types of validity.
Full Transcript
10/27/24, 3:43 PM exam 2 psych testing exam 2 psych testing agreement between a test score or measure validity...
10/27/24, 3:43 PM exam 2 psych testing exam 2 psych testing agreement between a test score or measure validity and the quality it is believed to measure extent to which a measure appears to have face validity validity, does not offer evidence to support conclusions drawn from a test, is not a determine if items on a test are directly related content validity to what they are assessing, logical 1. define domain of test 2. select panel of qualified experts (NOT item writers) process of establishing content validity 3. panel participates in process of matching items to domain 4. collect/summarize data from matching process using a current test to infer some performance criterion that is not being directly measured. criterion validity supported by high correlations between test score and well defined measure process of establishing criterion validity 1. ID criterion and measurement method 2. ID representative sample 3. Give test to sample and obtain criterion data https://knowt.com/flashcards/a1ef86d0-9920-483a-b4a0-8c89601a4304?isNew=true 1/10 10/27/24, 3:43 PM exam 2 psych testing 4. determine strength of relationship (correlation) between test score and criterion performance how well test predicts CRITERION performance predictive validity in the future assess the simultaneous relationship between a concurrent validity test and CRITERION the relationship between a test and the related validity coefficient criterion (r ) extent to which the test is valid for making statements about the criterion validity coefficient squared (coefficient of relationship between variation in the criterion determination R²) and knowledge of a test score how much variation in Y will we be able to what questions does coefficient of predict on the basis of X? what % of data determination R² ask points will fall in the regression (prediction line) the prediction will be worse there will be more error in the prediction fewer points will fall on result of cross validation the regression/prediction line, but could still have good predictive validity check for restricted range on both predictor evaluating validity coefficients and criterion, review evidence for validity generalization, consider differential prediction often measure things that arent directly observable: requires operational definition, construct validity description of relationships with other variables. process of establishing construct validity assemble evidence about what a test means, each relationship that is identified helps to https://knowt.com/flashcards/a1ef86d0-9920-483a-b4a0-8c89601a4304?isNew=true 2/10 10/27/24, 3:43 PM exam 2 psych testing provide a piece of the puzzle of what the test means restricted range can causse evidence for discriminant and convergent validity expect high correlation between 2+ tests that assess the same construct, not perfect (or convergent evidence exceptionally high >0.90) means the tests are exactly the same if 1 or very close to 1 2 tests of unrelated constructs should have low correlations discriminate between 2 qualities discriminant/divergent evidence unrelated to each other. can drop items measuring unrelated concepts, can create subscales item writing guideline #1 define clearly what you wish to measure generate pool of items write 3-4 for every one item writing guideline #2 you will keep avoids redundant items in final test avoid items that are exceptionally long causing item writing guideline #3 confusion/misleading be aware of reading level (scale and test taker) item writing guideline #4 usually want reading level at 6th grade item writing guideline #5 avoid double barreled items consider using questions that mix positive and item writing guideline #6 negative wording to avoid response set 2 choice each question, requires absolute dichotomous format judgement, can promote memorization without understanding https://knowt.com/flashcards/a1ef86d0-9920-483a-b4a0-8c89601a4304?isNew=true 3/10 10/27/24, 3:43 PM exam 2 psych testing has more than 2 options probably of guessing polychotomous format correctly is lower. formula can be used to correct for guessing poor distractors hurt reliability and validity. distractors rarely more than 3 or 4 work well unfocused stem, negative stem, avoid irrelevant info in the stem, avoid: unequal question length, negative options, clues to correct issue with multiple choice questions answer (use of vague terms may might can), keep correct option and distractors in the same general category rate degree of agreement with statement often used for attitude and personality scales. can likert format use factor analysis. odd # have center evens do not. past 6 options cant discriminate between choices review literature what measures exist already, Test development step #1 how can they be improved define the construct what domain you’ll be test development step 2 sampling from test planning and layout. find representative test development step 3 sample of items that rep that domain well designing the test. brief clear instructions test development step 4 manual/directors for administrators if items are too difficult/easy will not designing the test: item difficulty discriminate between individuals, the test is not informative (for personality tests) if test taker is likely to designing the test: item attactiveness answer yes/true/agree. should be rephrased if more people would agree https://knowt.com/flashcards/a1ef86d0-9920-483a-b4a0-8c89601a4304?isNew=true 4/10 10/27/24, 3:43 PM exam 2 psych testing item tryout. choose sample of indiv that match target pop. initial has 1.5-2x more items than test development step 5 final test. can have multiple ver with diff items combined later. item analysis. people with high levels of characteristic should get high scores, should be a range of scores (not all clumped together) item difficulty/attractiveness: # correct/marked test development step 6 true. difficulty should vary across test. item discrimination index: how well item discriminates bw high and low scorers on the test building the scale. choose items with moderate test development step 7 difficulty, high discriminability. standardizing the test. test used with large rep sample. same conditions and demographics as test development step 8 intended use. if sufficient reliability/validity calc percentiles etc, if not back to item writing/analysis what % of people got item correct. most tests item difficulty have difficulty between 0.3-0.7. optimal difficulty approx 0.625 determines whether people who did well on a particular item have also done well on the item discriminability entire test. items that are too easy/hard wont disc well (too little variability) calc of disc index compares those who did well extreme group method to those who did poorly (top 27% vs bottom 27%) https://knowt.com/flashcards/a1ef86d0-9920-483a-b4a0-8c89601a4304?isNew=true 5/10 10/27/24, 3:43 PM exam 2 psych testing P (upper group correct) - P (lower group discrimination index correct) if the # is negative the item should be removed. range is +1 to -1 aim for > +0.3 factor unobserved (latent variable) data reduction technique find fewest # of factor analysis distinct factors within a data set correlation between an individual item and factor loading newly found factors PCA (principal component analysis) and CPA types of factor analysis (confirmatory factor analysis) threat to validity. affect test scores: examiners behavior, relationship stronger rapport higher score, supportive comments by the admin relationship between examiner and test taker higher score, even subtle cues can be important. score not actually rep of the persons ability threat to validity. anxiety about how one will be eval and how one will perform. members of a stereotyped group pressure to disconfirm stereotype threat negative stereotypes. belief that intelligence is a fixed inherited traits can be changed growth mindset depletes working memory. self handicapping- reduced effect-reduced performance. hypotheses causes of stereotype threat stereotype threat-psych arousal-distrust performance shift demographic questions to end of test. tell possible remedies stereotype threat test taker the test isnt expected to show differences https://knowt.com/flashcards/a1ef86d0-9920-483a-b4a0-8c89601a4304?isNew=true 6/10 10/27/24, 3:43 PM exam 2 psych testing many tests highly linguistic puts non-native eng speakers at disadvantage. translated texts must be professionally translated and validated language of tests but often incomplete/not validated introducing error, bias. valid if hold for those whom the testing language isnt native often requires high level of trainig. standardization important part of tests validity. training test admin research finds a minimum of 10 practice admin necessary to gain competence. data can be impacted by what we expect to expectancy effects find can be subtle often unintentional adds bias (error into the assessment) test takers (particularly kids) work hard for approval/praise. reactions given to a given effects of reinforcing responses answer can influence future answers. reward (praise improves IQ test scores. how does reinforcing responses violated hurts testing ability, shifts away from standardized admin protocols measuring solely their own ability what situations might legit call for adj to learning disability, visual impairment, person is standard admin upset highly standardized, indiv tailored sequentail admin, precision of timing responses, releases computer assisted test admin advantages human testers for other tasks, patience, control of bias reading comprehension is worse on screen than on paper, cant underline/cross things disadvantages computer admin tests out/take notes, paper tests flipping bw questions. misinterpretation: still need to know clinical judgement gives context https://knowt.com/flashcards/a1ef86d0-9920-483a-b4a0-8c89601a4304?isNew=true 7/10 10/27/24, 3:43 PM exam 2 psych testing state of the subject can and does infl tst subject variables performance. motivation and anxiety, illness predetermined set of questions, usually standardized interview directive (largely guided and controlled by interviewer) questions depend on client response, able to know whats important to the client. often unstandardized interview nondirective: largely directed by client interviewer asks fewer questions selective interview identify qualifications and capabilities usually more standardized emotional/cognitive diagnostic interview functioning method for gathering info, used to describe interview as a test make predictions or both, reliability and validity are eval, each type has a defined purpose some tests cannot be properly done w/o an interview sometimes the interview is the test what is the role of an itnerview itself: selection for employment, diagnostic interviews human interactions any time we interact we interviews are reciprocal impact e/o we tend to act like those around us if interviewer acts defensive so will client. good interviewers remain in control and set the tone social facilitation not reacting to clients tension/anxiety with more, remain relaxed confident calming effect on client more of attitude than skill. warmth, (body effective interviewing language, tone of voice, genuineness, acceptance, honesty, fairness https://knowt.com/flashcards/a1ef86d0-9920-483a-b4a0-8c89601a4304?isNew=true 8/10 10/27/24, 3:43 PM exam 2 psych testing cold, defensive, uninterested/uninvolved, aloof interviewers are rated poorly when seen as (hard to connect with), bored judgmental/evaluative statements, probing responses to avoid statements (why did you do that-demanding), hostile/angry statements, false reassurance keep interaction flowing smoothly. use open ended questions when possible, respond without interruption. urge/prompt to continue effective responses with “yes” “and” “bc”. verbatim playback, paraphrasing/restatement, empathy/understanding, summarizing/clarification verbatim feedback repeat last responses exactly less structured, lets interviewee lead topic of why use open ended questions for effective interest/importance, requires client to prod responses something spontaneously. pull together meaning across responses goes just beyond what they said to tie things summarizing/clarification together, ensure understanding what they’re saying similar to verbatim feedback, captures the paraphrasing/restatement meaning, does not add anything additional, shows you’re listening may/may not be reflection of what was said empathy/understanding infer and state emotions that were implied understanding theory and research, supervised practice, make conscious effort to apply developing interviewing skills interviewing principles: requires constant self eval, am i communicating i understand?, what am i communicating non verbally? https://knowt.com/flashcards/a1ef86d0-9920-483a-b4a0-8c89601a4304?isNew=true 9/10 10/27/24, 3:43 PM exam 2 psych testing judge person during interview based on first impression (1st few minutes) can be favorable or unfavorable impression. impairs objectivity halo effect adding error. even in highly structured interviews impressions made during rapport building can influence the evl impressions made during rapport building can even in highly structured interviews influence the eval judge on basis of one outstanding general standoutishness characteristic prevents objective eval. make unwarranted infer based on the characteristic low reliability no standardization different answers based on interpersonal characteristics, unstructured interviews have ask diff questions, yet interviews tend to give fairer outcomes than other test (better validity with worse reliability) can have high levels of reliability may limit structured interviews content obtained during interview https://knowt.com/flashcards/a1ef86d0-9920-483a-b4a0-8c89601a4304?isNew=true 10/10