Psychology: Scorer Differences in Assessment
80 Questions
2 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

Which method is used to determine the level of agreement between two or more raters when the assessment is on a categorical scale?

  • Krippendorff's Alpha
  • Generalizability Theory
  • Fleiss Kappa (correct)
  • Cohen's Kappa
  • What is a Dynamic trait in the context of psychological assessment?

  • A trait that barely changes or remains relatively unchanging
  • A measure that focuses on item difficulty
  • A variable that provides an indication of where a test taker stands with respect to a criterion
  • A characteristic presumed to be fast-changing as a function of situational and cognitive experience (correct)
  • What does Generalizability Theory suggest about a person's test score?

  • The test score genuinely reflects an individual's ability level
  • A person's test score is completely variable and cannot be predicted
  • Item difficulty is the primary consideration
  • Given the exact same conditions, the exact same test score should be obtained (correct)
  • According to Item Response Theory, what is the primary focus?

    <p>The probability that a person with X ability will be able to perform at a level of Y</p> Signup and view all the answers

    What is indicated by a high degree of internal consistency in a test designed to measure one factor?

    <p>The test has a high degree of reliability</p> Signup and view all the answers

    What does Krippendorff's Alpha measure?

    <p>Agreement among two or more raters, based on observed disagreement corrected for expected disagreement by chance</p> Signup and view all the answers

    In Classical Test Theory, what is the 'True Score'?

    <p>A genuinely reflective measurement of an individual's ability level on a particular test</p> Signup and view all the answers

    What is the purpose of a Decision Study in psychological assessment?

    <p>To examine the usefulness of test scores in helping the test user make decisions</p> Signup and view all the answers

    How is the reliability of Speed Tests typically evaluated?

    <p>Through test-retest and alternate-forms or split-half-reliability across independent testing periods</p> Signup and view all the answers

    What does Domain Sampling Theory estimate in psychological assessments?

    <p>The extent to which specific sources of variation under defined conditions are contributing to the test scores</p> Signup and view all the answers

    Which type of validity is judged by how well a test score can be used to infer an individual's probable standing on a measure of interest?

    <p>Criterion Validity</p> Signup and view all the answers

    What does the Halo Effect in rating errors refer to?

    <p>Tendency to give high scores due to failure to discriminate among distinct aspects</p> Signup and view all the answers

    Which term describes the use of additional predictors to explain the criterion measure beyond what is explained by existing predictors?

    <p>Incremental Validity</p> Signup and view all the answers

    Which factor analysis type is used for estimating factors and deciding how many to retain?

    <p>Exploratory Factor Analysis</p> Signup and view all the answers

    Which statistical procedure used in test development helps prevent bias and ensure accurate, impartial measurement?

    <p>Estimated True Score Transformation</p> Signup and view all the answers

    What is criterion contamination?

    <p>When criterion measure includes aspects of performance not part of the job</p> Signup and view all the answers

    In the context of construct validity, what is a construct?

    <p>Unobservable, scientific idea to explain behavior</p> Signup and view all the answers

    What does a high Factor Loading signify?

    <p>High influence of the factor on the test scores</p> Signup and view all the answers

    What is the purpose of co-validation?

    <p>Validation of more than one test using the same group</p> Signup and view all the answers

    The concept of fairness in psychological assessment refers to:

    <p>Use of the test in an impartial, just, and equitable manner</p> Signup and view all the answers

    Which of the following best describes item sampling or content sampling in psychological assessment?

    <p>Variation among items within a test and between tests</p> Signup and view all the answers

    Which type of error is caused by influences such as noise or weather conditions during testing?

    <p>Random error</p> Signup and view all the answers

    In the context of psychological assessment, what does the True Score Formula aim to represent?

    <p>The estimated true score accounting for variance</p> Signup and view all the answers

    Which of the following is an appropriate use of test-retest reliability?

    <p>Evaluating a test that measures an unchanging attribute</p> Signup and view all the answers

    What effect might occur if the interval between test-retest administrations is short?

    <p>Practice effect</p> Signup and view all the answers

    What is the main difference between parallel forms and alternate forms of a test?

    <p>Parallel forms have different items but same true score</p> Signup and view all the answers

    Which technique is used to avoid carryover effects in parallel forms of a test?

    <p>Counterbalancing</p> Signup and view all the answers

    The presence of which factor would NOT likely influence the validity coefficient of a test?

    <p>Systematic error</p> Signup and view all the answers

    Which of the following statements about systematic error is TRUE?

    <p>Is a consistent source of error across all measurements</p> Signup and view all the answers

    Which example would most likely demonstrate a low correlation in a test-retest reliability measure?

    <p>Long interval with significant external changes</p> Signup and view all the answers

    What is the primary purpose of a test blueprint in psychological assessment?

    <p>To plan the types of information and number of items to be covered</p> Signup and view all the answers

    Which term refers to the failure to capture important elements of a construct within a test?

    <p>Construct underrepresentation</p> Signup and view all the answers

    The formula $CVR = \frac{N_e - N/2}{N/2}$ is associated with which concept?

    <p>Content Validity Ratio</p> Signup and view all the answers

    If exactly half of the experts rate a test item as essential, what is the CVR value?

    <p>0</p> Signup and view all the answers

    What type of validity involves measuring the relationship between test scores and a criterion at a future time?

    <p>Predictive validity</p> Signup and view all the answers

    Which validity type is considered 'umbrella validity' and covers all other types of validities?

    <p>Construct validity</p> Signup and view all the answers

    What type of evidence demonstrates that test scores vary predictably based on group membership?

    <p>Method of contrasted groups</p> Signup and view all the answers

    Which term describes the degree to which an additional predictor provides unique information about a criterion?

    <p>Incremental validity</p> Signup and view all the answers

    When a test developer eliminates items that do not show significant correlation with the total test score, they aim to improve which characteristic?

    <p>Homogeneity</p> Signup and view all the answers

    If two tests measure the same construct and their scores highly correlate, what type of evidence is this?

    <p>Convergent evidence</p> Signup and view all the answers

    What is the primary focus of a mechanical prediction in psychological assessment?

    <p>Generating findings and recommendations using computer algorithms and statistical rules</p> Signup and view all the answers

    According to Level 1 interpretation in psychological assessments, which of the following is NOT a characteristic?

    <p>Concern with intervening processes</p> Signup and view all the answers

    How are psychological traits expected to behave across time according to the assumptions about psychological testing and assessment?

    <p>They are relatively enduring and remain rather stable across time</p> Signup and view all the answers

    Which of the following best describes 'states' in psychological assessment?

    <p>Relatively less enduring patterns of thinking, feeling, and behaving in specific situations</p> Signup and view all the answers

    What role do test reviewers play in psychological assessment?

    <p>They prepare evaluative critiques based on technical and practical aspects of tests</p> Signup and view all the answers

    What is a profile in the context of psychological assessment?

    <p>A table or graph showing the extent to which a person has demonstrated certain targeted characteristics</p> Signup and view all the answers

    Which of the following is NOT considered part of the 'parties in psychological assessment'?

    <p>Test Administrators</p> Signup and view all the answers

    What assumption about psychological traits and states is made in psychological testing?

    <p>Psychological traits permit prediction of future behavior based on past behavior</p> Signup and view all the answers

    Which of the following best describes actuarial assessment?

    <p>An approach characterized by the use of empirically demonstrated statistical rules</p> Signup and view all the answers

    What is 'extra-test history' in psychological assessment?

    <p>Observations made during testing that are indirectly related to its specific content</p> Signup and view all the answers

    When should the median be used instead of the mode for describing central tendency?

    <p>For ratio/interval data distributions that are skewed</p> Signup and view all the answers

    Which statistical measure describes the amount of variation in a distribution?

    <p>Measure of spread</p> Signup and view all the answers

    If a dataset has widely varied scores, how might this affect the central tendency?

    <p>It implies a large spread of values, indicating large differences between individual scores.</p> Signup and view all the answers

    Which of the following correlations is used with one ordinal and one interval/ratio variable?

    <p>Spearman Rho</p> Signup and view all the answers

    Which measure of spread provides a quick but gross summary of the scores?

    <p>Range</p> Signup and view all the answers

    What does the value of the mode indicate in a distribution?

    <p>The shape of the distribution and central tendency</p> Signup and view all the answers

    Which test should be used when comparing blood pressure of males and females?

    <p>T-Test Independent (Unpaired T-test)</p> Signup and view all the answers

    Which concept allows dividing a distribution into four equal parts?

    <p>Quartile</p> Signup and view all the answers

    Which measure of location is not linearly transformable and is vital for normalized standardized scores?

    <p>Percentile Rank</p> Signup and view all the answers

    Which of the following tests is appropriate when comparing board reviewers' focus levels during different times of the day?

    <p>One-Way ANOVA</p> Signup and view all the answers

    What is the primary feature of the Angoff Method?

    <p>Low interrater reliability</p> Signup and view all the answers

    What reliability coefficient value range is considered 'Good'?

    <p>80-89</p> Signup and view all the answers

    Which method involves placing a 'bookmark' to differentiate between test-takers?

    <p>Bookmark Method</p> Signup and view all the answers

    In which scenario are you likely to use a relative cut score?

    <p>Norm-referenced considerations</p> Signup and view all the answers

    Which validity coefficient value range is likely to be useful?

    <p>21-35</p> Signup and view all the answers

    Which method of setting cut scores is based on test-taker performance across all items on the test?

    <p>IRT-Based Methods</p> Signup and view all the answers

    What kind of method is described as a multi-stage selection process?

    <p>Multiple Hurdle</p> Signup and view all the answers

    Which of the following best describes the method of predictive yield?

    <p>Considering likelihood of offer acceptance</p> Signup and view all the answers

    Which cut score setting method involves the use of expert judgments to evaluate examination pass marks?

    <p>Angoff Method</p> Signup and view all the answers

    What is the utility gain in psychological assessment?

    <p>Benefit of using a particular test</p> Signup and view all the answers

    What does the Item-Discrimination Index measure?

    <p>The difference between high and low scorers in answering a question correctly</p> Signup and view all the answers

    What is the purpose of Cross-Validation?

    <p>To validate the test on a sample other than the original test group</p> Signup and view all the answers

    In Computerized Adaptive Testing, what is 'item branching'?

    <p>Adapting the order and content of test items based on previous responses</p> Signup and view all the answers

    What is the main characteristic of the Cumulative Scoring Model?

    <p>Testtaker obtains a measure of the level of the trait being measured</p> Signup and view all the answers

    What is 'validity shrinkage'?

    <p>The decrease in item validities after cross-validation</p> Signup and view all the answers

    What does the term 'Differential Item Functioning' (DIF) refer to?

    <p>When an item functions differently in one group of testtakers who have the same trait level</p> Signup and view all the answers

    What is an 'Anchor Protocol' used for?

    <p>To resolve scoring discrepancies using a highly authoritative model score</p> Signup and view all the answers

    Which of the following accurately describes the Point-Biserial Method?

    <p>Measuring the correlation between a dichotomous variable and a continuous variable</p> Signup and view all the answers

    What does 'co-validation' entail in psychological assessment?

    <p>Conducting validation on two or more tests using the same sample of testtakers</p> Signup and view all the answers

    What phenomenon does 'floor effects' describe in performance measurement?

    <p>A large percentage of respondents scoring near a lower limit</p> Signup and view all the answers

    Study Notes

    Psychological Assessment

    Error: Scorer Differences

    • Evaluates the degree of agreement between two or more scorers on a particular measure
    • Calculated by determining the percentage of times two individuals assign the same scores to the performance of examinees
    • Variations: having two examiners test the same client using the same test and determining the closeness of their scores or ratings
    • Measures of scorer differences:
      • Fleiss Kappa: determines the level of agreement between two or more raters on a categorical scale
      • Cohen's Kappa: used for two raters only
      • Krippendorff's Alpha: used for two or more raters, correcting for chance agreement

    Tests Designed

    • Homogenous tests: designed to measure one factor, expected to have a high degree of internal consistency
    • Dynamic tests: measure traits, states, or abilities that are fast-changing as a function of situational and cognitive experience
    • Static tests: measure traits, states, or abilities that are relatively unchanging
    • Restriction of range or variance: when the variance of either variable in a correlational analysis is restricted, resulting in a lower correlation coefficient

    Power Tests

    • Designed to allow test-takers to attempt all items within a time limit
    • Measures a test-taker's ability to complete a task accurately and efficiently

    Speed Tests

    • Contain items of uniform difficulty with a time limit
    • Reliability should be based on performance from two independent testing periods using test-retest, alternate-forms, or split-half reliability

    Criterion-Referenced Tests

    • Designed to provide an indication of where a test-taker stands with respect to a criterion
    • As individual differences decrease, traditional measures of reliability also decrease, regardless of individual performance stability

    Classical Test Theory

    • Assumes that everyone has a "true score" on a test
    • True score reflects an individual's ability level as measured by a particular test
    • Random error affects the observed score

    Domain Sampling Theory

    • Estimates the extent to which specific sources of variation contribute to test scores
    • Considers problems created by using a limited number of items to represent a large construct

    Test Reliability

    • Conceived as an objective measure of how precisely a test score assesses a domain
    • Reliability is a function of the proportion of total variance attributed to true variance

    Generalizability Theory

    • Based on the idea that test scores vary due to variables in the testing situation
    • Universe: the test situation
    • Facets: number of items, amount of review, and purpose of test administration
    • Given the same conditions, the same test score should be obtained (universe score)

    Decision Study

    • Examines the usefulness of test scores in helping test users make decisions

    Systematic Error

    • Factors inherent in a test that prevent accurate, impartial measurement

    Item Response Theory

    • The probability of a person with a certain ability level performing at a certain level on a test
    • Focuses on item difficulty

    Latent-Trait Theory

    • A system of assumptions about measurement and the extent to which items measure a trait
    • Computers are used to focus on the range of item difficulty that helps assess an individual's ability level
    • If a person answers easy items correctly, the computer will move to more difficult items
    • Item attributes: difficulty, discrimination, and dichotomousness

    Construct Validity (Umbrella Validity)

    • Covers all types of validity
    • Logical and statistical
    • Judgment about the appropriateness of inferences drawn from test scores regarding individual standing on a variable called a construct

    Criterion Validity

    • More statistical than logical
    • Judgment about the adequacy of test scores in inferring an individual's standing on a criterion measure
    • Criterion: a standard on which a judgment or decision may be made
    • Characteristics: relevant, valid, uncontaminated
    • Types of criterion validity: concurrent, predictive, and incremental validity

    Factor Analysis

    • Designed to identify factors or variables that are typically attributes, characteristics, or dimensions on which people may differ
    • Developed by Charles Spearman
    • Employed as a data reduction method
    • Used to study the interrelationships among a set of variables
    • Types of factor analysis: explanatory, confirmatory, and factor loading

    Cross-Validation

    • Validation of a test to a criterion based on a different group from the original group
    • Validity shrinkage: a decrease in validity after cross-validation
    • Co-validation: validation of more than one test from the same group
    • Co-norming: norming more than one test from the same group

    Bias

    • Factors inherent in a test that systematically prevent accurate, impartial measurement
    • Prevention: during test development through procedures such as estimated true score transformation

    Rating

    • Numerical or verbal judgment that places a person or attribute along a continuum identified by a scale
    • Rating error: intentional or unintentional misuse of the scale
    • Types of rating error: leniency, severity, central tendency, and halo effect
    • One way to overcome rating errors is to use rankings### Measures of Central Tendency
    • Mode: Most frequently occurring score in the distribution; useful for nominal scales and discrete variables; gives an indication of the shape of the distribution.

    Measures of Spread or Variability

    • Range: Equal to the difference between the highest and lowest scores; provides a quick but gross description of the spread of scores.
    • Variance: Equal to the square root of the average squared deviations about the mean; measures the distance from the mean.

    Measures of Location

    • Percentile or Percentile Rank: Expressed in terms of the percentage of persons in the standardization sample who fall below a given score; essential in creating normalized standardized scores.
    • Quartile: Dividing points between the four quarters in the distribution; a specific point that refers to an interval.

    Correlation Coefficients

    • Spearman Rho: Used for ordinal + ordinal data.
    • Biserial: Used for true dichotomous + interval/ratio data.
    • Point Biserial: Used for nominal (true dic) + nominal (true/artificial dic) data.
    • Phi Coefficient: Used for artificial dichotomous + artificial dichotomous data.
    • Tetrachoric: Used for 3 or more ordinal/rank data or nominal ordinal data.
    • Kendall's Rank Biserial Differences: Used for two separate groups with random assignment.

    Inferential Statistics

    • T-test Independent (Unpaired T-test): Used for one group with two scores.
    • T-Test Dependent (Paired T-test): Used for one group measured at least twice.
    • One-Way ANOVA: Used for one group measured at least three times.
    • One-Way Repeated Measures: Used for 3 or more groups tested for 2 variables.
    • Two-Way ANOVA: Used when controlling for an additional variable that may influence the relationship between the independent and dependent variables.
    • ANCOVA: Used when controlling for an additional variable that may influence the relationship between the independent and dependent variables.

    Utility Gain and Productivity Gains

    • Utility Gain: Estimate of the benefit of using a particular test.
    • Productivity Gains: Estimated increase in work output.

    Cut Scores

    • Cut Score: Reference point derived as a result of a judgment and used to divide a set of data into two or more classifications.
    • Relative Cut Score: Reference point based on norm-referenced considerations, not fixed per se.
    • Fixed Cut Scores: Set with reference to a judgment concerning the minimum level of proficiency required.
    • Multiple Cut Scores: Refers to the use of two or more cut scores with reference to one predictor for the purpose of categorization.
    • Multiple Hurdle: Multi-stage selection process, with a cut score in place for each predictor.
    • Compensatory Model of Selection: Assumption that high scores on one attribute can compensate for lower scores on another.

    Setting Cut Scores

    • Angoff Method: Expert judges discuss and evaluate the examination using a well-defined and rational procedure.
    • Known Groups Method: Collection of data on the predictor of interest from groups known to possess and not possess a trait of interest.
    • IRT-Based Methods: Cut scores are typically set based on test-taker's performance across all the items on the test.
    • Item-Mapping Method: Arrangement of items in a histogram, with each column containing items deemed to be equivalent value.
    • Bookmark Method: Expert places a "bookmark" between the two pages that are deemed to separate test-takers who have acquired the minimal knowledge, skills, and/or abilities from those who are not.

    Discriminant Analysis

    • Discriminant Analysis: Used to analyze the research data when the criterion or dependent variable is categorical, and the predictor or independent variable is interval in nature.

    Reliability and Validity

    • Reliability: Excellent (0.90 and up), Good (0.80-0.89), Adequate (0.70-0.79), and Limited Applicability (below 0.70).
    • Validity: Very beneficial (above 0.35), Likely to be useful (0.21-0.35), Depends on circumstances (0.11-0.20), and Unlikely to be useful (below 0.11).

    Item Analysis

    • Item-Validity Index: Designed to provide an indication of the degree to which a test is measuring what it purports to measure.
    • Item-Discrimination Index: Measures the difference between the proportion of high scorers answering a question correctly and the proportion of low scorers answering it correctly.

    Scoring Models

    • Cumulative Model: Test-taker obtains a measure of the level of the trait, thus high scorers may suggest a high level in the trait being measured.
    • Class Scoring/Category Scoring: Test-taker response earns credit toward placement in a particular class or category with other test-takers whose pattern of responses is similar in some way.

    Test Revision

    • Ipsative Scoring: Compares test-taker's score on one scale within a test to another scale within that same test, where the two scales measure unrelated constructs.
    • Cross-Validation: Revalidation of a test on a sample of test-takers other than those on whom test performance was originally found to be a valid predictor of some criterion.

    Computerized Adaptive Testing

    • Computerized Adaptive Testing: An interactive, computer-administered test-taking process wherein items presented to the test-taker are based in part on the test-taker's performance on previous items.

    Computer Assisted Psychological Assessment

    • Computer Assisted Psychological Assessment: Standardized test administration is assured for test-takers, and variation is kept to a minimum; test content and length are tailored according to the test-taker's ability.

    Studying That Suits You

    Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

    Quiz Team

    Description

    This quiz assesses your understanding of scorer differences in psychological assessment, including measures of agreement between scorers. Learn about Fleiss Kappa and other metrics.

    More Like This

    Use Quizgecko on...
    Browser
    Browser