Psychological Assessment: Scorer Differences
100 Questions
2 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

Which type of validity involves a judgment about the adequacy of the inferences drawn from test scores about individual's standing on a variable called construct?

  • Criterion Validity
  • Predictive Validity
  • Content Validity
  • Construct Validity (correct)
  • Which term describes the error where a rater's scores tend to cluster in the middle of the rating scale?

  • Central Tendency Error (correct)
  • Severity Error
  • Halo Effect
  • Leniency Error
  • Which validity examines the relationship between test scores and a criterion measure obtained at the same time?

  • Predictive Validity
  • Concurrent Validity (correct)
  • Construct Validity
  • Incremental Validity
  • Which of the following methods can be used to enhance the homogeneity of a test containing dichotomous items?

    <p>Eliminating items with low correlation coefficients with total test scores</p> Signup and view all the answers

    Which concept refers to the validation of a test based on a different group from the original group?

    <p>Cross-Validation</p> Signup and view all the answers

    What is the focus of factor analysis?

    <p>Identifying factors or specific variables</p> Signup and view all the answers

    What describes a situation where the criterion measure includes irrelevant aspects of performance?

    <p>Criterion Contamination</p> Signup and view all the answers

    In factor analysis, which method is used to test the degree to which a hypothetical model fits the actual data?

    <p>Confirmatory Factor Analysis</p> Signup and view all the answers

    Which term refers to the extent to which a test is used in an impartial, just, and equitable way?

    <p>Fairness</p> Signup and view all the answers

    What term describes the error where a rater inaccurately gives higher scores due to the inability to differentiate between distinct aspects of behavior?

    <p>Halo Effect</p> Signup and view all the answers

    What measure is used to determine the level of agreement between two or more raters when the method of assessment is categorical?

    <p>Fleiss Kappa</p> Signup and view all the answers

    Which theory posits that a person's test scores vary due to the variables in the testing situations?

    <p>Generalizability Theory</p> Signup and view all the answers

    What is the main focus of Item Response Theory?

    <p>Item difficulty</p> Signup and view all the answers

    Which measure should be assessed by two independent testing periods when dealing with Speed Tests?

    <p>Test-retest reliability</p> Signup and view all the answers

    What does a true score genuinely reflect in Classical Test Theory?

    <p>An individual's ability level as measured by a particular test</p> Signup and view all the answers

    What do tests designed to measure one factor usually exhibit?

    <p>High degree of internal consistency</p> Signup and view all the answers

    Which concept is used for estimating how specific sources of variation contribute to the test scores?

    <p>Domain Sampling Theory</p> Signup and view all the answers

    What information do Criterion-Referenced Tests provide?

    <p>Test taker's performance relative to a specific variable or criterion</p> Signup and view all the answers

    According to Generalizability Theory, under what condition should the exact same test score be obtained?

    <p>When all facets in the universe are the same</p> Signup and view all the answers

    Which theory focuses on the extent to which an item measures a specific trait?

    <p>Latent-Trait Theory</p> Signup and view all the answers

    What type of error is caused by unpredictable fluctuations in measurement conditions?

    <p>Random Error</p> Signup and view all the answers

    Which type of reliability is obtained by correlating pairs of scores from the same individuals on two different administrations of the test?

    <p>Test-Retest Reliability</p> Signup and view all the answers

    Which type of error variance is associated with a test-taker's motivation or attention during test administration?

    <p>Test Administration</p> Signup and view all the answers

    What effect occurs when the interval between test administrations is short, leading to inflated correlation?

    <p>Carryover Effects</p> Signup and view all the answers

    When is it appropriate to use test-retest reliability?

    <p>For tests measuring a stable attribute</p> Signup and view all the answers

    What is the main error type associated with parallel forms reliability?

    <p>Item Sampling</p> Signup and view all the answers

    What does a lower correlation in a test-retest scenario with a longer interval indicate?

    <p>Poor reliability</p> Signup and view all the answers

    Which method helps avoid carryover effects between parallel forms?

    <p>Counterbalancing Technique</p> Signup and view all the answers

    Which type of score consistency would result from computer scoring of objective-type items?

    <p>High reliability</p> Signup and view all the answers

    What does a test blueprint primarily ensure in psychological assessment?

    <p>It ensures that the test is representative of a defined body of content.</p> Signup and view all the answers

    What is the true score formula used to calculate?

    <p>True score</p> Signup and view all the answers

    What is the Content Validity Ratio (CVR) formula developed by Lawshe?

    <p>$CVR = rac{N_e - N/2}{N/2}$</p> Signup and view all the answers

    What should be done if the Content Validity Index (CVI) is low?

    <p>Remove or modify items with low CVR values.</p> Signup and view all the answers

    What is an indication of Zero Content Validity Ratio (CVR) in a test item?

    <p>Half of the experts rate the item as essential.</p> Signup and view all the answers

    Which type of validity is more logical than statistical?

    <p>Content validity</p> Signup and view all the answers

    What type of evidence involves a validity coefficient showing a high correlation between test scores and an established test?

    <p>Convergent evidence</p> Signup and view all the answers

    Which validity type involves comparing a test score at one time with a criterion measure obtained at the same time?

    <p>Concurrent validity</p> Signup and view all the answers

    What does high Construct Validity indicate about a test?

    <p>It accurately measures the theoretical construct it is intended to measure.</p> Signup and view all the answers

    Which method demonstrates construct validity by showing predictable score differences across groups?

    <p>Method of Contrasted Groups</p> Signup and view all the answers

    Why might a test item be considered poor if high scorers on an academic test tend to get it wrong and low scorers get it right?

    <p>It suggests the item is not accurately measuring the intended construct.</p> Signup and view all the answers

    What is a fixed cut score?

    <p>A cut score derived from expert judgment concerning minimum proficiency</p> Signup and view all the answers

    Which method is used to set cut scores based on the composition of contrasting groups?

    <p>Known Groups Method</p> Signup and view all the answers

    In a compensatory model of selection, what is assumed about high scores on one attribute?

    <p>They can compensate for lower scores on another attribute</p> Signup and view all the answers

    Which method requires expert judges to use a well-defined and rational procedure to determine a pass mark?

    <p>Angoff Method</p> Signup and view all the answers

    If a reliability coefficient is below 0.70, how is it interpreted?

    <p>May have limited applicability</p> Signup and view all the answers

    Which reliability coefficient range is considered 'excellent'?

    <p>90 and up</p> Signup and view all the answers

    Which method arranges items in a histogram according to their equivalent value?

    <p>Item-Mapping Method</p> Signup and view all the answers

    What does the utility gain of a particular test estimate?

    <p>The benefit of using the test</p> Signup and view all the answers

    What does a multiple hurdle selection process involve?

    <p>Having a cut score for each predictor at multiple selection stages</p> Signup and view all the answers

    What validity coefficient range is considered 'very beneficial'?

    <p>above 35</p> Signup and view all the answers

    Which measure of central tendency is most commonly used for nominal data?

    <p>Mode</p> Signup and view all the answers

    Which situation is a paired T-test used for?

    <p>Comparing the means of two related groups</p> Signup and view all the answers

    Which statistical test would you use to compare means from more than two groups taken at more than three different times?

    <p>ANOVA Mixed Design</p> Signup and view all the answers

    What does a large spread of values in a distribution indicate?

    <p>Large differences between individual scores</p> Signup and view all the answers

    What measure divides the distribution into four equal parts?

    <p>Quartile</p> Signup and view all the answers

    What correlation coefficient is used for ordinal data?

    <p>Spearman rho</p> Signup and view all the answers

    Variance is equal to which of the following?

    <p>The square root of the average squared deviations about the mean</p> Signup and view all the answers

    Which test would you use to compare the blood pressure of male and female graduate students?

    <p>Independent T-test</p> Signup and view all the answers

    What type of data is most appropriately analyzed with the median?

    <p>Ordinal</p> Signup and view all the answers

    What type of correlation is used for a true dichotomous variable and interval/ratio data?

    <p>Biserial</p> Signup and view all the answers

    Which of the following best describes Level 1 interpretation?

    <p>Minimal interpretation with data treated in a sampling or correlational way</p> Signup and view all the answers

    What is the primary characteristic of actuarial assessment?

    <p>Application of empirically demonstrated statistical rules</p> Signup and view all the answers

    During a psychological assessment, who prepares evaluative critiques based on technical and practical aspects of tests?

    <p>Test Reviewers</p> Signup and view all the answers

    Which term refers to an observable action or the product of an observable action?

    <p>Overt Behavior</p> Signup and view all the answers

    Which party in psychological assessment is responsible for controlling the distribution of tests?

    <p>Test Publishers</p> Signup and view all the answers

    What is a trait in psychological assessment?

    <p>A distinguishable, relatively enduring way in which individuals vary from one another</p> Signup and view all the answers

    Which level of interpretation involves descriptive generalizations and hypothetical constructs?

    <p>Level 2</p> Signup and view all the answers

    Which statement about mechanical prediction is correct?

    <p>It involves computer algorithms combined with statistical rules</p> Signup and view all the answers

    What is the primary focus of Level 3 interpretation?

    <p>Full-scale exploration of personality, psychosocial situation, and developmental history</p> Signup and view all the answers

    What is extra-test history?

    <p>Observations made by the examiner that are indirectly related to the test content</p> Signup and view all the answers

    What does the Item-Validity Index measure?

    <p>The degree to which a test measures what it purports to measure</p> Signup and view all the answers

    What method assesses the correlation between a dichotomous variable and a continuous variable?

    <p>Point-Biserial Method</p> Signup and view all the answers

    In scoring models, what does the Cumulative Model indicate about high scorers?

    <p>They suggest a high level in the trait being measured</p> Signup and view all the answers

    What term describes the principle of revalidating a test on a sample other than the original test sample?

    <p>Cross-validation</p> Signup and view all the answers

    Which phenomenon is described by a large percentage of respondents scoring near the lower limit on a test?

    <p>Floor Effects</p> Signup and view all the answers

    What function does the Routing Test serve in Computerized Adaptive Testing?

    <p>Directs the test-taker to a suitable level of items</p> Signup and view all the answers

    Which aspect is crucial in psychological assessment compared to psychological testing?

    <p>Educational selection of tools</p> Signup and view all the answers

    What is the primary focus of an aptitude test?

    <p>Potential for learning a specific skill</p> Signup and view all the answers

    What is the primary purpose of DIF Analysis?

    <p>To identify items that function differently across groups</p> Signup and view all the answers

    What is meant by 'Scoring Drift'?

    <p>Discrepancy between scoring in the anchor protocol and another protocol</p> Signup and view all the answers

    Which type of psychological assessment involves evaluation without the subject being in physical proximity?

    <p>Remote</p> Signup and view all the answers

    Which model compares test-taker responses to different scales within the same test?

    <p>Ipsative Scoring</p> Signup and view all the answers

    What characterizes a psychological test's scoring process?

    <p>Reflects an evaluation of performance</p> Signup and view all the answers

    What distinguishes an intelligence test from an achievement test?

    <p>Measurement of general potential</p> Signup and view all the answers

    What concept is described as the ability of a computer to tailor the test content based on prior responses?

    <p>Item Branching</p> Signup and view all the answers

    Which type of assessment is described as encouraging therapeutic self-discovery?

    <p>Therapeutic Assessment</p> Signup and view all the answers

    What does the psychometrics field specifically focus on?

    <p>Psychological measurement</p> Signup and view all the answers

    What is typically measured by a typical performance test?

    <p>Usual habits and behaviors</p> Signup and view all the answers

    Which of the following refers to assigning a summary statement of performance, usually numerical in nature?

    <p>Score</p> Signup and view all the answers

    What is an example of a dynamic assessment approach?

    <p>Sequential evaluation with intervention</p> Signup and view all the answers

    When should the Shapiro-Wilk test be used?

    <p>When the sample size is less than 50</p> Signup and view all the answers

    What does a p-value of 0.03 in Levene's Test signify?

    <p>The variances are significantly different</p> Signup and view all the answers

    Which test is more sensitive to departures from normality?

    <p>Bartlett's Test</p> Signup and view all the answers

    What is one of the outcomes of the pilot work in test development?

    <p>Determining how best to measure a construct</p> Signup and view all the answers

    Which type of item format offers more than two alternatives?

    <p>Polychotomous Format</p> Signup and view all the answers

    What is the primary purpose of Computerized Adaptive Testing?

    <p>Tailoring test items based on performance</p> Signup and view all the answers

    Which scale arranges items from weaker to stronger expressions of attitude?

    <p>Guttman Scale</p> Signup and view all the answers

    How does Levene's Test determine if variances are equal?

    <p>By analyzing p-values</p> Signup and view all the answers

    Which process involves brainstorming ideas about the kind of test a developer wants to publish?

    <p>Test Conceptualization</p> Signup and view all the answers

    What does an item pool refer to in test construction?

    <p>A reservoir of potential test items</p> Signup and view all the answers

    Study Notes

    Psychological Assessment

    Error: Scorer Differences

    • Evaluates the degree of agreement between two or more scorers on a particular measure
    • Calculated by determining the percentage of times two individuals assign the same scores to the performance of examinees
    • Variations: having two examiners test the same client using the same test and determining the closeness of their scores or ratings
    • Measures of scorer differences:
      • Fleiss Kappa: determines the level of agreement between two or more raters on a categorical scale
      • Cohen's Kappa: used for two raters only
      • Krippendorff's Alpha: used for two or more raters, correcting for chance agreement

    Tests Designed

    • Homogenous tests: designed to measure one factor, expected to have a high degree of internal consistency
    • Dynamic tests: measure traits, states, or abilities that are fast-changing as a function of situational and cognitive experience
    • Static tests: measure traits, states, or abilities that are relatively unchanging
    • Restriction of range or variance: when the variance of either variable in a correlational analysis is restricted, resulting in a lower correlation coefficient

    Power Tests

    • Designed to allow test-takers to attempt all items within a time limit
    • Measures a test-taker's ability to complete a task accurately and efficiently

    Speed Tests

    • Contain items of uniform difficulty with a time limit
    • Reliability should be based on performance from two independent testing periods using test-retest, alternate-forms, or split-half reliability

    Criterion-Referenced Tests

    • Designed to provide an indication of where a test-taker stands with respect to a criterion
    • As individual differences decrease, traditional measures of reliability also decrease, regardless of individual performance stability

    Classical Test Theory

    • Assumes that everyone has a "true score" on a test
    • True score reflects an individual's ability level as measured by a particular test
    • Random error affects the observed score

    Domain Sampling Theory

    • Estimates the extent to which specific sources of variation contribute to test scores
    • Considers problems created by using a limited number of items to represent a large construct

    Test Reliability

    • Conceived as an objective measure of how precisely a test score assesses a domain
    • Reliability is a function of the proportion of total variance attributed to true variance

    Generalizability Theory

    • Based on the idea that test scores vary due to variables in the testing situation
    • Universe: the test situation
    • Facets: number of items, amount of review, and purpose of test administration
    • Given the same conditions, the same test score should be obtained (universe score)

    Decision Study

    • Examines the usefulness of test scores in helping test users make decisions

    Systematic Error

    • Factors inherent in a test that prevent accurate, impartial measurement

    Item Response Theory

    • The probability of a person with a certain ability level performing at a certain level on a test
    • Focuses on item difficulty

    Latent-Trait Theory

    • A system of assumptions about measurement and the extent to which items measure a trait
    • Computers are used to focus on the range of item difficulty that helps assess an individual's ability level
    • If a person answers easy items correctly, the computer will move to more difficult items
    • Item attributes: difficulty, discrimination, and dichotomousness

    Construct Validity (Umbrella Validity)

    • Covers all types of validity
    • Logical and statistical
    • Judgment about the appropriateness of inferences drawn from test scores regarding individual standing on a variable called a construct

    Criterion Validity

    • More statistical than logical
    • Judgment about the adequacy of test scores in inferring an individual's standing on a criterion measure
    • Criterion: a standard on which a judgment or decision may be made
    • Characteristics: relevant, valid, uncontaminated
    • Types of criterion validity: concurrent, predictive, and incremental validity

    Factor Analysis

    • Designed to identify factors or variables that are typically attributes, characteristics, or dimensions on which people may differ
    • Developed by Charles Spearman
    • Employed as a data reduction method
    • Used to study the interrelationships among a set of variables
    • Types of factor analysis: explanatory, confirmatory, and factor loading

    Cross-Validation

    • Validation of a test to a criterion based on a different group from the original group
    • Validity shrinkage: a decrease in validity after cross-validation
    • Co-validation: validation of more than one test from the same group
    • Co-norming: norming more than one test from the same group

    Bias

    • Factors inherent in a test that systematically prevent accurate, impartial measurement
    • Prevention: during test development through procedures such as estimated true score transformation

    Rating

    • Numerical or verbal judgment that places a person or attribute along a continuum identified by a scale
    • Rating error: intentional or unintentional misuse of the scale
    • Types of rating error: leniency, severity, central tendency, and halo effect
    • One way to overcome rating errors is to use rankings### Discriminant Evidence
    • Definition: A validity coefficient showing little relationship between test scores and/or other variables
    • Importance: Used in Psychological Assessment

    Measures of Central Tendency

    Mode

    • Definition: The most frequently occurring score in a distribution
    • Use: For ordinal data, and for nominal scales and discrete variables
    • Characteristics: Can be used in analyses of qualitative or verbal nature, and gives an indication of the shape of the distribution

    Measures of Spread or Variability

    • Definition: Statistics that describe the amount of variation in a distribution
    • Use: Gives an idea of how well the measure of central tendency represents the data
    • Characteristics: Large spread of values means large differences between individual scores

    Measures of Spread or Variability

    Range

    • Definition: The difference between the highest and lowest score
    • Use: Provides a quick but gross description of the spread of scores
    • Characteristics: Can be affected by extreme scores in the distribution

    Variance

    • Definition: The square root of the average squared deviations about the mean
    • Use: Equal to the square root of the variance, and measures the distance from the mean
    • Characteristics: Equal to the arithmetic mean of the squares of the differences between the scores in a distribution and their mean

    Measures of Location

    Percentile or Percentile Rank

    • Definition: Expressed in terms of the percentage of persons in the standardization sample who fall below a given score
    • Use: Indicates the individual's relative position in the standardization sample
    • Characteristics: Not linearly transformable, converged at the middle, and the outer ends show large intervals

    Quartile

    • Definition: Dividing points between the four quarters in the distribution
    • Use: Specific point in the distribution

    Correlation

    Spearman Rho

    • Definition: Used for ordinal + ordinal data
    • Use: Measures the correlation between two variables

    Biserial

    • Definition: Used for true dichotomous + interval/ratio data
    • Use: Measures the correlation between two variables

    Point Biserial

    • Definition: Used for nominal (true dichotomous) + nominal (true/artificial dichotomous) data
    • Use: Measures the correlation between two variables

    Phi Coefficient

    • Definition: Used for artificial dichotomous + artificial dichotomous data
    • Use: Measures the correlation between two variables

    Tetrachoric

    • Definition: Used for 3 or more ordinal/rank data
    • Use: Measures the correlation between two variables

    Kendall's Rank Biserial Differences

    • Definition: Used for two separate groups, random assignment
    • Use: Measures the correlation between two variables

    T-Test

    • Definition: Used for comparing means between two groups
    • Use: One group, two scores (e.g., blood pressure before and after the lecture)

    T-Test Dependent (Paired T-test)

    • Definition: Used for comparing means between two groups
    • Use: One group, measured at least three times

    One-Way ANOVA

    • Definition: Used for comparing means between three or more groups
    • Use: One group, measured at least three times

    One-Way Repeated Measures

    • Definition: Used for comparing means between three or more groups
    • Use: Three or more groups, tested for two variables

    Two-Way ANOVA

    • Definition: Used for comparing means between two or more groups, controlling for an additional variable
    • Use: Used when you need to control for an additional variable that may be influencing the relationship between your independent and dependent variable

    Utility Gain

    • Definition: Estimate of the benefit of using a particular test
    • Use: In Psychological Assessment

    Productivity Gains

    • Definition: An estimated increase in work output
    • Use: In Psychological Assessment

    Cut Score

    • Definition: Reference point derived as a result of a judgment and used to divide a set of data into two or more classifications
    • Use: In Psychological Assessment

    Relative Cut Score

    • Definition: Reference point based on norm-referenced considerations, not fixed per se
    • Use: In Psychological Assessment

    Fixed Cut Scores

    • Definition: Set with reference to a judgment concerning minimum level of proficiency required
    • Use: In Psychological Assessment

    Multiple Cut Scores

    • Definition: Refers to the use of two or more cut scores with reference to one predictor for the purpose of categorization
    • Use: In Psychological Assessment

    Multiple Hurdle

    • Definition: Multi-stage selection process, a cut score is in place for each predictor
    • Use: In Psychological Assessment

    Compensatory Model of Selection

    • Definition: Assumption that high scores on one attribute can compensate for lower scores
    • Use: In Psychological Assessment

    Angoff Method

    • Definition: Setting fixed cut scores
    • Use: In Psychological Assessment

    Known Groups Method

    • Definition: Collection of data on the predictor of interest from groups known to possess and not possess a trait of interest
    • Use: In Psychological Assessment

    IRT-Based Methods

    • Definition: Cut scores are typically set based on test-taker's performance across all the items on the test
    • Use: In Psychological Assessment

    Item-Mapping Method

    • Definition: Arrangement of items in a histogram, with each column containing items deemed to be equivalent in value
    • Use: In Psychological Assessment

    Bookmark Method

    • Definition: Expert places a "bookmark" between the two pages that are deemed to separate test-takers who have acquired the minimal knowledge, skills, and/or abilities from those who are not
    • Use: In Psychological Assessment

    Method of Predictive Yield

    • Definition: Took into account the number of positions to be filled, projections regarding the likelihood of offer acceptance, and the distribution of applicant scores
    • Use: In Psychological Assessment

    Discriminant Analysis

    • Definition: Used to shed light on the relationship between identified variables and two naturally occurring groups
    • Use: In Psychological Assessment, used to analyze data when the criterion or dependent variable is categorical and the predictor or independent variable is interval in nature

    Reliability and Validity

    Reliability

    • Definition: The consistency of test scores
    • Interpretation:
      • Excellent: 0.90 and up
      • Good: 0.80-0.89
      • Adequate: 0.70-0.79
      • Limited applicability: below 0.70

    Validity

    • Definition: The degree to which a test measures what it claims to measure
    • Interpretation:
      • Very beneficial: above 0.35
      • Likely to be useful: 0.21-0.35
      • Depends on circumstances: 0.11-0.20
      • Unlikely to be useful: below 0.11

    Item Analysis

    Item-Validity Index

    • Definition: Designed to provide an indication of the degree to which a test is measuring what it purports to measure
    • Use: In Psychological Assessment

    Item-Discrimination Index

    • Definition: Measures the difference between the proportion of high scorers answering a question correctly and the proportion of low scorers answering it correctly
    • Use: In Psychological Assessment

    Extreme Group Method

    • Definition: Compares people who have done well with those who have done poorly
    • Use: In Psychological Assessment

    Discrimination Index

    • Definition: The difference between those proportions
    • Use: In Psychological Assessment

    Point-Biserial Method

    • Definition: Correlation between a dichotomous variable and continuous variable
    • Use: In Psychological Assessment

    Studying That Suits You

    Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

    Quiz Team

    Description

    This quiz assesses your knowledge of scorer differences in psychological assessment, including measures of agreement and variability. Topics include evaluating scorer differences, types of measures, and the Fleiss Kappa coefficient.

    More Like This

    Use Quizgecko on...
    Browser
    Browser