quiz image

Psychology: Scorer Differences in Assessment

RationalValley avatar
RationalValley
·
·
Download

Start Quiz

Study Flashcards

80 Questions

Which method is used to determine the level of agreement between two or more raters when the assessment is on a categorical scale?

Fleiss Kappa

What is a Dynamic trait in the context of psychological assessment?

A characteristic presumed to be fast-changing as a function of situational and cognitive experience

What does Generalizability Theory suggest about a person's test score?

Given the exact same conditions, the exact same test score should be obtained

According to Item Response Theory, what is the primary focus?

The probability that a person with X ability will be able to perform at a level of Y

What is indicated by a high degree of internal consistency in a test designed to measure one factor?

The test has a high degree of reliability

What does Krippendorff's Alpha measure?

Agreement among two or more raters, based on observed disagreement corrected for expected disagreement by chance

In Classical Test Theory, what is the 'True Score'?

A genuinely reflective measurement of an individual's ability level on a particular test

What is the purpose of a Decision Study in psychological assessment?

To examine the usefulness of test scores in helping the test user make decisions

How is the reliability of Speed Tests typically evaluated?

Through test-retest and alternate-forms or split-half-reliability across independent testing periods

What does Domain Sampling Theory estimate in psychological assessments?

The extent to which specific sources of variation under defined conditions are contributing to the test scores

Which type of validity is judged by how well a test score can be used to infer an individual's probable standing on a measure of interest?

Criterion Validity

What does the Halo Effect in rating errors refer to?

Tendency to give high scores due to failure to discriminate among distinct aspects

Which term describes the use of additional predictors to explain the criterion measure beyond what is explained by existing predictors?

Incremental Validity

Which factor analysis type is used for estimating factors and deciding how many to retain?

Exploratory Factor Analysis

Which statistical procedure used in test development helps prevent bias and ensure accurate, impartial measurement?

Estimated True Score Transformation

What is criterion contamination?

When criterion measure includes aspects of performance not part of the job

In the context of construct validity, what is a construct?

Unobservable, scientific idea to explain behavior

What does a high Factor Loading signify?

High influence of the factor on the test scores

What is the purpose of co-validation?

Validation of more than one test using the same group

The concept of fairness in psychological assessment refers to:

Use of the test in an impartial, just, and equitable manner

Which of the following best describes item sampling or content sampling in psychological assessment?

Variation among items within a test and between tests

Which type of error is caused by influences such as noise or weather conditions during testing?

Random error

In the context of psychological assessment, what does the True Score Formula aim to represent?

The estimated true score accounting for variance

Which of the following is an appropriate use of test-retest reliability?

Evaluating a test that measures an unchanging attribute

What effect might occur if the interval between test-retest administrations is short?

Practice effect

What is the main difference between parallel forms and alternate forms of a test?

Parallel forms have different items but same true score

Which technique is used to avoid carryover effects in parallel forms of a test?

Counterbalancing

The presence of which factor would NOT likely influence the validity coefficient of a test?

Systematic error

Which of the following statements about systematic error is TRUE?

Is a consistent source of error across all measurements

Which example would most likely demonstrate a low correlation in a test-retest reliability measure?

Long interval with significant external changes

What is the primary purpose of a test blueprint in psychological assessment?

To plan the types of information and number of items to be covered

Which term refers to the failure to capture important elements of a construct within a test?

Construct underrepresentation

The formula $CVR = \frac{N_e - N/2}{N/2}$ is associated with which concept?

Content Validity Ratio

If exactly half of the experts rate a test item as essential, what is the CVR value?

0

What type of validity involves measuring the relationship between test scores and a criterion at a future time?

Predictive validity

Which validity type is considered 'umbrella validity' and covers all other types of validities?

Construct validity

What type of evidence demonstrates that test scores vary predictably based on group membership?

Method of contrasted groups

Which term describes the degree to which an additional predictor provides unique information about a criterion?

Incremental validity

When a test developer eliminates items that do not show significant correlation with the total test score, they aim to improve which characteristic?

Homogeneity

If two tests measure the same construct and their scores highly correlate, what type of evidence is this?

Convergent evidence

What is the primary focus of a mechanical prediction in psychological assessment?

Generating findings and recommendations using computer algorithms and statistical rules

According to Level 1 interpretation in psychological assessments, which of the following is NOT a characteristic?

Concern with intervening processes

How are psychological traits expected to behave across time according to the assumptions about psychological testing and assessment?

They are relatively enduring and remain rather stable across time

Which of the following best describes 'states' in psychological assessment?

Relatively less enduring patterns of thinking, feeling, and behaving in specific situations

What role do test reviewers play in psychological assessment?

They prepare evaluative critiques based on technical and practical aspects of tests

What is a profile in the context of psychological assessment?

A table or graph showing the extent to which a person has demonstrated certain targeted characteristics

Which of the following is NOT considered part of the 'parties in psychological assessment'?

Test Administrators

What assumption about psychological traits and states is made in psychological testing?

Psychological traits permit prediction of future behavior based on past behavior

Which of the following best describes actuarial assessment?

An approach characterized by the use of empirically demonstrated statistical rules

What is 'extra-test history' in psychological assessment?

Observations made during testing that are indirectly related to its specific content

When should the median be used instead of the mode for describing central tendency?

For ratio/interval data distributions that are skewed

Which statistical measure describes the amount of variation in a distribution?

Measure of spread

If a dataset has widely varied scores, how might this affect the central tendency?

It implies a large spread of values, indicating large differences between individual scores.

Which of the following correlations is used with one ordinal and one interval/ratio variable?

Spearman Rho

Which measure of spread provides a quick but gross summary of the scores?

Range

What does the value of the mode indicate in a distribution?

The shape of the distribution and central tendency

Which test should be used when comparing blood pressure of males and females?

T-Test Independent (Unpaired T-test)

Which concept allows dividing a distribution into four equal parts?

Quartile

Which measure of location is not linearly transformable and is vital for normalized standardized scores?

Percentile Rank

Which of the following tests is appropriate when comparing board reviewers' focus levels during different times of the day?

One-Way ANOVA

What is the primary feature of the Angoff Method?

Low interrater reliability

What reliability coefficient value range is considered 'Good'?

80-89

Which method involves placing a 'bookmark' to differentiate between test-takers?

Bookmark Method

In which scenario are you likely to use a relative cut score?

Norm-referenced considerations

Which validity coefficient value range is likely to be useful?

21-35

Which method of setting cut scores is based on test-taker performance across all items on the test?

IRT-Based Methods

What kind of method is described as a multi-stage selection process?

Multiple Hurdle

Which of the following best describes the method of predictive yield?

Considering likelihood of offer acceptance

Which cut score setting method involves the use of expert judgments to evaluate examination pass marks?

Angoff Method

What is the utility gain in psychological assessment?

Benefit of using a particular test

What does the Item-Discrimination Index measure?

The difference between high and low scorers in answering a question correctly

What is the purpose of Cross-Validation?

To validate the test on a sample other than the original test group

In Computerized Adaptive Testing, what is 'item branching'?

Adapting the order and content of test items based on previous responses

What is the main characteristic of the Cumulative Scoring Model?

Testtaker obtains a measure of the level of the trait being measured

What is 'validity shrinkage'?

The decrease in item validities after cross-validation

What does the term 'Differential Item Functioning' (DIF) refer to?

When an item functions differently in one group of testtakers who have the same trait level

What is an 'Anchor Protocol' used for?

To resolve scoring discrepancies using a highly authoritative model score

Which of the following accurately describes the Point-Biserial Method?

Measuring the correlation between a dichotomous variable and a continuous variable

What does 'co-validation' entail in psychological assessment?

Conducting validation on two or more tests using the same sample of testtakers

What phenomenon does 'floor effects' describe in performance measurement?

A large percentage of respondents scoring near a lower limit

Study Notes

Psychological Assessment

Error: Scorer Differences

  • Evaluates the degree of agreement between two or more scorers on a particular measure
  • Calculated by determining the percentage of times two individuals assign the same scores to the performance of examinees
  • Variations: having two examiners test the same client using the same test and determining the closeness of their scores or ratings
  • Measures of scorer differences:
    • Fleiss Kappa: determines the level of agreement between two or more raters on a categorical scale
    • Cohen's Kappa: used for two raters only
    • Krippendorff's Alpha: used for two or more raters, correcting for chance agreement

Tests Designed

  • Homogenous tests: designed to measure one factor, expected to have a high degree of internal consistency
  • Dynamic tests: measure traits, states, or abilities that are fast-changing as a function of situational and cognitive experience
  • Static tests: measure traits, states, or abilities that are relatively unchanging
  • Restriction of range or variance: when the variance of either variable in a correlational analysis is restricted, resulting in a lower correlation coefficient

Power Tests

  • Designed to allow test-takers to attempt all items within a time limit
  • Measures a test-taker's ability to complete a task accurately and efficiently

Speed Tests

  • Contain items of uniform difficulty with a time limit
  • Reliability should be based on performance from two independent testing periods using test-retest, alternate-forms, or split-half reliability

Criterion-Referenced Tests

  • Designed to provide an indication of where a test-taker stands with respect to a criterion
  • As individual differences decrease, traditional measures of reliability also decrease, regardless of individual performance stability

Classical Test Theory

  • Assumes that everyone has a "true score" on a test
  • True score reflects an individual's ability level as measured by a particular test
  • Random error affects the observed score

Domain Sampling Theory

  • Estimates the extent to which specific sources of variation contribute to test scores
  • Considers problems created by using a limited number of items to represent a large construct

Test Reliability

  • Conceived as an objective measure of how precisely a test score assesses a domain
  • Reliability is a function of the proportion of total variance attributed to true variance

Generalizability Theory

  • Based on the idea that test scores vary due to variables in the testing situation
  • Universe: the test situation
  • Facets: number of items, amount of review, and purpose of test administration
  • Given the same conditions, the same test score should be obtained (universe score)

Decision Study

  • Examines the usefulness of test scores in helping test users make decisions

Systematic Error

  • Factors inherent in a test that prevent accurate, impartial measurement

Item Response Theory

  • The probability of a person with a certain ability level performing at a certain level on a test
  • Focuses on item difficulty

Latent-Trait Theory

  • A system of assumptions about measurement and the extent to which items measure a trait
  • Computers are used to focus on the range of item difficulty that helps assess an individual's ability level
  • If a person answers easy items correctly, the computer will move to more difficult items
  • Item attributes: difficulty, discrimination, and dichotomousness

Construct Validity (Umbrella Validity)

  • Covers all types of validity
  • Logical and statistical
  • Judgment about the appropriateness of inferences drawn from test scores regarding individual standing on a variable called a construct

Criterion Validity

  • More statistical than logical
  • Judgment about the adequacy of test scores in inferring an individual's standing on a criterion measure
  • Criterion: a standard on which a judgment or decision may be made
  • Characteristics: relevant, valid, uncontaminated
  • Types of criterion validity: concurrent, predictive, and incremental validity

Factor Analysis

  • Designed to identify factors or variables that are typically attributes, characteristics, or dimensions on which people may differ
  • Developed by Charles Spearman
  • Employed as a data reduction method
  • Used to study the interrelationships among a set of variables
  • Types of factor analysis: explanatory, confirmatory, and factor loading

Cross-Validation

  • Validation of a test to a criterion based on a different group from the original group
  • Validity shrinkage: a decrease in validity after cross-validation
  • Co-validation: validation of more than one test from the same group
  • Co-norming: norming more than one test from the same group

Bias

  • Factors inherent in a test that systematically prevent accurate, impartial measurement
  • Prevention: during test development through procedures such as estimated true score transformation

Rating

  • Numerical or verbal judgment that places a person or attribute along a continuum identified by a scale
  • Rating error: intentional or unintentional misuse of the scale
  • Types of rating error: leniency, severity, central tendency, and halo effect
  • One way to overcome rating errors is to use rankings### Measures of Central Tendency
  • Mode: Most frequently occurring score in the distribution; useful for nominal scales and discrete variables; gives an indication of the shape of the distribution.

Measures of Spread or Variability

  • Range: Equal to the difference between the highest and lowest scores; provides a quick but gross description of the spread of scores.
  • Variance: Equal to the square root of the average squared deviations about the mean; measures the distance from the mean.

Measures of Location

  • Percentile or Percentile Rank: Expressed in terms of the percentage of persons in the standardization sample who fall below a given score; essential in creating normalized standardized scores.
  • Quartile: Dividing points between the four quarters in the distribution; a specific point that refers to an interval.

Correlation Coefficients

  • Spearman Rho: Used for ordinal + ordinal data.
  • Biserial: Used for true dichotomous + interval/ratio data.
  • Point Biserial: Used for nominal (true dic) + nominal (true/artificial dic) data.
  • Phi Coefficient: Used for artificial dichotomous + artificial dichotomous data.
  • Tetrachoric: Used for 3 or more ordinal/rank data or nominal ordinal data.
  • Kendall's Rank Biserial Differences: Used for two separate groups with random assignment.

Inferential Statistics

  • T-test Independent (Unpaired T-test): Used for one group with two scores.
  • T-Test Dependent (Paired T-test): Used for one group measured at least twice.
  • One-Way ANOVA: Used for one group measured at least three times.
  • One-Way Repeated Measures: Used for 3 or more groups tested for 2 variables.
  • Two-Way ANOVA: Used when controlling for an additional variable that may influence the relationship between the independent and dependent variables.
  • ANCOVA: Used when controlling for an additional variable that may influence the relationship between the independent and dependent variables.

Utility Gain and Productivity Gains

  • Utility Gain: Estimate of the benefit of using a particular test.
  • Productivity Gains: Estimated increase in work output.

Cut Scores

  • Cut Score: Reference point derived as a result of a judgment and used to divide a set of data into two or more classifications.
  • Relative Cut Score: Reference point based on norm-referenced considerations, not fixed per se.
  • Fixed Cut Scores: Set with reference to a judgment concerning the minimum level of proficiency required.
  • Multiple Cut Scores: Refers to the use of two or more cut scores with reference to one predictor for the purpose of categorization.
  • Multiple Hurdle: Multi-stage selection process, with a cut score in place for each predictor.
  • Compensatory Model of Selection: Assumption that high scores on one attribute can compensate for lower scores on another.

Setting Cut Scores

  • Angoff Method: Expert judges discuss and evaluate the examination using a well-defined and rational procedure.
  • Known Groups Method: Collection of data on the predictor of interest from groups known to possess and not possess a trait of interest.
  • IRT-Based Methods: Cut scores are typically set based on test-taker's performance across all the items on the test.
  • Item-Mapping Method: Arrangement of items in a histogram, with each column containing items deemed to be equivalent value.
  • Bookmark Method: Expert places a "bookmark" between the two pages that are deemed to separate test-takers who have acquired the minimal knowledge, skills, and/or abilities from those who are not.

Discriminant Analysis

  • Discriminant Analysis: Used to analyze the research data when the criterion or dependent variable is categorical, and the predictor or independent variable is interval in nature.

Reliability and Validity

  • Reliability: Excellent (0.90 and up), Good (0.80-0.89), Adequate (0.70-0.79), and Limited Applicability (below 0.70).
  • Validity: Very beneficial (above 0.35), Likely to be useful (0.21-0.35), Depends on circumstances (0.11-0.20), and Unlikely to be useful (below 0.11).

Item Analysis

  • Item-Validity Index: Designed to provide an indication of the degree to which a test is measuring what it purports to measure.
  • Item-Discrimination Index: Measures the difference between the proportion of high scorers answering a question correctly and the proportion of low scorers answering it correctly.

Scoring Models

  • Cumulative Model: Test-taker obtains a measure of the level of the trait, thus high scorers may suggest a high level in the trait being measured.
  • Class Scoring/Category Scoring: Test-taker response earns credit toward placement in a particular class or category with other test-takers whose pattern of responses is similar in some way.

Test Revision

  • Ipsative Scoring: Compares test-taker's score on one scale within a test to another scale within that same test, where the two scales measure unrelated constructs.
  • Cross-Validation: Revalidation of a test on a sample of test-takers other than those on whom test performance was originally found to be a valid predictor of some criterion.

Computerized Adaptive Testing

  • Computerized Adaptive Testing: An interactive, computer-administered test-taking process wherein items presented to the test-taker are based in part on the test-taker's performance on previous items.

Computer Assisted Psychological Assessment

  • Computer Assisted Psychological Assessment: Standardized test administration is assured for test-takers, and variation is kept to a minimum; test content and length are tailored according to the test-taker's ability.

This quiz assesses your understanding of scorer differences in psychological assessment, including measures of agreement between scorers. Learn about Fleiss Kappa and other metrics.

Make Your Own Quizzes and Flashcards

Convert your notes into interactive study material.

Get started for free

More Quizzes Like This

Use Quizgecko on...
Browser
Browser