Psychology: Scorer Differences in Assessment

Podcast

Play an AI-generated podcast conversation about this lesson

Download our mobile app to listen on the go

Get App

Questions and Answers

Which method is used to determine the level of agreement between two or more raters when the assessment is on a categorical scale?

Krippendorff's Alpha
Generalizability Theory
Fleiss Kappa (correct)
Cohen's Kappa

What is a Dynamic trait in the context of psychological assessment?

A trait that barely changes or remains relatively unchanging
A measure that focuses on item difficulty
A variable that provides an indication of where a test taker stands with respect to a criterion
A characteristic presumed to be fast-changing as a function of situational and cognitive experience (correct)

What does Generalizability Theory suggest about a person's test score?

The test score genuinely reflects an individual's ability level
A person's test score is completely variable and cannot be predicted
Item difficulty is the primary consideration
Given the exact same conditions, the exact same test score should be obtained (correct)

According to Item Response Theory, what is the primary focus?

The probability that a person with X ability will be able to perform at a level of Y (D) Signup and view all the answers

What is indicated by a high degree of internal consistency in a test designed to measure one factor?

The test has a high degree of reliability (D) Signup and view all the answers

What does Krippendorff's Alpha measure?

Agreement among two or more raters, based on observed disagreement corrected for expected disagreement by chance (B) Signup and view all the answers

In Classical Test Theory, what is the 'True Score'?

A genuinely reflective measurement of an individual's ability level on a particular test (B) Signup and view all the answers

What is the purpose of a Decision Study in psychological assessment?

To examine the usefulness of test scores in helping the test user make decisions (D) Signup and view all the answers

How is the reliability of Speed Tests typically evaluated?

Through test-retest and alternate-forms or split-half-reliability across independent testing periods (D) Signup and view all the answers

What does Domain Sampling Theory estimate in psychological assessments?

The extent to which specific sources of variation under defined conditions are contributing to the test scores (D) Signup and view all the answers

Which type of validity is judged by how well a test score can be used to infer an individual's probable standing on a measure of interest?

Criterion Validity (A) Signup and view all the answers

What does the Halo Effect in rating errors refer to?

Tendency to give high scores due to failure to discriminate among distinct aspects (C) Signup and view all the answers

Which term describes the use of additional predictors to explain the criterion measure beyond what is explained by existing predictors?

Incremental Validity (A) Signup and view all the answers

Which factor analysis type is used for estimating factors and deciding how many to retain?

Exploratory Factor Analysis (A) Signup and view all the answers

Which statistical procedure used in test development helps prevent bias and ensure accurate, impartial measurement?

Estimated True Score Transformation (D) Signup and view all the answers

What is criterion contamination?

When criterion measure includes aspects of performance not part of the job (C) Signup and view all the answers

In the context of construct validity, what is a construct?

Unobservable, scientific idea to explain behavior (C) Signup and view all the answers

What does a high Factor Loading signify?

High influence of the factor on the test scores (C) Signup and view all the answers

What is the purpose of co-validation?

Validation of more than one test using the same group (B) Signup and view all the answers

The concept of fairness in psychological assessment refers to:

Use of the test in an impartial, just, and equitable manner (A) Signup and view all the answers

Which of the following best describes item sampling or content sampling in psychological assessment?

Variation among items within a test and between tests (C) Signup and view all the answers

Which type of error is caused by influences such as noise or weather conditions during testing?

Random error (D) Signup and view all the answers

In the context of psychological assessment, what does the True Score Formula aim to represent?

The estimated true score accounting for variance (B) Signup and view all the answers

Which of the following is an appropriate use of test-retest reliability?

Evaluating a test that measures an unchanging attribute (A) Signup and view all the answers

What effect might occur if the interval between test-retest administrations is short?

Practice effect (C) Signup and view all the answers

What is the main difference between parallel forms and alternate forms of a test?

Parallel forms have different items but same true score (A) Signup and view all the answers

Which technique is used to avoid carryover effects in parallel forms of a test?

Counterbalancing (C) Signup and view all the answers

The presence of which factor would NOT likely influence the validity coefficient of a test?

Systematic error (B) Signup and view all the answers

Which of the following statements about systematic error is TRUE?

Is a consistent source of error across all measurements (D) Signup and view all the answers

Which example would most likely demonstrate a low correlation in a test-retest reliability measure?

Long interval with significant external changes (D) Signup and view all the answers

What is the primary purpose of a test blueprint in psychological assessment?

To plan the types of information and number of items to be covered (D) Signup and view all the answers

Which term refers to the failure to capture important elements of a construct within a test?

Construct underrepresentation (D) Signup and view all the answers

The formula $CVR = \frac{N_e - N/2}{N/2}$ is associated with which concept?

Content Validity Ratio (D) Signup and view all the answers

If exactly half of the experts rate a test item as essential, what is the CVR value?

0 (C) Signup and view all the answers

What type of validity involves measuring the relationship between test scores and a criterion at a future time?

Predictive validity (D) Signup and view all the answers

Which validity type is considered 'umbrella validity' and covers all other types of validities?

Construct validity (D) Signup and view all the answers

What type of evidence demonstrates that test scores vary predictably based on group membership?

Method of contrasted groups (B) Signup and view all the answers

Which term describes the degree to which an additional predictor provides unique information about a criterion?

Incremental validity (C) Signup and view all the answers

When a test developer eliminates items that do not show significant correlation with the total test score, they aim to improve which characteristic?

Homogeneity (A) Signup and view all the answers

If two tests measure the same construct and their scores highly correlate, what type of evidence is this?

Convergent evidence (C) Signup and view all the answers

What is the primary focus of a mechanical prediction in psychological assessment?

Generating findings and recommendations using computer algorithms and statistical rules (B) Signup and view all the answers

According to Level 1 interpretation in psychological assessments, which of the following is NOT a characteristic?

Concern with intervening processes (C) Signup and view all the answers

How are psychological traits expected to behave across time according to the assumptions about psychological testing and assessment?

They are relatively enduring and remain rather stable across time (D) Signup and view all the answers

Which of the following best describes 'states' in psychological assessment?

Relatively less enduring patterns of thinking, feeling, and behaving in specific situations (B) Signup and view all the answers

What role do test reviewers play in psychological assessment?

They prepare evaluative critiques based on technical and practical aspects of tests (A) Signup and view all the answers

What is a profile in the context of psychological assessment?

A table or graph showing the extent to which a person has demonstrated certain targeted characteristics (C) Signup and view all the answers

Which of the following is NOT considered part of the 'parties in psychological assessment'?

Test Administrators (B) Signup and view all the answers

What assumption about psychological traits and states is made in psychological testing?

Psychological traits permit prediction of future behavior based on past behavior (C) Signup and view all the answers

Which of the following best describes actuarial assessment?

An approach characterized by the use of empirically demonstrated statistical rules (A) Signup and view all the answers

What is 'extra-test history' in psychological assessment?

Observations made during testing that are indirectly related to its specific content (C) Signup and view all the answers

When should the median be used instead of the mode for describing central tendency?

For ratio/interval data distributions that are skewed (C) Signup and view all the answers

Which statistical measure describes the amount of variation in a distribution?

Measure of spread (B) Signup and view all the answers

If a dataset has widely varied scores, how might this affect the central tendency?

It implies a large spread of values, indicating large differences between individual scores. (A) Signup and view all the answers

Which of the following correlations is used with one ordinal and one interval/ratio variable?

Spearman Rho (C) Signup and view all the answers

Which measure of spread provides a quick but gross summary of the scores?

Range (D) Signup and view all the answers

What does the value of the mode indicate in a distribution?

The shape of the distribution and central tendency (B) Signup and view all the answers

Which test should be used when comparing blood pressure of males and females?

T-Test Independent (Unpaired T-test) (D) Signup and view all the answers

Which concept allows dividing a distribution into four equal parts?

Quartile (D) Signup and view all the answers

Which measure of location is not linearly transformable and is vital for normalized standardized scores?

Percentile Rank (C) Signup and view all the answers

Which of the following tests is appropriate when comparing board reviewers' focus levels during different times of the day?

One-Way ANOVA (C) Signup and view all the answers

What is the primary feature of the Angoff Method?

Low interrater reliability (C) Signup and view all the answers

What reliability coefficient value range is considered 'Good'?

80-89 (C) Signup and view all the answers

Which method involves placing a 'bookmark' to differentiate between test-takers?

Bookmark Method (C) Signup and view all the answers

In which scenario are you likely to use a relative cut score?

Norm-referenced considerations (C) Signup and view all the answers

Which validity coefficient value range is likely to be useful?

21-35 (A) Signup and view all the answers

Which method of setting cut scores is based on test-taker performance across all items on the test?

IRT-Based Methods (D) Signup and view all the answers

What kind of method is described as a multi-stage selection process?

Multiple Hurdle (A) Signup and view all the answers

Which of the following best describes the method of predictive yield?

Considering likelihood of offer acceptance (B) Signup and view all the answers

Which cut score setting method involves the use of expert judgments to evaluate examination pass marks?

Angoff Method (A) Signup and view all the answers

What is the utility gain in psychological assessment?

Benefit of using a particular test (D) Signup and view all the answers

What does the Item-Discrimination Index measure?

The difference between high and low scorers in answering a question correctly (A) Signup and view all the answers

What is the purpose of Cross-Validation?

To validate the test on a sample other than the original test group (A) Signup and view all the answers

In Computerized Adaptive Testing, what is 'item branching'?

Adapting the order and content of test items based on previous responses (A) Signup and view all the answers

What is the main characteristic of the Cumulative Scoring Model?

Testtaker obtains a measure of the level of the trait being measured (A) Signup and view all the answers

What is 'validity shrinkage'?

The decrease in item validities after cross-validation (B) Signup and view all the answers

What does the term 'Differential Item Functioning' (DIF) refer to?

When an item functions differently in one group of testtakers who have the same trait level (D) Signup and view all the answers

What is an 'Anchor Protocol' used for?

To resolve scoring discrepancies using a highly authoritative model score (B) Signup and view all the answers

Which of the following accurately describes the Point-Biserial Method?

Measuring the correlation between a dichotomous variable and a continuous variable (C) Signup and view all the answers

What does 'co-validation' entail in psychological assessment?

Conducting validation on two or more tests using the same sample of testtakers (A) Signup and view all the answers

What phenomenon does 'floor effects' describe in performance measurement?

A large percentage of respondents scoring near a lower limit (D) Signup and view all the answers

Flashcards are hidden until you start studying

Study Notes

Psychological Assessment

Error: Scorer Differences

Evaluates the degree of agreement between two or more scorers on a particular measure
Calculated by determining the percentage of times two individuals assign the same scores to the performance of examinees
Variations: having two examiners test the same client using the same test and determining the closeness of their scores or ratings
Measures of scorer differences:
- Fleiss Kappa: determines the level of agreement between two or more raters on a categorical scale
- Cohen's Kappa: used for two raters only
- Krippendorff's Alpha: used for two or more raters, correcting for chance agreement

Tests Designed

Homogenous tests: designed to measure one factor, expected to have a high degree of internal consistency
Dynamic tests: measure traits, states, or abilities that are fast-changing as a function of situational and cognitive experience
Static tests: measure traits, states, or abilities that are relatively unchanging
Restriction of range or variance: when the variance of either variable in a correlational analysis is restricted, resulting in a lower correlation coefficient

Power Tests

Designed to allow test-takers to attempt all items within a time limit
Measures a test-taker's ability to complete a task accurately and efficiently

Speed Tests

Contain items of uniform difficulty with a time limit
Reliability should be based on performance from two independent testing periods using test-retest, alternate-forms, or split-half reliability

Criterion-Referenced Tests

Designed to provide an indication of where a test-taker stands with respect to a criterion
As individual differences decrease, traditional measures of reliability also decrease, regardless of individual performance stability

Classical Test Theory

Assumes that everyone has a "true score" on a test
True score reflects an individual's ability level as measured by a particular test
Random error affects the observed score

Domain Sampling Theory

Estimates the extent to which specific sources of variation contribute to test scores
Considers problems created by using a limited number of items to represent a large construct

Test Reliability

Conceived as an objective measure of how precisely a test score assesses a domain
Reliability is a function of the proportion of total variance attributed to true variance

Generalizability Theory

Based on the idea that test scores vary due to variables in the testing situation
Universe: the test situation
Facets: number of items, amount of review, and purpose of test administration
Given the same conditions, the same test score should be obtained (universe score)

Decision Study

Examines the usefulness of test scores in helping test users make decisions

Systematic Error

Factors inherent in a test that prevent accurate, impartial measurement

Item Response Theory

The probability of a person with a certain ability level performing at a certain level on a test
Focuses on item difficulty

Latent-Trait Theory

A system of assumptions about measurement and the extent to which items measure a trait
Computers are used to focus on the range of item difficulty that helps assess an individual's ability level
If a person answers easy items correctly, the computer will move to more difficult items
Item attributes: difficulty, discrimination, and dichotomousness

Construct Validity (Umbrella Validity)

Covers all types of validity
Logical and statistical
Judgment about the appropriateness of inferences drawn from test scores regarding individual standing on a variable called a construct

Criterion Validity

More statistical than logical
Judgment about the adequacy of test scores in inferring an individual's standing on a criterion measure
Criterion: a standard on which a judgment or decision may be made
Characteristics: relevant, valid, uncontaminated
Types of criterion validity: concurrent, predictive, and incremental validity

Factor Analysis

Designed to identify factors or variables that are typically attributes, characteristics, or dimensions on which people may differ
Developed by Charles Spearman
Employed as a data reduction method
Used to study the interrelationships among a set of variables
Types of factor analysis: explanatory, confirmatory, and factor loading

Cross-Validation

Validation of a test to a criterion based on a different group from the original group
Validity shrinkage: a decrease in validity after cross-validation
Co-validation: validation of more than one test from the same group
Co-norming: norming more than one test from the same group

Bias

Factors inherent in a test that systematically prevent accurate, impartial measurement
Prevention: during test development through procedures such as estimated true score transformation

Rating

Numerical or verbal judgment that places a person or attribute along a continuum identified by a scale
Rating error: intentional or unintentional misuse of the scale
Types of rating error: leniency, severity, central tendency, and halo effect
One way to overcome rating errors is to use rankings### Measures of Central Tendency
Mode: Most frequently occurring score in the distribution; useful for nominal scales and discrete variables; gives an indication of the shape of the distribution.

Measures of Spread or Variability

Range: Equal to the difference between the highest and lowest scores; provides a quick but gross description of the spread of scores.
Variance: Equal to the square root of the average squared deviations about the mean; measures the distance from the mean.

Measures of Location

Percentile or Percentile Rank: Expressed in terms of the percentage of persons in the standardization sample who fall below a given score; essential in creating normalized standardized scores.
Quartile: Dividing points between the four quarters in the distribution; a specific point that refers to an interval.

Correlation Coefficients

Spearman Rho: Used for ordinal + ordinal data.
Biserial: Used for true dichotomous + interval/ratio data.
Point Biserial: Used for nominal (true dic) + nominal (true/artificial dic) data.
Phi Coefficient: Used for artificial dichotomous + artificial dichotomous data.
Tetrachoric: Used for 3 or more ordinal/rank data or nominal ordinal data.
Kendall's Rank Biserial Differences: Used for two separate groups with random assignment.

Inferential Statistics

T-test Independent (Unpaired T-test): Used for one group with two scores.
T-Test Dependent (Paired T-test): Used for one group measured at least twice.
One-Way ANOVA: Used for one group measured at least three times.
One-Way Repeated Measures: Used for 3 or more groups tested for 2 variables.
Two-Way ANOVA: Used when controlling for an additional variable that may influence the relationship between the independent and dependent variables.
ANCOVA: Used when controlling for an additional variable that may influence the relationship between the independent and dependent variables.

Utility Gain and Productivity Gains

Utility Gain: Estimate of the benefit of using a particular test.
Productivity Gains: Estimated increase in work output.

Cut Scores

Cut Score: Reference point derived as a result of a judgment and used to divide a set of data into two or more classifications.
Relative Cut Score: Reference point based on norm-referenced considerations, not fixed per se.
Fixed Cut Scores: Set with reference to a judgment concerning the minimum level of proficiency required.
Multiple Cut Scores: Refers to the use of two or more cut scores with reference to one predictor for the purpose of categorization.
Multiple Hurdle: Multi-stage selection process, with a cut score in place for each predictor.
Compensatory Model of Selection: Assumption that high scores on one attribute can compensate for lower scores on another.

Setting Cut Scores

Angoff Method: Expert judges discuss and evaluate the examination using a well-defined and rational procedure.
Known Groups Method: Collection of data on the predictor of interest from groups known to possess and not possess a trait of interest.
IRT-Based Methods: Cut scores are typically set based on test-taker's performance across all the items on the test.
Item-Mapping Method: Arrangement of items in a histogram, with each column containing items deemed to be equivalent value.
Bookmark Method: Expert places a "bookmark" between the two pages that are deemed to separate test-takers who have acquired the minimal knowledge, skills, and/or abilities from those who are not.

Discriminant Analysis

Discriminant Analysis: Used to analyze the research data when the criterion or dependent variable is categorical, and the predictor or independent variable is interval in nature.

Reliability and Validity

Reliability: Excellent (0.90 and up), Good (0.80-0.89), Adequate (0.70-0.79), and Limited Applicability (below 0.70).
Validity: Very beneficial (above 0.35), Likely to be useful (0.21-0.35), Depends on circumstances (0.11-0.20), and Unlikely to be useful (below 0.11).

Item Analysis

Item-Validity Index: Designed to provide an indication of the degree to which a test is measuring what it purports to measure.
Item-Discrimination Index: Measures the difference between the proportion of high scorers answering a question correctly and the proportion of low scorers answering it correctly.

Scoring Models

Cumulative Model: Test-taker obtains a measure of the level of the trait, thus high scorers may suggest a high level in the trait being measured.
Class Scoring/Category Scoring: Test-taker response earns credit toward placement in a particular class or category with other test-takers whose pattern of responses is similar in some way.

Test Revision

Ipsative Scoring: Compares test-taker's score on one scale within a test to another scale within that same test, where the two scales measure unrelated constructs.
Cross-Validation: Revalidation of a test on a sample of test-takers other than those on whom test performance was originally found to be a valid predictor of some criterion.

Computerized Adaptive Testing

Computerized Adaptive Testing: An interactive, computer-administered test-taking process wherein items presented to the test-taker are based in part on the test-taker's performance on previous items.

Computer Assisted Psychological Assessment

Computer Assisted Psychological Assessment: Standardized test administration is assured for test-takers, and variation is kept to a minimum; test content and length are tailored according to the test-taker's ability.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.