Psychology: Scorer Differences in Assessment

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

Which method is used to determine the level of agreement between two or more raters when the assessment is on a categorical scale?

  • Krippendorff's Alpha
  • Generalizability Theory
  • Fleiss Kappa (correct)
  • Cohen's Kappa

What is a Dynamic trait in the context of psychological assessment?

  • A trait that barely changes or remains relatively unchanging
  • A measure that focuses on item difficulty
  • A variable that provides an indication of where a test taker stands with respect to a criterion
  • A characteristic presumed to be fast-changing as a function of situational and cognitive experience (correct)

What does Generalizability Theory suggest about a person's test score?

  • The test score genuinely reflects an individual's ability level
  • A person's test score is completely variable and cannot be predicted
  • Item difficulty is the primary consideration
  • Given the exact same conditions, the exact same test score should be obtained (correct)

According to Item Response Theory, what is the primary focus?

<p>The probability that a person with X ability will be able to perform at a level of Y (D)</p> Signup and view all the answers

What is indicated by a high degree of internal consistency in a test designed to measure one factor?

<p>The test has a high degree of reliability (D)</p> Signup and view all the answers

What does Krippendorff's Alpha measure?

<p>Agreement among two or more raters, based on observed disagreement corrected for expected disagreement by chance (B)</p> Signup and view all the answers

In Classical Test Theory, what is the 'True Score'?

<p>A genuinely reflective measurement of an individual's ability level on a particular test (B)</p> Signup and view all the answers

What is the purpose of a Decision Study in psychological assessment?

<p>To examine the usefulness of test scores in helping the test user make decisions (D)</p> Signup and view all the answers

How is the reliability of Speed Tests typically evaluated?

<p>Through test-retest and alternate-forms or split-half-reliability across independent testing periods (D)</p> Signup and view all the answers

What does Domain Sampling Theory estimate in psychological assessments?

<p>The extent to which specific sources of variation under defined conditions are contributing to the test scores (D)</p> Signup and view all the answers

Which type of validity is judged by how well a test score can be used to infer an individual's probable standing on a measure of interest?

<p>Criterion Validity (A)</p> Signup and view all the answers

What does the Halo Effect in rating errors refer to?

<p>Tendency to give high scores due to failure to discriminate among distinct aspects (C)</p> Signup and view all the answers

Which term describes the use of additional predictors to explain the criterion measure beyond what is explained by existing predictors?

<p>Incremental Validity (A)</p> Signup and view all the answers

Which factor analysis type is used for estimating factors and deciding how many to retain?

<p>Exploratory Factor Analysis (A)</p> Signup and view all the answers

Which statistical procedure used in test development helps prevent bias and ensure accurate, impartial measurement?

<p>Estimated True Score Transformation (D)</p> Signup and view all the answers

What is criterion contamination?

<p>When criterion measure includes aspects of performance not part of the job (C)</p> Signup and view all the answers

In the context of construct validity, what is a construct?

<p>Unobservable, scientific idea to explain behavior (C)</p> Signup and view all the answers

What does a high Factor Loading signify?

<p>High influence of the factor on the test scores (C)</p> Signup and view all the answers

What is the purpose of co-validation?

<p>Validation of more than one test using the same group (B)</p> Signup and view all the answers

The concept of fairness in psychological assessment refers to:

<p>Use of the test in an impartial, just, and equitable manner (A)</p> Signup and view all the answers

Which of the following best describes item sampling or content sampling in psychological assessment?

<p>Variation among items within a test and between tests (C)</p> Signup and view all the answers

Which type of error is caused by influences such as noise or weather conditions during testing?

<p>Random error (D)</p> Signup and view all the answers

In the context of psychological assessment, what does the True Score Formula aim to represent?

<p>The estimated true score accounting for variance (B)</p> Signup and view all the answers

Which of the following is an appropriate use of test-retest reliability?

<p>Evaluating a test that measures an unchanging attribute (A)</p> Signup and view all the answers

What effect might occur if the interval between test-retest administrations is short?

<p>Practice effect (C)</p> Signup and view all the answers

What is the main difference between parallel forms and alternate forms of a test?

<p>Parallel forms have different items but same true score (A)</p> Signup and view all the answers

Which technique is used to avoid carryover effects in parallel forms of a test?

<p>Counterbalancing (C)</p> Signup and view all the answers

The presence of which factor would NOT likely influence the validity coefficient of a test?

<p>Systematic error (B)</p> Signup and view all the answers

Which of the following statements about systematic error is TRUE?

<p>Is a consistent source of error across all measurements (D)</p> Signup and view all the answers

Which example would most likely demonstrate a low correlation in a test-retest reliability measure?

<p>Long interval with significant external changes (D)</p> Signup and view all the answers

What is the primary purpose of a test blueprint in psychological assessment?

<p>To plan the types of information and number of items to be covered (D)</p> Signup and view all the answers

Which term refers to the failure to capture important elements of a construct within a test?

<p>Construct underrepresentation (D)</p> Signup and view all the answers

The formula $CVR = \frac{N_e - N/2}{N/2}$ is associated with which concept?

<p>Content Validity Ratio (D)</p> Signup and view all the answers

If exactly half of the experts rate a test item as essential, what is the CVR value?

<p>0 (C)</p> Signup and view all the answers

What type of validity involves measuring the relationship between test scores and a criterion at a future time?

<p>Predictive validity (D)</p> Signup and view all the answers

Which validity type is considered 'umbrella validity' and covers all other types of validities?

<p>Construct validity (D)</p> Signup and view all the answers

What type of evidence demonstrates that test scores vary predictably based on group membership?

<p>Method of contrasted groups (B)</p> Signup and view all the answers

Which term describes the degree to which an additional predictor provides unique information about a criterion?

<p>Incremental validity (C)</p> Signup and view all the answers

When a test developer eliminates items that do not show significant correlation with the total test score, they aim to improve which characteristic?

<p>Homogeneity (A)</p> Signup and view all the answers

If two tests measure the same construct and their scores highly correlate, what type of evidence is this?

<p>Convergent evidence (C)</p> Signup and view all the answers

What is the primary focus of a mechanical prediction in psychological assessment?

<p>Generating findings and recommendations using computer algorithms and statistical rules (B)</p> Signup and view all the answers

According to Level 1 interpretation in psychological assessments, which of the following is NOT a characteristic?

<p>Concern with intervening processes (C)</p> Signup and view all the answers

How are psychological traits expected to behave across time according to the assumptions about psychological testing and assessment?

<p>They are relatively enduring and remain rather stable across time (D)</p> Signup and view all the answers

Which of the following best describes 'states' in psychological assessment?

<p>Relatively less enduring patterns of thinking, feeling, and behaving in specific situations (B)</p> Signup and view all the answers

What role do test reviewers play in psychological assessment?

<p>They prepare evaluative critiques based on technical and practical aspects of tests (A)</p> Signup and view all the answers

What is a profile in the context of psychological assessment?

<p>A table or graph showing the extent to which a person has demonstrated certain targeted characteristics (C)</p> Signup and view all the answers

Which of the following is NOT considered part of the 'parties in psychological assessment'?

<p>Test Administrators (B)</p> Signup and view all the answers

What assumption about psychological traits and states is made in psychological testing?

<p>Psychological traits permit prediction of future behavior based on past behavior (C)</p> Signup and view all the answers

Which of the following best describes actuarial assessment?

<p>An approach characterized by the use of empirically demonstrated statistical rules (A)</p> Signup and view all the answers

What is 'extra-test history' in psychological assessment?

<p>Observations made during testing that are indirectly related to its specific content (C)</p> Signup and view all the answers

When should the median be used instead of the mode for describing central tendency?

<p>For ratio/interval data distributions that are skewed (C)</p> Signup and view all the answers

Which statistical measure describes the amount of variation in a distribution?

<p>Measure of spread (B)</p> Signup and view all the answers

If a dataset has widely varied scores, how might this affect the central tendency?

<p>It implies a large spread of values, indicating large differences between individual scores. (A)</p> Signup and view all the answers

Which of the following correlations is used with one ordinal and one interval/ratio variable?

<p>Spearman Rho (C)</p> Signup and view all the answers

Which measure of spread provides a quick but gross summary of the scores?

<p>Range (D)</p> Signup and view all the answers

What does the value of the mode indicate in a distribution?

<p>The shape of the distribution and central tendency (B)</p> Signup and view all the answers

Which test should be used when comparing blood pressure of males and females?

<p>T-Test Independent (Unpaired T-test) (D)</p> Signup and view all the answers

Which concept allows dividing a distribution into four equal parts?

<p>Quartile (D)</p> Signup and view all the answers

Which measure of location is not linearly transformable and is vital for normalized standardized scores?

<p>Percentile Rank (C)</p> Signup and view all the answers

Which of the following tests is appropriate when comparing board reviewers' focus levels during different times of the day?

<p>One-Way ANOVA (C)</p> Signup and view all the answers

What is the primary feature of the Angoff Method?

<p>Low interrater reliability (C)</p> Signup and view all the answers

What reliability coefficient value range is considered 'Good'?

<p>80-89 (C)</p> Signup and view all the answers

Which method involves placing a 'bookmark' to differentiate between test-takers?

<p>Bookmark Method (C)</p> Signup and view all the answers

In which scenario are you likely to use a relative cut score?

<p>Norm-referenced considerations (C)</p> Signup and view all the answers

Which validity coefficient value range is likely to be useful?

<p>21-35 (A)</p> Signup and view all the answers

Which method of setting cut scores is based on test-taker performance across all items on the test?

<p>IRT-Based Methods (D)</p> Signup and view all the answers

What kind of method is described as a multi-stage selection process?

<p>Multiple Hurdle (A)</p> Signup and view all the answers

Which of the following best describes the method of predictive yield?

<p>Considering likelihood of offer acceptance (B)</p> Signup and view all the answers

Which cut score setting method involves the use of expert judgments to evaluate examination pass marks?

<p>Angoff Method (A)</p> Signup and view all the answers

What is the utility gain in psychological assessment?

<p>Benefit of using a particular test (D)</p> Signup and view all the answers

What does the Item-Discrimination Index measure?

<p>The difference between high and low scorers in answering a question correctly (A)</p> Signup and view all the answers

What is the purpose of Cross-Validation?

<p>To validate the test on a sample other than the original test group (A)</p> Signup and view all the answers

In Computerized Adaptive Testing, what is 'item branching'?

<p>Adapting the order and content of test items based on previous responses (A)</p> Signup and view all the answers

What is the main characteristic of the Cumulative Scoring Model?

<p>Testtaker obtains a measure of the level of the trait being measured (A)</p> Signup and view all the answers

What is 'validity shrinkage'?

<p>The decrease in item validities after cross-validation (B)</p> Signup and view all the answers

What does the term 'Differential Item Functioning' (DIF) refer to?

<p>When an item functions differently in one group of testtakers who have the same trait level (D)</p> Signup and view all the answers

What is an 'Anchor Protocol' used for?

<p>To resolve scoring discrepancies using a highly authoritative model score (B)</p> Signup and view all the answers

Which of the following accurately describes the Point-Biserial Method?

<p>Measuring the correlation between a dichotomous variable and a continuous variable (C)</p> Signup and view all the answers

What does 'co-validation' entail in psychological assessment?

<p>Conducting validation on two or more tests using the same sample of testtakers (A)</p> Signup and view all the answers

What phenomenon does 'floor effects' describe in performance measurement?

<p>A large percentage of respondents scoring near a lower limit (D)</p> Signup and view all the answers

Flashcards are hidden until you start studying

Study Notes

Psychological Assessment

Error: Scorer Differences

  • Evaluates the degree of agreement between two or more scorers on a particular measure
  • Calculated by determining the percentage of times two individuals assign the same scores to the performance of examinees
  • Variations: having two examiners test the same client using the same test and determining the closeness of their scores or ratings
  • Measures of scorer differences:
    • Fleiss Kappa: determines the level of agreement between two or more raters on a categorical scale
    • Cohen's Kappa: used for two raters only
    • Krippendorff's Alpha: used for two or more raters, correcting for chance agreement

Tests Designed

  • Homogenous tests: designed to measure one factor, expected to have a high degree of internal consistency
  • Dynamic tests: measure traits, states, or abilities that are fast-changing as a function of situational and cognitive experience
  • Static tests: measure traits, states, or abilities that are relatively unchanging
  • Restriction of range or variance: when the variance of either variable in a correlational analysis is restricted, resulting in a lower correlation coefficient

Power Tests

  • Designed to allow test-takers to attempt all items within a time limit
  • Measures a test-taker's ability to complete a task accurately and efficiently

Speed Tests

  • Contain items of uniform difficulty with a time limit
  • Reliability should be based on performance from two independent testing periods using test-retest, alternate-forms, or split-half reliability

Criterion-Referenced Tests

  • Designed to provide an indication of where a test-taker stands with respect to a criterion
  • As individual differences decrease, traditional measures of reliability also decrease, regardless of individual performance stability

Classical Test Theory

  • Assumes that everyone has a "true score" on a test
  • True score reflects an individual's ability level as measured by a particular test
  • Random error affects the observed score

Domain Sampling Theory

  • Estimates the extent to which specific sources of variation contribute to test scores
  • Considers problems created by using a limited number of items to represent a large construct

Test Reliability

  • Conceived as an objective measure of how precisely a test score assesses a domain
  • Reliability is a function of the proportion of total variance attributed to true variance

Generalizability Theory

  • Based on the idea that test scores vary due to variables in the testing situation
  • Universe: the test situation
  • Facets: number of items, amount of review, and purpose of test administration
  • Given the same conditions, the same test score should be obtained (universe score)

Decision Study

  • Examines the usefulness of test scores in helping test users make decisions

Systematic Error

  • Factors inherent in a test that prevent accurate, impartial measurement

Item Response Theory

  • The probability of a person with a certain ability level performing at a certain level on a test
  • Focuses on item difficulty

Latent-Trait Theory

  • A system of assumptions about measurement and the extent to which items measure a trait
  • Computers are used to focus on the range of item difficulty that helps assess an individual's ability level
  • If a person answers easy items correctly, the computer will move to more difficult items
  • Item attributes: difficulty, discrimination, and dichotomousness

Construct Validity (Umbrella Validity)

  • Covers all types of validity
  • Logical and statistical
  • Judgment about the appropriateness of inferences drawn from test scores regarding individual standing on a variable called a construct

Criterion Validity

  • More statistical than logical
  • Judgment about the adequacy of test scores in inferring an individual's standing on a criterion measure
  • Criterion: a standard on which a judgment or decision may be made
  • Characteristics: relevant, valid, uncontaminated
  • Types of criterion validity: concurrent, predictive, and incremental validity

Factor Analysis

  • Designed to identify factors or variables that are typically attributes, characteristics, or dimensions on which people may differ
  • Developed by Charles Spearman
  • Employed as a data reduction method
  • Used to study the interrelationships among a set of variables
  • Types of factor analysis: explanatory, confirmatory, and factor loading

Cross-Validation

  • Validation of a test to a criterion based on a different group from the original group
  • Validity shrinkage: a decrease in validity after cross-validation
  • Co-validation: validation of more than one test from the same group
  • Co-norming: norming more than one test from the same group

Bias

  • Factors inherent in a test that systematically prevent accurate, impartial measurement
  • Prevention: during test development through procedures such as estimated true score transformation

Rating

  • Numerical or verbal judgment that places a person or attribute along a continuum identified by a scale
  • Rating error: intentional or unintentional misuse of the scale
  • Types of rating error: leniency, severity, central tendency, and halo effect
  • One way to overcome rating errors is to use rankings### Measures of Central Tendency
  • Mode: Most frequently occurring score in the distribution; useful for nominal scales and discrete variables; gives an indication of the shape of the distribution.

Measures of Spread or Variability

  • Range: Equal to the difference between the highest and lowest scores; provides a quick but gross description of the spread of scores.
  • Variance: Equal to the square root of the average squared deviations about the mean; measures the distance from the mean.

Measures of Location

  • Percentile or Percentile Rank: Expressed in terms of the percentage of persons in the standardization sample who fall below a given score; essential in creating normalized standardized scores.
  • Quartile: Dividing points between the four quarters in the distribution; a specific point that refers to an interval.

Correlation Coefficients

  • Spearman Rho: Used for ordinal + ordinal data.
  • Biserial: Used for true dichotomous + interval/ratio data.
  • Point Biserial: Used for nominal (true dic) + nominal (true/artificial dic) data.
  • Phi Coefficient: Used for artificial dichotomous + artificial dichotomous data.
  • Tetrachoric: Used for 3 or more ordinal/rank data or nominal ordinal data.
  • Kendall's Rank Biserial Differences: Used for two separate groups with random assignment.

Inferential Statistics

  • T-test Independent (Unpaired T-test): Used for one group with two scores.
  • T-Test Dependent (Paired T-test): Used for one group measured at least twice.
  • One-Way ANOVA: Used for one group measured at least three times.
  • One-Way Repeated Measures: Used for 3 or more groups tested for 2 variables.
  • Two-Way ANOVA: Used when controlling for an additional variable that may influence the relationship between the independent and dependent variables.
  • ANCOVA: Used when controlling for an additional variable that may influence the relationship between the independent and dependent variables.

Utility Gain and Productivity Gains

  • Utility Gain: Estimate of the benefit of using a particular test.
  • Productivity Gains: Estimated increase in work output.

Cut Scores

  • Cut Score: Reference point derived as a result of a judgment and used to divide a set of data into two or more classifications.
  • Relative Cut Score: Reference point based on norm-referenced considerations, not fixed per se.
  • Fixed Cut Scores: Set with reference to a judgment concerning the minimum level of proficiency required.
  • Multiple Cut Scores: Refers to the use of two or more cut scores with reference to one predictor for the purpose of categorization.
  • Multiple Hurdle: Multi-stage selection process, with a cut score in place for each predictor.
  • Compensatory Model of Selection: Assumption that high scores on one attribute can compensate for lower scores on another.

Setting Cut Scores

  • Angoff Method: Expert judges discuss and evaluate the examination using a well-defined and rational procedure.
  • Known Groups Method: Collection of data on the predictor of interest from groups known to possess and not possess a trait of interest.
  • IRT-Based Methods: Cut scores are typically set based on test-taker's performance across all the items on the test.
  • Item-Mapping Method: Arrangement of items in a histogram, with each column containing items deemed to be equivalent value.
  • Bookmark Method: Expert places a "bookmark" between the two pages that are deemed to separate test-takers who have acquired the minimal knowledge, skills, and/or abilities from those who are not.

Discriminant Analysis

  • Discriminant Analysis: Used to analyze the research data when the criterion or dependent variable is categorical, and the predictor or independent variable is interval in nature.

Reliability and Validity

  • Reliability: Excellent (0.90 and up), Good (0.80-0.89), Adequate (0.70-0.79), and Limited Applicability (below 0.70).
  • Validity: Very beneficial (above 0.35), Likely to be useful (0.21-0.35), Depends on circumstances (0.11-0.20), and Unlikely to be useful (below 0.11).

Item Analysis

  • Item-Validity Index: Designed to provide an indication of the degree to which a test is measuring what it purports to measure.
  • Item-Discrimination Index: Measures the difference between the proportion of high scorers answering a question correctly and the proportion of low scorers answering it correctly.

Scoring Models

  • Cumulative Model: Test-taker obtains a measure of the level of the trait, thus high scorers may suggest a high level in the trait being measured.
  • Class Scoring/Category Scoring: Test-taker response earns credit toward placement in a particular class or category with other test-takers whose pattern of responses is similar in some way.

Test Revision

  • Ipsative Scoring: Compares test-taker's score on one scale within a test to another scale within that same test, where the two scales measure unrelated constructs.
  • Cross-Validation: Revalidation of a test on a sample of test-takers other than those on whom test performance was originally found to be a valid predictor of some criterion.

Computerized Adaptive Testing

  • Computerized Adaptive Testing: An interactive, computer-administered test-taking process wherein items presented to the test-taker are based in part on the test-taker's performance on previous items.

Computer Assisted Psychological Assessment

  • Computer Assisted Psychological Assessment: Standardized test administration is assured for test-takers, and variation is kept to a minimum; test content and length are tailored according to the test-taker's ability.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

More Like This

Use Quizgecko on...
Browser
Browser