Podcast
Questions and Answers
Which type of validity involves a judgment about the adequacy of the inferences drawn from test scores about individual's standing on a variable called construct?
Which type of validity involves a judgment about the adequacy of the inferences drawn from test scores about individual's standing on a variable called construct?
- Criterion Validity
- Predictive Validity
- Content Validity
- Construct Validity (correct)
Which term describes the error where a rater's scores tend to cluster in the middle of the rating scale?
Which term describes the error where a rater's scores tend to cluster in the middle of the rating scale?
- Central Tendency Error (correct)
- Severity Error
- Halo Effect
- Leniency Error
Which validity examines the relationship between test scores and a criterion measure obtained at the same time?
Which validity examines the relationship between test scores and a criterion measure obtained at the same time?
- Predictive Validity
- Concurrent Validity (correct)
- Construct Validity
- Incremental Validity
Which of the following methods can be used to enhance the homogeneity of a test containing dichotomous items?
Which of the following methods can be used to enhance the homogeneity of a test containing dichotomous items?
Which concept refers to the validation of a test based on a different group from the original group?
Which concept refers to the validation of a test based on a different group from the original group?
What is the focus of factor analysis?
What is the focus of factor analysis?
What describes a situation where the criterion measure includes irrelevant aspects of performance?
What describes a situation where the criterion measure includes irrelevant aspects of performance?
In factor analysis, which method is used to test the degree to which a hypothetical model fits the actual data?
In factor analysis, which method is used to test the degree to which a hypothetical model fits the actual data?
Which term refers to the extent to which a test is used in an impartial, just, and equitable way?
Which term refers to the extent to which a test is used in an impartial, just, and equitable way?
What term describes the error where a rater inaccurately gives higher scores due to the inability to differentiate between distinct aspects of behavior?
What term describes the error where a rater inaccurately gives higher scores due to the inability to differentiate between distinct aspects of behavior?
What measure is used to determine the level of agreement between two or more raters when the method of assessment is categorical?
What measure is used to determine the level of agreement between two or more raters when the method of assessment is categorical?
Which theory posits that a person's test scores vary due to the variables in the testing situations?
Which theory posits that a person's test scores vary due to the variables in the testing situations?
What is the main focus of Item Response Theory?
What is the main focus of Item Response Theory?
Which measure should be assessed by two independent testing periods when dealing with Speed Tests?
Which measure should be assessed by two independent testing periods when dealing with Speed Tests?
What does a true score genuinely reflect in Classical Test Theory?
What does a true score genuinely reflect in Classical Test Theory?
What do tests designed to measure one factor usually exhibit?
What do tests designed to measure one factor usually exhibit?
Which concept is used for estimating how specific sources of variation contribute to the test scores?
Which concept is used for estimating how specific sources of variation contribute to the test scores?
What information do Criterion-Referenced Tests provide?
What information do Criterion-Referenced Tests provide?
According to Generalizability Theory, under what condition should the exact same test score be obtained?
According to Generalizability Theory, under what condition should the exact same test score be obtained?
Which theory focuses on the extent to which an item measures a specific trait?
Which theory focuses on the extent to which an item measures a specific trait?
What type of error is caused by unpredictable fluctuations in measurement conditions?
What type of error is caused by unpredictable fluctuations in measurement conditions?
Which type of reliability is obtained by correlating pairs of scores from the same individuals on two different administrations of the test?
Which type of reliability is obtained by correlating pairs of scores from the same individuals on two different administrations of the test?
Which type of error variance is associated with a test-taker's motivation or attention during test administration?
Which type of error variance is associated with a test-taker's motivation or attention during test administration?
What effect occurs when the interval between test administrations is short, leading to inflated correlation?
What effect occurs when the interval between test administrations is short, leading to inflated correlation?
When is it appropriate to use test-retest reliability?
When is it appropriate to use test-retest reliability?
What is the main error type associated with parallel forms reliability?
What is the main error type associated with parallel forms reliability?
What does a lower correlation in a test-retest scenario with a longer interval indicate?
What does a lower correlation in a test-retest scenario with a longer interval indicate?
Which method helps avoid carryover effects between parallel forms?
Which method helps avoid carryover effects between parallel forms?
Which type of score consistency would result from computer scoring of objective-type items?
Which type of score consistency would result from computer scoring of objective-type items?
What does a test blueprint primarily ensure in psychological assessment?
What does a test blueprint primarily ensure in psychological assessment?
What is the true score formula used to calculate?
What is the true score formula used to calculate?
What is the Content Validity Ratio (CVR) formula developed by Lawshe?
What is the Content Validity Ratio (CVR) formula developed by Lawshe?
What should be done if the Content Validity Index (CVI) is low?
What should be done if the Content Validity Index (CVI) is low?
What is an indication of Zero Content Validity Ratio (CVR) in a test item?
What is an indication of Zero Content Validity Ratio (CVR) in a test item?
Which type of validity is more logical than statistical?
Which type of validity is more logical than statistical?
What type of evidence involves a validity coefficient showing a high correlation between test scores and an established test?
What type of evidence involves a validity coefficient showing a high correlation between test scores and an established test?
Which validity type involves comparing a test score at one time with a criterion measure obtained at the same time?
Which validity type involves comparing a test score at one time with a criterion measure obtained at the same time?
What does high Construct Validity indicate about a test?
What does high Construct Validity indicate about a test?
Which method demonstrates construct validity by showing predictable score differences across groups?
Which method demonstrates construct validity by showing predictable score differences across groups?
Why might a test item be considered poor if high scorers on an academic test tend to get it wrong and low scorers get it right?
Why might a test item be considered poor if high scorers on an academic test tend to get it wrong and low scorers get it right?
What is a fixed cut score?
What is a fixed cut score?
Which method is used to set cut scores based on the composition of contrasting groups?
Which method is used to set cut scores based on the composition of contrasting groups?
In a compensatory model of selection, what is assumed about high scores on one attribute?
In a compensatory model of selection, what is assumed about high scores on one attribute?
Which method requires expert judges to use a well-defined and rational procedure to determine a pass mark?
Which method requires expert judges to use a well-defined and rational procedure to determine a pass mark?
If a reliability coefficient is below 0.70, how is it interpreted?
If a reliability coefficient is below 0.70, how is it interpreted?
Which reliability coefficient range is considered 'excellent'?
Which reliability coefficient range is considered 'excellent'?
Which method arranges items in a histogram according to their equivalent value?
Which method arranges items in a histogram according to their equivalent value?
What does the utility gain of a particular test estimate?
What does the utility gain of a particular test estimate?
What does a multiple hurdle selection process involve?
What does a multiple hurdle selection process involve?
What validity coefficient range is considered 'very beneficial'?
What validity coefficient range is considered 'very beneficial'?
Which measure of central tendency is most commonly used for nominal data?
Which measure of central tendency is most commonly used for nominal data?
Which situation is a paired T-test used for?
Which situation is a paired T-test used for?
Which statistical test would you use to compare means from more than two groups taken at more than three different times?
Which statistical test would you use to compare means from more than two groups taken at more than three different times?
What does a large spread of values in a distribution indicate?
What does a large spread of values in a distribution indicate?
What measure divides the distribution into four equal parts?
What measure divides the distribution into four equal parts?
What correlation coefficient is used for ordinal data?
What correlation coefficient is used for ordinal data?
Variance is equal to which of the following?
Variance is equal to which of the following?
Which test would you use to compare the blood pressure of male and female graduate students?
Which test would you use to compare the blood pressure of male and female graduate students?
What type of data is most appropriately analyzed with the median?
What type of data is most appropriately analyzed with the median?
What type of correlation is used for a true dichotomous variable and interval/ratio data?
What type of correlation is used for a true dichotomous variable and interval/ratio data?
Which of the following best describes Level 1 interpretation?
Which of the following best describes Level 1 interpretation?
What is the primary characteristic of actuarial assessment?
What is the primary characteristic of actuarial assessment?
During a psychological assessment, who prepares evaluative critiques based on technical and practical aspects of tests?
During a psychological assessment, who prepares evaluative critiques based on technical and practical aspects of tests?
Which term refers to an observable action or the product of an observable action?
Which term refers to an observable action or the product of an observable action?
Which party in psychological assessment is responsible for controlling the distribution of tests?
Which party in psychological assessment is responsible for controlling the distribution of tests?
What is a trait in psychological assessment?
What is a trait in psychological assessment?
Which level of interpretation involves descriptive generalizations and hypothetical constructs?
Which level of interpretation involves descriptive generalizations and hypothetical constructs?
Which statement about mechanical prediction is correct?
Which statement about mechanical prediction is correct?
What is the primary focus of Level 3 interpretation?
What is the primary focus of Level 3 interpretation?
What is extra-test history?
What is extra-test history?
What does the Item-Validity Index measure?
What does the Item-Validity Index measure?
What method assesses the correlation between a dichotomous variable and a continuous variable?
What method assesses the correlation between a dichotomous variable and a continuous variable?
In scoring models, what does the Cumulative Model indicate about high scorers?
In scoring models, what does the Cumulative Model indicate about high scorers?
What term describes the principle of revalidating a test on a sample other than the original test sample?
What term describes the principle of revalidating a test on a sample other than the original test sample?
Which phenomenon is described by a large percentage of respondents scoring near the lower limit on a test?
Which phenomenon is described by a large percentage of respondents scoring near the lower limit on a test?
What function does the Routing Test serve in Computerized Adaptive Testing?
What function does the Routing Test serve in Computerized Adaptive Testing?
Which aspect is crucial in psychological assessment compared to psychological testing?
Which aspect is crucial in psychological assessment compared to psychological testing?
What is the primary focus of an aptitude test?
What is the primary focus of an aptitude test?
What is the primary purpose of DIF Analysis?
What is the primary purpose of DIF Analysis?
What is meant by 'Scoring Drift'?
What is meant by 'Scoring Drift'?
Which type of psychological assessment involves evaluation without the subject being in physical proximity?
Which type of psychological assessment involves evaluation without the subject being in physical proximity?
Which model compares test-taker responses to different scales within the same test?
Which model compares test-taker responses to different scales within the same test?
What characterizes a psychological test's scoring process?
What characterizes a psychological test's scoring process?
What distinguishes an intelligence test from an achievement test?
What distinguishes an intelligence test from an achievement test?
What concept is described as the ability of a computer to tailor the test content based on prior responses?
What concept is described as the ability of a computer to tailor the test content based on prior responses?
Which type of assessment is described as encouraging therapeutic self-discovery?
Which type of assessment is described as encouraging therapeutic self-discovery?
What does the psychometrics field specifically focus on?
What does the psychometrics field specifically focus on?
What is typically measured by a typical performance test?
What is typically measured by a typical performance test?
Which of the following refers to assigning a summary statement of performance, usually numerical in nature?
Which of the following refers to assigning a summary statement of performance, usually numerical in nature?
What is an example of a dynamic assessment approach?
What is an example of a dynamic assessment approach?
When should the Shapiro-Wilk test be used?
When should the Shapiro-Wilk test be used?
What does a p-value of 0.03 in Levene's Test signify?
What does a p-value of 0.03 in Levene's Test signify?
Which test is more sensitive to departures from normality?
Which test is more sensitive to departures from normality?
What is one of the outcomes of the pilot work in test development?
What is one of the outcomes of the pilot work in test development?
Which type of item format offers more than two alternatives?
Which type of item format offers more than two alternatives?
What is the primary purpose of Computerized Adaptive Testing?
What is the primary purpose of Computerized Adaptive Testing?
Which scale arranges items from weaker to stronger expressions of attitude?
Which scale arranges items from weaker to stronger expressions of attitude?
How does Levene's Test determine if variances are equal?
How does Levene's Test determine if variances are equal?
Which process involves brainstorming ideas about the kind of test a developer wants to publish?
Which process involves brainstorming ideas about the kind of test a developer wants to publish?
What does an item pool refer to in test construction?
What does an item pool refer to in test construction?
Study Notes
Psychological Assessment
Error: Scorer Differences
- Evaluates the degree of agreement between two or more scorers on a particular measure
- Calculated by determining the percentage of times two individuals assign the same scores to the performance of examinees
- Variations: having two examiners test the same client using the same test and determining the closeness of their scores or ratings
- Measures of scorer differences:
- Fleiss Kappa: determines the level of agreement between two or more raters on a categorical scale
- Cohen's Kappa: used for two raters only
- Krippendorff's Alpha: used for two or more raters, correcting for chance agreement
Tests Designed
- Homogenous tests: designed to measure one factor, expected to have a high degree of internal consistency
- Dynamic tests: measure traits, states, or abilities that are fast-changing as a function of situational and cognitive experience
- Static tests: measure traits, states, or abilities that are relatively unchanging
- Restriction of range or variance: when the variance of either variable in a correlational analysis is restricted, resulting in a lower correlation coefficient
Power Tests
- Designed to allow test-takers to attempt all items within a time limit
- Measures a test-taker's ability to complete a task accurately and efficiently
Speed Tests
- Contain items of uniform difficulty with a time limit
- Reliability should be based on performance from two independent testing periods using test-retest, alternate-forms, or split-half reliability
Criterion-Referenced Tests
- Designed to provide an indication of where a test-taker stands with respect to a criterion
- As individual differences decrease, traditional measures of reliability also decrease, regardless of individual performance stability
Classical Test Theory
- Assumes that everyone has a "true score" on a test
- True score reflects an individual's ability level as measured by a particular test
- Random error affects the observed score
Domain Sampling Theory
- Estimates the extent to which specific sources of variation contribute to test scores
- Considers problems created by using a limited number of items to represent a large construct
Test Reliability
- Conceived as an objective measure of how precisely a test score assesses a domain
- Reliability is a function of the proportion of total variance attributed to true variance
Generalizability Theory
- Based on the idea that test scores vary due to variables in the testing situation
- Universe: the test situation
- Facets: number of items, amount of review, and purpose of test administration
- Given the same conditions, the same test score should be obtained (universe score)
Decision Study
- Examines the usefulness of test scores in helping test users make decisions
Systematic Error
- Factors inherent in a test that prevent accurate, impartial measurement
Item Response Theory
- The probability of a person with a certain ability level performing at a certain level on a test
- Focuses on item difficulty
Latent-Trait Theory
- A system of assumptions about measurement and the extent to which items measure a trait
- Computers are used to focus on the range of item difficulty that helps assess an individual's ability level
- If a person answers easy items correctly, the computer will move to more difficult items
- Item attributes: difficulty, discrimination, and dichotomousness
Construct Validity (Umbrella Validity)
- Covers all types of validity
- Logical and statistical
- Judgment about the appropriateness of inferences drawn from test scores regarding individual standing on a variable called a construct
Criterion Validity
- More statistical than logical
- Judgment about the adequacy of test scores in inferring an individual's standing on a criterion measure
- Criterion: a standard on which a judgment or decision may be made
- Characteristics: relevant, valid, uncontaminated
- Types of criterion validity: concurrent, predictive, and incremental validity
Factor Analysis
- Designed to identify factors or variables that are typically attributes, characteristics, or dimensions on which people may differ
- Developed by Charles Spearman
- Employed as a data reduction method
- Used to study the interrelationships among a set of variables
- Types of factor analysis: explanatory, confirmatory, and factor loading
Cross-Validation
- Validation of a test to a criterion based on a different group from the original group
- Validity shrinkage: a decrease in validity after cross-validation
- Co-validation: validation of more than one test from the same group
- Co-norming: norming more than one test from the same group
Bias
- Factors inherent in a test that systematically prevent accurate, impartial measurement
- Prevention: during test development through procedures such as estimated true score transformation
Rating
- Numerical or verbal judgment that places a person or attribute along a continuum identified by a scale
- Rating error: intentional or unintentional misuse of the scale
- Types of rating error: leniency, severity, central tendency, and halo effect
- One way to overcome rating errors is to use rankings### Discriminant Evidence
- Definition: A validity coefficient showing little relationship between test scores and/or other variables
- Importance: Used in Psychological Assessment
Measures of Central Tendency
Mode
- Definition: The most frequently occurring score in a distribution
- Use: For ordinal data, and for nominal scales and discrete variables
- Characteristics: Can be used in analyses of qualitative or verbal nature, and gives an indication of the shape of the distribution
Measures of Spread or Variability
- Definition: Statistics that describe the amount of variation in a distribution
- Use: Gives an idea of how well the measure of central tendency represents the data
- Characteristics: Large spread of values means large differences between individual scores
Measures of Spread or Variability
Range
- Definition: The difference between the highest and lowest score
- Use: Provides a quick but gross description of the spread of scores
- Characteristics: Can be affected by extreme scores in the distribution
Variance
- Definition: The square root of the average squared deviations about the mean
- Use: Equal to the square root of the variance, and measures the distance from the mean
- Characteristics: Equal to the arithmetic mean of the squares of the differences between the scores in a distribution and their mean
Measures of Location
Percentile or Percentile Rank
- Definition: Expressed in terms of the percentage of persons in the standardization sample who fall below a given score
- Use: Indicates the individual's relative position in the standardization sample
- Characteristics: Not linearly transformable, converged at the middle, and the outer ends show large intervals
Quartile
- Definition: Dividing points between the four quarters in the distribution
- Use: Specific point in the distribution
Correlation
Spearman Rho
- Definition: Used for ordinal + ordinal data
- Use: Measures the correlation between two variables
Biserial
- Definition: Used for true dichotomous + interval/ratio data
- Use: Measures the correlation between two variables
Point Biserial
- Definition: Used for nominal (true dichotomous) + nominal (true/artificial dichotomous) data
- Use: Measures the correlation between two variables
Phi Coefficient
- Definition: Used for artificial dichotomous + artificial dichotomous data
- Use: Measures the correlation between two variables
Tetrachoric
- Definition: Used for 3 or more ordinal/rank data
- Use: Measures the correlation between two variables
Kendall's Rank Biserial Differences
- Definition: Used for two separate groups, random assignment
- Use: Measures the correlation between two variables
T-Test
- Definition: Used for comparing means between two groups
- Use: One group, two scores (e.g., blood pressure before and after the lecture)
T-Test Dependent (Paired T-test)
- Definition: Used for comparing means between two groups
- Use: One group, measured at least three times
One-Way ANOVA
- Definition: Used for comparing means between three or more groups
- Use: One group, measured at least three times
One-Way Repeated Measures
- Definition: Used for comparing means between three or more groups
- Use: Three or more groups, tested for two variables
Two-Way ANOVA
- Definition: Used for comparing means between two or more groups, controlling for an additional variable
- Use: Used when you need to control for an additional variable that may be influencing the relationship between your independent and dependent variable
Utility Gain
- Definition: Estimate of the benefit of using a particular test
- Use: In Psychological Assessment
Productivity Gains
- Definition: An estimated increase in work output
- Use: In Psychological Assessment
Cut Score
- Definition: Reference point derived as a result of a judgment and used to divide a set of data into two or more classifications
- Use: In Psychological Assessment
Relative Cut Score
- Definition: Reference point based on norm-referenced considerations, not fixed per se
- Use: In Psychological Assessment
Fixed Cut Scores
- Definition: Set with reference to a judgment concerning minimum level of proficiency required
- Use: In Psychological Assessment
Multiple Cut Scores
- Definition: Refers to the use of two or more cut scores with reference to one predictor for the purpose of categorization
- Use: In Psychological Assessment
Multiple Hurdle
- Definition: Multi-stage selection process, a cut score is in place for each predictor
- Use: In Psychological Assessment
Compensatory Model of Selection
- Definition: Assumption that high scores on one attribute can compensate for lower scores
- Use: In Psychological Assessment
Angoff Method
- Definition: Setting fixed cut scores
- Use: In Psychological Assessment
Known Groups Method
- Definition: Collection of data on the predictor of interest from groups known to possess and not possess a trait of interest
- Use: In Psychological Assessment
IRT-Based Methods
- Definition: Cut scores are typically set based on test-taker's performance across all the items on the test
- Use: In Psychological Assessment
Item-Mapping Method
- Definition: Arrangement of items in a histogram, with each column containing items deemed to be equivalent in value
- Use: In Psychological Assessment
Bookmark Method
- Definition: Expert places a "bookmark" between the two pages that are deemed to separate test-takers who have acquired the minimal knowledge, skills, and/or abilities from those who are not
- Use: In Psychological Assessment
Method of Predictive Yield
- Definition: Took into account the number of positions to be filled, projections regarding the likelihood of offer acceptance, and the distribution of applicant scores
- Use: In Psychological Assessment
Discriminant Analysis
- Definition: Used to shed light on the relationship between identified variables and two naturally occurring groups
- Use: In Psychological Assessment, used to analyze data when the criterion or dependent variable is categorical and the predictor or independent variable is interval in nature
Reliability and Validity
Reliability
- Definition: The consistency of test scores
- Interpretation:
- Excellent: 0.90 and up
- Good: 0.80-0.89
- Adequate: 0.70-0.79
- Limited applicability: below 0.70
Validity
- Definition: The degree to which a test measures what it claims to measure
- Interpretation:
- Very beneficial: above 0.35
- Likely to be useful: 0.21-0.35
- Depends on circumstances: 0.11-0.20
- Unlikely to be useful: below 0.11
Item Analysis
Item-Validity Index
- Definition: Designed to provide an indication of the degree to which a test is measuring what it purports to measure
- Use: In Psychological Assessment
Item-Discrimination Index
- Definition: Measures the difference between the proportion of high scorers answering a question correctly and the proportion of low scorers answering it correctly
- Use: In Psychological Assessment
Extreme Group Method
- Definition: Compares people who have done well with those who have done poorly
- Use: In Psychological Assessment
Discrimination Index
- Definition: The difference between those proportions
- Use: In Psychological Assessment
Point-Biserial Method
- Definition: Correlation between a dichotomous variable and continuous variable
- Use: In Psychological Assessment
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Description
This quiz assesses your knowledge of scorer differences in psychological assessment, including measures of agreement and variability. Topics include evaluating scorer differences, types of measures, and the Fleiss Kappa coefficient.