Podcast
Questions and Answers
Which type of validity involves a judgment about the adequacy of the inferences drawn from test scores about individual's standing on a variable called construct?
Which type of validity involves a judgment about the adequacy of the inferences drawn from test scores about individual's standing on a variable called construct?
Which term describes the error where a rater's scores tend to cluster in the middle of the rating scale?
Which term describes the error where a rater's scores tend to cluster in the middle of the rating scale?
Which validity examines the relationship between test scores and a criterion measure obtained at the same time?
Which validity examines the relationship between test scores and a criterion measure obtained at the same time?
Which of the following methods can be used to enhance the homogeneity of a test containing dichotomous items?
Which of the following methods can be used to enhance the homogeneity of a test containing dichotomous items?
Signup and view all the answers
Which concept refers to the validation of a test based on a different group from the original group?
Which concept refers to the validation of a test based on a different group from the original group?
Signup and view all the answers
What is the focus of factor analysis?
What is the focus of factor analysis?
Signup and view all the answers
What describes a situation where the criterion measure includes irrelevant aspects of performance?
What describes a situation where the criterion measure includes irrelevant aspects of performance?
Signup and view all the answers
In factor analysis, which method is used to test the degree to which a hypothetical model fits the actual data?
In factor analysis, which method is used to test the degree to which a hypothetical model fits the actual data?
Signup and view all the answers
Which term refers to the extent to which a test is used in an impartial, just, and equitable way?
Which term refers to the extent to which a test is used in an impartial, just, and equitable way?
Signup and view all the answers
What term describes the error where a rater inaccurately gives higher scores due to the inability to differentiate between distinct aspects of behavior?
What term describes the error where a rater inaccurately gives higher scores due to the inability to differentiate between distinct aspects of behavior?
Signup and view all the answers
What measure is used to determine the level of agreement between two or more raters when the method of assessment is categorical?
What measure is used to determine the level of agreement between two or more raters when the method of assessment is categorical?
Signup and view all the answers
Which theory posits that a person's test scores vary due to the variables in the testing situations?
Which theory posits that a person's test scores vary due to the variables in the testing situations?
Signup and view all the answers
What is the main focus of Item Response Theory?
What is the main focus of Item Response Theory?
Signup and view all the answers
Which measure should be assessed by two independent testing periods when dealing with Speed Tests?
Which measure should be assessed by two independent testing periods when dealing with Speed Tests?
Signup and view all the answers
What does a true score genuinely reflect in Classical Test Theory?
What does a true score genuinely reflect in Classical Test Theory?
Signup and view all the answers
What do tests designed to measure one factor usually exhibit?
What do tests designed to measure one factor usually exhibit?
Signup and view all the answers
Which concept is used for estimating how specific sources of variation contribute to the test scores?
Which concept is used for estimating how specific sources of variation contribute to the test scores?
Signup and view all the answers
What information do Criterion-Referenced Tests provide?
What information do Criterion-Referenced Tests provide?
Signup and view all the answers
According to Generalizability Theory, under what condition should the exact same test score be obtained?
According to Generalizability Theory, under what condition should the exact same test score be obtained?
Signup and view all the answers
Which theory focuses on the extent to which an item measures a specific trait?
Which theory focuses on the extent to which an item measures a specific trait?
Signup and view all the answers
What type of error is caused by unpredictable fluctuations in measurement conditions?
What type of error is caused by unpredictable fluctuations in measurement conditions?
Signup and view all the answers
Which type of reliability is obtained by correlating pairs of scores from the same individuals on two different administrations of the test?
Which type of reliability is obtained by correlating pairs of scores from the same individuals on two different administrations of the test?
Signup and view all the answers
Which type of error variance is associated with a test-taker's motivation or attention during test administration?
Which type of error variance is associated with a test-taker's motivation or attention during test administration?
Signup and view all the answers
What effect occurs when the interval between test administrations is short, leading to inflated correlation?
What effect occurs when the interval between test administrations is short, leading to inflated correlation?
Signup and view all the answers
When is it appropriate to use test-retest reliability?
When is it appropriate to use test-retest reliability?
Signup and view all the answers
What is the main error type associated with parallel forms reliability?
What is the main error type associated with parallel forms reliability?
Signup and view all the answers
What does a lower correlation in a test-retest scenario with a longer interval indicate?
What does a lower correlation in a test-retest scenario with a longer interval indicate?
Signup and view all the answers
Which method helps avoid carryover effects between parallel forms?
Which method helps avoid carryover effects between parallel forms?
Signup and view all the answers
Which type of score consistency would result from computer scoring of objective-type items?
Which type of score consistency would result from computer scoring of objective-type items?
Signup and view all the answers
What does a test blueprint primarily ensure in psychological assessment?
What does a test blueprint primarily ensure in psychological assessment?
Signup and view all the answers
What is the true score formula used to calculate?
What is the true score formula used to calculate?
Signup and view all the answers
What is the Content Validity Ratio (CVR) formula developed by Lawshe?
What is the Content Validity Ratio (CVR) formula developed by Lawshe?
Signup and view all the answers
What should be done if the Content Validity Index (CVI) is low?
What should be done if the Content Validity Index (CVI) is low?
Signup and view all the answers
What is an indication of Zero Content Validity Ratio (CVR) in a test item?
What is an indication of Zero Content Validity Ratio (CVR) in a test item?
Signup and view all the answers
Which type of validity is more logical than statistical?
Which type of validity is more logical than statistical?
Signup and view all the answers
What type of evidence involves a validity coefficient showing a high correlation between test scores and an established test?
What type of evidence involves a validity coefficient showing a high correlation between test scores and an established test?
Signup and view all the answers
Which validity type involves comparing a test score at one time with a criterion measure obtained at the same time?
Which validity type involves comparing a test score at one time with a criterion measure obtained at the same time?
Signup and view all the answers
What does high Construct Validity indicate about a test?
What does high Construct Validity indicate about a test?
Signup and view all the answers
Which method demonstrates construct validity by showing predictable score differences across groups?
Which method demonstrates construct validity by showing predictable score differences across groups?
Signup and view all the answers
Why might a test item be considered poor if high scorers on an academic test tend to get it wrong and low scorers get it right?
Why might a test item be considered poor if high scorers on an academic test tend to get it wrong and low scorers get it right?
Signup and view all the answers
What is a fixed cut score?
What is a fixed cut score?
Signup and view all the answers
Which method is used to set cut scores based on the composition of contrasting groups?
Which method is used to set cut scores based on the composition of contrasting groups?
Signup and view all the answers
In a compensatory model of selection, what is assumed about high scores on one attribute?
In a compensatory model of selection, what is assumed about high scores on one attribute?
Signup and view all the answers
Which method requires expert judges to use a well-defined and rational procedure to determine a pass mark?
Which method requires expert judges to use a well-defined and rational procedure to determine a pass mark?
Signup and view all the answers
If a reliability coefficient is below 0.70, how is it interpreted?
If a reliability coefficient is below 0.70, how is it interpreted?
Signup and view all the answers
Which reliability coefficient range is considered 'excellent'?
Which reliability coefficient range is considered 'excellent'?
Signup and view all the answers
Which method arranges items in a histogram according to their equivalent value?
Which method arranges items in a histogram according to their equivalent value?
Signup and view all the answers
What does the utility gain of a particular test estimate?
What does the utility gain of a particular test estimate?
Signup and view all the answers
What does a multiple hurdle selection process involve?
What does a multiple hurdle selection process involve?
Signup and view all the answers
What validity coefficient range is considered 'very beneficial'?
What validity coefficient range is considered 'very beneficial'?
Signup and view all the answers
Which measure of central tendency is most commonly used for nominal data?
Which measure of central tendency is most commonly used for nominal data?
Signup and view all the answers
Which situation is a paired T-test used for?
Which situation is a paired T-test used for?
Signup and view all the answers
Which statistical test would you use to compare means from more than two groups taken at more than three different times?
Which statistical test would you use to compare means from more than two groups taken at more than three different times?
Signup and view all the answers
What does a large spread of values in a distribution indicate?
What does a large spread of values in a distribution indicate?
Signup and view all the answers
What measure divides the distribution into four equal parts?
What measure divides the distribution into four equal parts?
Signup and view all the answers
What correlation coefficient is used for ordinal data?
What correlation coefficient is used for ordinal data?
Signup and view all the answers
Variance is equal to which of the following?
Variance is equal to which of the following?
Signup and view all the answers
Which test would you use to compare the blood pressure of male and female graduate students?
Which test would you use to compare the blood pressure of male and female graduate students?
Signup and view all the answers
What type of data is most appropriately analyzed with the median?
What type of data is most appropriately analyzed with the median?
Signup and view all the answers
What type of correlation is used for a true dichotomous variable and interval/ratio data?
What type of correlation is used for a true dichotomous variable and interval/ratio data?
Signup and view all the answers
Which of the following best describes Level 1 interpretation?
Which of the following best describes Level 1 interpretation?
Signup and view all the answers
What is the primary characteristic of actuarial assessment?
What is the primary characteristic of actuarial assessment?
Signup and view all the answers
During a psychological assessment, who prepares evaluative critiques based on technical and practical aspects of tests?
During a psychological assessment, who prepares evaluative critiques based on technical and practical aspects of tests?
Signup and view all the answers
Which term refers to an observable action or the product of an observable action?
Which term refers to an observable action or the product of an observable action?
Signup and view all the answers
Which party in psychological assessment is responsible for controlling the distribution of tests?
Which party in psychological assessment is responsible for controlling the distribution of tests?
Signup and view all the answers
What is a trait in psychological assessment?
What is a trait in psychological assessment?
Signup and view all the answers
Which level of interpretation involves descriptive generalizations and hypothetical constructs?
Which level of interpretation involves descriptive generalizations and hypothetical constructs?
Signup and view all the answers
Which statement about mechanical prediction is correct?
Which statement about mechanical prediction is correct?
Signup and view all the answers
What is the primary focus of Level 3 interpretation?
What is the primary focus of Level 3 interpretation?
Signup and view all the answers
What is extra-test history?
What is extra-test history?
Signup and view all the answers
What does the Item-Validity Index measure?
What does the Item-Validity Index measure?
Signup and view all the answers
What method assesses the correlation between a dichotomous variable and a continuous variable?
What method assesses the correlation between a dichotomous variable and a continuous variable?
Signup and view all the answers
In scoring models, what does the Cumulative Model indicate about high scorers?
In scoring models, what does the Cumulative Model indicate about high scorers?
Signup and view all the answers
What term describes the principle of revalidating a test on a sample other than the original test sample?
What term describes the principle of revalidating a test on a sample other than the original test sample?
Signup and view all the answers
Which phenomenon is described by a large percentage of respondents scoring near the lower limit on a test?
Which phenomenon is described by a large percentage of respondents scoring near the lower limit on a test?
Signup and view all the answers
What function does the Routing Test serve in Computerized Adaptive Testing?
What function does the Routing Test serve in Computerized Adaptive Testing?
Signup and view all the answers
Which aspect is crucial in psychological assessment compared to psychological testing?
Which aspect is crucial in psychological assessment compared to psychological testing?
Signup and view all the answers
What is the primary focus of an aptitude test?
What is the primary focus of an aptitude test?
Signup and view all the answers
What is the primary purpose of DIF Analysis?
What is the primary purpose of DIF Analysis?
Signup and view all the answers
What is meant by 'Scoring Drift'?
What is meant by 'Scoring Drift'?
Signup and view all the answers
Which type of psychological assessment involves evaluation without the subject being in physical proximity?
Which type of psychological assessment involves evaluation without the subject being in physical proximity?
Signup and view all the answers
Which model compares test-taker responses to different scales within the same test?
Which model compares test-taker responses to different scales within the same test?
Signup and view all the answers
What characterizes a psychological test's scoring process?
What characterizes a psychological test's scoring process?
Signup and view all the answers
What distinguishes an intelligence test from an achievement test?
What distinguishes an intelligence test from an achievement test?
Signup and view all the answers
What concept is described as the ability of a computer to tailor the test content based on prior responses?
What concept is described as the ability of a computer to tailor the test content based on prior responses?
Signup and view all the answers
Which type of assessment is described as encouraging therapeutic self-discovery?
Which type of assessment is described as encouraging therapeutic self-discovery?
Signup and view all the answers
What does the psychometrics field specifically focus on?
What does the psychometrics field specifically focus on?
Signup and view all the answers
What is typically measured by a typical performance test?
What is typically measured by a typical performance test?
Signup and view all the answers
Which of the following refers to assigning a summary statement of performance, usually numerical in nature?
Which of the following refers to assigning a summary statement of performance, usually numerical in nature?
Signup and view all the answers
What is an example of a dynamic assessment approach?
What is an example of a dynamic assessment approach?
Signup and view all the answers
When should the Shapiro-Wilk test be used?
When should the Shapiro-Wilk test be used?
Signup and view all the answers
What does a p-value of 0.03 in Levene's Test signify?
What does a p-value of 0.03 in Levene's Test signify?
Signup and view all the answers
Which test is more sensitive to departures from normality?
Which test is more sensitive to departures from normality?
Signup and view all the answers
What is one of the outcomes of the pilot work in test development?
What is one of the outcomes of the pilot work in test development?
Signup and view all the answers
Which type of item format offers more than two alternatives?
Which type of item format offers more than two alternatives?
Signup and view all the answers
What is the primary purpose of Computerized Adaptive Testing?
What is the primary purpose of Computerized Adaptive Testing?
Signup and view all the answers
Which scale arranges items from weaker to stronger expressions of attitude?
Which scale arranges items from weaker to stronger expressions of attitude?
Signup and view all the answers
How does Levene's Test determine if variances are equal?
How does Levene's Test determine if variances are equal?
Signup and view all the answers
Which process involves brainstorming ideas about the kind of test a developer wants to publish?
Which process involves brainstorming ideas about the kind of test a developer wants to publish?
Signup and view all the answers
What does an item pool refer to in test construction?
What does an item pool refer to in test construction?
Signup and view all the answers
Study Notes
Psychological Assessment
Error: Scorer Differences
- Evaluates the degree of agreement between two or more scorers on a particular measure
- Calculated by determining the percentage of times two individuals assign the same scores to the performance of examinees
- Variations: having two examiners test the same client using the same test and determining the closeness of their scores or ratings
- Measures of scorer differences:
- Fleiss Kappa: determines the level of agreement between two or more raters on a categorical scale
- Cohen's Kappa: used for two raters only
- Krippendorff's Alpha: used for two or more raters, correcting for chance agreement
Tests Designed
- Homogenous tests: designed to measure one factor, expected to have a high degree of internal consistency
- Dynamic tests: measure traits, states, or abilities that are fast-changing as a function of situational and cognitive experience
- Static tests: measure traits, states, or abilities that are relatively unchanging
- Restriction of range or variance: when the variance of either variable in a correlational analysis is restricted, resulting in a lower correlation coefficient
Power Tests
- Designed to allow test-takers to attempt all items within a time limit
- Measures a test-taker's ability to complete a task accurately and efficiently
Speed Tests
- Contain items of uniform difficulty with a time limit
- Reliability should be based on performance from two independent testing periods using test-retest, alternate-forms, or split-half reliability
Criterion-Referenced Tests
- Designed to provide an indication of where a test-taker stands with respect to a criterion
- As individual differences decrease, traditional measures of reliability also decrease, regardless of individual performance stability
Classical Test Theory
- Assumes that everyone has a "true score" on a test
- True score reflects an individual's ability level as measured by a particular test
- Random error affects the observed score
Domain Sampling Theory
- Estimates the extent to which specific sources of variation contribute to test scores
- Considers problems created by using a limited number of items to represent a large construct
Test Reliability
- Conceived as an objective measure of how precisely a test score assesses a domain
- Reliability is a function of the proportion of total variance attributed to true variance
Generalizability Theory
- Based on the idea that test scores vary due to variables in the testing situation
- Universe: the test situation
- Facets: number of items, amount of review, and purpose of test administration
- Given the same conditions, the same test score should be obtained (universe score)
Decision Study
- Examines the usefulness of test scores in helping test users make decisions
Systematic Error
- Factors inherent in a test that prevent accurate, impartial measurement
Item Response Theory
- The probability of a person with a certain ability level performing at a certain level on a test
- Focuses on item difficulty
Latent-Trait Theory
- A system of assumptions about measurement and the extent to which items measure a trait
- Computers are used to focus on the range of item difficulty that helps assess an individual's ability level
- If a person answers easy items correctly, the computer will move to more difficult items
- Item attributes: difficulty, discrimination, and dichotomousness
Construct Validity (Umbrella Validity)
- Covers all types of validity
- Logical and statistical
- Judgment about the appropriateness of inferences drawn from test scores regarding individual standing on a variable called a construct
Criterion Validity
- More statistical than logical
- Judgment about the adequacy of test scores in inferring an individual's standing on a criterion measure
- Criterion: a standard on which a judgment or decision may be made
- Characteristics: relevant, valid, uncontaminated
- Types of criterion validity: concurrent, predictive, and incremental validity
Factor Analysis
- Designed to identify factors or variables that are typically attributes, characteristics, or dimensions on which people may differ
- Developed by Charles Spearman
- Employed as a data reduction method
- Used to study the interrelationships among a set of variables
- Types of factor analysis: explanatory, confirmatory, and factor loading
Cross-Validation
- Validation of a test to a criterion based on a different group from the original group
- Validity shrinkage: a decrease in validity after cross-validation
- Co-validation: validation of more than one test from the same group
- Co-norming: norming more than one test from the same group
Bias
- Factors inherent in a test that systematically prevent accurate, impartial measurement
- Prevention: during test development through procedures such as estimated true score transformation
Rating
- Numerical or verbal judgment that places a person or attribute along a continuum identified by a scale
- Rating error: intentional or unintentional misuse of the scale
- Types of rating error: leniency, severity, central tendency, and halo effect
- One way to overcome rating errors is to use rankings### Discriminant Evidence
- Definition: A validity coefficient showing little relationship between test scores and/or other variables
- Importance: Used in Psychological Assessment
Measures of Central Tendency
Mode
- Definition: The most frequently occurring score in a distribution
- Use: For ordinal data, and for nominal scales and discrete variables
- Characteristics: Can be used in analyses of qualitative or verbal nature, and gives an indication of the shape of the distribution
Measures of Spread or Variability
- Definition: Statistics that describe the amount of variation in a distribution
- Use: Gives an idea of how well the measure of central tendency represents the data
- Characteristics: Large spread of values means large differences between individual scores
Measures of Spread or Variability
Range
- Definition: The difference between the highest and lowest score
- Use: Provides a quick but gross description of the spread of scores
- Characteristics: Can be affected by extreme scores in the distribution
Variance
- Definition: The square root of the average squared deviations about the mean
- Use: Equal to the square root of the variance, and measures the distance from the mean
- Characteristics: Equal to the arithmetic mean of the squares of the differences between the scores in a distribution and their mean
Measures of Location
Percentile or Percentile Rank
- Definition: Expressed in terms of the percentage of persons in the standardization sample who fall below a given score
- Use: Indicates the individual's relative position in the standardization sample
- Characteristics: Not linearly transformable, converged at the middle, and the outer ends show large intervals
Quartile
- Definition: Dividing points between the four quarters in the distribution
- Use: Specific point in the distribution
Correlation
Spearman Rho
- Definition: Used for ordinal + ordinal data
- Use: Measures the correlation between two variables
Biserial
- Definition: Used for true dichotomous + interval/ratio data
- Use: Measures the correlation between two variables
Point Biserial
- Definition: Used for nominal (true dichotomous) + nominal (true/artificial dichotomous) data
- Use: Measures the correlation between two variables
Phi Coefficient
- Definition: Used for artificial dichotomous + artificial dichotomous data
- Use: Measures the correlation between two variables
Tetrachoric
- Definition: Used for 3 or more ordinal/rank data
- Use: Measures the correlation between two variables
Kendall's Rank Biserial Differences
- Definition: Used for two separate groups, random assignment
- Use: Measures the correlation between two variables
T-Test
- Definition: Used for comparing means between two groups
- Use: One group, two scores (e.g., blood pressure before and after the lecture)
T-Test Dependent (Paired T-test)
- Definition: Used for comparing means between two groups
- Use: One group, measured at least three times
One-Way ANOVA
- Definition: Used for comparing means between three or more groups
- Use: One group, measured at least three times
One-Way Repeated Measures
- Definition: Used for comparing means between three or more groups
- Use: Three or more groups, tested for two variables
Two-Way ANOVA
- Definition: Used for comparing means between two or more groups, controlling for an additional variable
- Use: Used when you need to control for an additional variable that may be influencing the relationship between your independent and dependent variable
Utility Gain
- Definition: Estimate of the benefit of using a particular test
- Use: In Psychological Assessment
Productivity Gains
- Definition: An estimated increase in work output
- Use: In Psychological Assessment
Cut Score
- Definition: Reference point derived as a result of a judgment and used to divide a set of data into two or more classifications
- Use: In Psychological Assessment
Relative Cut Score
- Definition: Reference point based on norm-referenced considerations, not fixed per se
- Use: In Psychological Assessment
Fixed Cut Scores
- Definition: Set with reference to a judgment concerning minimum level of proficiency required
- Use: In Psychological Assessment
Multiple Cut Scores
- Definition: Refers to the use of two or more cut scores with reference to one predictor for the purpose of categorization
- Use: In Psychological Assessment
Multiple Hurdle
- Definition: Multi-stage selection process, a cut score is in place for each predictor
- Use: In Psychological Assessment
Compensatory Model of Selection
- Definition: Assumption that high scores on one attribute can compensate for lower scores
- Use: In Psychological Assessment
Angoff Method
- Definition: Setting fixed cut scores
- Use: In Psychological Assessment
Known Groups Method
- Definition: Collection of data on the predictor of interest from groups known to possess and not possess a trait of interest
- Use: In Psychological Assessment
IRT-Based Methods
- Definition: Cut scores are typically set based on test-taker's performance across all the items on the test
- Use: In Psychological Assessment
Item-Mapping Method
- Definition: Arrangement of items in a histogram, with each column containing items deemed to be equivalent in value
- Use: In Psychological Assessment
Bookmark Method
- Definition: Expert places a "bookmark" between the two pages that are deemed to separate test-takers who have acquired the minimal knowledge, skills, and/or abilities from those who are not
- Use: In Psychological Assessment
Method of Predictive Yield
- Definition: Took into account the number of positions to be filled, projections regarding the likelihood of offer acceptance, and the distribution of applicant scores
- Use: In Psychological Assessment
Discriminant Analysis
- Definition: Used to shed light on the relationship between identified variables and two naturally occurring groups
- Use: In Psychological Assessment, used to analyze data when the criterion or dependent variable is categorical and the predictor or independent variable is interval in nature
Reliability and Validity
Reliability
- Definition: The consistency of test scores
- Interpretation:
- Excellent: 0.90 and up
- Good: 0.80-0.89
- Adequate: 0.70-0.79
- Limited applicability: below 0.70
Validity
- Definition: The degree to which a test measures what it claims to measure
- Interpretation:
- Very beneficial: above 0.35
- Likely to be useful: 0.21-0.35
- Depends on circumstances: 0.11-0.20
- Unlikely to be useful: below 0.11
Item Analysis
Item-Validity Index
- Definition: Designed to provide an indication of the degree to which a test is measuring what it purports to measure
- Use: In Psychological Assessment
Item-Discrimination Index
- Definition: Measures the difference between the proportion of high scorers answering a question correctly and the proportion of low scorers answering it correctly
- Use: In Psychological Assessment
Extreme Group Method
- Definition: Compares people who have done well with those who have done poorly
- Use: In Psychological Assessment
Discrimination Index
- Definition: The difference between those proportions
- Use: In Psychological Assessment
Point-Biserial Method
- Definition: Correlation between a dichotomous variable and continuous variable
- Use: In Psychological Assessment
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Description
This quiz assesses your knowledge of scorer differences in psychological assessment, including measures of agreement and variability. Topics include evaluating scorer differences, types of measures, and the Fleiss Kappa coefficient.