Podcast
Questions and Answers
Which method is used to determine the level of agreement between two or more raters when the assessment is on a categorical scale?
Which method is used to determine the level of agreement between two or more raters when the assessment is on a categorical scale?
What is a Dynamic trait in the context of psychological assessment?
What is a Dynamic trait in the context of psychological assessment?
What does Generalizability Theory suggest about a person's test score?
What does Generalizability Theory suggest about a person's test score?
According to Item Response Theory, what is the primary focus?
According to Item Response Theory, what is the primary focus?
Signup and view all the answers
What is indicated by a high degree of internal consistency in a test designed to measure one factor?
What is indicated by a high degree of internal consistency in a test designed to measure one factor?
Signup and view all the answers
What does Krippendorff's Alpha measure?
What does Krippendorff's Alpha measure?
Signup and view all the answers
In Classical Test Theory, what is the 'True Score'?
In Classical Test Theory, what is the 'True Score'?
Signup and view all the answers
What is the purpose of a Decision Study in psychological assessment?
What is the purpose of a Decision Study in psychological assessment?
Signup and view all the answers
How is the reliability of Speed Tests typically evaluated?
How is the reliability of Speed Tests typically evaluated?
Signup and view all the answers
What does Domain Sampling Theory estimate in psychological assessments?
What does Domain Sampling Theory estimate in psychological assessments?
Signup and view all the answers
Which type of validity is judged by how well a test score can be used to infer an individual's probable standing on a measure of interest?
Which type of validity is judged by how well a test score can be used to infer an individual's probable standing on a measure of interest?
Signup and view all the answers
What does the Halo Effect in rating errors refer to?
What does the Halo Effect in rating errors refer to?
Signup and view all the answers
Which term describes the use of additional predictors to explain the criterion measure beyond what is explained by existing predictors?
Which term describes the use of additional predictors to explain the criterion measure beyond what is explained by existing predictors?
Signup and view all the answers
Which factor analysis type is used for estimating factors and deciding how many to retain?
Which factor analysis type is used for estimating factors and deciding how many to retain?
Signup and view all the answers
Which statistical procedure used in test development helps prevent bias and ensure accurate, impartial measurement?
Which statistical procedure used in test development helps prevent bias and ensure accurate, impartial measurement?
Signup and view all the answers
What is criterion contamination?
What is criterion contamination?
Signup and view all the answers
In the context of construct validity, what is a construct?
In the context of construct validity, what is a construct?
Signup and view all the answers
What does a high Factor Loading signify?
What does a high Factor Loading signify?
Signup and view all the answers
What is the purpose of co-validation?
What is the purpose of co-validation?
Signup and view all the answers
The concept of fairness in psychological assessment refers to:
The concept of fairness in psychological assessment refers to:
Signup and view all the answers
Which of the following best describes item sampling or content sampling in psychological assessment?
Which of the following best describes item sampling or content sampling in psychological assessment?
Signup and view all the answers
Which type of error is caused by influences such as noise or weather conditions during testing?
Which type of error is caused by influences such as noise or weather conditions during testing?
Signup and view all the answers
In the context of psychological assessment, what does the True Score Formula aim to represent?
In the context of psychological assessment, what does the True Score Formula aim to represent?
Signup and view all the answers
Which of the following is an appropriate use of test-retest reliability?
Which of the following is an appropriate use of test-retest reliability?
Signup and view all the answers
What effect might occur if the interval between test-retest administrations is short?
What effect might occur if the interval between test-retest administrations is short?
Signup and view all the answers
What is the main difference between parallel forms and alternate forms of a test?
What is the main difference between parallel forms and alternate forms of a test?
Signup and view all the answers
Which technique is used to avoid carryover effects in parallel forms of a test?
Which technique is used to avoid carryover effects in parallel forms of a test?
Signup and view all the answers
The presence of which factor would NOT likely influence the validity coefficient of a test?
The presence of which factor would NOT likely influence the validity coefficient of a test?
Signup and view all the answers
Which of the following statements about systematic error is TRUE?
Which of the following statements about systematic error is TRUE?
Signup and view all the answers
Which example would most likely demonstrate a low correlation in a test-retest reliability measure?
Which example would most likely demonstrate a low correlation in a test-retest reliability measure?
Signup and view all the answers
What is the primary purpose of a test blueprint in psychological assessment?
What is the primary purpose of a test blueprint in psychological assessment?
Signup and view all the answers
Which term refers to the failure to capture important elements of a construct within a test?
Which term refers to the failure to capture important elements of a construct within a test?
Signup and view all the answers
The formula $CVR = \frac{N_e - N/2}{N/2}$ is associated with which concept?
The formula $CVR = \frac{N_e - N/2}{N/2}$ is associated with which concept?
Signup and view all the answers
If exactly half of the experts rate a test item as essential, what is the CVR value?
If exactly half of the experts rate a test item as essential, what is the CVR value?
Signup and view all the answers
What type of validity involves measuring the relationship between test scores and a criterion at a future time?
What type of validity involves measuring the relationship between test scores and a criterion at a future time?
Signup and view all the answers
Which validity type is considered 'umbrella validity' and covers all other types of validities?
Which validity type is considered 'umbrella validity' and covers all other types of validities?
Signup and view all the answers
What type of evidence demonstrates that test scores vary predictably based on group membership?
What type of evidence demonstrates that test scores vary predictably based on group membership?
Signup and view all the answers
Which term describes the degree to which an additional predictor provides unique information about a criterion?
Which term describes the degree to which an additional predictor provides unique information about a criterion?
Signup and view all the answers
When a test developer eliminates items that do not show significant correlation with the total test score, they aim to improve which characteristic?
When a test developer eliminates items that do not show significant correlation with the total test score, they aim to improve which characteristic?
Signup and view all the answers
If two tests measure the same construct and their scores highly correlate, what type of evidence is this?
If two tests measure the same construct and their scores highly correlate, what type of evidence is this?
Signup and view all the answers
What is the primary focus of a mechanical prediction in psychological assessment?
What is the primary focus of a mechanical prediction in psychological assessment?
Signup and view all the answers
According to Level 1 interpretation in psychological assessments, which of the following is NOT a characteristic?
According to Level 1 interpretation in psychological assessments, which of the following is NOT a characteristic?
Signup and view all the answers
How are psychological traits expected to behave across time according to the assumptions about psychological testing and assessment?
How are psychological traits expected to behave across time according to the assumptions about psychological testing and assessment?
Signup and view all the answers
Which of the following best describes 'states' in psychological assessment?
Which of the following best describes 'states' in psychological assessment?
Signup and view all the answers
What role do test reviewers play in psychological assessment?
What role do test reviewers play in psychological assessment?
Signup and view all the answers
What is a profile in the context of psychological assessment?
What is a profile in the context of psychological assessment?
Signup and view all the answers
Which of the following is NOT considered part of the 'parties in psychological assessment'?
Which of the following is NOT considered part of the 'parties in psychological assessment'?
Signup and view all the answers
What assumption about psychological traits and states is made in psychological testing?
What assumption about psychological traits and states is made in psychological testing?
Signup and view all the answers
Which of the following best describes actuarial assessment?
Which of the following best describes actuarial assessment?
Signup and view all the answers
What is 'extra-test history' in psychological assessment?
What is 'extra-test history' in psychological assessment?
Signup and view all the answers
When should the median be used instead of the mode for describing central tendency?
When should the median be used instead of the mode for describing central tendency?
Signup and view all the answers
Which statistical measure describes the amount of variation in a distribution?
Which statistical measure describes the amount of variation in a distribution?
Signup and view all the answers
If a dataset has widely varied scores, how might this affect the central tendency?
If a dataset has widely varied scores, how might this affect the central tendency?
Signup and view all the answers
Which of the following correlations is used with one ordinal and one interval/ratio variable?
Which of the following correlations is used with one ordinal and one interval/ratio variable?
Signup and view all the answers
Which measure of spread provides a quick but gross summary of the scores?
Which measure of spread provides a quick but gross summary of the scores?
Signup and view all the answers
What does the value of the mode indicate in a distribution?
What does the value of the mode indicate in a distribution?
Signup and view all the answers
Which test should be used when comparing blood pressure of males and females?
Which test should be used when comparing blood pressure of males and females?
Signup and view all the answers
Which concept allows dividing a distribution into four equal parts?
Which concept allows dividing a distribution into four equal parts?
Signup and view all the answers
Which measure of location is not linearly transformable and is vital for normalized standardized scores?
Which measure of location is not linearly transformable and is vital for normalized standardized scores?
Signup and view all the answers
Which of the following tests is appropriate when comparing board reviewers' focus levels during different times of the day?
Which of the following tests is appropriate when comparing board reviewers' focus levels during different times of the day?
Signup and view all the answers
What is the primary feature of the Angoff Method?
What is the primary feature of the Angoff Method?
Signup and view all the answers
What reliability coefficient value range is considered 'Good'?
What reliability coefficient value range is considered 'Good'?
Signup and view all the answers
Which method involves placing a 'bookmark' to differentiate between test-takers?
Which method involves placing a 'bookmark' to differentiate between test-takers?
Signup and view all the answers
In which scenario are you likely to use a relative cut score?
In which scenario are you likely to use a relative cut score?
Signup and view all the answers
Which validity coefficient value range is likely to be useful?
Which validity coefficient value range is likely to be useful?
Signup and view all the answers
Which method of setting cut scores is based on test-taker performance across all items on the test?
Which method of setting cut scores is based on test-taker performance across all items on the test?
Signup and view all the answers
What kind of method is described as a multi-stage selection process?
What kind of method is described as a multi-stage selection process?
Signup and view all the answers
Which of the following best describes the method of predictive yield?
Which of the following best describes the method of predictive yield?
Signup and view all the answers
Which cut score setting method involves the use of expert judgments to evaluate examination pass marks?
Which cut score setting method involves the use of expert judgments to evaluate examination pass marks?
Signup and view all the answers
What is the utility gain in psychological assessment?
What is the utility gain in psychological assessment?
Signup and view all the answers
What does the Item-Discrimination Index measure?
What does the Item-Discrimination Index measure?
Signup and view all the answers
What is the purpose of Cross-Validation?
What is the purpose of Cross-Validation?
Signup and view all the answers
In Computerized Adaptive Testing, what is 'item branching'?
In Computerized Adaptive Testing, what is 'item branching'?
Signup and view all the answers
What is the main characteristic of the Cumulative Scoring Model?
What is the main characteristic of the Cumulative Scoring Model?
Signup and view all the answers
What is 'validity shrinkage'?
What is 'validity shrinkage'?
Signup and view all the answers
What does the term 'Differential Item Functioning' (DIF) refer to?
What does the term 'Differential Item Functioning' (DIF) refer to?
Signup and view all the answers
What is an 'Anchor Protocol' used for?
What is an 'Anchor Protocol' used for?
Signup and view all the answers
Which of the following accurately describes the Point-Biserial Method?
Which of the following accurately describes the Point-Biserial Method?
Signup and view all the answers
What does 'co-validation' entail in psychological assessment?
What does 'co-validation' entail in psychological assessment?
Signup and view all the answers
What phenomenon does 'floor effects' describe in performance measurement?
What phenomenon does 'floor effects' describe in performance measurement?
Signup and view all the answers
Study Notes
Psychological Assessment
Error: Scorer Differences
- Evaluates the degree of agreement between two or more scorers on a particular measure
- Calculated by determining the percentage of times two individuals assign the same scores to the performance of examinees
- Variations: having two examiners test the same client using the same test and determining the closeness of their scores or ratings
- Measures of scorer differences:
- Fleiss Kappa: determines the level of agreement between two or more raters on a categorical scale
- Cohen's Kappa: used for two raters only
- Krippendorff's Alpha: used for two or more raters, correcting for chance agreement
Tests Designed
- Homogenous tests: designed to measure one factor, expected to have a high degree of internal consistency
- Dynamic tests: measure traits, states, or abilities that are fast-changing as a function of situational and cognitive experience
- Static tests: measure traits, states, or abilities that are relatively unchanging
- Restriction of range or variance: when the variance of either variable in a correlational analysis is restricted, resulting in a lower correlation coefficient
Power Tests
- Designed to allow test-takers to attempt all items within a time limit
- Measures a test-taker's ability to complete a task accurately and efficiently
Speed Tests
- Contain items of uniform difficulty with a time limit
- Reliability should be based on performance from two independent testing periods using test-retest, alternate-forms, or split-half reliability
Criterion-Referenced Tests
- Designed to provide an indication of where a test-taker stands with respect to a criterion
- As individual differences decrease, traditional measures of reliability also decrease, regardless of individual performance stability
Classical Test Theory
- Assumes that everyone has a "true score" on a test
- True score reflects an individual's ability level as measured by a particular test
- Random error affects the observed score
Domain Sampling Theory
- Estimates the extent to which specific sources of variation contribute to test scores
- Considers problems created by using a limited number of items to represent a large construct
Test Reliability
- Conceived as an objective measure of how precisely a test score assesses a domain
- Reliability is a function of the proportion of total variance attributed to true variance
Generalizability Theory
- Based on the idea that test scores vary due to variables in the testing situation
- Universe: the test situation
- Facets: number of items, amount of review, and purpose of test administration
- Given the same conditions, the same test score should be obtained (universe score)
Decision Study
- Examines the usefulness of test scores in helping test users make decisions
Systematic Error
- Factors inherent in a test that prevent accurate, impartial measurement
Item Response Theory
- The probability of a person with a certain ability level performing at a certain level on a test
- Focuses on item difficulty
Latent-Trait Theory
- A system of assumptions about measurement and the extent to which items measure a trait
- Computers are used to focus on the range of item difficulty that helps assess an individual's ability level
- If a person answers easy items correctly, the computer will move to more difficult items
- Item attributes: difficulty, discrimination, and dichotomousness
Construct Validity (Umbrella Validity)
- Covers all types of validity
- Logical and statistical
- Judgment about the appropriateness of inferences drawn from test scores regarding individual standing on a variable called a construct
Criterion Validity
- More statistical than logical
- Judgment about the adequacy of test scores in inferring an individual's standing on a criterion measure
- Criterion: a standard on which a judgment or decision may be made
- Characteristics: relevant, valid, uncontaminated
- Types of criterion validity: concurrent, predictive, and incremental validity
Factor Analysis
- Designed to identify factors or variables that are typically attributes, characteristics, or dimensions on which people may differ
- Developed by Charles Spearman
- Employed as a data reduction method
- Used to study the interrelationships among a set of variables
- Types of factor analysis: explanatory, confirmatory, and factor loading
Cross-Validation
- Validation of a test to a criterion based on a different group from the original group
- Validity shrinkage: a decrease in validity after cross-validation
- Co-validation: validation of more than one test from the same group
- Co-norming: norming more than one test from the same group
Bias
- Factors inherent in a test that systematically prevent accurate, impartial measurement
- Prevention: during test development through procedures such as estimated true score transformation
Rating
- Numerical or verbal judgment that places a person or attribute along a continuum identified by a scale
- Rating error: intentional or unintentional misuse of the scale
- Types of rating error: leniency, severity, central tendency, and halo effect
- One way to overcome rating errors is to use rankings### Measures of Central Tendency
- Mode: Most frequently occurring score in the distribution; useful for nominal scales and discrete variables; gives an indication of the shape of the distribution.
Measures of Spread or Variability
- Range: Equal to the difference between the highest and lowest scores; provides a quick but gross description of the spread of scores.
- Variance: Equal to the square root of the average squared deviations about the mean; measures the distance from the mean.
Measures of Location
- Percentile or Percentile Rank: Expressed in terms of the percentage of persons in the standardization sample who fall below a given score; essential in creating normalized standardized scores.
- Quartile: Dividing points between the four quarters in the distribution; a specific point that refers to an interval.
Correlation Coefficients
- Spearman Rho: Used for ordinal + ordinal data.
- Biserial: Used for true dichotomous + interval/ratio data.
- Point Biserial: Used for nominal (true dic) + nominal (true/artificial dic) data.
- Phi Coefficient: Used for artificial dichotomous + artificial dichotomous data.
- Tetrachoric: Used for 3 or more ordinal/rank data or nominal ordinal data.
- Kendall's Rank Biserial Differences: Used for two separate groups with random assignment.
Inferential Statistics
- T-test Independent (Unpaired T-test): Used for one group with two scores.
- T-Test Dependent (Paired T-test): Used for one group measured at least twice.
- One-Way ANOVA: Used for one group measured at least three times.
- One-Way Repeated Measures: Used for 3 or more groups tested for 2 variables.
- Two-Way ANOVA: Used when controlling for an additional variable that may influence the relationship between the independent and dependent variables.
- ANCOVA: Used when controlling for an additional variable that may influence the relationship between the independent and dependent variables.
Utility Gain and Productivity Gains
- Utility Gain: Estimate of the benefit of using a particular test.
- Productivity Gains: Estimated increase in work output.
Cut Scores
- Cut Score: Reference point derived as a result of a judgment and used to divide a set of data into two or more classifications.
- Relative Cut Score: Reference point based on norm-referenced considerations, not fixed per se.
- Fixed Cut Scores: Set with reference to a judgment concerning the minimum level of proficiency required.
- Multiple Cut Scores: Refers to the use of two or more cut scores with reference to one predictor for the purpose of categorization.
- Multiple Hurdle: Multi-stage selection process, with a cut score in place for each predictor.
- Compensatory Model of Selection: Assumption that high scores on one attribute can compensate for lower scores on another.
Setting Cut Scores
- Angoff Method: Expert judges discuss and evaluate the examination using a well-defined and rational procedure.
- Known Groups Method: Collection of data on the predictor of interest from groups known to possess and not possess a trait of interest.
- IRT-Based Methods: Cut scores are typically set based on test-taker's performance across all the items on the test.
- Item-Mapping Method: Arrangement of items in a histogram, with each column containing items deemed to be equivalent value.
- Bookmark Method: Expert places a "bookmark" between the two pages that are deemed to separate test-takers who have acquired the minimal knowledge, skills, and/or abilities from those who are not.
Discriminant Analysis
- Discriminant Analysis: Used to analyze the research data when the criterion or dependent variable is categorical, and the predictor or independent variable is interval in nature.
Reliability and Validity
- Reliability: Excellent (0.90 and up), Good (0.80-0.89), Adequate (0.70-0.79), and Limited Applicability (below 0.70).
- Validity: Very beneficial (above 0.35), Likely to be useful (0.21-0.35), Depends on circumstances (0.11-0.20), and Unlikely to be useful (below 0.11).
Item Analysis
- Item-Validity Index: Designed to provide an indication of the degree to which a test is measuring what it purports to measure.
- Item-Discrimination Index: Measures the difference between the proportion of high scorers answering a question correctly and the proportion of low scorers answering it correctly.
Scoring Models
- Cumulative Model: Test-taker obtains a measure of the level of the trait, thus high scorers may suggest a high level in the trait being measured.
- Class Scoring/Category Scoring: Test-taker response earns credit toward placement in a particular class or category with other test-takers whose pattern of responses is similar in some way.
Test Revision
- Ipsative Scoring: Compares test-taker's score on one scale within a test to another scale within that same test, where the two scales measure unrelated constructs.
- Cross-Validation: Revalidation of a test on a sample of test-takers other than those on whom test performance was originally found to be a valid predictor of some criterion.
Computerized Adaptive Testing
- Computerized Adaptive Testing: An interactive, computer-administered test-taking process wherein items presented to the test-taker are based in part on the test-taker's performance on previous items.
Computer Assisted Psychological Assessment
- Computer Assisted Psychological Assessment: Standardized test administration is assured for test-takers, and variation is kept to a minimum; test content and length are tailored according to the test-taker's ability.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Description
This quiz assesses your understanding of scorer differences in psychological assessment, including measures of agreement between scorers. Learn about Fleiss Kappa and other metrics.