Podcast
Questions and Answers
What does the 'Known Groups Method' involve in determining cutoff scores?
What does the 'Known Groups Method' involve in determining cutoff scores?
- Setting fixed cutoff scores and requiring expert judges to discuss
- Setting cutoff scores based on test-taker performance across all items
- Determination of whether the predictor of interest from group known to possess and not possess a trait of interest (correct)
- Using two or more cut scores with reference to one predictor for categorization
Which reliability coefficient value denotes 'excellent' reliability?
Which reliability coefficient value denotes 'excellent' reliability?
- 0.60 to 0.69
- 0.70 to 0.79
- 0.90 and up (correct)
- 0.80 to 0.89
In IRT-Based Methods, how are cut scores typically set?
In IRT-Based Methods, how are cut scores typically set?
- Based on expert judges discussing issues
- Using a multi-stage selection process
- Setting a reference point based on norm-related considerations
- Based on test-taker's performance across all items on the test (correct)
What is the purpose of Discriminant Analysis as mentioned in the document?
What is the purpose of Discriminant Analysis as mentioned in the document?
Which method requires expert judges to discuss the issues involved in determining a pass mark?
Which method requires expert judges to discuss the issues involved in determining a pass mark?
What level of difficulty corresponds to an item difficulty range of 0.0 to 0.19?
What level of difficulty corresponds to an item difficulty range of 0.0 to 0.19?
Which p-value range signifies strong evidence against the null hypothesis?
Which p-value range signifies strong evidence against the null hypothesis?
What Cronbach's alpha value corresponds to 'Good' internal consistency?
What Cronbach's alpha value corresponds to 'Good' internal consistency?
Which item discrimination range is considered 'Fair'?
Which item discrimination range is considered 'Fair'?
Which interrater reliability coefficient range is regarded as 'Substantial' according to Landis & Koch (1977)?
Which interrater reliability coefficient range is regarded as 'Substantial' according to Landis & Koch (1977)?
What measure of central tendency is used when there is an unknown or undetermined score?
What measure of central tendency is used when there is an unknown or undetermined score?
Which interrater reliability coefficient range is classified as 'Fair to Good' according to Fleiss (1981)?
Which interrater reliability coefficient range is classified as 'Fair to Good' according to Fleiss (1981)?
What type of evidence against the null hypothesis does a p-value greater than 0.10 provide?
What type of evidence against the null hypothesis does a p-value greater than 0.10 provide?
Which measure gives an indication of the shape of the distribution as well as a measure of central tendency?
Which measure gives an indication of the shape of the distribution as well as a measure of central tendency?
Which measure of spread is calculated as the difference between the highest and lowest scores?
Which measure of spread is calculated as the difference between the highest and lowest scores?
What does the variance measure in a distribution?
What does the variance measure in a distribution?
How is the semi-quartile range calculated?
How is the semi-quartile range calculated?
Which of the following is true about percentiles?
Which of the following is true about percentiles?
Which measure divides a distribution into four equal parts?
Which measure divides a distribution into four equal parts?
What is Pearson's R used to measure?
What is Pearson's R used to measure?
Which measure is NOT commonly used for nominal scales or discrete variables?
Which measure is NOT commonly used for nominal scales or discrete variables?
Which test should be used to assess normality for a sample size of 60?
Which test should be used to assess normality for a sample size of 60?
Which statistical test is appropriate for comparing more than two groups on an ordinal scale?
Which statistical test is appropriate for comparing more than two groups on an ordinal scale?
What is the major sensitivity difference between Bartlett's Test and Levene's Test?
What is the major sensitivity difference between Bartlett's Test and Levene's Test?
What does a P-value greater than 0.05 in Levene's Test indicate?
What does a P-value greater than 0.05 in Levene's Test indicate?
Which of the following is an example of a true dichotomy?
Which of the following is an example of a true dichotomy?
During which stage of test development is the construct determined?
During which stage of test development is the construct determined?
What does computerized adaptive testing primarily depend on?
What does computerized adaptive testing primarily depend on?
Which test measures dependent means on an ordinal scale?
Which test measures dependent means on an ordinal scale?
What characterizes the primary role of an item pool in test construction?
What characterizes the primary role of an item pool in test construction?
Which stage involves item revising, formatting, and setting scoring rules?
Which stage involves item revising, formatting, and setting scoring rules?
Which method is used to compare individuals who have performed well against those who have not on a test?
Which method is used to compare individuals who have performed well against those who have not on a test?
What is defined by the proportion of test takers who answered an item correctly in personality testing?
What is defined by the proportion of test takers who answered an item correctly in personality testing?
Which index measures the internal consistency of a test?
Which index measures the internal consistency of a test?
What is the optimal average item difficulty proportion for a test?
What is the optimal average item difficulty proportion for a test?
Which of the following indicates a 'Very Good Item' based on the Point-Biserial Method?
Which of the following indicates a 'Very Good Item' based on the Point-Biserial Method?
Which statistical procedure is used to evaluate test items?
Which statistical procedure is used to evaluate test items?
What type of items should be avoided during item writing?
What type of items should be avoided during item writing?
How are items arranged in an Omnibus Spiral Format?
How are items arranged in an Omnibus Spiral Format?
Which index measures the degree to which a test measures what it purports to measure?
Which index measures the degree to which a test measures what it purports to measure?
Phantom factors may emerge as a risk of using what in psychological assessment?
Phantom factors may emerge as a risk of using what in psychological assessment?
Which statistical test would you use for measuring correlation between two variables when both are measured on a nominal scale?
Which statistical test would you use for measuring correlation between two variables when both are measured on a nominal scale?
Which test is appropriate for examining the difference between the means of multiple dependent variables across two or more independent groups?
Which test is appropriate for examining the difference between the means of multiple dependent variables across two or more independent groups?
To predict the unknown value of variable X using the known value of variable Y, which test should be used?
To predict the unknown value of variable X using the known value of variable Y, which test should be used?
Which non-parametric test is equivalent to a paired t-test?
Which non-parametric test is equivalent to a paired t-test?
Which test would you use to control for an additional variable that may be influencing the relationship between your independent and dependent variable?
Which test would you use to control for an additional variable that may be influencing the relationship between your independent and dependent variable?
Which test should be used to analyze the focus level of a group of reviewers measured in the morning, afternoon, and night sessions of review?
Which test should be used to analyze the focus level of a group of reviewers measured in the morning, afternoon, and night sessions of review?
Which test involves artificial dichotomous variables for both the independent and dependent variables?
Which test involves artificial dichotomous variables for both the independent and dependent variables?
Which statistical test should be used to test the difference between groups when you have nominal data involving two groups with two or more categories?
Which statistical test should be used to test the difference between groups when you have nominal data involving two groups with two or more categories?
Which test would you use to measure the difference in blood pressure of a group before and after a lecture?
Which test would you use to measure the difference in blood pressure of a group before and after a lecture?
Which test should be used when comparing blood pressure measurements of young adults, middle-aged adults, and old adults during breakfast, lunch, and dinner?
Which test should be used when comparing blood pressure measurements of young adults, middle-aged adults, and old adults during breakfast, lunch, and dinner?
What is the primary purpose of cross-validation in test revision?
What is the primary purpose of cross-validation in test revision?
Which type of scoring discrepancy occurs when there is a difference between scoring in an anchor protocol and another protocol?
Which type of scoring discrepancy occurs when there is a difference between scoring in an anchor protocol and another protocol?
What does DIF Analysis aim to detect during test development?
What does DIF Analysis aim to detect during test development?
Which aspect of computerized adaptive testing reduces the likelihood of testtakers having low or high extreme scores?
Which aspect of computerized adaptive testing reduces the likelihood of testtakers having low or high extreme scores?
In the context of inferential statistics, what is the main purpose?
In the context of inferential statistics, what is the main purpose?
What is the role of an anchor protocol in test scoring?
What is the role of an anchor protocol in test scoring?
Which term describes the inevitable decrease in item validities after cross-validation?
Which term describes the inevitable decrease in item validities after cross-validation?
What is indicated by the term 'Equal Intervals' in measurement scales?
What is indicated by the term 'Equal Intervals' in measurement scales?
What is the primary function of an item-mapping method in test construction?
What is the primary function of an item-mapping method in test construction?
Ipsative scoring is used to compare what aspects within a test?
Ipsative scoring is used to compare what aspects within a test?
Which measure of central tendency is most useful when analyzing a skewed distribution?
Which measure of central tendency is most useful when analyzing a skewed distribution?
What distinguishes a ratio scale from an interval scale?
What distinguishes a ratio scale from an interval scale?
In the context of psychological assessment, what does 'error' primarily refer to?
In the context of psychological assessment, what does 'error' primarily refer to?
Which level of measurement is appropriate for categorizing observations without any quantitative distinctions?
Which level of measurement is appropriate for categorizing observations without any quantitative distinctions?
Which central tendency measure is appropriate for nominal data?
Which central tendency measure is appropriate for nominal data?
What is the goal of measures of central tendency in a distribution?
What is the goal of measures of central tendency in a distribution?
Which post-hoc test is used to determine the minimum difference between treatment means necessary for significance in ANOVA?
Which post-hoc test is used to determine the minimum difference between treatment means necessary for significance in ANOVA?
Why might the range be an unreliable measure of variability?
Why might the range be an unreliable measure of variability?
What statistical measure provides a quick but gross description of the spread of scores?
What statistical measure provides a quick but gross description of the spread of scores?
Which level of measurement allows for rank ordering on some characteristic but does not have equal intervals?
Which level of measurement allows for rank ordering on some characteristic but does not have equal intervals?
In a positively skewed distribution, where do most of the scores fall?
In a positively skewed distribution, where do most of the scores fall?
Which type of distribution has its mean equal to its median and mode?
Which type of distribution has its mean equal to its median and mode?
What kurtosis describes a distribution with a relatively flat peak?
What kurtosis describes a distribution with a relatively flat peak?
What type of standard score scale is set at a mean of 50 with a standard deviation of 10?
What type of standard score scale is set at a mean of 50 with a standard deviation of 10?
Which condition describes a distribution with high kurtosis?
Which condition describes a distribution with high kurtosis?
What does a Z-score indicate in a distribution?
What does a Z-score indicate in a distribution?
Which of the following distributions would most likely represent an easy exam?
Which of the following distributions would most likely represent an easy exam?
When a distribution has the mean < median < mode, what is typically observed?
When a distribution has the mean < median < mode, what is typically observed?
Which type of item format requires test takers to supply or create the correct answer?
Which type of item format requires test takers to supply or create the correct answer?
Which scale of measurement is characterized by having true zero points?
Which scale of measurement is characterized by having true zero points?
What distinguishes a good distractor in a multiple-choice item?
What distinguishes a good distractor in a multiple-choice item?
What is the significant characteristic of the ratio scale of measurement?
What is the significant characteristic of the ratio scale of measurement?
Which item type involves respondents ranking objects based on a criterion?
Which item type involves respondents ranking objects based on a criterion?
What type of measurement involves categorization without quantitative distinctions?
What type of measurement involves categorization without quantitative distinctions?
What is a characteristic of completion items in constructed-response formats?
What is a characteristic of completion items in constructed-response formats?
Which comparative scale of measurement involves allocating a constant sum of units among a set of items?
Which comparative scale of measurement involves allocating a constant sum of units among a set of items?
Which of the following best describes an ineffective distractor?
Which of the following best describes an ineffective distractor?
Which characteristic describes the interval scale of measurement?
Which characteristic describes the interval scale of measurement?
Flashcards are hidden until you start studying
Study Notes
BLEPP
Source
- Cohen & Swerdlik (2018)
- Kaplan & Saccuzzo (2018)
- Groth & Wright (2016)
- Psych Pearls
Item Writing Guidelines
- Define what to measure
- Generate item pool
- Avoid long items
- Keep reading difficulty appropriate
- Avoid double-barreled items
- Consider positive and negative worded items
Item Difficulty
- Defined by the number of people who get a particular item correct
- Item-Difficulty Index: proportion of total test-takers who answered the item correctly
- Optimal average item difficulty is approximately 50% with items ranging from 30% to 80%
Item Difficulty Ranges
- 0.0-0.19: Very difficult
- 0.20-0.39: Difficult
- 0.40-0.60: Average/moderately difficult
- 0.61-0.79: Easy
- 0.80-1.0: Very easy
Item Reliability Index
- Provides an indication of the internal consistency of a test
- Higher Item-Reliability index, the greater the test's internal consistency
Item-Validity Index
- Designed to provide an indication of the degree to which a test measures what it purports to measure
- Higher Item-Validity index, the greater the test's criterion-related validity
Item-Discrimination Index
- Measure of item discrimination
- Difference between proportion of high scorers answering an item correctly and proportion of low scorers answering the item correctly
Extreme Group Method
- Compares people who have done well with those who have done poorly
- Point-Biserial Method: correlation between a dichotomous variable and a continuous variable
Item-Characteristic Curve
- Graphic representation of item difficulty and discrimination
Guessing
- One that eluded any universally accepted solutions
- Item analyses taken under speed conditions yield misleading or uninterpretable results
Effective Distractors
- A distractor that was chosen equally by both high and low performing groups that enhances the consistency of test results
- Good distractors have been chosen frequently by low scorers
Ineffective Distractors
- May hurt the reliability of the test because they are time-consuming to read and can limit the number of good items
Types of Items
- Matching Item
- Binary Choice
- Constructed-Response Format
- Completion Item
- Short-Answer
- Essay
Primary Scales of Measurement
- Nominal: involve classification or categorization
- Mode
- Ordinal: rank ordering
- Median
- Ratio: contains equal intervals, has no absolute zero point
- Interval: has true zero point, easiest to manipulate
Comparative Scales of Measurement
- Paired Comparison: produces ordinal data by presenting pairs of two stimuli
- Rank Order: respondents are presented with several items simultaneously and asked to rank them in order or priority
- Constant Sum: respondents are asked to allocate a constant sum of units among a set of stimulus objects
- Q-Sort Technique: sort objects based on similarity with respect to some criterion
Non-Comparative Scales of Measurement
- Continuous Rating: rate objects by placing a mark at the appropriate position on a continuous line
- Itemized Rating: having numbers or brief descriptions associated with each category
- Likert Scale: indicate attitudes by responding to a series of statements that range from very positive to very negative
Ipsative Scoring
- Compares test-taker's score on one scale within a test to another scale within that same test, two unrelated constructs### Test Revision
- Characterize each item according to its strength and weaknesses
- Large item pool is advantageous in test revision as some items are removed and replaced by items from the pool
- Administer the revised test under standardized conditions to a second appropriate sample of examinees
- Cross-validation involves revalidating a test on a sample of test-takers other than those on whom test performance was originally found to be a valid predictor of some criterion
- Validity shrinkage is the decrease in item validities that inevitably occurs after cross-validation
Computerized Adaptive Testing
- An interactive, computer-administered test-taking process where items presented to the test-taker are based on their performance on previous items
- Reduces floor and ceiling effects
- Floor effects occur when there is a lower limit on a survey or questionnaire and a large percentage of respondents score near this lower limit
- Ceiling effects occur when there is an upper limit on a survey or questionnaire and a large percentage of respondents score near this upper limit
- Item branching is the ability of the computer to tailor the content and order of presentation of items based on responses to previous items
- Routing test is a subtest used to direct or route the test-taker to a suitable level of items
Statistics
- Measurement is the act of assigning numbers or symbols to characteristics of things according to rules
- Descriptive statistics provide a concise description of a collection of quantitative information
- Inferential statistics make inferences from observations of a small group of people (sample) to a larger group of individuals (population)
- Magnitude refers to the property of "moreness"
- Equal intervals refer to the difference between two points at any place on the scale having the same meaning as the difference between two other points that differ by the same amount
Symmetrical Distribution
- The right side of the graph is a mirror image of the left side
- Has only one mode and it is in the center of the distribution
- Mean = median = mode
Skewness
- Refers to the nature and extent to which symmetry is absent
- Positive skewness occurs when few scores fall at the high end of the distribution
- Mean < median < mode
- Negative skewness occurs when relatively few scores fall at the low end of the distribution
- Mean > median > mode
Kurtosis
- Refers to the steepness of a distribution in its center
- Platykurtic distributions are relatively flat
- Leptokurtic distributions are relatively peaked
- Mesokurtic distributions are somewhere in the middle
Standard Scores
- Raw scores that have been converted from one scale to another scale
- Z-scores are results from the conversion of a raw score into a number indicating how many standard deviation units the raw score is below or above the mean of the distribution
- T-scores are a scale with a mean set at 50 and a standard deviation set at 10
Error
- Refers to the collective influence of all the factors on a test score or measurement beyond those specifically measured by the test or measurement
- Degree to which the test score/measurement may be wrong, considering other factors
Scales of Measurement
- Nominal scales involve classification or categorization based on one or more distinguishing characteristics
- Ordinal scales involve rank ordering on some characteristic
- Interval scales contain equal intervals, has no absolute zero point
- Ratio scales have a true zero point
Distribution
- Defined as a set of test scores arrayed for recording or study
- Raw scores are a straightforward, unmodified accounting of performance that is usually numerical
- Frequency distribution lists all scores alongside the number of times each score occurred
Post-Hoc Tests
- Used in ANOVA to determine which mean differences are significantly different
- Tukey's HSD test allows the computation of a single value that determines the minimum difference between treatment means that is necessary for significance
Measures of Central Tendency
- Statistics that indicate the average or midmost score between the extreme scores in a distribution
- Goal is to identify the most typical or representative of the entire group
- Mean is the average of all the raw scores
- Median is the middle score of the distribution
- Mode is the most frequently occurring score in the distribution
Variability
- An indication of how scores in a distribution are scattered or dispersed
- Measures of variability describe the amount of variation in a distribution
- Range is equal to the difference between the highest and the lowest score
- Quartile is a dividing point between the four quarters in the distribution
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.