Psychological Assessment (11-20, Aly)

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What does the 'Known Groups Method' involve in determining cutoff scores?

Setting fixed cutoff scores and requiring expert judges to discuss
Setting cutoff scores based on test-taker performance across all items
Determination of whether the predictor of interest from group known to possess and not possess a trait of interest (correct)
Using two or more cut scores with reference to one predictor for categorization

Which reliability coefficient value denotes 'excellent' reliability?

0.60 to 0.69
0.70 to 0.79
0.90 and up (correct)
0.80 to 0.89

In IRT-Based Methods, how are cut scores typically set?

Based on expert judges discussing issues
Using a multi-stage selection process
Setting a reference point based on norm-related considerations
Based on test-taker's performance across all items on the test (correct)

What is the purpose of Discriminant Analysis as mentioned in the document?

To shed light on the relationship between identified variables and two naturally occurring groups (C) Signup and view all the answers

Which method requires expert judges to discuss the issues involved in determining a pass mark?

Angoff Method (B) Signup and view all the answers

What level of difficulty corresponds to an item difficulty range of 0.0 to 0.19?

Very difficult (B) Signup and view all the answers

Which p-value range signifies strong evidence against the null hypothesis?

0.01 Signup and view all the answers

What Cronbach's alpha value corresponds to 'Good' internal consistency?

0.9 > α ≥ 0.8 (B) Signup and view all the answers

Which item discrimination range is considered 'Fair'?

.20-.29 (B) Signup and view all the answers

Which interrater reliability coefficient range is regarded as 'Substantial' according to Landis & Koch (1977)?

0.8 (C) Signup and view all the answers

What measure of central tendency is used when there is an unknown or undetermined score?

Median (C) Signup and view all the answers

Which interrater reliability coefficient range is classified as 'Fair to Good' according to Fleiss (1981)?

0.7 (D) Signup and view all the answers

What type of evidence against the null hypothesis does a p-value greater than 0.10 provide?

Weak or no evidence (A) Signup and view all the answers

Which measure gives an indication of the shape of the distribution as well as a measure of central tendency?

Mode (B) Signup and view all the answers

Which measure of spread is calculated as the difference between the highest and lowest scores?

Range (C) Signup and view all the answers

What does the variance measure in a distribution?

Average squared deviations around the mean (D) Signup and view all the answers

How is the semi-quartile range calculated?

Interquartile range divided by 2 (B) Signup and view all the answers

Which of the following is true about percentiles?

They express the percentage of persons below a given score. (D) Signup and view all the answers

Which measure divides a distribution into four equal parts?

Quartile (C) Signup and view all the answers

What is Pearson's R used to measure?

Correlation between two variables (B) Signup and view all the answers

Which measure is NOT commonly used for nominal scales or discrete variables?

Standard Deviation (B) Signup and view all the answers

Which test should be used to assess normality for a sample size of 60?

Kolmogorov-Smirnov (A) Signup and view all the answers

Which statistical test is appropriate for comparing more than two groups on an ordinal scale?

Kruskal Wallis H Test (C) Signup and view all the answers

What is the major sensitivity difference between Bartlett's Test and Levene's Test?

Bartlett's Test is much more sensitive than Levene's Test (B) Signup and view all the answers

What does a P-value greater than 0.05 in Levene's Test indicate?

Variances are equal (D) Signup and view all the answers

Which of the following is an example of a true dichotomy?

Pass/Fail (B) Signup and view all the answers

During which stage of test development is the construct determined?

Test Conceptualization (C) Signup and view all the answers

What does computerized adaptive testing primarily depend on?

Previous item responses (B) Signup and view all the answers

Which test measures dependent means on an ordinal scale?

Wilcoxon Signed Rank Test (A) Signup and view all the answers

What characterizes the primary role of an item pool in test construction?

To serve as a reservoir for potential test items (B) Signup and view all the answers

Which stage involves item revising, formatting, and setting scoring rules?

Test Construction (A) Signup and view all the answers

Which method is used to compare individuals who have performed well against those who have not on a test?

Extreme Group Method (C) Signup and view all the answers

What is defined by the proportion of test takers who answered an item correctly in personality testing?

Item-Endorsement Index (B) Signup and view all the answers

Which index measures the internal consistency of a test?

Item-Reliability Index (D) Signup and view all the answers

What is the optimal average item difficulty proportion for a test?

0.50 (C) Signup and view all the answers

Which of the following indicates a 'Very Good Item' based on the Point-Biserial Method?

0.45 (B) Signup and view all the answers

Which statistical procedure is used to evaluate test items?

Item Analysis (D) Signup and view all the answers

What type of items should be avoided during item writing?

Double-barreled items (A) Signup and view all the answers

How are items arranged in an Omnibus Spiral Format?

In order of increasing difficulty (B) Signup and view all the answers

Which index measures the degree to which a test measures what it purports to measure?

Item-Validity Index (A) Signup and view all the answers

Phantom factors may emerge as a risk of using what in psychological assessment?

Few subjects (D) Signup and view all the answers

Which statistical test would you use for measuring correlation between two variables when both are measured on a nominal scale?

Phi Coefficient (A) Signup and view all the answers

Which test is appropriate for examining the difference between the means of multiple dependent variables across two or more independent groups?

MANOVA (D) Signup and view all the answers

To predict the unknown value of variable X using the known value of variable Y, which test should be used?

Linear Regression of X on Y (C) Signup and view all the answers

Which non-parametric test is equivalent to a paired t-test?

Wilcoxon Signed Rank Test (B) Signup and view all the answers

Which test would you use to control for an additional variable that may be influencing the relationship between your independent and dependent variable?

ANCOVA (B) Signup and view all the answers

Which test should be used to analyze the focus level of a group of reviewers measured in the morning, afternoon, and night sessions of review?

One-Way Repeated Measures (C) Signup and view all the answers

Which test involves artificial dichotomous variables for both the independent and dependent variables?

Tetrachoric (D) Signup and view all the answers

Which statistical test should be used to test the difference between groups when you have nominal data involving two groups with two or more categories?

Test of Independence (C) Signup and view all the answers

Which test would you use to measure the difference in blood pressure of a group before and after a lecture?

T-Test Dependent (B) Signup and view all the answers

Which test should be used when comparing blood pressure measurements of young adults, middle-aged adults, and old adults during breakfast, lunch, and dinner?

ANOVA Mixed Design (B) Signup and view all the answers

What is the primary purpose of cross-validation in test revision?

To validate the test on a new sample of testtakers (C) Signup and view all the answers

Which type of scoring discrepancy occurs when there is a difference between scoring in an anchor protocol and another protocol?

Scoring Drift (D) Signup and view all the answers

What does DIF Analysis aim to detect during test development?

Differential Item Functioning (A) Signup and view all the answers

Which aspect of computerized adaptive testing reduces the likelihood of testtakers having low or high extreme scores?

Floor and Ceiling Effects (C) Signup and view all the answers

In the context of inferential statistics, what is the main purpose?

To infer characteristics of a population based on a sample (D) Signup and view all the answers

What is the role of an anchor protocol in test scoring?

To resolve discrepancies in scoring (D) Signup and view all the answers

Which term describes the inevitable decrease in item validities after cross-validation?

Validity Shrinkage (B) Signup and view all the answers

What is indicated by the term 'Equal Intervals' in measurement scales?

A difference between two points on the scale is the same everywhere on the scale (A) Signup and view all the answers

What is the primary function of an item-mapping method in test construction?

To set cut scores based on item performance (C) Signup and view all the answers

Ipsative scoring is used to compare what aspects within a test?

A testtaker's score on one scale with another scale within the same test (B) Signup and view all the answers

Which measure of central tendency is most useful when analyzing a skewed distribution?

Median (D) Signup and view all the answers

What distinguishes a ratio scale from an interval scale?

Possession of an absolute zero point (A) Signup and view all the answers

In the context of psychological assessment, what does 'error' primarily refer to?

Factors affecting the test score beyond the measured attribute (A) Signup and view all the answers

Which level of measurement is appropriate for categorizing observations without any quantitative distinctions?

Nominal (B) Signup and view all the answers

Which central tendency measure is appropriate for nominal data?

Mode (A) Signup and view all the answers

What is the goal of measures of central tendency in a distribution?

Identify the most typical or representative score (B) Signup and view all the answers

Which post-hoc test is used to determine the minimum difference between treatment means necessary for significance in ANOVA?

Tukey’s HSD test (D) Signup and view all the answers

Why might the range be an unreliable measure of variability?

It is affected by extreme scores (C) Signup and view all the answers

What statistical measure provides a quick but gross description of the spread of scores?

Range (C) Signup and view all the answers

Which level of measurement allows for rank ordering on some characteristic but does not have equal intervals?

Ordinal (A) Signup and view all the answers

In a positively skewed distribution, where do most of the scores fall?

At the low end of the distribution (A) Signup and view all the answers

Which type of distribution has its mean equal to its median and mode?

Symmetrical distribution (A) Signup and view all the answers

What kurtosis describes a distribution with a relatively flat peak?

Platykurtic (D) Signup and view all the answers

What type of standard score scale is set at a mean of 50 with a standard deviation of 10?

T-scores (B) Signup and view all the answers

Which condition describes a distribution with high kurtosis?

A high peak with fatter tails (D) Signup and view all the answers

What does a Z-score indicate in a distribution?

The number of standard deviation units a raw score is above or below the mean (D) Signup and view all the answers

Which of the following distributions would most likely represent an easy exam?

Negatively skewed (A) Signup and view all the answers

When a distribution has the mean < median < mode, what is typically observed?

A positively skewed distribution (C) Signup and view all the answers

Which type of item format requires test takers to supply or create the correct answer?

Constructed-Response (D) Signup and view all the answers

Which scale of measurement is characterized by having true zero points?

Interval (D) Signup and view all the answers

What distinguishes a good distractor in a multiple-choice item?

It is chosen frequently by low scorers (B) Signup and view all the answers

What is the significant characteristic of the ratio scale of measurement?

It contains equal intervals but no absolute zero point (A) Signup and view all the answers

Which item type involves respondents ranking objects based on a criterion?

Rank Order (C) Signup and view all the answers

What type of measurement involves categorization without quantitative distinctions?

Nominal (B) Signup and view all the answers

What is a characteristic of completion items in constructed-response formats?

The test taker must provide a word or phrase to complete a sentence (D) Signup and view all the answers

Which comparative scale of measurement involves allocating a constant sum of units among a set of items?

Constant Sum (D) Signup and view all the answers

Which of the following best describes an ineffective distractor?

Time-consuming to read without being frequently chosen (D) Signup and view all the answers

Which characteristic describes the interval scale of measurement?

Equal intervals and no true zero point (D) Signup and view all the answers

Flashcards are hidden until you start studying

Study Notes

BLEPP

Source

Cohen & Swerdlik (2018)
Kaplan & Saccuzzo (2018)
Groth & Wright (2016)
Psych Pearls

Item Writing Guidelines

Define what to measure
Generate item pool
Avoid long items
Keep reading difficulty appropriate
Avoid double-barreled items
Consider positive and negative worded items

Item Difficulty

Defined by the number of people who get a particular item correct
Item-Difficulty Index: proportion of total test-takers who answered the item correctly
Optimal average item difficulty is approximately 50% with items ranging from 30% to 80%

Item Difficulty Ranges

0.0-0.19: Very difficult
0.20-0.39: Difficult
0.40-0.60: Average/moderately difficult
0.61-0.79: Easy
0.80-1.0: Very easy

Item Reliability Index

Provides an indication of the internal consistency of a test
Higher Item-Reliability index, the greater the test's internal consistency

Item-Validity Index

Designed to provide an indication of the degree to which a test measures what it purports to measure
Higher Item-Validity index, the greater the test's criterion-related validity

Item-Discrimination Index

Measure of item discrimination
Difference between proportion of high scorers answering an item correctly and proportion of low scorers answering the item correctly

Extreme Group Method

Compares people who have done well with those who have done poorly
Point-Biserial Method: correlation between a dichotomous variable and a continuous variable

Item-Characteristic Curve

Graphic representation of item difficulty and discrimination

Guessing

One that eluded any universally accepted solutions
Item analyses taken under speed conditions yield misleading or uninterpretable results

Effective Distractors

A distractor that was chosen equally by both high and low performing groups that enhances the consistency of test results
Good distractors have been chosen frequently by low scorers

Ineffective Distractors

May hurt the reliability of the test because they are time-consuming to read and can limit the number of good items

Types of Items

Matching Item
Binary Choice
Constructed-Response Format
- Completion Item
- Short-Answer
- Essay

Primary Scales of Measurement

Nominal: involve classification or categorization
- Mode
Ordinal: rank ordering
- Median
Ratio: contains equal intervals, has no absolute zero point
Interval: has true zero point, easiest to manipulate

Comparative Scales of Measurement

Paired Comparison: produces ordinal data by presenting pairs of two stimuli
Rank Order: respondents are presented with several items simultaneously and asked to rank them in order or priority
Constant Sum: respondents are asked to allocate a constant sum of units among a set of stimulus objects
Q-Sort Technique: sort objects based on similarity with respect to some criterion

Non-Comparative Scales of Measurement

Continuous Rating: rate objects by placing a mark at the appropriate position on a continuous line
Itemized Rating: having numbers or brief descriptions associated with each category
Likert Scale: indicate attitudes by responding to a series of statements that range from very positive to very negative

Ipsative Scoring

Compares test-taker's score on one scale within a test to another scale within that same test, two unrelated constructs### Test Revision
Characterize each item according to its strength and weaknesses
Large item pool is advantageous in test revision as some items are removed and replaced by items from the pool
Administer the revised test under standardized conditions to a second appropriate sample of examinees
Cross-validation involves revalidating a test on a sample of test-takers other than those on whom test performance was originally found to be a valid predictor of some criterion
Validity shrinkage is the decrease in item validities that inevitably occurs after cross-validation

Computerized Adaptive Testing

An interactive, computer-administered test-taking process where items presented to the test-taker are based on their performance on previous items
Reduces floor and ceiling effects
Floor effects occur when there is a lower limit on a survey or questionnaire and a large percentage of respondents score near this lower limit
Ceiling effects occur when there is an upper limit on a survey or questionnaire and a large percentage of respondents score near this upper limit
Item branching is the ability of the computer to tailor the content and order of presentation of items based on responses to previous items
Routing test is a subtest used to direct or route the test-taker to a suitable level of items

Statistics

Measurement is the act of assigning numbers or symbols to characteristics of things according to rules
Descriptive statistics provide a concise description of a collection of quantitative information
Inferential statistics make inferences from observations of a small group of people (sample) to a larger group of individuals (population)
Magnitude refers to the property of "moreness"
Equal intervals refer to the difference between two points at any place on the scale having the same meaning as the difference between two other points that differ by the same amount

Symmetrical Distribution

The right side of the graph is a mirror image of the left side
Has only one mode and it is in the center of the distribution
Mean = median = mode

Skewness

Refers to the nature and extent to which symmetry is absent
Positive skewness occurs when few scores fall at the high end of the distribution
Mean < median < mode
Negative skewness occurs when relatively few scores fall at the low end of the distribution
Mean > median > mode

Kurtosis

Refers to the steepness of a distribution in its center
Platykurtic distributions are relatively flat
Leptokurtic distributions are relatively peaked
Mesokurtic distributions are somewhere in the middle

Standard Scores

Raw scores that have been converted from one scale to another scale
Z-scores are results from the conversion of a raw score into a number indicating how many standard deviation units the raw score is below or above the mean of the distribution
T-scores are a scale with a mean set at 50 and a standard deviation set at 10

Error

Refers to the collective influence of all the factors on a test score or measurement beyond those specifically measured by the test or measurement
Degree to which the test score/measurement may be wrong, considering other factors

Scales of Measurement

Nominal scales involve classification or categorization based on one or more distinguishing characteristics
Ordinal scales involve rank ordering on some characteristic
Interval scales contain equal intervals, has no absolute zero point
Ratio scales have a true zero point

Distribution

Defined as a set of test scores arrayed for recording or study
Raw scores are a straightforward, unmodified accounting of performance that is usually numerical
Frequency distribution lists all scores alongside the number of times each score occurred

Post-Hoc Tests

Used in ANOVA to determine which mean differences are significantly different
Tukey's HSD test allows the computation of a single value that determines the minimum difference between treatment means that is necessary for significance

Measures of Central Tendency

Statistics that indicate the average or midmost score between the extreme scores in a distribution
Goal is to identify the most typical or representative of the entire group
Mean is the average of all the raw scores
Median is the middle score of the distribution
Mode is the most frequently occurring score in the distribution

Variability

An indication of how scores in a distribution are scattered or dispersed
Measures of variability describe the amount of variation in a distribution
Range is equal to the difference between the highest and the lowest score
Quartile is a dividing point between the four quarters in the distribution

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.