Podcast
Questions and Answers
Which of the following is NOT one of the five stages in the process of developing a test?
Which of the following is NOT one of the five stages in the process of developing a test?
- Test Conceptualization
- Item Analysis
- Test Construction
- Test Standardization (correct)
Test conceptualization always begins with a review of existing literature on similar tests.
Test conceptualization always begins with a review of existing literature on similar tests.
False (B)
What is the primary purpose of a pilot study in test development?
What is the primary purpose of a pilot study in test development?
to evaluate whether items should be included in the final form of the instrument
The process of setting rules for assigning numbers in measurement is known as ______.
The process of setting rules for assigning numbers in measurement is known as ______.
Match each scale type with its description:
Match each scale type with its description:
What is a key disadvantage of using a true-false item format in a test?
What is a key disadvantage of using a true-false item format in a test?
In cumulative scoring, a lower score indicates a higher level of the trait or ability being measured.
In cumulative scoring, a lower score indicates a higher level of the trait or ability being measured.
Before administering the final draft of a test, the next step is the ______.
Before administering the final draft of a test, the next step is the ______.
What is ipsative scoring, and in what type of testing is it typically used?
What is ipsative scoring, and in what type of testing is it typically used?
A good test item always decreases the likelihood of discriminating between testtakers.
A good test item always decreases the likelihood of discriminating between testtakers.
For optimal effectiveness, a test tryout should ideally have:
For optimal effectiveness, a test tryout should ideally have:
Define the item-difficulty index (p) and explain what its value represents.
Define the item-difficulty index (p) and explain what its value represents.
The item-reliability index indicates the ______ of a test.
The item-reliability index indicates the ______ of a test.
Match the following item analysis indices with their primary focus:
Match the following item analysis indices with their primary focus:
What does a negative d-value in the item-discrimination index indicate?
What does a negative d-value in the item-discrimination index indicate?
Qualitative item analysis relies primarily on statistical procedures.
Qualitative item analysis relies primarily on statistical procedures.
What is the main goal of the 'think aloud' test administration?
What is the main goal of the 'think aloud' test administration?
Expert panels provide ______ analyses of test items.
Expert panels provide ______ analyses of test items.
According to the APA, when should existing tests be revised?
According to the APA, when should existing tests be revised?
A test should never be revised simply because the stimulus materials look dated.
A test should never be revised simply because the stimulus materials look dated.
Flashcards
Test Development
Test Development
The creation of a good test through thoughtful application of established principles of test construction.
Test Conceptualization
Test Conceptualization
The initial phase in test development, involving thoughts and self-talk about the need for a test to measure a specific construct.
Scaling
Scaling
A process of setting rules for assigning numbers in measurement to traits or characteristics.
Rating Scale
Rating Scale
Signup and view all the flashcards
Summative Rating Scale
Summative Rating Scale
Signup and view all the flashcards
Likert Scale
Likert Scale
Signup and view all the flashcards
Method of Paired Comparisons
Method of Paired Comparisons
Signup and view all the flashcards
Comparative scaling
Comparative scaling
Signup and view all the flashcards
Item Format
Item Format
Signup and view all the flashcards
Selected-response format
Selected-response format
Signup and view all the flashcards
Constructed-response format
Constructed-response format
Signup and view all the flashcards
Item Branching
Item Branching
Signup and view all the flashcards
Cumulative Scoring
Cumulative Scoring
Signup and view all the flashcards
Test Construction
Test Construction
Signup and view all the flashcards
Test Revision
Test Revision
Signup and view all the flashcards
Test Tryout
Test Tryout
Signup and view all the flashcards
Item Analysis
Item Analysis
Signup and view all the flashcards
Item-Reliability Index
Item-Reliability Index
Signup and view all the flashcards
Item-Discrimination Index
Item-Discrimination Index
Signup and view all the flashcards
Think Aloud Test Administration
Think Aloud Test Administration
Signup and view all the flashcards
Study Notes
- This module discusses test development stages and their importance for valid, reliable results
- Developing good tests requires thoughtful application of established test construction principles
Test Development Stages
- Test conceptualization is the first stage
- Test construction is the second stage
- Test tryout is the third stage
- Item analysis is the fourth stage
- Test revision is the fifth and final stage
Test Conceptualization
- The test creation may begin as thoughts or self-talk
- A test developer may think there should be a test for a construct in a certain way
- Stimuli for test included literature review
- Emerging social phenomena (e.g., celibacy, new diseases) may also stimulate a test
- Assessments in emerging occupations (e.g., high-definition electronics) require new tests
- Initial test development questions cover conceptualization, purpose, items, administration, scoring, test-takers, format, and structure
- The Pilot studies evaluate test items
- The structured interviews gather feedback from subjects and others
Test Construction
- This stage begins after completion of pilot work
- Scaling assigns numbers for measurement using calibrated measuring devices
- Scale values relate to trait/attribute amounts measured
- Louis Leon Thurstone pioneered sound scaling methods and applied psychophysical ones to psychological variables like attitudes
- Age-based scales depends of the age
- Grade-based scales depends of the grade
- Stanine scales
- Unidimensional or Multidimensional scales exist
- Optimal scale choice aligns with the developer’s measurement conception
- Rating scales use words/symbols to indicate trait/attitude/emotion strength and record various judgements
- Summative rating scales adds testtaker ratings to get a final score
- Likert scales commonly assess attitudes using 5-7 answer options (agree-disagree)
- Paired comparisons presents stimuli pairs for testtakers to choose which they agree with
- Comparative scaling asks testtakers to sort stimuli from most to least justifiable, involving ranking
- Categorical scaling places stimuli into alternative, quantitatively different categories
- Guttman scales use items ranging from weak to strong expressions of attitudes/beliefs/feelings
Writing Test Items
- Item format includes form, plan, structure, arrangement, and layout
- Selected-response format requires testtakers to choose from options
- Multiple-choice questions includes a stem, correct option, and distractors
- Matching questions pairs responses with premises
- True-False questions are binary choice thatindicates whether a statement is fact
- The disadvantage of True/False questions is 50% chance of guessing correctly
- Constructed-response format requires creating an answer
- Completion items requires filling in word/phrase (short-answer)
- Essay items requires writing a composition to demonstrate recall/understanding/analysis/interpretation
- Computerized Adaptive Testing uses computer-administered items based on testtaker's performance
- Item branching tailors content and order based on responses
Scoring Items
- Cumulative scoring is the most common model where higher scores mean higher trait levels
- Class/Category scoring places testtakers in categories based on similar response patterns
- Diagnostic systems use this approach
- Ipsative scoring compares scores within a test to draw intra-individual conclusions
Test Tryout
- Tryout occurs after deciding on a scoring model and preparing the test draft
- Tests should be tried out to representative target audience
- There should be at least five subjects, ideally ten, per item with more subjects weakening chances of data error
- Tryout conditions need to match standardized test conditions
- Test developers need to minimize extraneous factors to ensure accurate results
Item Analysis
- A good test item is reliable, valid, anddiscriminates testtakers
- Analysis selects the best items after administering the test to a group of testtakers
- Item-Difficulty Index calculates the proportion of testtakers answering correctly with values from 0-1
- The difficulty index of personality testing is the item-endorsement index
- The item-endorsement index measures how may people agreed with the item
- Item-Reliability Index indicates a test's internal consistency, with higher indices being more consistent
- Factor analysis determines if items measure the same thing, and revises/eliminates items
- Item-Validity Index indicates correlation that measures a specific item
- Item-Discrimination Index separates high versus lowers scorers
- Good achievement test items are answered right by high scorers and wrong by low scorers
- The item-discrimination index measures the difference between the high scorer proportion versus the low scorer proportion
- Negative d-values indicate low scorers answering correctly, requiring item revision/elimination
Qualitative Item Analysis
- Analysis uses nonstatistical procedures to explore how test items work
- It compares items to each other and the test as a whole
- Methods explore issues through interviews and discussions with testtakers
- "Think aloud" administration sheds light on testtaker thought processes
- Testtakers verbalize thoughts on each item to one examiner
- Expert panels provide qualitative analyses with sensitivity reviews to asses fairness and offensive content
Test Revision
- Revision involves rewording, deleting, or creating items
- Revision modifies pre-existing tests
- Revision characterizes and balances items based on strengths and weaknesses, considering reliability and validity
- The blueprint revision influences according to test purpose, especially to asses discrimination
- Existing tests should remain "useful" per APA standards
- Tests should be revised when domain changes, and new interpretation makes test inappropriate
- Test need revising when stimulus seem outdated
- Revising needs to occur when vocabulary is dated
- Some words/expressions may be inappropriate due to cultural changes
- Changes need to occur when test norms are outdated due to potential testtaker demographic shifts
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.