Podcast
Questions and Answers
Which of the following is the first stage in the process of developing a test?
Which of the following is the first stage in the process of developing a test?
- Test Conceptualization (correct)
- Test Revision
- Test Construction
- Item Analysis
Test tryout occurs before test construction in the test development process.
Test tryout occurs before test construction in the test development process.
False (B)
What is the term for the preliminary research surrounding the creation of a test?
What is the term for the preliminary research surrounding the creation of a test?
Pilot Work
The process of setting rules for assigning numbers in measurement is known as ______.
The process of setting rules for assigning numbers in measurement is known as ______.
Match each type of scale with its description:
Match each type of scale with its description:
Which type of scaling involves summing the ratings across all items to obtain a final score?
Which type of scaling involves summing the ratings across all items to obtain a final score?
In a method of paired comparisons, test takers are presented with only single stimuli to evaluate.
In a method of paired comparisons, test takers are presented with only single stimuli to evaluate.
What does comparative scaling entail?
What does comparative scaling entail?
Sorting cards into piles representing "never justified," "sometimes justified," and "always justified" is an example of ______ scaling.
Sorting cards into piles representing "never justified," "sometimes justified," and "always justified" is an example of ______ scaling.
Which of the following is a key feature of a Guttman scale?
Which of the following is a key feature of a Guttman scale?
In test construction, the number of items written initially should always equal the number of items intended for the final version of the test.
In test construction, the number of items written initially should always equal the number of items intended for the final version of the test.
What are the three elements of a multiple-choice item?
What are the three elements of a multiple-choice item?
An item format that requires test-takers to create or supply the correct answer is known as a ______-response format.
An item format that requires test-takers to create or supply the correct answer is known as a ______-response format.
What is the term for a multiple-choice item that contains only two possible responses?
What is the term for a multiple-choice item that contains only two possible responses?
Item branching is a technique used in paper-based tests to individualize the testing experience.
Item branching is a technique used in paper-based tests to individualize the testing experience.
What is the term for a relatively large and easily accessible collection of test questions for computer administration?
What is the term for a relatively large and easily accessible collection of test questions for computer administration?
The term ______ refers to an interactive, computer-administered test-taking process wherein items presented to the test-taker are based in part on the test-taker’s performance on previous items.
The term ______ refers to an interactive, computer-administered test-taking process wherein items presented to the test-taker are based in part on the test-taker’s performance on previous items.
In computerized adaptive testing (CAT), what is a general rule regarding difficulty?
In computerized adaptive testing (CAT), what is a general rule regarding difficulty?
The floor effect occurs when a test is too difficult, leading most test takers to score very high.
The floor effect occurs when a test is too difficult, leading most test takers to score very high.
In testing, what general situation does the term "ceiling effect" describe?
In testing, what general situation does the term "ceiling effect" describe?
In a cumulatively scored test, the rule typically is that the higher the score on the test, the ______ the testtaker is on the ability, trait, or other characteristic that the test purports to measure.
In a cumulatively scored test, the rule typically is that the higher the score on the test, the ______ the testtaker is on the ability, trait, or other characteristic that the test purports to measure.
What is a typical objective in ipsative scoring?
What is a typical objective in ipsative scoring?
During test tryout, it is important to administer the test in conditions that are dissimilar to the final test administration.
During test tryout, it is important to administer the test in conditions that are dissimilar to the final test administration.
Name four indices that are gathered during item analysis.
Name four indices that are gathered during item analysis.
The item-difficulty index is obtained by calculating the ______ of the total number of test-takers who answered the item correctly.
The item-difficulty index is obtained by calculating the ______ of the total number of test-takers who answered the item correctly.
An item with a difficulty index of 0.75 means that the item is considered:
An item with a difficulty index of 0.75 means that the item is considered:
The item-discrimination index measures the reliability of each question.
The item-discrimination index measures the reliability of each question.
What is the purpose of the item-discrimination index?
What is the purpose of the item-discrimination index?
The item-validity index provides an indication of the degree to which a test is measuring what it ______ to measure.
The item-validity index provides an indication of the degree to which a test is measuring what it ______ to measure.
What is the possible impact of guessing on examinees scores, test scores in general?
What is the possible impact of guessing on examinees scores, test scores in general?
A biased test item is fair if it favors one particular group when differences in group ability are controlled.
A biased test item is fair if it favors one particular group when differences in group ability are controlled.
In qualitative item analysis, what general type of procedure is used?
In qualitative item analysis, what general type of procedure is used?
A sensitivity review during test development examines items for fairness and the presence of offensive language, ______, or situations.
A sensitivity review during test development examines items for fairness and the presence of offensive language, ______, or situations.
Which of the situations is a reason to revise the test?
Which of the situations is a reason to revise the test?
Cross-validation involves revalidating a test on the same sample as the original validation.
Cross-validation involves revalidating a test on the same sample as the original validation.
What is co-validation?
What is co-validation?
Match each term related to test development with its correct definition:
Match each term related to test development with its correct definition:
During the test tryout phase, it is important that the test administration atmosphere, instructions, and ______ are similar to the final test.
During the test tryout phase, it is important that the test administration atmosphere, instructions, and ______ are similar to the final test.
Which type of scaling would involve asking test-takers to rank items from most to least justifiable?
Which type of scaling would involve asking test-takers to rank items from most to least justifiable?
Qualitative item analysis only relies on statistical data to identify areas for improvement.
Qualitative item analysis only relies on statistical data to identify areas for improvement.
What is the function of a sensitivity review conducted by expert panels?
What is the function of a sensitivity review conducted by expert panels?
Match the stage of test development process with their activities:
Match the stage of test development process with their activities:
Flashcards
Test Conceptualization
Test Conceptualization
The initial phase involving brainstorming, literature review, and identifying emerging social phenomena that could be stimuli for a new test.
Pilot Work
Pilot Work
Preliminary research to refine test items and procedures before full-scale test construction.
Scaling
Scaling
The process of setting rules for assigning numbers in measurement.
Age Based Scale
Age Based Scale
Signup and view all the flashcards
Grade Based Scale
Grade Based Scale
Signup and view all the flashcards
Unidimensional Scale
Unidimensional Scale
Signup and view all the flashcards
Multidimensional Scale
Multidimensional Scale
Signup and view all the flashcards
Comparative Scaling
Comparative Scaling
Signup and view all the flashcards
Categorical Scaling
Categorical Scaling
Signup and view all the flashcards
Guttman Scale
Guttman Scale
Signup and view all the flashcards
Item Bank
Item Bank
Signup and view all the flashcards
Item Format
Item Format
Signup and view all the flashcards
Constructed-Response Format
Constructed-Response Format
Signup and view all the flashcards
Selected-Response Format
Selected-Response Format
Signup and view all the flashcards
Binary Choice Item
Binary Choice Item
Signup and view all the flashcards
Completion Item
Completion Item
Signup and view all the flashcards
Item Branching
Item Branching
Signup and view all the flashcards
Computerized Adaptive Testing (CAT)
Computerized Adaptive Testing (CAT)
Signup and view all the flashcards
Ceiling Effect
Ceiling Effect
Signup and view all the flashcards
Floor Effect
Floor Effect
Signup and view all the flashcards
Ipsative Scoring
Ipsative Scoring
Signup and view all the flashcards
Test Tryout
Test Tryout
Signup and view all the flashcards
Item-Difficulty Index
Item-Difficulty Index
Signup and view all the flashcards
Item-Discrimination Index
Item-Discrimination Index
Signup and view all the flashcards
Item Reliability Index
Item Reliability Index
Signup and view all the flashcards
Item-Validity Index
Item-Validity Index
Signup and view all the flashcards
Item Characteristics Curve
Item Characteristics Curve
Signup and view all the flashcards
Qualitative Item Analysis
Qualitative Item Analysis
Signup and view all the flashcards
Sensitivity Review
Sensitivity Review
Signup and view all the flashcards
Test Revision
Test Revision
Signup and view all the flashcards
Cross-Validation
Cross-Validation
Signup and view all the flashcards
Co-Validation
Co-Validation
Signup and view all the flashcards
Biased Test Item
Biased Test Item
Signup and view all the flashcards
Study Notes
- Development of a test happens in five stages: test conceptualization, test construction, test tryout, item analysis, and test revision.
Test Conceptualization
- Self talk helps you conceptualize of a test
- Review available literature on existing tests.
- Emerging social behavior patterns can inspire new test development like FOMO or JOMO.
Preliminary questions
- What is the test designed to measure?
- What is the objective of the test?
- Is there a need for this test?
- Who will use this test?
- Who will take this test?
- What content will the test cover?
- How will the test be administered?
- What is the ideal format of the test?
- What special training will be required of administrators or interpreters?
- What types of responses will be required of test takers?
- Who benefits from the administration of this test?
- Is there potential for harm due to test administration?
- How will test results be interpreted meaningfully?
Pilot Work
- Preliminary research creation of a test.
- Test items are evaluated to determine if they should be in the final version.
- The test developer runs a trial
- Test construction begins after pilot work.
Test Construction
- Scaling methods are used to assign numbers in measurement.
- L. L. Thurstone developed scaling methods.
Types of Scales
- Age-based scales consider the age of the test taker for performance
- Grade-based scales determine importance of grade.
- Unidimensional scales measure one trait.
- Multidimensional scales measure different aspects.
- Comparative scales are opposed to categorical ones.
Scaling Methods
- Rating scales include words, statements or symbols showing strength of a trait, attitude or emotion.
- Summative scales involves summing ratings making ratings across all test items for a final score.
- Likert scales are a type of summative scale.
- Likert scales present test takers with five or seven alternative responses, typically on an agree-disagree continuum.
Method of Paired Comparisons
- Test takers compare pairs of stimuli and select based on a rule, like agreeing more with a statement or finding a stimulus more appealing.
- Example: Deciding which behavior is more justified, such as cheating on taxes versus accepting a bribe.
Comparative Scaling
- Comparative scaling means judging a stimulus when compared to ever stimulus on the scale.
- Test takers rank cards from most to least justifiable.
- Comparative scaling can involves ranking 30 items from 1 to 30 on paper.
Categorical Scaling
- Stimuli are placed into two or more alternative categories differing quantitatively.
- Cards can be sorted into piles depending on whether behavior is never, sometimes, or always justified.
Guttman Scale
- Items are organized so agreeing with a particular item implies agreement with lower-ranked items.
- Agreement with any item assumes agreement with lower-order items.
Writing Items
- Consider what range of content items cover.
- Decide the different item formats used.
- Determine how many items will be written in total for each content area.
Item Pool
- Item pool is the reservoir of items for a test's final version.
- A 30-item test should have an item pool of 60 items.
- Test developers write items based on personal experience or subject matter knowledge.
Item Format
- Variables like the form, plan, structure, and layout of individual test items are an item format.
- Selected-response format items require test takers to select from alternative responses.
- Constructed-response format items mean test-takers must supply or create the answer.
Types of Selected-Response Item Formats
- An item in a multiple-choice format includes a stem (problem statement), a correct option, and distractors or foils.
- In matching items, test takers matches premises and responses in two columns.
- Binary choice items are a multiple-choice item with two options.
- Completion items require examinees to complete a sentence, like fill-in-the-blanks or short answer essay questions.
Writing Items for Computer Administration
- Item banks store items, allowing individualized testing via item branching.
- Instructors create item banks of questions for examinations.
Writing Items and Computerized Adaptive Testing
- Computerized Adaptive Testing (CAT) is interactive, using test takers' performance on previous items to adaptively give subsequent items.
- More difficult items are given when the person gets two consecutive items correct.
- Stop a test if the person gets five consecutive items wrong.
Floor Effect
- Occurs when a test is too difficult, causing students to fail all items.
- This makes it difficult to distinguish among test takers.
- Computer based tests use easier items to distinguish between performance.
Ceiling Effect
-
When a test is too easy and students pass all items.
-
It prevents distinguishing among them.
-
Adaptive tests use more difficult items.
-
CAT can measure personality and give items that in depth measure depression.
Scoring Items
- The cumulative model suggests that higher test scores indicate higher ability or trait levels.
- Category scoring means test taker responses place them in a class/category, like depressed or non-depressed.
- This approach is used in diagnostic systems.
- Ipsative scoring compares a test taker scores on different scales within the test.
Test Tryout
- The test developer tries the test out.
- Administer the test to people for whom it is designed.
- A minimum of 5, the more merrier as this reduces chance factor.
- Time, instructions, and atmosphere should mirror the final test.
Item Analysis
- Item analysis indices include the difficulty index, reliability index, validity index, discrimination index, and ICC.
Item-Difficulty Index
- It is calculated by the proportion of test takers answering correctly.
- For example, if out of 100 test takers, only 50 got item 2 right, difficulty index is 0.5.
- Item 2 is more difficult than item 3 in the example provided.
Range of Item-Difficulty Index
- It ranges from 0.0 to 1.0, with higher values indicating easier items.
- A difficulty index is considered easy if it is .75.
- A difficulty index is considered difficult if it is .30.
- A giveaway item beginning an achievement test can spur motivation and positive attitude and lowers text anxiety.
Item-Discrimination Index
- How well an item distinguishes between knowledgeable and unknowledgeable examinees.
- It assesses the relationship between examinee performance on an item and their overall test score.
- Highly discriminating items means examinees answering them correctly tend to perform well overall.
Item Reliability Index
- It provides an indication of the internal consistency of a test.
- It sees to it all items are measuring aspects that the test is designed to measure.
- Items measuring something different are eliminated.
Item-Validity Index
- It provides the degree to which a test measures what it is expected to measure.
- A higher item-validity index means greater criterion-related validity.
Item Characteristics Curve (ICC)
- It graphically represents item discrimination and difficulty.
Other Considerations in Item Analysis
-
When guessing, test takers can rule out distractors alternatives.
-
Instructions should be explicit about how to handle omitting a response.
-
Test takers should answer only when certain not guess.
-
Test takers can complete all items and guess out of doubt.
-
Risk-taking behavior impacts test scores.
-
There is no set solution to the problem of guessing.
-
Proper instructions for test administrators exists.
-
Guessing is not be allowed for test takers.
-
There are specifics for scoring and interpreting omitted items.
Item Fairness
- A biased test item favors one group of examinees when group ability differences are controlled.
- These items are identified by ICC Curves.
- A difference in curve shape between genders is an unfair item in the circumstance no difference in total test scores exist.
Qualitative Item Analysis
- Used to gather test takers' viewpoints.
- Interviews and group discussions get this feedback.
Types of Questions
-
Test length.
-
Test language.
-
Giving the test may give test takes the means to lash out.
-
Think aloud test administration is a research tool that explains test taker's thoughts in administration.
-
Verbalizing thoughts may occur.
Qualitative Item Analysis: Expert Panels
- A sensitivity review test items for fairness and offensive language during test development.
Test Revision
- Some items are removed for weaknesses.
- Unreliable of biased items are removed.
- Items are selected from an item pool.
- The test will be revised.
Test Revision in the Life Cycle of an Existing Test
- Stimulus materials are outdated and cannot be related to current test takers.
- Verbal content of items and instructions contain outdated vocabulary to change.
- Popular culture change will have to be considered to potentially offensive material.
- Reliability and validity can be changed or improved with revision.
- Theories are improved which will change test items.
Cross-Validation and Co-Validation
- Cross-validation revalidates a test on a new sample of test takers.
- Co-validation means validating two or more tests measuring the same thing using the same test takers.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.