Podcast
Questions and Answers
Which of the following is the initial step in the test development process?
Which of the following is the initial step in the test development process?
- Test Construction
- Item Analysis
- Test Revision
- Test Conceptualization (correct)
Reviewing existing literature and tests related to the construct is not a helpful activity during test conceptualization.
Reviewing existing literature and tests related to the construct is not a helpful activity during test conceptualization.
False (B)
According to the content, what is the term for a preliminary investigation surrounding the creation of a test?
According to the content, what is the term for a preliminary investigation surrounding the creation of a test?
Pilot Work
The process of setting rules for assigning numbers in measurement is known as __________.
The process of setting rules for assigning numbers in measurement is known as __________.
Which type of scale is used when the age of the test taker affects performance?
Which type of scale is used when the age of the test taker affects performance?
Unidimensional scales measure multiple aspects or traits.
Unidimensional scales measure multiple aspects or traits.
What type of scaling involves judgements of a stimulus in comparison with every other stimulus on the scale?
What type of scaling involves judgements of a stimulus in comparison with every other stimulus on the scale?
In a Guttman scale, items are arranged in such a way that agreement with an item implies agreement with items of __________ rank-order.
In a Guttman scale, items are arranged in such a way that agreement with an item implies agreement with items of __________ rank-order.
What should test developers consider when writing items to ensure comprehensive coverage?
What should test developers consider when writing items to ensure comprehensive coverage?
An item pool is a small set of questions intended for the final version of a test.
An item pool is a small set of questions intended for the final version of a test.
What is the term collectively referring to the format, variables such as the form, plan, structure, arrangement, and layout of individual test items?
What is the term collectively referring to the format, variables such as the form, plan, structure, arrangement, and layout of individual test items?
A multiple-choice item has three elements: a stem, a correct option, and several incorrect options known as __________.
A multiple-choice item has three elements: a stem, a correct option, and several incorrect options known as __________.
What is a multiple-choice item with only two possible answers called?
What is a multiple-choice item with only two possible answers called?
Item branching involves writing items for computer administration.
Item branching involves writing items for computer administration.
What is a relatively comprehensive collection of test questions called in the context of computerized testing?
What is a relatively comprehensive collection of test questions called in the context of computerized testing?
The interactive testing process in which items presented to the test-taker are based on the earlier performance is called __________.
The interactive testing process in which items presented to the test-taker are based on the earlier performance is called __________.
In adaptive testing, when should more difficult items be presented to the person?
In adaptive testing, when should more difficult items be presented to the person?
In adaptive testing, it's recommended to continue administering items if a person gets three consecutive items wrong.
In adaptive testing, it's recommended to continue administering items if a person gets three consecutive items wrong.
What effect occurs when a test is too difficult and many students score at the very bottom?
What effect occurs when a test is too difficult and many students score at the very bottom?
The __________ effect occurs when a test is too easy, and most test-takers achieve very high scores.
The __________ effect occurs when a test is too easy, and most test-takers achieve very high scores.
Under what circumstances should further, more difficult items be presented to test-takers?
Under what circumstances should further, more difficult items be presented to test-takers?
According to the cumulative model of scoring items, the lower the score on the test, the higher the test taker is on the ability.
According to the cumulative model of scoring items, the lower the score on the test, the higher the test taker is on the ability.
What type of scoring places the test takers in a particular class or category whose responses have similar patterns?
What type of scoring places the test takers in a particular class or category whose responses have similar patterns?
In __________ scoring, the test takers' scores are on one scale within a test compared to another scale within that same test.
In __________ scoring, the test takers' scores are on one scale within a test compared to another scale within that same test.
In Test Tryout, who does the test developer administer the test to?
In Test Tryout, who does the test developer administer the test to?
Test Tryout requires the enviornment and atmosphere to be unsimilar to the final test.
Test Tryout requires the enviornment and atmosphere to be unsimilar to the final test.
What are the different indexes used in item analysis?
What are the different indexes used in item analysis?
A high value in an item-difficulty index indicates that a greater proportion of examinees responded to the item __________.
A high value in an item-difficulty index indicates that a greater proportion of examinees responded to the item __________.
Between what range does the item-difficulty index range between?
Between what range does the item-difficulty index range between?
Inserting less difficult questions at the start of an achievement test can cause test takers to become more anxious.
Inserting less difficult questions at the start of an achievement test can cause test takers to become more anxious.
What is the Item-Discrimination Index a measure of?
What is the Item-Discrimination Index a measure of?
The __________ reliability index shows the internal consistency of a test.
The __________ reliability index shows the internal consistency of a test.
Which index shows where a test measures what its suppose to?
Which index shows where a test measures what its suppose to?
Omiting instructions when guessing should be done to ensure the test doesnt take too long.
Omiting instructions when guessing should be done to ensure the test doesnt take too long.
According to computer testing, what should test takers do if they are not sure about the item at hand?
According to computer testing, what should test takers do if they are not sure about the item at hand?
A __________ test item is an item that favors a specific group in relation to another when group differences are controlled.
A __________ test item is an item that favors a specific group in relation to another when group differences are controlled.
Which process improves a test and gets feedback from test takers?
Which process improves a test and gets feedback from test takers?
In think aloud test administration, people should not say aloud what they are doing
In think aloud test administration, people should not say aloud what they are doing
What review ensures that the test is fair and there is no offensive or unneeded language?
What review ensures that the test is fair and there is no offensive or unneeded language?
During test revision, some items have many __________ making them prime candidates for deletion or revision.
During test revision, some items have many __________ making them prime candidates for deletion or revision.
Match the following stages of test development with their descriptions:
Match the following stages of test development with their descriptions:
Flashcards
Test Conceptualization
Test Conceptualization
The initial phase of test development, involving defining the test's purpose and target audience.
Test Construction
Test Construction
The stage where the test is actually constructed, including writing items and setting rules for scoring.
Test Tryout
Test Tryout
Administering the test to a representative sample to evaluate its effectiveness.
Scaling
Scaling
Signup and view all the flashcards
Age-Based Scale
Age-Based Scale
Signup and view all the flashcards
Grade-Based Scale
Grade-Based Scale
Signup and view all the flashcards
Unidimensional Scale
Unidimensional Scale
Signup and view all the flashcards
Multidimensional Scale
Multidimensional Scale
Signup and view all the flashcards
Paired Comparisons
Paired Comparisons
Signup and view all the flashcards
Comparative Scaling
Comparative Scaling
Signup and view all the flashcards
Categorical Scaling
Categorical Scaling
Signup and view all the flashcards
Guttman Scale
Guttman Scale
Signup and view all the flashcards
Item Pool
Item Pool
Signup and view all the flashcards
Item Format
Item Format
Signup and view all the flashcards
Selected-Response Format
Selected-Response Format
Signup and view all the flashcards
Constructed-Response Format
Constructed-Response Format
Signup and view all the flashcards
Computerized Adaptive Testing (CAT)
Computerized Adaptive Testing (CAT)
Signup and view all the flashcards
Floor Effect
Floor Effect
Signup and view all the flashcards
Ceiling Effect
Ceiling Effect
Signup and view all the flashcards
Cumulative Scoring
Cumulative Scoring
Signup and view all the flashcards
Category Scoring
Category Scoring
Signup and view all the flashcards
Ipsative Scoring
Ipsative Scoring
Signup and view all the flashcards
Item Analysis
Item Analysis
Signup and view all the flashcards
Item-Difficulty Index
Item-Difficulty Index
Signup and view all the flashcards
Item-Discrimination Index
Item-Discrimination Index
Signup and view all the flashcards
Item-Reliability Index
Item-Reliability Index
Signup and view all the flashcards
Item-Validity Index
Item-Validity Index
Signup and view all the flashcards
"Think Aloud" Test Administration
"Think Aloud" Test Administration
Signup and view all the flashcards
Sensitivity Review
Sensitivity Review
Signup and view all the flashcards
Test Revision
Test Revision
Signup and view all the flashcards
Cross-Validation
Cross-Validation
Signup and view all the flashcards
Co-Validation
Co-Validation
Signup and view all the flashcards
Study Notes
- Test development occurs in five stages including test conceptualization, test construction, test tryout, item analysis, and test revision.
Test Conceptualization
- Includes indulging in self-talk, reviewing of available literature on existing tests, and looking at emerging social phenomenon or pattern of behavior.
- Emerging social phenomenon or pattern of behavior might serve as the stimulus for the development of a new test.
Preliminary Questions
- What the test is designed to measure, what the objective of the test is and if there is a need for the test.
- Who will use the test, who will take the test, and what content the test will cover.
- How the test will be administered and what is the ideal format.
- What special training is required for test users, what types of responses are required of test takers, and who benefits from an administration of this test.
- Is there any potential for harm as a result of administering this test and how meaning will be attributed to it.
Pilot Work
- Defined as the preliminary research surrounding the creation of a test.
- The test developer runs a trial.
- Test items can be studied to evaluate if they should be included in the final instrument.
- Once pilot work has been completed, the process of test construction begins.
Test Construction: Scaling
- Scaling is the process of setting rules for assigning numbers in measurement.
- L. L. Thurstone is credited for being at the forefront of efforts to develop methodologically sound scaling methods.
Types of Scales
- If age of test taker is important for performance it is an age-based scale, if grade is important, it is a grade-based scale.
- Scales can be unidimensional as opposed to multidimensional.
- Unidimensional scales measure only one aspect trait, while multidimensional scales measure several different aspects.
- Scales can be comparative as opposed to categorical, using comparisons like average, low, or extrovert/introvert.
Scaling Methods
- Rating scales are groupings of words, statements or symbols to make judgments of the strength of a particular trait, attitude, or emotion.
- The final test score on a summative scale is obtained by summing the ratings across all the items.
- The Likert scale is a type of summative scaling where each item presents the test taker with five or seven alternative responses, usually on an agree-disagree continuum.
Paired Comparisons
- Present test takers with pairs of stimuli like photos, objects or statements to compare.
- Test takers must select one of the stimuli according to a rule, such as which statement they agree more with.
Comparative Scaling
- Entails judgments of a stimulus in comparison with every other stimulus on the scale.
- Tasks might include sorting cards from most justifiable to least, or ranking items from 1 to 30.
Categorical Scaling
- Places stimuli into one of two or more alternative categories, that differ quantitatively based on a continuum.
- Test takers could be asked to sort cards into categories like never, sometimes, or always justified.
Guttman Scale
- Items arranged so agreeing with a particular item implies agreement with lower-ranked items.
Item Writing
- Considers the range of content items should cover, type of formats to use, and total number of items to write for each content area.
- An item pool is the reservoir or well from which items will/will not be drawn.
- For a 30-item test, the item pool should have 60 items, with approximately half eliminated.
Item Format
- Item format refers to variables like the item's form, plan, structure, arrangement, and layout.
- Selected-response format items require selecting a response from a set of alternative responses.
- Constructed-response format items require supplying or creating the correct answer, not just selecting it.
Types of Selected-Response Item Formats
- A multiple-choice format includes a stem, a correct alternative or option, and several incorrect alternatives called distractors or foils.
- A matching format presents two columns, premises on the left and responses on the right.
- A binary choice item is a multiple-choice item with only two possible responses.
- A completion item requires providing a word or phrase to complete a sentence, which might include fill in the blanks, or short answer or essay type questions..
Writing Items for Computer Administration
- Two advantages of digital media are storing items in an item bank and individualizing testing through item branching.
- An item bank is a large and easily accessible collection of test questions.
- Computerized adaptive testing (CAT) is interactive wherein items presented are based in part on the test-taker's performance on previous items.
- More difficult items should not be given until the person gets two consecutive items right.
- The test should stop if the person gets five consecutive numbers wrong.
Floor Effect
- This occurs when test takers score very low such as a group of students failing all items.
- Occurs if you administer a test containing items appropriate to the 9th standard average student to students who are younger.
- In computer adaptive testing, a pool of easier items can be given to distinguish between them.
Ceiling Effect
- Occurs when a set of students passes all items, and is addressed by providing more difficult questions if available in the item pool.
Measurement of Personality
- Computerized adaptive testing can be used, starting with items measuring the test-takers responses to issues like depression, then providing in-depth tests based on those responses.
Scoring Items: Cumulative Model
- The rule in a cumulatively scored test is that the higher the score on the test, the higher the test taker is on the ability, trait, or other characteristic the test measures such as high, low or moderate intelligence.
Category Scoring
- Class or category scoring earns credit toward placement in a category with similar responders.
- A diagnostic test that uses category scoring may require a certain number of symptoms to qualify for a specific diagnosis such as depressed or non depressed.
Ipsative Scoring
- Compares a test taker's score on a scale within a test to another scale within that same test.
Test Tryout
- The test developer tries administering the test to people for whom the test is designed.
- For selecting higher corporates, administer the test to them.
- At minimum, administer to 5-10 people to reduce the change factor.
- The time, instructions and atmosphere should be similar to the final test.
Item Analysis
- Includes item difficulty index, item reliability index, item validity index, item-discrimination index, and ICC.
The Item-Difficulty Index
- Obtained by calculating the proportion of test-takers who answered the item correctly.
- Ranges between 0.0 and 1.0, with a higher value indicating a greater proportion of examinees responded correctly, indicating an easier item.
- An item having a difficulty index of .75 is easy, while .30 is difficult.
- A "giveaway" item may be included near the beginning of an achievement test to spur motivation, a positive test-taking attitude, and to lessen test-related anxiety.
The Item-Discrimination Index
- Measures how well an item distinguishes between those who are knowledgeable and those who are not.
- Examines the relationship between performance on the given item and the overall test score, so that examiners who responded correctly also did well on the test.
Item Reliability Index
- Provides an indication of the internal consistency of a test.
- The ideal item measures aspects which the test is designed to measure, and items that do not are eliminated.
The Item-Validity Index
- A statistic that indicates to what degree a test is measuring what it purports to measure.
- The higher the item-validity index, the greater the test's criterion-related validity.
Item Characteristic Curve
- Provides a graphic representation of item discrimination and difficulty.
Other Considerations in Item Analysis: Guessing
- Ability to rule out one or more of the distractor alternatives represents informed guessing.
- Test takers should be instructed to answer only when certain, or to complete all items and guess when in doubt.
- There is no single solution to date for the problem of guessing and its effect on test scores.
- Instructions to test administrators may state that guessing is not allowed.
- Provide specific instructions for scoring and interpreting omitted items.
Item Fairness
- A biased item favors one group over another when group ability differences are are held constant.
- These items are identified by ICC Curves.
- Items are not fair if total test scores between groups are the same, but the ICC Curve shapes are different.
Qualitative Item Analysis
- Uses a variety of non-statistical verbal procedures to gauge the viewpoint of test takers so improvements can be done based on feedback.
- Interviews and group discussions are used to gather information.
Qualitative Item Analysis Questions
- Test length, gauging feelings about length of time to complete and number of items.
- Test language, and determining if any aspects of the test were difficult to understand.
- Giving an opportunity to give feedback might help test takers to lash out at instructor/ institution if did not have good experience.
Think Aloud Test Administration
- "Think aloud” test administration is a qualitative research tool designed to shed light on the test-taker's thought processes during the test.
- Individuals verbalize their thoughts as they occur during one-to-one test administration.
- Yields valuable insights for achievement or personality tests during item misinterpretation.
Expert Panels and Sensitivity Reviews
- A sensitivity review is a study of test items, typically conducted during the test development process.
- Evaluates items for fairness to all prospective test-takers and for offensive language, stereotypes, or situations.
Test Revision
- It is a stage in New Test Development because some items have many weaknesses, therefore becoming candidates for deletion or revision and item pool items may be picked.
- Test revision can occur if stimulus materials look dated and test-takers cannot relate to them.
- Verbal content of items and instructions can contain dated vocabulary to be changed.
- Cultural shifts can cause new word meaning to offend.
- Revision could improve reliability and validity, while new theory can change test items.
Cross-Validation and Co-Validation
- Cross-validation refers to the revalidation of a test on a sample of test-takers other than those on whom test performance was originally found to be a valid predictor of some criterion.
- Co-validation is the test validation process conducted on two or more tests measuring the same aspect using the same sample of test-takers.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.