Test Development: Conceptualization & Preliminary Qs

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson
Download our mobile app to listen on the go
Get App

Questions and Answers

Which of the following is the first stage in the process of developing a test?

  • Test Conceptualization (correct)
  • Test Revision
  • Test Construction
  • Item Analysis

Test tryout occurs before test construction in the test development process.

False (B)

What is the term for the preliminary research surrounding the creation of a test?

Pilot Work

The process of setting rules for assigning numbers in measurement is known as ______.

<p>Scaling</p> Signup and view all the answers

Match each type of scale with its description:

<p>Age-based scale = Performance is related to the age of the test taker Grade-based scale = Performance is related to the grade of the test taker. Unidimensional scale = Measures only one aspect or trait. Multidimensional scale = Measures several different aspects.</p> Signup and view all the answers

Which type of scaling involves summing the ratings across all items to obtain a final score?

<p>Summative Scaling (A)</p> Signup and view all the answers

In a method of paired comparisons, test takers are presented with only single stimuli to evaluate.

<p>False (B)</p> Signup and view all the answers

What does comparative scaling entail?

<p>Comparative scaling is the judgment of a stimulus in comparison with every other stimulus on the scale.</p> Signup and view all the answers

Sorting cards into piles representing "never justified," "sometimes justified," and "always justified" is an example of ______ scaling.

<p>Categorical</p> Signup and view all the answers

Which of the following is a key feature of a Guttman scale?

<p>Agreement with an item implies agreement with lower-order items. (C)</p> Signup and view all the answers

In test construction, the number of items written initially should always equal the number of items intended for the final version of the test.

<p>False (B)</p> Signup and view all the answers

What are the three elements of a multiple-choice item?

<p>A stem, a correct alternative/option, and distractors</p> Signup and view all the answers

An item format that requires test-takers to create or supply the correct answer is known as a ______-response format.

<p>constructed</p> Signup and view all the answers

What is the term for a multiple-choice item that contains only two possible responses?

<p>Binary Choice Item (A)</p> Signup and view all the answers

Item branching is a technique used in paper-based tests to individualize the testing experience.

<p>False (B)</p> Signup and view all the answers

What is the term for a relatively large and easily accessible collection of test questions for computer administration?

<p>Item bank</p> Signup and view all the answers

The term ______ refers to an interactive, computer-administered test-taking process wherein items presented to the test-taker are based in part on the test-taker’s performance on previous items.

<p>Computerized adaptive testing</p> Signup and view all the answers

In computerized adaptive testing (CAT), what is a general rule regarding difficulty?

<p>Give more difficult items after two consecutive correct responses. (C)</p> Signup and view all the answers

The floor effect occurs when a test is too difficult, leading most test takers to score very high.

<p>False (B)</p> Signup and view all the answers

In testing, what general situation does the term "ceiling effect" describe?

<p>The ceiling effect describes the situation where most test takers score high/pass all items.</p> Signup and view all the answers

In a cumulatively scored test, the rule typically is that the higher the score on the test, the ______ the testtaker is on the ability, trait, or other characteristic that the test purports to measure.

<p>higher</p> Signup and view all the answers

What is a typical objective in ipsative scoring?

<p>Comparing a testtaker's score on one scale within a test to another scale within that same test (A)</p> Signup and view all the answers

During test tryout, it is important to administer the test in conditions that are dissimilar to the final test administration.

<p>False (B)</p> Signup and view all the answers

Name four indices that are gathered during item analysis.

<p>Item difficulty index, Item reliability index, Item validity index, and Item discrimination index</p> Signup and view all the answers

The item-difficulty index is obtained by calculating the ______ of the total number of test-takers who answered the item correctly.

<p>proportion</p> Signup and view all the answers

An item with a difficulty index of 0.75 means that the item is considered:

<p>Easy (A)</p> Signup and view all the answers

The item-discrimination index measures the reliability of each question.

<p>False (B)</p> Signup and view all the answers

What is the purpose of the item-discrimination index?

<p>To measure how well an item distinguishes between examinees who are knowledgeable and those who are not.</p> Signup and view all the answers

The item-validity index provides an indication of the degree to which a test is measuring what it ______ to measure.

<p>purports</p> Signup and view all the answers

What is the possible impact of guessing on examinees scores, test scores in general?

<p>The risk-taking behaviour impacts responses and test scores. (D)</p> Signup and view all the answers

A biased test item is fair if it favors one particular group when differences in group ability are controlled.

<p>False (B)</p> Signup and view all the answers

In qualitative item analysis, what general type of procedure is used?

<p>Verbal procedures</p> Signup and view all the answers

A sensitivity review during test development examines items for fairness and the presence of offensive language, ______, or situations.

<p>stereotypes</p> Signup and view all the answers

Which of the situations is a reason to revise the test?

<p>The stimulus materials look dated and current test-takers cannot relate to them. (C)</p> Signup and view all the answers

Cross-validation involves revalidating a test on the same sample as the original validation.

<p>False (B)</p> Signup and view all the answers

What is co-validation?

<p>A test validation process conducted on two or more tests measuring the same aspect of the same sample of test-takers.</p> Signup and view all the answers

Match each term related to test development with its correct definition:

<p>Item Pool = The reservoir from which items will be drawn for the final version of the test. Selected-Response Format = A format where the test-taker selects a response from a set of alternatives. Item Branching = A technique to individualize tests through digital media. Item Fairness = An item that is unbiased and measures as it should in relation to group ability and other factors.</p> Signup and view all the answers

During the test tryout phase, it is important that the test administration atmosphere, instructions, and ______ are similar to the final test.

<p>Time</p> Signup and view all the answers

Which type of scaling would involve asking test-takers to rank items from most to least justifiable?

<p>Comparative Scaling (D)</p> Signup and view all the answers

Qualitative item analysis only relies on statistical data to identify areas for improvement.

<p>False (B)</p> Signup and view all the answers

What is the function of a sensitivity review conducted by expert panels?

<p>To minimize bias and ensure test is fair overall.</p> Signup and view all the answers

Match the stage of test development process with their activities:

<p>Test Conceptualization = Reviewing the available literature on existing tests Test construction = Piloting the test to a sample of test-takers Item analysis = Calculating the item difficulty index Test revision = Revising test items based on item analysis results</p> Signup and view all the answers

Flashcards

Test Conceptualization

The initial phase involving brainstorming, literature review, and identifying emerging social phenomena that could be stimuli for a new test.

Pilot Work

Preliminary research to refine test items and procedures before full-scale test construction.

Scaling

The process of setting rules for assigning numbers in measurement.

Age Based Scale

A scale where test-takers are evaluated relative to their age.

Signup and view all the flashcards

Grade Based Scale

A scale where test-takers are evaluated relative to their grade level.

Signup and view all the flashcards

Unidimensional Scale

Assesses only one characteristic or trait.

Signup and view all the flashcards

Multidimensional Scale

Assesses several different characteristics or aspects.

Signup and view all the flashcards

Comparative Scaling

Judges stimuli in relation to every other stimulus on the scale.

Signup and view all the flashcards

Categorical Scaling

Stimuli are placed into predefined categories based on quantitative differences.

Signup and view all the flashcards

Guttman Scale

A scale where items are ordered so agreement implies agreement with lower-order items.

Signup and view all the flashcards

Item Bank

A collection of test questions available for use

Signup and view all the flashcards

Item Format

The format of individual test items

Signup and view all the flashcards

Constructed-Response Format

Requires test-takers to supply or create the correct answer.

Signup and view all the flashcards

Selected-Response Format

Select a response from a set of possible responses.

Signup and view all the flashcards

Binary Choice Item

A selected-response item with only two possible answers.

Signup and view all the flashcards

Completion Item

A type of question that requires one or more words to complete a sentence.

Signup and view all the flashcards

Item Branching

Adjusting test difficulty based on test-taker performance.

Signup and view all the flashcards

Computerized Adaptive Testing (CAT)

An interactive, computer-administered test where the items presented are based on the test-taker's previous responses.

Signup and view all the flashcards

Ceiling Effect

Effect of not being able to distinguish among test takers that find the test too easy.

Signup and view all the flashcards

Floor Effect

Occurs with a test that is too hard; most scores are very low.

Signup and view all the flashcards

Ipsative Scoring

Comparing a testtaker's score on one scale within a test relative to their score on another scale within that test.

Signup and view all the flashcards

Test Tryout

Trying out the test on a sample of test-takers.

Signup and view all the flashcards

Item-Difficulty Index

The proportion of test-takers who answered an item correctly.

Signup and view all the flashcards

Item-Discrimination Index

A measure of how well an item differentiates between knowledgeable and less knowledgeable examinees.

Signup and view all the flashcards

Item Reliability Index

Indicates the internal consistency of a test.

Signup and view all the flashcards

Item-Validity Index

Indicates the degree to which a test measures what it is supposed to measure.

Signup and view all the flashcards

Item Characteristics Curve

A graphical representation of item discrimination and difficulty.

Signup and view all the flashcards

Qualitative Item Analysis

Qualitative procedures to gather feedback from test-takers to improve a test.

Signup and view all the flashcards

Sensitivity Review

A study of test items for fairness and potentially offensive content.

Signup and view all the flashcards

Test Revision

Revising items based on data collected.

Signup and view all the flashcards

Cross-Validation

Revalidation of a test on a different sample of test-takers.

Signup and view all the flashcards

Co-Validation

Test validation process conducted on two or more tests using the same sample of test-takers.

Signup and view all the flashcards

Biased Test Item

An item that favors one group of examinees over another when group ability is controlled.

Signup and view all the flashcards

Study Notes

  • Development of a test happens in five stages: test conceptualization, test construction, test tryout, item analysis, and test revision.

Test Conceptualization

  • Self talk helps you conceptualize of a test
  • Review available literature on existing tests.
  • Emerging social behavior patterns can inspire new test development like FOMO or JOMO.

Preliminary questions

  • What is the test designed to measure?
  • What is the objective of the test?
  • Is there a need for this test?
  • Who will use this test?
  • Who will take this test?
  • What content will the test cover?
  • How will the test be administered?
  • What is the ideal format of the test?
  • What special training will be required of administrators or interpreters?
  • What types of responses will be required of test takers?
  • Who benefits from the administration of this test?
  • Is there potential for harm due to test administration?
  • How will test results be interpreted meaningfully?

Pilot Work

  • Preliminary research creation of a test.
  • Test items are evaluated to determine if they should be in the final version.
  • The test developer runs a trial
  • Test construction begins after pilot work.

Test Construction

  • Scaling methods are used to assign numbers in measurement.
  • L. L. Thurstone developed scaling methods.

Types of Scales

  • Age-based scales consider the age of the test taker for performance
  • Grade-based scales determine importance of grade.
  • Unidimensional scales measure one trait.
  • Multidimensional scales measure different aspects.
  • Comparative scales are opposed to categorical ones.

Scaling Methods

  • Rating scales include words, statements or symbols showing strength of a trait, attitude or emotion.
  • Summative scales involves summing ratings making ratings across all test items for a final score.
  • Likert scales are a type of summative scale.
  • Likert scales present test takers with five or seven alternative responses, typically on an agree-disagree continuum.

Method of Paired Comparisons

  • Test takers compare pairs of stimuli and select based on a rule, like agreeing more with a statement or finding a stimulus more appealing.
  • Example: Deciding which behavior is more justified, such as cheating on taxes versus accepting a bribe.

Comparative Scaling

  • Comparative scaling means judging a stimulus when compared to ever stimulus on the scale.
  • Test takers rank cards from most to least justifiable.
  • Comparative scaling can involves ranking 30 items from 1 to 30 on paper.

Categorical Scaling

  • Stimuli are placed into two or more alternative categories differing quantitatively.
  • Cards can be sorted into piles depending on whether behavior is never, sometimes, or always justified.

Guttman Scale

  • Items are organized so agreeing with a particular item implies agreement with lower-ranked items.
  • Agreement with any item assumes agreement with lower-order items.

Writing Items

  • Consider what range of content items cover.
  • Decide the different item formats used.
  • Determine how many items will be written in total for each content area.

Item Pool

  • Item pool is the reservoir of items for a test's final version.
  • A 30-item test should have an item pool of 60 items.
  • Test developers write items based on personal experience or subject matter knowledge.

Item Format

  • Variables like the form, plan, structure, and layout of individual test items are an item format.
  • Selected-response format items require test takers to select from alternative responses.
  • Constructed-response format items mean test-takers must supply or create the answer.

Types of Selected-Response Item Formats

  • An item in a multiple-choice format includes a stem (problem statement), a correct option, and distractors or foils.
  • In matching items, test takers matches premises and responses in two columns.
  • Binary choice items are a multiple-choice item with two options.
  • Completion items require examinees to complete a sentence, like fill-in-the-blanks or short answer essay questions.

Writing Items for Computer Administration

  • Item banks store items, allowing individualized testing via item branching.
  • Instructors create item banks of questions for examinations.

Writing Items and Computerized Adaptive Testing

  • Computerized Adaptive Testing (CAT) is interactive, using test takers' performance on previous items to adaptively give subsequent items.
  • More difficult items are given when the person gets two consecutive items correct.
  • Stop a test if the person gets five consecutive items wrong.

Floor Effect

  • Occurs when a test is too difficult, causing students to fail all items.
  • This makes it difficult to distinguish among test takers.
  • Computer based tests use easier items to distinguish between performance.

Ceiling Effect

  • When a test is too easy and students pass all items.

  • It prevents distinguishing among them.

  • Adaptive tests use more difficult items.

  • CAT can measure personality and give items that in depth measure depression.

Scoring Items

  • The cumulative model suggests that higher test scores indicate higher ability or trait levels.
  • Category scoring means test taker responses place them in a class/category, like depressed or non-depressed.
  • This approach is used in diagnostic systems.
  • Ipsative scoring compares a test taker scores on different scales within the test.

Test Tryout

  • The test developer tries the test out.
  • Administer the test to people for whom it is designed.
  • A minimum of 5, the more merrier as this reduces chance factor.
  • Time, instructions, and atmosphere should mirror the final test.

Item Analysis

  • Item analysis indices include the difficulty index, reliability index, validity index, discrimination index, and ICC.

Item-Difficulty Index

  • It is calculated by the proportion of test takers answering correctly.
  • For example, if out of 100 test takers, only 50 got item 2 right, difficulty index is 0.5.
  • Item 2 is more difficult than item 3 in the example provided.

Range of Item-Difficulty Index

  • It ranges from 0.0 to 1.0, with higher values indicating easier items.
  • A difficulty index is considered easy if it is .75.
  • A difficulty index is considered difficult if it is .30.
  • A giveaway item beginning an achievement test can spur motivation and positive attitude and lowers text anxiety.

Item-Discrimination Index

  • How well an item distinguishes between knowledgeable and unknowledgeable examinees.
  • It assesses the relationship between examinee performance on an item and their overall test score.
  • Highly discriminating items means examinees answering them correctly tend to perform well overall.

Item Reliability Index

  • It provides an indication of the internal consistency of a test.
  • It sees to it all items are measuring aspects that the test is designed to measure.
  • Items measuring something different are eliminated.

Item-Validity Index

  • It provides the degree to which a test measures what it is expected to measure.
  • A higher item-validity index means greater criterion-related validity.

Item Characteristics Curve (ICC)

  • It graphically represents item discrimination and difficulty.

Other Considerations in Item Analysis

  • When guessing, test takers can rule out distractors alternatives.

  • Instructions should be explicit about how to handle omitting a response.

  • Test takers should answer only when certain not guess.

  • Test takers can complete all items and guess out of doubt.

  • Risk-taking behavior impacts test scores.

  • There is no set solution to the problem of guessing.

  • Proper instructions for test administrators exists.

  • Guessing is not be allowed for test takers.

  • There are specifics for scoring and interpreting omitted items.

Item Fairness

  • A biased test item favors one group of examinees when group ability differences are controlled.
  • These items are identified by ICC Curves.
  • A difference in curve shape between genders is an unfair item in the circumstance no difference in total test scores exist.

Qualitative Item Analysis

  • Used to gather test takers' viewpoints.
  • Interviews and group discussions get this feedback.

Types of Questions

  • Test length.

  • Test language.

  • Giving the test may give test takes the means to lash out.

  • Think aloud test administration is a research tool that explains test taker's thoughts in administration.

  • Verbalizing thoughts may occur.

Qualitative Item Analysis: Expert Panels

  • A sensitivity review test items for fairness and offensive language during test development.

Test Revision

  • Some items are removed for weaknesses.
  • Unreliable of biased items are removed.
  • Items are selected from an item pool.
  • The test will be revised.

Test Revision in the Life Cycle of an Existing Test

  • Stimulus materials are outdated and cannot be related to current test takers.
  • Verbal content of items and instructions contain outdated vocabulary to change.
  • Popular culture change will have to be considered to potentially offensive material.
  • Reliability and validity can be changed or improved with revision.
  • Theories are improved which will change test items.

Cross-Validation and Co-Validation

  • Cross-validation revalidates a test on a new sample of test takers.
  • Co-validation means validating two or more tests measuring the same thing using the same test takers.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

Use Quizgecko on...
Browser
Browser