Test Development Overview

Study Notes

Test development is an umbrella term encompassing all aspects of creating a test.
Test development involves conceptualization, construction, tryout, item analysis, and revision.

These tests compare test takers' performance to a norm group (similar age/grade).
Scores indicate how someone performed relative to others.

This involves developing and evaluating a test with a specified psychological function.
It combines writing test items, formatting, setting rules, and overall test design.
SCALING involves assigning numbers to reflect attributes/traits in measurements.
Scaling methods exist for various measurement types. (e.g., rankings of experts, equal-appearing intervals, absolute scaling, Likert scales, Guttman scales, etc)

Item format includes variables such as form, structure, and the arrangement of items.
Test construction utilizes selected-response (multiple-choice, matching, true/false) and constructed-response (essay, short answer) formats.
Item pool is the source from which test questions are drawn.
Item banks are large collections of test questions.
Item branching dynamically adjusts the test based on test taker responses.
Computer adaptive tests (CATs) tailor the test content and order based on past answers.

Cumulative scoring assesses the number of correct answers to reflect a construct.
Class/category scoring classifies individuals based on responses.
Ipsative scoring shows performance on different test sections.

The test is administered to a representative group similar to the intended target audience.
A minimum of 20 participants per item is preferred.
A good test helps discriminate between test takers well.

Item Reliability: Measures the internal consistency of an item. A strong correlation between the item and the total test score is desired.
Item analysis includes reliability and validity aspects
Item Validity: Determines if the test measures what it's intended to measure; A standardized factor loading is commonly used; an indicator loading value of at least .50 is often required.
Item Analysis (Factor Analysis): Constructs aren't directly observable; test taker answers help measure constructs.
Item Difficulty: Measures the proportion of respondents answering an item correctly; indices typically range from 0.30-0.70

Measures how effectively an item differentiates high from low scorers.
A high item discrimination index means upper scorers answer the item correctly, and low scorers don't.
A value above .30 is usually preferred.

A stage in test development where an existing test is adapted or a new edition is created or modified
This process can apply to newly developed and existing tests.
Cross-validation/rotation Estimation/Out of Sample Testing validates tests on new groups.
Co-validation validates tests on the same sample group across multiple tests.