Psychometrics Unit 2: Test Construction

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is the primary purpose of the literature review in the test development process?

To define the test through operational definitions (correct)
To gather feedback from examinees on test performance
To conduct statistical analysis of the test items
To publish the test and its manuals

In scaling methods, which of the following is NOT a recognized scale?

Ratio scale
Nominal scale
Ordinal scale
Composite scale (correct)

What does item analysis primarily determine during the test development process?

The final set of items to be used in the test (correct)
The clarity and format of the items
Which items are easiest for examinees
The theoretical framework of the test

What is the purpose of operational definitions in test development?

To provide a precise meaning for the concept being measured (C) Signup and view all the answers

Which formula is used to compute the optimal level of item difficulty for a four-option multiple-choice item?

(1.0 g)/2 (A) Signup and view all the answers

What is a significant factor to consider in item writing for a test?

Clarity and simplicity (D) Signup and view all the answers

What does the discrimination index in item analysis indicate?

The proportion of correct responses by high-scoring examinees (D) Signup and view all the answers

Which part of the test development process involves revising items based on feedback?

Revising the test (A) Signup and view all the answers

What determines content validity in a test?

The representativeness of test items in relation to the universe of behavior (D) Signup and view all the answers

Which type of validity is assessed when a test score is compared to an outcome measured in the future?

Predictive validity (A) Signup and view all the answers

What is face validity primarily concerned with?

The social acceptability of a test's appearance to users (B) Signup and view all the answers

What is the ideal range of difficulty levels for items selected for a test aimed at an extreme group?

Below 0.3 or above 0.7 (D) Signup and view all the answers

Which method is used to test the internal consistency of individual items in a test?

Point biserial correlation (C) Signup and view all the answers

In which category of validity is the degree to which a test relates to a theoretical construct evaluated?

Construct validity (C) Signup and view all the answers

What does a higher point biserial correlation signify for an item in a test?

The item is effectively homogeneous (D) Signup and view all the answers

Which formula represents content validity?

D / (A + B + C + D) (D) Signup and view all the answers

What is the main purpose of computing the item-reliability index for each item?

To eliminate outlier items with low reliability (A) Signup and view all the answers

Which aspect of reliability assesses the consistency of scores when a test is re-administered?

Test-retest reliability (B) Signup and view all the answers

What is the primary focus of criterion-related validity?

The estimation of performance on an external criterion (B) Signup and view all the answers

What does the item characteristic curve (ICC) illustrate?

The probability of correct responses versus examinee ability (D) Signup and view all the answers

Which scenario indicates the usefulness of an item based on its predictive validity?

High point biserial correlation with the criterion variable (D) Signup and view all the answers

Which of the following is NOT a method to evaluate reliability?

Concurrent method (A) Signup and view all the answers

How can a developer identify ineffective test items?

By examining the item-reliability index (B) Signup and view all the answers

What does a good item characteristic curve (ICC) typically have?

A positive slope (B) Signup and view all the answers

What is a primary drawback of the split-half reliability method?

It lacks precision in estimating reliability. (B) Signup and view all the answers

How is coefficient alpha computed?

By taking the average of all split-half coefficients. (B) Signup and view all the answers

What does interscorer reliability primarily assess?

The correlation of scores between multiple examiners. (C) Signup and view all the answers

What is the primary purpose of establishing norms in testing?

To determine the performance of an examinee relative to others. (B) Signup and view all the answers

Which of the following is an advantage of coefficient alpha?

It reflects the reliability of tests with multiple items. (B) Signup and view all the answers

What is the Kuder-Richardson formula commonly referred to as?

KR-20 (C) Signup and view all the answers

The primary error variance in item sampling arises due to what factor?

Differences in item sampling. (D) Signup and view all the answers

What is the potential disadvantage of using test-retest reliability?

It could introduce memory effects in retested examinees. (C) Signup and view all the answers

What does an item-discrimination index measure in a test item?

The efficiency of an item to differentiate high and low scorers (D) Signup and view all the answers

In the formula for item-discrimination index, what does the variable 'N' represent?

The number of examinees in either the upper or lower range (A) Signup and view all the answers

What is the primary purpose of item analysis in testing?

To identify and improve unproductive items (B) Signup and view all the answers

What is cross-validation in the context of test evaluation?

Using a new sample to see if results match original predictions (D) Signup and view all the answers

What do we mean by 'validity' in test measurement?

The extent to which a test measures what it claims to measure (C) Signup and view all the answers

What does validity shrinkage refer to within cross-validation research?

A decrease in predictive accuracy with a new sample (C) Signup and view all the answers

Why is feedback from examinees important in test development?

It provides insight into potential improvements (C) Signup and view all the answers

What must a newly developed test instrument fulfill?

The purpose for which it is designed (A) Signup and view all the answers

Flashcards are hidden until you start studying

Study Notes

Basic Psychometric Concepts

Test Construction: Involves defining, scaling, item writing, item analysis, revising, and publishing the test.
Item Analysis: Evaluates items for difficulty, reliability, and validity to retain, revise, or discard items.
Reliability: Refers to the consistency of test scores across different occasions or forms. There are various types: internal consistency, test-retest, and inter-scorer reliability.
Validity: Indicates how well a test measures what it claims to measure, categorized into content, criterion-related, and construct validity.
Norms: Standardized scores derived from a reference sample that indicate an individual's performance in relation to a target population.

Process of Test Development

Define the Test: Conduct a literature review for operational definitions, focusing on measurement methods and application contexts.
Scaling and Item Writing: Develop a comprehensive table of contents, ensuring items represent relevant domains with clarity and simplicity.
Item Analysis: Identify effective items based on difficulty, reliability, and discrimination index, using metrics like item difficulty (Pi).
Revising the Test: Refine items based on item analysis results, and gather feedback for further improvement through cross-validation.
Publish Test: Create detailed technical and user manuals outlining test administration and interpretation.

Literature Review and Definition

Operational definitions provide clear meanings for constructs and ensure consistency in measurement and application.
Interviews and focus groups help establish a common understanding of constructs and generate preliminary items.

Item Scaling Methods

Common scaling methods include nominal, ordinal, interval, and ratio scales tailored to the measured traits.

Item Analysis Details

Item Difficulty (Pi): Calculated as the proportion of correct responses, ranging from 0 to 1. Optimal difficulty = (1.0 - g)/2.
Item Reliability: Evaluated through point biserial correlation to assess internal consistency among test items.
Item Validity Index: Assesses concurrent and predictive validity through point-biserial correlation with criterion scores.

Item Characteristics Curve (ICC)

ICC graphical displays the relationship between probability of correct responses and examinee traits, reflecting how well an item discriminates among test-takers.

Revising the Test

Utilize data from least productive items to enhance the test through item revision and cross-validation ensuring the consistency of predictive power in varied samples.

Validity Types

Content Validity: Judged by the ability of items to represent the construct adequately.
Criterion-related Validity: Validates effectiveness in predicting outcomes via concurrent and predictive methods.
Construct Validity: Ensures items align with theoretical constructs, accurately measuring intangible qualities.

Reliability Overview

Reliability measures consistency across different conditions and items, encompassing factors like item sampling errors and costs associated with assessments.

Types of Internal Consistency Reliability

Split-half Reliability: Involves correlating two halves of a test, with potential drawbacks in precision.
Coefficient Alpha (Cronbach's Alpha): Provides a mean estimate of all possible split-half coefficients for internal consistency.
Kuder-Richardson: A specific reliability estimate for dichotomous items (KR-20).
Interscorer Reliability: Correlates scores from different raters to verify scoring consistency.

Norms and Standardization

Norm groups are representative samples useful for establishing score distributions.
Norms are used to derive scores indicating an individual's performance relative to peers, presented in forms like percentile ranks or standard scores.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.