Chapter 4: Test Items and Administration

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

Which of the following is NOT a suggested guideline for writing effective test items?

  • Define clearly what you wish to measure to maintain focus.
  • Be aware of the reading level of those taking the scale and the reading level of the items.
  • Generate a small pool of items to ensure each one is highly refined. (correct)
  • Consider using questions that mix positive and negative wording.

Which of the following is an advantage of using the dichotomous format in psychological testing?

  • It discourages memorization and rote learning.
  • It is simple and often requires absolute judgment. (correct)
  • It allows for nuanced responses, capturing the complexity of human traits.
  • It accurately reflects the true complexity of most situations.

In a multiple-choice test, what are the incorrect answer options commonly referred to as?

  • Keys
  • Stems
  • Anchors
  • Distractors (correct)

What is a primary risk associated with using an 'unfocused stem' in a test question?

<p>It requires test-takers to overly rely on the answer options to understand the question. (D)</p> Signup and view all the answers

What is the MOST important consideration when deciding whether to guess on a multiple-choice question?

<p>Whether there is a penalty for incorrect answers. (A)</p> Signup and view all the answers

Which type of measurement is the Likert format particularly well-suited for?

<p>Measuring attitudes and opinions on various topics. (C)</p> Signup and view all the answers

What is a potential drawback of using a 10-point category scale to rate the abilities of a group?

<p>People's ratings can be affected by a number of factors that threaten the validity of their responses. (C)</p> Signup and view all the answers

What is the primary purpose of Q-sorts?

<p>To allow an individual to describe themselves or another person by sorting adjectives into piles of increasing similarity. (B)</p> Signup and view all the answers

Item analysis is used to evaluate test items. What does item difficulty specifically assess?

<p>The percentage of test-takers who answered the item correctly. (A)</p> Signup and view all the answers

In item analysis, what does a difficulty index (D.V) of 0.80 indicate?

<p>The item is too easy and should be revised or discarded. (D)</p> Signup and view all the answers

What does it mean if an item has a negative discrimination index?

<p>The item is answered correctly more often by low-scoring individuals than high-scoring individuals. (D)</p> Signup and view all the answers

What is the theoretical range for the index of discrimination?

<p>-1.0 to 1.0 (C)</p> Signup and view all the answers

According to the criteria for selection and rejection of test items, which items should be selected?

<p>Items with a positive discrimination index (B)</p> Signup and view all the answers

What does an item characteristic curve graph?

<p>Total test score on the x-axis and the proportion of respondents who got an item right on the y-axis. (C)</p> Signup and view all the answers

What is a key advantage of item response theory over classical test theory?

<p>IRT considers each item and involves the ability of the test-taker. (D)</p> Signup and view all the answers

What is the purpose of linking uncommon measures in testing?

<p>To create appropriate comparisons between tests that do not use the same items. (A)</p> Signup and view all the answers

What initial step is required when creating items for criterion-referenced tests?

<p>Specifying the objectives for the assessment and what the learning program attempts to achieve. (D)</p> Signup and view all the answers

What is a primary limitation of item analysis?

<p>It may emphasize ranking students over identifying weaknesses or gaps in knowledge. (C)</p> Signup and view all the answers

How can the relationship between the examiner and test taker affect test scores?

<p>Examiners' behavior as well as their relationship to the test taker can affect test scores. (D)</p> Signup and view all the answers

What did research by Feldman and Sullivan (1960) find regarding rapport and test scores on the WISC?

<p>Stronger rapport predicted higher scores on the WISC. (A)</p> Signup and view all the answers

What consideration is related to the race of the tester when administering psychological tests?

<p>The tester's race can influence test outcomes, though the impact is not definitively understood. (D)</p> Signup and view all the answers

What is stereotype threat?

<p>The risk of confirming negative stereotypes about one's group, leading to decreased performance. (A)</p> Signup and view all the answers

According to the information, what BEST describes how stereotype threat can negatively impact performance?

<p>By depleting working memory and causing physiological arousal. (C)</p> Signup and view all the answers

In the context of psychological testing, why is the language of the test taker important?

<p>Because language proficiency can impact test performance and understanding questions. (C)</p> Signup and view all the answers

What does telling the test taker the test is nondiagnostic do?

<p>It is one intervention studied. (A)</p> Signup and view all the answers

What do tests require?

<p>Standardized administration. (B)</p> Signup and view all the answers

Data can be impacted by what an experiment expects to find. What is this called?

<p>Rosenthal Effects (C)</p> Signup and view all the answers

If your teacher looks at your answers during a test and smiles and nods, what impact do you think this has?

<p>Does giving a reinforcing response violate standardized administration protocols? (A)</p> Signup and view all the answers

Which of the following is an advantage of Computer-Assisted Test Administration?

<p>High standardization. (A)</p> Signup and view all the answers

What are some problems one can find that influences test performance?

<p>Educational setting versus psychiatric setting. (D)</p> Signup and view all the answers

Motivation and anxiety can influence test performance. What are some components of this?

<p>Worry. (C)</p> Signup and view all the answers

Which of the following is the MOST accurate description of the 'extreme group method' of the discriminability?

<p>Compares those who did well to those who did poorly. (D)</p> Signup and view all the answers

To increase the accuracy of tests and also reduce the volume of responses, what can be applied?

<p>Item response theory. (D)</p> Signup and view all the answers

Which of the following is a problem with using a dichotomous format?

<p>Promotes memorization without understanding. (B)</p> Signup and view all the answers

Which of the following is NOT a noted problem with using a negative stem?

<p>The stem should include the information necessary to answer the question. (D)</p> Signup and view all the answers

When can expectancies occur?

<p>when specific purposes exist. (B)</p> Signup and view all the answers

Which of the following is NOT considered one of the advantages of computer-assisted test administration?

<p>Telephone surveys. (A)</p> Signup and view all the answers

Which of the following is a component of the state of the subject influencing test performance?

<p>Emotionality. (C)</p> Signup and view all the answers

Statistical methods to create appropriate comparisons can be…

<p>Effective. (D)</p> Signup and view all the answers

Flashcards

Dichotomous Format

A format with two answer choices for each question.

Polytomous Format

A format with more than two options, like multiple-choice questions.

Distractors

Incorrect options in a multiple-choice question.

Likert Format

A rating scale that provides a continuum of responses for measuring attitudes.

Signup and view all the flashcards

Category Format

A rating scale with a greater number of choices than a Likert format.

Signup and view all the flashcards

Adjective Checklist

A list of adjectives where an individual selects terms that describe themselves or others.

Signup and view all the flashcards

Q-Sorts

A list of adjectives sorted into piles of increasing similarity to a target person.

Signup and view all the flashcards

Item Analysis

A general term for methods to evaluate test items.

Signup and view all the flashcards

Item Difficulty

The percentage of test-takers who answer an item correctly.

Signup and view all the flashcards

Discriminability

Determines if those who did well on the test also did well on a particular item.

Signup and view all the flashcards

Extreme Group Method

Compares performance of the upper and lower groups on an item.

Signup and view all the flashcards

Point Biserial Method

Correlation between item score and overall test score; measures item discrimination.

Signup and view all the flashcards

Item Characteristic Curve

A graph showing total test score on the x-axis and proportion of correct responses on the y-axis.

Signup and view all the flashcards

Item Response Theory(IRT)

Examines test quality by analyzing the likelihood of getting an item right or wrong.

Signup and view all the flashcards

Criterion-Referenced Test

A test where performance is compared to a well-defined learning criterion.

Signup and view all the flashcards

Examiner Relationship

The examiner's behavior and relationship with the test taker affects test scores.

Signup and view all the flashcards

Stereotype Threat

The risk of confirming negative stereotypes about one's group.

Signup and view all the flashcards

Expectancy Effects

Data influenced by what an experiment expects to find.

Signup and view all the flashcards

Reinforcing Responses

Teacher's reactions influence future test performance.

Signup and view all the flashcards

Computer-Assisted Testing

Administering tests using computers.

Signup and view all the flashcards

Subject variables

The subject's state influencing test performance.

Signup and view all the flashcards

Study Notes

Chapter 4: Writing, Evaluation Test Items, and Test Administration

  • This chapter covers writing and evaluating test items, and administering tests.
  • After completing the chapter, the user should be able to describe the 2 types of item formats, and understand whether or not they should guess with multiple choice questions

Item Writing - Guidelines and Diversity

  • Six guidelines for writing test items (DeVellis, 2016) include defining what to measure, generating a pool of items, and avoiding exceptionally long items.
  • In addition it is important to; be aware of participants reading level, avoid items that convey 2 or more ideas, and consider questions that mix positive and negative wording.
  • It is also important to be aware of diversity issues.

Item Formats

  • The Dichotomous Format offers two choices for each question, such as yes/no or true/false.
  • Examples of the Dichotomous Format can be found on educational tests, as well as personality tests.
  • Advantages of the Dichotomous Format are simplicity, and often requires absolute judgment.
  • Disadvantages of the Dichotomous Format include that many situations are not truly dichotomous, and that it can promote memorization rather than understanding.
  • The Polytomous Format is similar to the Dichotomous Format but has more than two options.
  • The most common example of the Polytomous Format is the multiple-choice test, where there is one right answer and several wrong answers.
  • Incorrect answers on the Polytomous Format are referred to as distractor.

Problems in items include

  • Unfocused stems that do not includes the necessary information.
  • Negative stems that include negative terms like "not" or "except".
  • Window Dressing, stems with information that is irrelevant to the question, or concept
  • Unequal Option Length, the correct answer and distractors vary in length.
  • Negative Options should exclude negatives such as not.
  • Clues are a problem where sometimes clues are provided that indicate an answer, avoid using vague terms like may, can, and might.
  • Heterogeneous options where, the correct option and all distractors are not in the same general category.

Guessing

  • A certain number of responses on a limited item test, can be answered without knowledge
  • There is a formula that can correct for the amount of guessed items
  • The advantage of guessing depends on whether incorrect answered have are penalized, or score no credits.

Item scaling

  • Likert Format is pronounced "Lick-ert," not “Like-ert."
  • The Likert Format offers a continuum of responses for measuring attitudes on topics.
  • Likert scaling is open to factor analysis and groups of items can be grouped together and identified.
  • Category format is similar to that of Likert, but has a greater number of choices
  • Visual analogue scales are also used

Checklists and Q-Sorts

  • Adjective checklists are lists of terms where an individual selects those most characteristic of themself
  • Q-Sorts are lists of adjectives sorted into nine piles of increasing similarity to a target person.
  • Checklists have fallen out of favor and forced choice and Likert formats are more popular.
  • Very important advice is to avoid "all of the above" or "none of the above" options.

Item Analysis and Difficulty

  • Item analysis refers to a general set of methods used to evaluate test items.
  • Item difficulty is an important evaluation component that questions what percent of people chose the correct item.
  • The number of answers is factor that determine a reasonable difficulty level.
  • Difficulty of .30 to .70 is optimal for differentiation between individuals
  • Difficulty Index (DV) = Number of students with the correct xanswer / Total number of student
  • Low difficulty value index means an item has a high difficulty, while high difficulty value index means that an item is easy.

Discriminability

  • Determines if people who have done well on a particular item have also done well on the entire test.
  • Types of discrimination methods include the Extreme Group and Point Biserial methods.
  • Discrimination Positive discrimination index items should be selected, and negative/zero discrimination index should be rejected.
  • The end of the item analysis report, test items are listed according to degrees of difficulty and discrimination.
  • Discrimination index is the a measurements that helps determine the ability of an test item
  • Discrimination index = (number of students with the correct answer in the upper group / total number of students in the upper group) - (number of students with the correct answer in the lower group / total number of students in the lower group)
  • Ideally, the index of discrimination ranges from -1.0 to 1.0.
  • Types of discrimination indexes includes zero or no discrinimation, or positive/negative discrimination

Pictures of Item Characteristics and Item Response Theory

  • Drawing the Item Characteristic Curve graphs items right and total score
  • Often uses categories rather than data points
  • Visuals can indicate weak/strong visuals
  • The Item response theory assesses test quality by getting whether an item right or wrong
  • Advantage of the Item response theory: looks at not as many of the number of answers but the level of difficulty.

Other Information

  • Linking uncommon measures involves determining how to correct tests without the same items.
  • An example is the SAT which utilizes different items however has the same scoring
  • Statistical methods are used to create comparisons.
  • Criterion-referenced tests compare learning for testing by following a few steps
  • Specify the objectives for the assessment and what the learning program attempts to achieve
  • Test analysis can tell us the quality of a test, and not help understand the material.

Test Administration

  • By the end of this part of the chapter, the user should be able to discuss how the relationship between the examiner and the test taker can affect test scores.
  • In addition, explain how an effect might affect a test score and outline the advantages of computerized testing

The Relationship Between Examiner and Test Taker

  • The researcher needs to understand how the behavior of the administrators affects the answers through cues
  • Stronger rapports can indicate stronger scores in some cases.

Tester Race and Stereotypes

  • Research is inconclusive on the extent the way test takers of different subject and the same race are affected.
  • Tests should be done according to procedures that standardize administration
  • The anxiety over those is related to the pressure to disconfirm negative stereotypes
  • There can be stereotype pressure to impact the performance of the testers
  • An experiment has showed that there is indication when a test is non diagnostic

Other Factors

  • Language and cultural differences can also put some people at a disadvantage when testing.
  • Standardized administration is important for test validity.
  • It should also be noted that, standardized test administrations show no standard for demonstrations

Expectancy Effects

  • "Rosenthal Effects," data is impacted by what an administrator expects or believes
  • Expectancy effects can be unintentional/unconscious
  • There are varied opinions on how reactions given to test subjects can effect future answers.

Computer Administrations

  • Computer based tasks are often more beneficial for test administration including High High standardization, tailored sequential administration, precision of timing responses , and releasing more human testers
  • Internet tests have explored recently.
  • It is also important to consider the subject variables when administering test.
  • The impact is dependent on the setting in question of the study and if illness, personal issues or personal background affect performance.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

More Like This

Types of Test Items
12 questions

Types of Test Items

KnowledgeableSun avatar
KnowledgeableSun
Understanding Test Items
5 questions
Assessing Language Proficiency: Test Items
37 questions
Use Quizgecko on...
Browser
Browser