Validity in Assessment Instruments PDF
Document Details
Uploaded by Deleted User
Tags
Summary
This document discusses the concept of validity in measurement and assessment, specifically focusing on content and criterion validity. It outlines methods for establishing validity and includes illustrative examples. The document is likely intended for educational purposes, exploring the principles of effective assessment.
Full Transcript
**Topic 4: VALIDITY** This topic provides information on the validity process and how they are carried out when evaluating the functions of assessment instruments. By learning about this topic, students would gain knowledge on the process involved in establishing the validity of assessment instrume...
**Topic 4: VALIDITY** This topic provides information on the validity process and how they are carried out when evaluating the functions of assessment instruments. By learning about this topic, students would gain knowledge on the process involved in establishing the validity of assessment instruments to ensure that they serve the purpose for which they are designed. Objectives: At the end of the topic, students must have: - Grasped the meaning and purpose of validity in measurement and assessment. - Identified the different techniques of establishing validity. - Applied validity techniques in developing their assessment tools. Validity ======== ***Example***: Does an IQ test actually tell us how intelligent a person is? Or is it actually measuring something else? If an IQ test doesn't actually measure intelligence, then it is not valid. A tape measure may be a reliable measurement tool -- but if we were using it to measure the circumference of peoples' heads as a way of measuring intelligence, it wouldn't be very valid! 1. Content validity ------------------- It refers to the content & format of the instrument. How appropriate is the content? How comprehensive? How adequately does the sample of items or questions represent the content to be assessed? Is the format appropriate? Samples should be both large enough & be taken for appropriate target groups. Content validity does not yield numerical indices, but logical judgment as to whether the test covers what it is supposed to cover. For example, the question ―1 + 1 = \_\_\_‖ may be a valid basic addition question but does it represent all of intelligence? To develop a valid test of intelligence, not only must there be questions on math, but also questions on verbal reasoning, analytical ability, & every other aspect of the construct we call intelligence. **Elements/aspects of** **content-related evidence of validity**: **Procedure:** 1. Write the definition of what you want to measure & then give this definition, along with the instrument & a description of the intended sample, to one or more judges. 2. The judges look at the definition, read over the items or questions in the instrument, & place a check mark in front of each question or item that they feel does not measure 1 or more aspects of the definition (i.e., objectives). 3. They also place a check mark in front of each aspect not assessed by any of the items. 4. The judges evaluate the appropriateness of the instrument format, then rewrites any item or question so checked and resubmits it to the judges, &/or writes new items for criteria not adequately covered. 5. This continues until the judges approve all the items or questions in the instrument & also indicate that they feel the total number of items is an adequate representation of the total domain of content covered by the variable being measured. ***Questions to be answered by content validity:*** - Does the test cover a representative sample of the specified skills and knowledge? - Is test performance free from the influence of irrelevant variables? **Techniques in Content Validation** *1. Test Specifications* ------------------------ *2. Consultation with subject-matter experts* --------------------------------------------- *4. Face Validity* ------------------ 2. Criterion validity --------------------- A **criterion** is a second test or other assessment procedure presumed to measure the same variable. For example, if an instrument has been designed to measure academic ability, student scores on the instrument might be compared with their grade-point averages (the external criterion). If the instrument does indeed measure academic ability, then students who score high on the test would also be expected to have high grade-point averages. Criterion-related validity is established empirically by **validity coefficient.** 1. Academic achievement -- measures of scholastic aptitude 2. Performance in specialized training 3. Job performance -- how an individual actually performs on the job 4. Contrasted groups -- based on survival with a particular group vs elimination from it 5. Psychiatric diagnosis -- based on prolonged observation and detailed case history 6. Ratings 7\. Correlations between a new test and previously available tests **Types of criterion validity**: -the degree to which the scores on a test are related to the scores on another, already established, test administered ***at or nearly the same time***, or to some other valid criterion available at the same time. \- It allows you to show that your test is valid by comparing it with an already valid, reliable test & considered a ―standard‖ (criterion). A new test of adult intelligence, for example, would have concurrent validity if it had a high positive correlation with the Wechsler Adult Intelligence Scale since the Wechsler is an accepted measure of the construct we call intelligence. Another example is comparing a short aptitude test to a much longer version. If scores on the short version correlate highly with scores on the longer version, then the shorter version could be used. Time & money are saved. - **Procedure** - Administer the new test. - Administer the established test. - Compute Pearson r. - A high Pearson correlation coefficient (r) indicates that the test is valid. A **correlation coefficient**, symbolized by the letter ***r*** , indicates the degree of relationship that exists between the scores individuals obtain on two instruments. A **positive relationship** is indicated when a high score on one of the instruments is accompanied by a **high score** on the other or when a **low score** on one is accompanied by a **low score** on the other. A **negative relationship** is indicated when a **high score** on one instrument is accompanied by a **low score** on the other, & vice versa. An *r* of.00 indicates that no relationship exists. b\. [***[Predictive validity]***](https://explorable.com/predictive-validity) is a measure of how well a test predicts abilities or future performance; involves testing a group of subjects for a certain construct & then comparing them with results obtained at some point in the future. \- the test has predictive validity if it predicts (fortells) subsequent behavior; the test result predicts a ***later outcome***. -researchers allow a time interval to elapse between administration of the instrument & obtaining the criterion scores. -In order for a test to be a valid screening device for some future behavior, it must have predictive validity. ***Example:*** the CAT is used by college screening committees as one way to predict college grades/academic performance. We determine predictive validity by computing a correlational coefficient comparing CAT scores, for example, & college grades. If they are directly related, then we can make a prediction regarding college grades based on CAT score. We can show that students who score high on the CAT tend to receive high grades in college. ** Procedure** - Administer test to a group of individuals. - Collect data on performance related to variable measured by the test. - Compute Pearson r. - A high Pearson r indicates good predictive validity. - A test has predictive validity if its results make predictions that agree with observation. 3. Construct validity --------------------- This refers to the extent to which the test measures a theoretical construct or trait -it testified to how well the results obtained from the use of the measure fit the theories around which the test is designed. Does the test produce a result that is in accord with an established theory? How well a test or experiment measures up to its claims. -refers to the association of the test with an underlying theory. It infers not only that the test is measuring what it is supposed to, but also that it is capable of detecting what *should* exist, theoretically -extent/degree to which the test measures a theoretical trait or an intended hypothetical construct. -e g., We have a hypothesis that anxiety increases when subjects are under the threat of an electric shock, then the threat of an electric shock should increase anxiety scores. ***Example:*** A test of arithmetic computation skills, would yield improved scores after intensive coaching in arithmetic. A test of mechanical aptitude would be better for mechanics than for writers. A test designed to measure depression - the items on the test appear (on the ―face of it‖) to measure what the test is supposed to measure. - a conclusion drawn after the measure is constructed; a weak form of validity. -the easiest way to discover whether a test is valid is to examine it & decide whether it **looks** as though it is. ***Example:*** If a test which was supposed to measure intelligence contained a large memory test section then it would not have FACE VALIDITY (it looks wrong). Clearly it LOOKS more like a memory test than an intelligence test (assuming of course that intelligence is more than just having a good memory). FACTORS THAT INFLUENCE VALIDITY ------------------------------- Inadequate sample, Items that do not function as intended, Improper arrangement/unclear directions Too few items for interpretation, Improper test administration VALIDITY CHECKLIST ------------------ 1. **Convergent validity** - test correlates with other measures of similar constructs the extent to which a test is found to be related to other tests designed to measure ine same conceptual variable. 2. **Discriminant validity-** also referred to as divergent validity extent to which a measure does not correlate with measures of unrelated or distinct concepts.