Validity - 10th Week - PDF

VALIDITY Dr. Merve ŞAHİN KÜRŞAD VALIDITY When constructing or selecting assessments, the most important questions are (a) to what extent will the interpretation of the scores be appropriate, meaningful, and useful for the intended application of the results? and (b) what are the consequences of the particular uses and interpretations that are made of the results? Regardless of the type of assessment used or how the results are to be used, all assessments should possess certain characteristics. The most essential of these are validity, reliability, and usability. VALIDITY Validity is the adequacy and appropriateness of the interpretations and uses of assessment results. The lack of fairness would lead to a negative evaluation of the validity of the interpretation or use of assessment results. For example, if an assessment is to be used to describe student achievement, then we should like to be able to interpret the scores as a relevant and representative sample of the achievement domain to be measured. If the results are to be used as a measure of students' understanding of mathematical concepts, then we should like our interpretations to be based on evidence that the scores actually reflect mathematical understanding and are not distorted by irrelevant factors, such as the reading demands of the tasks. If you weigh your self on a scale, this scale should give you an accurate measurement of your weight If the scale tells you you weigh 70 kg and you actually weigh 60 kg, the validity of the result is low. VALIDITY If an assessment intends to measure achievement and ability in a particular subject area, but then measures concepts that are completely unrelated, the assessment results has low validity When was Atatürk born? 1.Validity refers to the appropriateness of the interpretation and use made of the results of an assessment procedure for a given group of individuals, not to the procedure itself. We sometimes speak of the "validity of a test" for the sake of convenience, but it is more correct to speak of the validity of the interpretation and use to be made of the Nature of Validity results. 2.Validity is a matter of degree; it does not exist on an all-or-none basis. Consequently, we should avoid thinking of assessment results as valid or invalid. Validity is best considered in terms of categories that specify degree, such as high validity, moderate validity, and low validity. 3. Validity is always specific to some particular use or interpretation for a specific population of test takers. Nature of No assessment is valid for all purposes. For Validity example, the results of a mathematics test may have a high degree of validity for indicating computational skill, a low degree of validity for indicating mathematical reasoning, a moderate degree of validity for predicting success in future mathematics courses, and essentially no validity for predicting success in art or music. When indicating computational skill, the mathematics test may also have a high degree of validity for third- and fourth-grade students but a low degree of validity for second- or fifth- grade students. Thus, when appraising or describing validity, it is necessary to consider the specific interpretation or use to be made of the results. Assessment results are never just valid; they have a different degree of validity for each particular interpretation to be made. Validity of test Validity of the interpretation Nature of Validity 4. Validity is a unitary concept. The conceptual nature of validity has typically been described for the testing profession in a set of standards prepared by a joint committee of members from three professional organizations that are especially concerned with educational and psychological testing and assessment. In the two most recent revisions of the Standards for Educational and Psychological Testing by the American Educational Research Association, American Psychological Association (APA), and National Council on Measurement in Education (NCME) (1999), the traditional view that there are several different types of validity has been discarded. Instead, validity is viewed as a unitary concept based on various kinds of evidence. Ways of Getting Validity Evidence 1.Content Validity 2.Construct Validity 3.Criterion Validity 4. Face Validity Ways of Getting Validity Evidence 1.Content Validity How well the sample of assessment tasks represents the domain of tasks to be measured and how it emphasizes the most important content. Compare the assessment tasks to the specifications describing the task domain under consideration. Content considerations are of special importance when we wish to describe how an individual performs on a domain of tasks that the assessment is supposed to represent. We may, for example, expect students to be able to spell the 200 words on a given list. Because a 200-word spelling test is too time consuming, we may select a sample of 20 words to represent the total domain of 200 spelling words. If Margaret correctly spells 80% of these 20 words, we would like to be able to say that she can probably spell approximately 80% of the 200 words. Ways of Getting Validity Evidence 1.Content Validity Thus, we would like to be able to generalize from the student's performance on the sample of words in the test to the performance that the student would be expected to demonstrate on the domain of spelling words that the test represents. The validity of the interpretation, in which a test score implies that the student can probably spell a given percentage of words in the whole domain, depends on considerations that go beyond the question of content. The goal in the consideration of content validation is to determine the extent to which a set of assessment tasks provides a relevant and representative sample of the domain of tasks about which interpretations of assessment results are made. Content Validity in Assessment of Classroom Achievement Determine which intended learning outcomes/objectives are to be achieved. (What has been taught) Specify the domain of instructionally relevant tasks to be measured. (What is to be measured) Specify the relative importance of learning objectives to be measured. (What should be emphasized in the assessment) Provide a representative set of assessment tasks from achievement domain. (A representative sample of relevant tasks) LOW CONTENT VALIDITY Retrieved from https://www.youtube.com/watch?v=FOnPrhGygg0 HIGH CONTENT VALIDITY Retrieved from https://www.youtube.com/watch?v=FOnPrhGygg0 Ways of Getting Validity Evidence 1.Content Validity – Example Students in Professor Jones' Geography 215 class are assigned to read Chapters 1, 2, 3, and 4 for the first exam. All the chapters are similar in length and amount of material. In lecture Professor Jones conducts 3 lectures on the topics in each chapter. Students are told to study all chapters and lecture notes for their first exam. On the first exam, however, 90% of the exam questions are based on the material in Chapter 3 and Chapter 4, and only 10% of the questions are based on material in Chapter 1 and Chapter 2. Professor Jones' exam has low content validity. Ways of Getting Validity Evidence 1.Content Validity – Example Vocabulary Assessment: An English language teacher develops a vocabulary assessment for high school students. The assessment includes items that require students to demonstrate their knowledge of a variety of words from different word families, parts of speech, and semantic categories. The assessment is designed to have high content validity by encompassing a representative range of vocabulary items. 1.Content Validity Ways of Getting Content issues are typically considered during the development of an assessment. It is Validity primarily a matter of preparing detailed specifications and then constructing an assessment that meets these specifications. Evidence Although there are many ways of specifying what an assessment should measure, one widely used procedure in constructing achievement tests uses a two-way chart called a table of specifications. The learning outcomes of a course or curriculum may be broadly defined to include both subject-matter content and instructional objectives. The former is concerned with the topics to be learned and the latter with the types of performance students are expected to demonstrate (e.g., knows, comprehends, applies, analyzes, synthesizes, evaluates). Table of Specification (Test Blueprint) Percentage of Content areas covered in the questions for each test. content area Comprehends Content Area Knows Concepts Applies Concepts Total concepts Air pressure 4 8 4 16 Air temperature 4 4 4 12 Humidity and 8 12 4 24 precipitation Wind 4 4 4 12 Clouds 8 12 0 20 Fronts 4 8 4 16 Total 32 48 20 100 Percentage of questions for each cognitive level Total percentage 1.Content Validity According to results of assessment, rather than only stating that a student correctly solved 75% of the tasks Ways of Getting on a particular mathematics test, we might want to infer that the student possesses a certain degree of Validity Evidence mathematical reasoning ability. Evidence obtained from an analysis of content domain definition and content coverage (sampling adequacy) is also of concern when selecting published achievement tests. When test publishers prepare achievement tests for use in the schools, they pay special attention to content. Their test specifications, however, are based on what is commonly taught in many different schools. Thus, a published test may or may not fit a particular school situation. To determine whether it does, it is necessary to go beyond the title of the test and examine what the test actually measures. Although we are considering them second, most measurement specialists Ways of Getting would give highest priority to construct considerations in evaluating the validity Validity Evidence of an interpretation or use of an assessment. We began with content considerations because the review of the content domain and sampling helps us determine how well test or assessment scores represent a given domain of tasks and is especially useful in both the preparation and the evaluation of all types of assessment of achievement. How well performance on the 2.Construct Validity assessment can be interpreted as a meaningful measure of some characteristic or quality. Whenever we wish to interpret assessment Ways of Getting Validity results in terms of some individual characteristic (e.g., reading comprehension, mathematics problem- Evidence solving ability), we are concerned with a construct. 2.Construct Validity A construct is an individual characteristic that we assume exists in order to explain some aspect of behavior. Mathematical reasoning is a construct, and so are reading comprehension, understanding of the principles of electricity, intelligence, creativity, and such personality characteristics as sociability, honesty, and anxiety. These are called constructs because they are theoretical constructions that are used to explain performance on an assessment. 2.Construct Validity When we interpret assessment results as a measure of a particular construct, we are implying that there is such a construct, that it differs from other constructs, and that the results provide a measure of the construct that is little influenced by extraneous factors. Verifying such implications is the task of construct validation. Construct validation may be defined as the process of determining the extent to which performance on an assessment can be interpreted in terms of one or more constructs. Ways of Getting Whenever an assessment is to be interpreted Validity Evidence as a measure of a particular construct, the various types of evidence useful for construct validation should be considered during its development or selection. Ways of Getting 2.Construct Validity Validity Evidence Two questions are central to any construct validation: 1.Does the assessment adequately represent the intended construct? 2.Is performance influenced by factors that are irrelevant to the construct? Construct-irrelevant factors can lead to unfairness in the use and interpretation of assessment results. An assessment intended to measure understanding of mathematical concepts, for example, could lead to unfair inferences about the level of understanding of English-language learners because of the heavy reading demands of the assessment tasks, which are presented only in English. Ways of Getting Validity Evidence 2.Construct Validity In considering construct-irrelevant factors On an assessment in mathematics or may undermine validity, it is useful to science, for example, reading ability is one think about ancillary skills that may have obvious skill that may be ancillary to the intent of the assessment. Thus, it would be an impact on performance on the important to review the reading demands assessment. of the tasks to ensure that performance of some students was limited not by a lack of understanding of science principles or mathematical concepts but by reading difficulties. A wide range of construct-irrelevant factors may Ways of Getting undermine validity. In addition to the influence of ancillary skills (e.g., reading on a science test), the test can interact with student characteristics such Validity Evidence as test-wiseness, motivation, or anxiety. As a consequence, students who have low test- wiseness (or low motivation or high anxiety) 2.Construct Validity might be expected to score lower that their true ability would indicate on a science test. Instruction can also introduce construct irrelevant factors when comparing two groups of students who have not had an equal opportunity to learn the material. For example, if one group of students has received instruction that emphasizes higher-order skills and another group of students received instruction that emphasized lower-level skills, then comparisons of the students' abilities on a test that measures higher- order skills would not lead to reasonable interpretations Ways of Getting Validity Evidence 2.Construct Validity Ways of Getting Validity Evidence 2.Construct Validity 2. Analyzing the response process Methods Used in Construct Validation required by the assessment tasks. The response process called forth by the 1. Defining the domain or tasks to be assessment tasks can be determined both measured. The specifications should by examining the test tasks themselves be so well defined that the meaning and by administering the tasks to of the construct is clear and it is individual students and having them possible to judge the extent to "think aloud" as they perform the tasks. which the assessment provides a relevant and representative measure of the task domain. If a single construct is being measured, then the tasks should evoke similar types of responses and be highly interrelated (also a content consideration). Ways of Getting Validity Evidence 2.Construct Validity Methods Used in Construct Validation 3. Comparing the scores of known groups. In some cases, it is possible to predict that scores will differ from one group to another. These may be age groups, trained and untrained, adjusted and maladjusted, and the Iike. For example, level of achievement generally increases with age (at least during childhood and adolescence). Also, it is reasonable to expect that performance on an assessment will differ for groups that have received different amounts of instruction in the subject matter of the assessment and that scores on adjustment inventories will discriminate between groups of adjusted and maladjusted individuals. Test measuring schizophrenic behaviours People with schizophrenia Normal people Ways of Getting Validity Evidence 2.Construct Validity Methods Used in Construct Validation 4. Comparing scores before and after a particular learning experience or experimental treatment. We would like our assessments to be sensitive to some types of experiences and insensitive to others. Certainly, we would like assessments of student achievement in a given subject-matter area to improve during the course of instruction. On the other hand, we would not Iike them to be influenced by such factors as student anxiety. Thus, both a demonstration of increases in performance following instruction and a demonstration that performance was affected little by a treatment designed to reduce student anxiety would lend support to the construct validity of the assessment. Ways of Getting Validity Evidence 2.Construct Validity Methods Used in Construct Validation 5. Correlating the scores with other measures. The scores of any particular assessment can be expected to correlate substantially with the scores of other measures of the same or a similar construct. By the same token, lower correlations would be expected to be obtained with measures of a different ability or trait. Scholastic aptitude test Scholastic aptitude test Scholastic aptitude test Musical aptitude test 3.Criterion Validity Validity Evidence Although few teachers will conduct studies relating Ways of Getting assessment results to other measures it is important to understand the use of assessment-criterion relationships in evaluating validity. When test scores are to be used to predict future performance or to estimate current performance on some valued measure other than the test itself (called a criterion), we are especially concerned with evaluating the relationship between the test and the criterion. For example, reading readiness test scores might be used to predict students' future achievement in reading, or a test of dictionary skills might be used to estimate students' current skills in the actual use of the dictionary. 3.Criterion Validity Ways of Getting In the first example, we are interested in prediction and thus in the relationship between the two measures over time. This procedure for obtaining evidence of validity Validity calls for a predictive validation study. Evidence Test scores are used to predict the future performance. The relationship between two measures over an extended period time is the focus. In the second example, we are attempting to estimate present status, and thus we are interested in the relationship between two measures obtained concurrently. A high relationship in this case would show that the test of dictionary skills is a good indicator of actual skill in using a dictionary. This procedure for obtaining evidence of validity calls for a concurrent validation study. The aim is to estimate present status. The relationship between two measures obtained concurrently is the focus. Test performance Predictive Validation Study Criterion performance Scholastic Aptitude Achievement Test Scores Scores September 17 December 10 Test performance Concurrent Validation Study Criterion performance Scholastic Aptitude Achievement Test Scores Scores September 17 September 17 The major difference resides in the time between two measures The second measure of performance (which is called as criterion) may obtained at some future date. In this case we are predicting the future performance OR the second measure may be obtained concurrently. In this case, we are interested in estimating present performance. 3.Criterion Validity Ways of Getting Predictive validity – Examples College Readiness Assessment: A high school Validity implements a college readiness assessment to determine students' preparedness for post-secondary Evidence education. The assessment measures critical thinking, problem-solving, and communication skills. The assessment demonstrates predictive validity if students who perform well on the assessment are more likely to succeed in college compared to those who score lower. Early Literacy Assessment: A primary school implements an early literacy assessment to identify students at risk of reading difficulties. The assessment measures various skills related to reading, such as letter recognition, phonological awareness, and vocabulary. The assessment demonstrates predictive validity if students identified as "at risk" on the assessment later show lower reading proficiency compared to their peers. Ways of Getting Validity Evidence 3.Criterion Validity Predicting Future Performance Suppose that Mr. Tanaka, a junior high school teacher, wants to determine how well the scores from a scholastic aptitude test will predict success in his seventh-grade mathematics class. Because the scholastic aptitude test is administered to all students when they enter junior high school, these scores are readily available to Mr. Tanaka. His biggest problem is deciding on a criterion of successful achievement in mathematics. For lack of a better criterion, he decides to use a comprehensive departmental examination that is administered to all seventh-grade math sections at the end of the school year. It is now possible for Mr. Tanaka to determine how well the scholastic aptitude test scores predict success in his mathematics class by comparing the students' scholastic aptitude test scores with their scores on the departmental examination. Do those students who have high scholastic aptitude test scores also tend to have high scores on the departmental examination? Do those who have low scholastic aptitude test scores also tend to have low scores on the departmental examination? If this is the case, Mr. Tanaka is inclined to agree that the scholastic aptitude test scores tend to be reasonably accurate in predicting achievement in this mathematics class. Ways of Getting Validity Evidence 3.Criterion Validity Estimating Present Performance (Concurrent validity) We obtain both measures approximately at the same time and correlate the results It is used for the purpose of replacement for a time consuming method of obtaining information Elaborate observation Rating Objective test Suppose that you are a biology teacher. You perform elaborate observation and rate the performance of the students on laboratory experiment. But, you spend a lot of time to do that. That’s why, instead of using these assessment methods, you want to use an objective test. To determine how adequately the test measured their performance, you should correlate the test results with the scores obtained from observation and rating. If you find high correlation, you can conclude that the scores obtained from the test are valid. 3.Criterion Validity Ways of Getting Concurrent validity – Examples Math Achievement Test: A researcher develops a new Validity math achievement test and administers it to a group of students. At the same time, the students also take an Evidence established math achievement test that is considered a gold standard. The scores on the new test demonstrate concurrent validity if they correlate highly with the scores on the established test, indicating that both tests are measuring the same math construct. Language Proficiency Assessment: A language institute introduces a new language proficiency test for assessing students' fluency in a specific language. The institute administers the new test alongside an existing language proficiency test that is widely recognized. The scores on the new test demonstrate concurrent validity if they correlate strongly with the scores on the established test, indicating that both tests measure similar language proficiency levels. Ways of Getting Validity Evidence 3.Criterion Validity - Example A university professor creates a new test to measure applicants’ English writing ability. To assess how well the test really does measure students’ writing ability, she finds an existing test that is considered a valid measurement of English writing ability, and compares the results when the same group of students take both tests. If the outcomes are very similar, the new test has a high criterion validity. Ways of Getting Validity Evidence 3.Criterion Validity - Example College Entrance Exam: A college admissions department uses a standardized test, such as the SAT or ACT, as a criterion for evaluating applicants' readiness for college-level coursework. The scores on these exams are expected to have criterion validity because they should correlate with academic success in college. Ways of Getting Validity Evidence 3.Criterion Validity Estimating Present Performance Several factors influence the size of correlation coefficients, including validity coefficients. Knowing these factors can help with the interpretation of a particular correlation coefficient Ways of Getting 4. Face Validity Validity Evidence Face validity refers to the degree to which an assessment or test subjectively appears to measure the variable or construct that it is supposed to measure. Face validity is when an assessment or test appears to do what it claims to do. Ways of Getting Validity Evidence 4. Face Validity Merve is a researcher who has just developed a new assessment that is meant to measure mathematical ability in college students. She selects a sample of 300 college students from three local universities and has them take the test. After the students complete the test, Merve asks all 300 participants to complete a follow-up questionnaire. In the questionnaire, she asks the participants what they think the purpose of the test is, what construct they believe is being measured, and whether or not they feel the assessment was an adequate measure of their mathematical ability. After analyzing the follow-up results, she finds that most of the participants agree that Merve’s assessment accurately measures their mathematical ability. Ways of Getting Validity Evidence 4. Face Validity - Example Course Syllabus: A college professor creates a detailed syllabus for a biology course that outlines the topics to be covered, the learning objectives, and the assessment methods. Students are likely to perceive the syllabus as having face validity because it provides a clear roadmap of what they can expect to learn and how their understanding will be evaluated. Ways of Getting Validity Evidence 4. Face Validity - Example Personality Assessment: A psychologist develops a personality assessment tool that measures traits such as extraversion, conscientiousness, and emotional stability. The items on the assessment ask individuals to rate themselves on various behaviors and preferences associated with these traits. Participants who take this assessment may perceive it as having face validity because the questions seem relevant to the traits being measured. Factors Influencing Validity-Factors in the Test or Asessment Itself 1.Unclear directions: Directions that do not clearly indicate to the student how to respond to the tasks and how to record the responses tend to reduce validity. 2.Reading, vocabulary and sentence structure too difficult (construct- irrelevant variance): Vocabulary and sentence structure that are too complicated for the students taking the assessment result in the assessment's measuring reading comprehension and aspects of intelligence, which will distort the meaning of the assessment results. 3.Ambiguity: Ambiguous statements in assessment tasks contribute to misinterpretations and confusion. Ambiguity sometimes confuses the better students more than it does the poor students. Factors Influencing Validity-Factors in the Test or Asessment Itself 4. Inadequate time limits (construct-irrelevant variance): Time limits that do not provide students with enough time to consider the tasks and provide thoughtful responses can reduce the validity of interpretations of results. Rather than measuring what a student knows about a topic or is able to do given adequate time, the assessment may become a measure of the speed with which the student can respond. For some content, speed may be important, but most assessments of achievement should minimize the effects of speed on student performance. 5. Overemphasis of easy-to-assess aspects of domain at the expense of important but difficult-to-assess aspects (construct underrepresentation): It is easy to develop test questions that assess factual recall and generally harder to develop ones that tap conceptual understanding or higher-order thinking processes, such as the evaluation of competing positions or arguments. Hence, it is important to guard against underrepresentation of tasks getting at the important but more difficult-to-assess aspects of achievement. Factors Influencing Validity-Factors in the Test or Asessment Itself 6. Test items 8. Test too short: A inappropriate for the test is only a sample outcomes being of the many questions measured: Attempting that might be asked. If to sure understanding, a test is too short to thinking skills, and provide a other complex types of representative sample achievement test forms 7. Poorly constructed of the performance that are appropriate test items: Test items we are interested in, only for measuring that unintentionally then its validity will factual knowledge will provide clues to the suffer accordingly. invalidate the results. answer tend to measure the students' alertness in detecting clues as well as mastery of skills or knowledge the test is intended to measure. Factors Influencing Validity-Factors in the Test or Asessment Itself 10. Identifiable pattern of answers: Placing correct answers in some systematic pattern (e.g., T, T, F, F or A, B, C, D, A, B, C, D) enables students to guess the answers to some items more 9. Improper arrangement of items: Test easily, which lowers validity. items are typically arranged in order of difficulty, with the easiest items first. Placing difficult items early in the test may cause students to spend too much time on these and prevent them from reaching items they could easily answer. Improper arrangement may also influence validity by having a detrimental effect on student motivation. This influence is likely to be strongest with young students. Factors Influencing Validity-Factors in Administration and Scoring Insufficient time Unfair aid to individual students who ask for help Cheating Unreliable scoring In the case of published tests, failure to follow the standard directions and time limits, giving students unauthorized assistance, and errors in scoring similarly contribute to lower validity. Factors Influencing Validity-Factors in Student Responses Emotional Physical Motivational Some students may be bothered by emotional disturbances that interfere with their performance. Others may be frightened by the assessment situation and so are unable to respond normally, and still others may not be motivated to put forth their best effort.

Validity - 10th Week - PDF

Document Details

Tags

Related

Summary

Full Transcript