Assessment in Learning 1 PDF

PED 06 - Assessment in Learning 1 Assessment in Learning 1 PED 06 - Assessment in Learning 1 Large Scale Student Assessment 1. Understanding Large Scale Student Assessment 2. Development of Large- Scale Student Assessment Test K to 12 Grading System PED 06 - Assessment in Learning 1 Understanding Large-Scale Student CHAPTER 12 Assessment Large-Scale Student Assessment This section introduces large-scale student assessment which by its name suggests a broader scope of student assessment to measure student progress at the local, regional or national level and establish comparability across schools. \Section Intended Learning Outcome A look at the history of large-scale student * Expound on the requirements of large- assessment movement in different countries scale student assessment for its use to shows that it is initiated by a felt need to raise improve student performance. the quality of education improving student * Describe the nature, purpose and use of achievement in government controlled public large-scale student assessment. schools. The issuance of compelling national policies encourages system-wide Overview assessment of schools to determine whether The practice of large-scale student students are meeting the standards set for assessment has long been in place in many their achievement and to provide educational systems that continuously seek stakeholders information for improving for ways and means to improve student student achievement (McGehee & Griffith, achievement through assessment. It is a 2001). It was made to align with standard- move initiated by educational managers and based reforms in education. administrators to push schools and teachers to double their efforts in raising the quality of Currently, LSA is now being utilized by instruction in their locale, soliciting educational systems as a means to assess stakeholders' support and for receiving and monitor growth of students in all schools substantive government incentives. The within a defined geographical area, and to initial chapter then provides a conceptual openly report such results to the public. This understanding of large-scale student has further awakened policy makers to use assessment (LSA) in terms of its nature, LSA to drive educators, schools, teachers purpose and use. Through the experiences and students become accountable for of countries employing LSA, it will lay out improving student achievement. some of the issues and challenges regarding Accountability then has become a major its implementation. concern for LSA in examining the degree to which educational standards are being NATURE OF LARGE-SCALE STUDENT reached by the system. (Popham, 2001 as ASSESSMENT (LSA) cited by De Pascale, 2002). Large scale assessment is generally administering tests to huge numbers of Large-scale assessment has been referred examinees at the same time (Montana Office to in the literature using different terms: of Public Instruction, 2001). The test data are System-wide Assessment suggests the later on processed, analyzed and reported participation of all schools which are part of by an organizational infrastructure. The cycle a system is repeated over and over for the purpose of Standardized Testing suggest the monitoring student progress. consistent manner and conditions in the administration, scoring and interpretation of LEARNING KEY POINT assessment tools Large-scale student assessment is a Certification Testing suggest a specific program designed to improve student purpose for assessment performance. It requires system-guide Large-scale Annual Assessment suggest administration of standardized tests to frequency of scheduled school assessment schools. PED 06 - Assessment in Learning 1 on the pertinent learning outcomes to be assessed. Planning the test content takes a formal convention of teachers across various parts of the country together with other curriculum experts. In CBA, a classroom teacher may seek collegial advice of his/her peers in determining content of a summative test that may be administered to one class or to classes of the same grade level in the school. LSA AND CBA d. Selected-response format, like multiple- The underlying assessment concepts behind choice and binary-choice items are LSA are closely related to those of commonly used. These two test types lend classroom-based assessment (CBA). themselves efficiently to objective scoring, reviewing and revising of times through item a. LSA and CBA can be both designed and analysis population, use of machine scoring used for norm-referenced assessment and for multiple-choice items facilitate fast criterion referenced assessment. As interpretation and release of results. Many previously discussed in earlier sections, one countries employ independent testing purpose of CBA is for ranking and comparing centers or agencies to carry out the task of students in class. On the basis of their test development, administration, scoring and scores, students' overall standing in class is interpretation of LSA paraphernalia. The known and reported as part of their grades. have computer facilities to handle complex LSA likewise uses this norm-referenced statistical analyses for large database approach in ranking schools within collected. This is not a concern of CBA. administrative districts, divisions, or regions in a country using some descriptive statistics. e. Assessment is more focused on cognitive instructional outcomes. Both LSA and CBA Both CBA and LSA are similarly interested in without fully intending to be so, often knowing to what extent students have measure those learning outcomes that lend mastered the expected body of knowledge themselves to testing formats, i.e. selected- and competencies being assessed. This response and constructed response types. criterion-referenced approach informs the Performance-based formats which can be instructional decisions of teachers and utilized to assess non-cognitive behaviors, students. are less frequently used for this purpose. This is actually a criticism received by b. Content of LSA and CBA tools are both standardized testing as applied in LSA. Bill aligned to the curriculum and specific Ayers (1993), and educational theorist, was learning outcomes. With assessment of quoted saying: learning as a primary purpose of student assessment, the main source of test content “Standardized tests can't measure initiative, are the intended learning outcomes within creativity, imagination, conceptual thinking, the context of subject matter areas in the curiosity, effort, irony, judgment, prescribed curriculum of the schools. LSA commitment, nuance, good will, ethical intends to gather information across target reflection, or a host of other valuable schools of regions in a country; its main dispositions and attributes. What they can concern is to monitor the status of student measure and count are isolated skills, achievement in specific areas such as specific facts and function, content reading, writing and Mathematics. Similarly, knowledge, the least interesting and least CBA aims to do the same but only within a significant aspects of learning." class using the intended learning outcomes for an instructional period. There is some truth in this allegation or observation especially considering the great c. Involvement of teachers in the preparation majority of standardized tests designed for of the assessment tools. Although LSA LSA in different countries measuring mostly involves a greater number of teachers in the competencies in Language, Reading, development process, both levels of Science and Mathematics. However, with the assessment rely heavily on experienced right intent and resources, assessing teachers in selecting the instructional psychomotor and affective behaviors for LSA outcomes to be tested. The teachers, more is no longer far from happening. CBA now than any other stakeholders, need to agree delves on these types of non-test PED 06 - Assessment in Learning 1 assessment. In fact, current assessment Admission Test (PNUAT) are illustrations of reforms applied in classrooms give emphasis entrance examinations. on authentic, performance-based, and portfolio assessments for learning outcomes e. Admission to graduate programs in focused on problem solving, critical thinking most universities in the U.S. requires passing and creative thinking. a qualifying examination, e.g. Graduate Record Examination (GRE). f. Certification of second language proficiency required by many higher education institutions from non-English speaking applicants. Tt is obtained using Test of English as a Second/Foreign Language (TOESL/FL) for American institutions and International English Language Test (IELT) for British schools. There are centers in major cities that administer these standardized tests. Use of Large-Scale Student 2. Another purpose of LSA is monitoring Assessment and judging progress of student LSA for certification and admission is performance for evaluative purposes and practiced in different configurations by research. educational institutions all over the world. a. Many countries use a national examination as their barometer of how their 1. Some LSA programs make use of high- students are performing along desired stake testing which defines a student's competencies in different subject areas at entry or non-entry to a desired advanced different levels across time. Tests are level. regularly administered to randomly sampled a. Grade level standardized test of schools through the National Assessment of achievement administered at major transition Educational Progress (NAEP) in the U.S. points which students have to pass before (Grissmer, et al, 2000). Some countries do proceeding to the next higher level. This is census testing and administer yearly an done in countries influenced by the British achievement test to target grades of all system of education. schools like the National School Achievement Test (NSAT) in the Philippines b. Examination given by secondary (http://nrtc.deped.gov). schools before graduation to certify student's readiness for college level. Failure could b. LSA is also triggered by policies harshly mean not receiving the secondary initiating standards-based reforms in diploma or requiring the student to take extra education to improve student performance. units of course work as a remedial The "No Child Left Behind" policy in the U.S. intervention before they are allowed to go. is backed by large-scale assessment to rationalize fiscal support for schools doing c. Senior students taking a national well and motivate teachers to improve their achievement test at a scheduled time with teaching-learning methodologies. their obtained scores spelling admission or non-admission to tertiary institutions of their c. At the international scene, there are choice. Only those reaching the prescribed standardized examinations developed to standard of the institution could get admitted. monitor changes in educational achievement in basic school subjects over time. During the d. Admission to state colleges and last decade or so, international assessments universities in Asia requires passing a for Science and Mathematics (Trends in standardized entrance examination and the International Mathematics and Science obtained score determines the course or Study), reading (Progress in International campus the student can be enrolled in. Reading Literacy Study) and Civics Passing additional examination after (International Civics and Citizenship admission can also mean an opportunity for Education Study), are being coordinated with advanced or accelerated work. University of participating countries interested in knowing the Philippines College Entrance Test status of their students in comparison with (UPCAT) and Philippine Normal University the rest of the world. This is led by International Association for the Evaluation PED 06 - Assessment in Learning 1 or Educational Achievement (1EA) in important to disseminate so schools can partnership with international testing, easily gain support from the government, research, and data processing and statistical community and parents. It may not be true in centers. Numerous researches in education all countries, but awareness of a school's have been conducted using their large data standing amongst other schools is a selling bases. The examinations also intend to point for schools performing well. model to the different countries the ideal learning outcomes to be measured in these 3. Mining of LSA processes and subject areas and the manner in which they products through relevant research in should be tested. Two of the international education. Either for a graduate degree agencies which are into expanding the requirement or just for academic exploration, science of large-scale assessments through establishing the correlates of school or research are International Association for the student performance through research is Evaluation of Educational Achievement (IEA) quite significant for systems improvement and Educational Testing Service (ETS). 4. Well-documented information of 3. Standardized tests have also been schools as a by-product of LSA inform developed for screening and diagnostic relevant policy for legislation or purposes. institutionalization. Having sufficient basis for There are those which have been reported policy legislation on transformation of non- as being able to identify developmental performing schools can easily be accepted delays in a young child's development. An for implementation when backed by example of this standardized tool is the empirical research. "Ages and Stages Questionnaire (ASQ) which screens young children on five 5. Curricular reforms can be in place with developmental areas such as LSA information on what learning outcomes Communication, Gross Motor, Fine Motor, students across schools can achieve. Problem Solving, and Personal-social. Reforms can be a revisit of curriculum standards, improvements of instructional and assessment methodologies, materials and resources. 6. Staff development of school heads and teachers. With data and information released to schools, school heads and staff VALUE-ADDED OF LSA need re-training and updating on complex Over and above the expressed mission of data interpretation. Misuse of assessment LSA to improve student performance, there results can be triggered by inadequate are other value-added that can be technical understanding of what are being strategized. communicated. 1. LSA can promote an evaluation of the effectiveness of educational programs CONDITIONS FOR EFFECTIVE LSA implemented in schools. (Crundwell (2005) IMPLEMENTATION One of the processes not given attention by Attitude towards LSA has been mixed heads of school is evaluation of school- from different corners. While some see specific educational programs. Many do not benefits of the program, others see it as not have the opportunity to conduct an having a fit to the purpose for which it has evaluative study due to lack of technical been conceived. Ungerleider (2003; 2006, expertise in this field, staff time and other pp. 873-874) has given the conditions for logistics. Or, the school does not feel the LSA implementation to succeed in improving need to do it at all! The LSA program, other student performance: than the assessment proper, includes systematic gathering of information on * establish broad agreement about what educational initiatives being carried out in school outcomes are essential for all schools and later treat them as school students; variables that could explain between * ensure that these areas are clearly variance in student scores. articulated in the curriculum and are supported with appropriate instructional 2. LSA can provide information to the material; general public on how schools and students * hold students, parents, and teachers are performing and how they compare with accountable for those other schools. This information is quite PED 06 - Assessment in Learning 1 * assess student progress in the areas of * Motivation of students to take portfolio importance at different times over their development seriously. Without this, there school careers; will be no data to be collected and interpreted * prepare teachers and encourage them for LSA implementation. to use teaching strategies that increase learning outcomes for all students; * encourage mixed-ability grouping and discourage grouping. tracking or streaming students by socio-economic background or in ways that increase differentiation among students of different ethno-cultural backgrounds; * assess schools on the basis of student growth learning outcomes, taking into account their individual socio-economic backgrounds, the socio-economic context of the school community, as well as school policies and practices known to influence the achievement of the valued outcomes; * examine rates of student progress as well as gradients in student progress associated with such background factors as socioeconomic standing, gender, and ethnicity; * ensure that teachers and administrators are well prepared for their responsibilities; * counter misuse of the results of large- scale assessments in the media and elsewhere; and * provide teachers with adequate time to individually and collectively interpret data for the purpose of improving instruction. Higher education has likewise been motivated in using large-scale assessment to monitor student's performance. The greatest challenge has been the development of assessment tools that will address general learning outcomes at the tertiary level and their operationalization to be able to handle various disciplines (Banta, 2006). 'According to him, there are milestones that must be accomplished for LSA to get implemented in higher education. * Agreement by the faculty on the general learning outcomes to be assessed * Operationalization of these outcomes so they can be translated to rubrics applicable to student works coming from different fields. * Systems to be designed by Information technology experts for analyzing millions of student's artefacts against rubrics aggregated for easy reporting. This way the References: student capabilities to perform the desired Guzman, E.&Adamos, J. (2015). learning outcomes can easily be Assessment of Learning. Adriana Publishing communicated. Balagtas,M.et al. (2020).Assessment in * Requirement for students to place Learning 1. Rex Bookstore artefacts in their portfolios over their entire academic program to assess development. PED 06 - Assessment in Learning 1 CHAPTER 13 Development of Large-Scale Student Assessment Test Objective: REVIEW OF CLASSROOM TEST * Trace the development process of a large- DEVELOPMENT PROCESS scale assessment test. Within the context of summative assessment in the classroom, the suggested phases of Overview work in developing the test generally consist Large-scale student assessment (LSA) has of the following procedural steps to ensure indeed come a long way in being used for basic validity requirements: different purposes with improvement of student performance never failing to rank Planning the test which specifies first in importance. The preceding chapter * Purpose of test has concentrated on providing functional * Test blueprint test format, number of understanding of the nature of large-scale items assessment, its purposes and uses as well * Learning outcomes as how it can be more effective in its use to improve student achievement. Item-Construction which is performed by the classroom teacher following a table of The present chapter divulges the process in specifications developing a large-scale assessment test following the traditions embraced by well- Review and Revision for item improvement known international testing agencies that * Judgmental approach before and after have expanded the science of LSA administration of test (https://www.ets.org/, http://www.iea.nl) - by the teacher/peers to ensure the through the years. accuracy and alignment of test content to learning outcomes Understanding the development process tor - by the students to ensure LSA will be approached using the process for comprehensibility of items and test classroom-based tests (cf. Section 3) as a instructions jumping board. Hopefully, by the time you reach the end of the book, you would have * Empirical approaches after administration been able to adequately describe the of test universal process in developing LSA tests. - Obtain item statistics in the form of As a future teacher and perhaps, a school quality indices administrator later on, this relevant knowledge of the test development process A teacher of whatever level defines to could entice you to get involved in some himself/herself even in the simplest way why assessment project either as a test s/he is going to prepare a test, what s/he will developer or as a consumer of LSA results. assess and how s/he will test them! This is the design or planning phase which every assessment tool for whatever purpose it will be used will have to be drawn. The item construction which follows gives flesh to the PED 06 - Assessment in Learning 1 test. To less informed or less conscientious teachers, test construction is item construction, full stop! To them, what come before and after item construction which is reviewing the items, is of little consequence anymore Hopefully, the present course on assessment would bring about changes in the way you view test development as a process in order for testing results to get maximized. Score-based inferences on student performance can only be appropriately done if the test from which they are derived have been constructed properly. DEVELOPMENT PROCESS FOR LARGE-SCALE TEST Changing the context from classroom to systems-wide testing, there are other significant considerations that The LSA process however spends much must be in place in addition to what are more time and effort in carrying out multiple required by a teacher-made test. With an checks and balance. The various types of understanding of the nature of large-scale review to be undertaken, i.e. content, student assessment, more questions must fairness, editorial, stakeholders, and the be addressed in the development process statistical review also suggest the concerning purpose of test, coverage, length involvement of several committees or of test, review of items for quality and expertise like curriculum experts, teachers, fairness and such technical merits as validity item developers, testing experts, language and reliability among others. specialists, sociologists, psychometricians and statisticians and large data base Watch the video presentation showing how specialists. These are reflected in two steps "ETS creates fair, meaningful tests and test which apparently are not done with questions" to guide your discussion later. classroom tests: pilot testing of tests to http://www.ets.org/understanding sample groups whose characteristics are testing/test_development/). similar to the target population and the statistical review that establishes the What do you see as common steps between psychometric integrity of the items and the developing classroom tests and large-scale test as a whole in terms of gathering tests? empirical evidences for the validity of its - They both need a test framework for score interpretation and reliability in terms of specifying purpose of test, what are consistency of scores obtained across to be measured, to whom the test will versions of the test. be administered, what test format to use, the length of test, etc. - They both need to prepare a test KEY STEPS IN LARGE-SCALE blueprint or table of specifications TEST DEVELOPMENT that specify the content and The test development process is knowledge and skills to be covered basically influenced by the and the number of items to be Standards tor Educational and prepared for each learning outcome. Psychological Testing developed by - There is a need to review the items American Educational Research Association, to ensure that the items measure American Psychological Association, & intended outcomes, non-ambiguity of National Council on Measurement in the problem, the plausibility of the Education (1985). While they are regarded distracters, and the correctness of as criteria for evaluating tests (cf Section 2), the keyed option. they serve as the foundation tor the process. Given these standards, ETS has developed its stringent guidelines contained in 2014 ETs Standards for Quality and Fairness (PDF) (https://www.ets.org/s/about/pdf/standards/pdf) for its specific standards on "Validity" PED 06 - Assessment in Learning 1 "Fairness" "Reliability" "Scoring" and Step 4: The Items are pretested to a "Reporting Test Results," in addition to "Test Pretest sample group similar to Design and Development" These have the population to be defined the key steps in the development of tested. Results should large-scale tests (see Table 13.1) determine: http://www.ets.org/understanding.testing/tes * the difficulty of each tdevelopment question * if questions are ambiguous or misleading Fundamental * if questions should be Key Steps Questions to be revised or eliminated Addressed * if incorrect alternative answers should be Step 1: Who will take the test revised the replaced Defining and for what purpose? Objectives What skills and/or areas Step 5: After pretesting, test of knowledge should be Detecting reviewers re-examine the tested? and items. How should test takers Removing * Are there any test be able to use their Unfair question which have knowledge? Questions language, symbols or What kinds of questions words and phrases should be included? How inappropriate or offensive many of each kind? to any subgroup of the How long should the test population? be? * Are there questions How difficult should the consistently performed test be? better by a group than Who will be other groups? Step 2: * What items further need Item * defining test objectives revision or removal Development and specifications before final version is Committees * helping ensure test made? questions are unbiased * determining test format Step 6: After the test is (e.g., multiple-choice, Assembling assembled, item essay, constructed- the Test reviewers prepare a list response, etc.) of correct answers and * considering are compared with supplemental test existing answer keys. materials * Are the intended * reviewing test answers indeed the correct answer? questions, or test items, written before Step 7: After test administration, * writing test questions Making Sure statisticians perform Step 3: Item developers and that the Test analysis of results to find Writing and reviewers must see to it Questions out if test is working as Reviewing that each item: are intended. Questions * has only one correct Functioning * Is the test valid? Are answer among the Properly the score interpretations options provided in the supported by empirical test evidence? * conforms to the style * Is the test reliable? Can rules used throughout the the performance on one test version of the test predict There are scoring guides performance on any for open-ended other version of the test? responses (e.g short- * What corrective actions written answers, essays need to be done when and oral responses.) there are problems detected before final scoring is done? PED 06 - Assessment in Learning 1 established by educational and ESTABLISHING VALIDITY OF psychological tests. TESTS While validity is spoken with reference to Validity is regarded as the basic requirement what the test purports to measure, the of every test. It refers to the degree to which concept as applied for large-scale testing a test measures what is intended to be has shifted to "the degree to which evidence measured. Can the test perform its intended and theory support the interpretations of test function? This is the business of validity and scores entailed by proposed uses of the one adapted by the classical model for tests."(Messick, 1995, p. 741 as cited by regarding validity. There are three Crundwell (2005). This modern model of conventional types of validity according to validity focuses significantly on validating this model: content validity, criterion-related score interpretation through established validity and construct validity (Anastasi and evidences to support them. According to Urbina (1997). Messick (1995), there are five categories of evidence supporting a score interpretation Content validity refers to how the test covers and which have brought about other forms of a representative sample of the behavior validity: domain to be measured. For educational tests, it is established by a systematic 1. Evidence based on test content examination of the course syllabi which 2. Evidence based on response processes becomes the basis for the selection of the 3. Evidence based on internal structure outcomes to be included in the preparation of 4. Evidence based on relations to other the test specifications. Alignment of the items variables to these intended behaviors serves as 5. Evidence based on consequences of evidence of content validity. testing Criterion-related validity is of two kinds: Pertinent to large-scale tests, validity refers concurrent and predictive validity which are to the meaning placed on test results, "how dependent on the strength of correlation test results are used and how they impact between the test scores and another external individuals, persons and society as a measure or criterion of the behavior intended whole."(College Board Inspiring Minds, to be measured. The two kinds differ in terms 2011). Thus working on the validity of a of assuring what the test measures. A large-scale test calls for explicitly validating significant correlation between the test its use for a specific purpose. If a test scores and another related test administered developer claims that the test he has is a almost at the same time frame indicates readiness examination to enter college, then concurrent validity and that the former what may be required for validity are similarly measures what the other external research evidences displaying capability of test is measuring. Predictive validity is similar the test to perform this qualifying unction For in procedure except that the data for the instance, comparing the grades of a sample external measure is gathered after a group that has passed the readiness test and considerable period of time after the new test a group who did not take the test at all may has been administered. The new test be a potential case for validating the test. If performs as the predictor variable while the there are more from the test takers who were external measure is the criterion variable. able to enter college than those who did not For instance, predictive validity for an take the test at all, it is an evidence of what entrance test is evident if the scores can the test user claims the test can do. predict students' grade point average after Probability of success in getting accepted in the first semester based on the magnitude of a college program can be inferred from the correlation between these variables. test Construct validity involves empirical The type of validity that looks into the social examination of the psychological construct impact of a test result on an individual, a hypothetically assumed to be measured by group or a school is referred to the test. It is established by doing a factor consequential validity (Crocker and Algina, analysis of the test items to bring about what 1986), Messick (1995). Some test users, for defines the overall construct. It determines if instance, claim that the effectiveness of their the test measures a unitary construct or if it school programs could be determined using is a multi-dimensional construct as shown by the students' test scores. This calls for the resultant factors. These "validities" have consequential validation if the scores of the for a while been what are required to be students could indeed be used as indicators PED 06 - Assessment in Learning 1 o school effectiveness. What evidence can two types, the Pearson Product Moment be presented to support this interpretation or Correlation is used to get the coefficient of is this rather a threat to validity? What large- correlation (r)with this well-known formula: scale assessment tests allege to perform need appropriate supporting evidences to 𝑛 (∑ 𝑥𝑦) − (∑ 𝑥)(∑ 𝑦) establish validity. 𝑟= √[𝑛 ∑ 𝑥2 − (∑ 𝑥)2][𝑛 ∑ 𝑦2 − (∑ 𝑦)2] ESTIMATING RELIABILITY OF With only a single administration, split-half LARGE-SCALE TESTS reliability is workable. This divides the test Another challenge for large-scale tests is its into two halves using the odd-even split. All reliability to show that they can produce the odd-numbered items make up Form A stable and consistent results. This refers to while the even-numbered items compose consistency of measurement over time, Form B. The coefficient of correlation across versions of the test, across the test between the two half tests is obtained using items and across scoring judges for the same the Pearson Product Moment Correlation group of individuals. (Phelan and Wren, with Spearman-Brown Formula applied to 2006). estimate the correlation of the full tests (rn). The Spearman-Brown Formula is given here: Reliability is related to the concept of error of measurement which indicates the degree of 2𝑟11 fluctuation likely to occur in an individual 𝑟11 = (1 + 𝑟11 ) score as a result of irrelevant, chance factors which Anastasi and Urbina (1997) call error With r, as the reliability coefficient for the total variance. This occurs when the differences test, and r, as the coefficient of correlation of between scores are not attributable to the the two half tests. construct being measured but simply due to chance, something which cannot be Inter-rater reliability assesses the degree to controlled. Thus an instrument with a large which different judges or raters agree in their error variance is likely not to be valid since assessment decisions. This is quite useful to the scores are not indicative of the persons' avoid doubts on the scoring procedure of true score but are likely just a product of tests with non-objective items. The sets of chance. As it is said, "all valid tests are scores obtained in the test from two raters reliable but not all reliable tests are valid!" A can also be subjected to Pearson r to get the person can claim he's 30 in age every time, reliability coefficient. any time, whoever asks him but in truth, he's 50! Perfect reliability but zero validity! The other type of reliability looks at the internal consistency of responses to all items, There are several ways of estimating the With the assumption that all items in the test reliability of a test and they are grouped are measures of the same construct, there according to the number of times the test is will be inter-item consistency in the administered to the same group of students. responses of the test takers. The procedure With two test sessions, test-retest reliability will require how the individuals perform (i.e. where the same test is given twice with a pass/fail) in each item. Kuder-Richardson time interval not exceeding six months and Formula 20 (K-R 20) will be applied to alternate-form reliability where two estimate the reliability coefficient. comparable versions of the test are administered to the same individuals. 𝑛 𝜎 − ∑ 𝜌𝑞 Administration of the two forms can be 𝑟𝑡𝑡 = [ ] 2 immediately done, one after the other or 𝑛−1 𝜎 delayed with an interval not exceeding six With rtt as the reliability coefficient of whole months. This is also widely known as test, n the number of items in the test, 𝝈 as parallel-form reliability since they emerge standard deviation of the whole test and 𝝆𝒒 from the same table of specifications. The as cross product of the number of students nature and strength of relationship or who pass and the number who fail in each correspondence between the two sets of item. scores is then established using the coefficient of correlation. (Anastasi, 1976). This value ranges from -1.0 to +1.0. The closer it gets to £1.0, the more consistent are the scores obtained from the two test trials. To obtain the reliability coefficient in these PED 06 - Assessment in Learning 1 Establishing the validity and estimating the reliability of tests are given attention in this last chapter to emphasize their significance in the development process of large-scale tests. Test documentation must include how reliability is estimated and this may not be limited to only one type. The more evidences these are of the test's reliability, the more convincing the test becomes of its fidelity to measurement consistency. In terms of validity, supporting evidences for the possible score interpretations and actions recommended should be effectively reported. These two technical merits speak well of the test's usability for the recommended usage. With large-scale student assessment now growing in acceptance all over, it is important that the integrity of the development process be upheld. References: Guzman, E.& Adamos, J. (2015). Assessment of Learning. Adriana Publishing Balagtas, M.et al. (2020). Assessment in Learning 1. Rex Bookstore

Assessment in Learning 1 PDF

Document Details

Tags

Related

Summary

Full Transcript