Principles Of Test Construction And Administration PDF
Document Details
Uploaded by InterestingKansasCity
Redeemer's University
Tags
Summary
This document provides an overview of the principles of test construction and administration, focusing on their role in nursing education. It details different types of tests, such as aptitude and achievement tests, and explores the crucial concept of validity and reliability in assessment.
Full Transcript
PRINCIPLES OF TEST CONSTRUCTION AND ITS ADMINISTRATION MEASUREMENT AND EVALUATION IN NURSING EDUCATION OUTLINE Types of tests Principles of test construction Qualities of a good test Reliability Validity Practicability & Economy Administration of test Scoring of t...
PRINCIPLES OF TEST CONSTRUCTION AND ITS ADMINISTRATION MEASUREMENT AND EVALUATION IN NURSING EDUCATION OUTLINE Types of tests Principles of test construction Qualities of a good test Reliability Validity Practicability & Economy Administration of test Scoring of test Item analysis INTRODUCTION A test is a particular type of assessment that typically consists of set of questions administered during a fixed period of time under reasonably comparable conditions for all student. A test can also be described as an instrument or systematic procedure for measuring a sample of behavior by posing set of questions in a uniform manner. It is a form of assessment. Measurement is the assigning of numbers to the results of a test or other type of assessment according to specific rule (i.e counting correct answers or awarding points for particular aspect of an essay) Measurement is also a process of obtaining a numerical description of the degree to which an individual possesses a particular characteristic. Types Of Tests Tests may be classified into two broad categories on the basis of nature of the measurement. These are: Measures of maximum performance and measures of typical performance. In measures of maximum performance you have those procedures used to determine a person’s ability. They are concerned with how well an individual performs when motivated to obtain as high a score as possible and the result indicates what individuals can do when they put forth their best effort. Examples are Aptitude test, Achievement tests and intelligence tests. On the other hand measures of typical performance are those designed to reflect a person’s typical behaviour. They fall into the general area of personality appraisal such as interests, attitudes and various aspects of personal social adjustment. Because testing instruments cannot adequately be used to measure these attributes, self-report and observational techniques, such as interviews, questionnaires, anecdotal records, ratings are sometimes used. These techniques are used in relevant combinations to provide the desired results on which accurate judgment concerning learner’s progress and change can be made. Measures of Maximum Performance Aptitude test(Separate Ability):When we talk about aptitude, we refer to the natural talent or ability especially specified. Thus, aptitude tests measure specialized abilities and the potential to learn or perform new tasks that may be relevant to later learning or performance in a specific area. Hence they are future oriented. An example is the Common Entrance Examination into Vocational Schools and even Secondary Schools. Achievement test: Achievement tests are designed to measure the effects of a specific programme of instruction or training which the learners attained usually by their effort. Generally, they represent a terminal evaluation of the learner’s status on the completion of a course of study or training. That is, it is used to determine how much the learner has learned from specified content via systemic and controlled instructions. End of term examinations and classroom tests are mostly achievement tests. Achievement test may be classified in the following ways: By mode of Response: Oral test. Written test. Practical test. By Purpose of Testing: Placement Test. Formative Test. Diagnostic Test. Summative Test. By Desired Speed of Response: Power test. Speed test. By Degree of Rigour Employed in Preparation and Scope ofApplicability: Teacher Made Tests. Standardized Tests. By Mode of Interpreting Results: Norm –referenced Testing. Criterion-referenced Testing. Self-referenced Testing. By Format of Test Items: Objective Test Items. Essay Test Items. Intelligence test(General Mental Ability test):Intelligence is the ability to reason and learn from experience. It is thought to depend both on inherited ability (nature) and on surroundings in which a person is brought up (nurture). The first intelligence tests were devised by Alfred Binet in 1905 to give an Intelligence Quotient (IQ). Intelligence test provides an indication of an individual’s general mental capacity. An Intelligence test usually includes a wide variety of tests so as to sample several aspects of cognitive function. Some people believe that Intelligence can be expressed only in speech and writing and therefore cannot be tested. Further explanation on types of test Diagnostic tests: This test is used at the onset of a lesson or research to evaluate strengths and weaknesses of a group of people on a particular subject matter. This enables those carrying out the test determine areas in which they are to focus on. Formative tests; This type of test is used to monitor the progress of a student during period of instruction. The aim of this test is to provide continuous feedback so as to enable the educator adjust his methods to best suit the student. Summative tests: This type of testing is used to assess knowledge of a subjects or courses taught over a period of time to assess the student’s overall progress and judge the effectiveness of teaching materials. Norm-referenced test: This is a type of test given which compares the scores to a student’s or class performance Criterion referenced test: This is a type of test given which is set against predetermined standards to assess knowledge students are required to know at each stage of learning. Principle of test construction Validity: A test should be constructed in a way which it measures what it is supposed to, fulfilling the purpose for which the test was created. The validity of a test is determined by the relationship between test and the criterion of efficiency. Reliability: This means that the test is consistent with which it serves as a measuring instrument. In other words, a test is considered reliable when it gives similar results when administered on different occasions. Reliability is usually determined by the following methods: Administering the test to two separate groups at the same time and correlating resultant series of scores Giving two or more different but equal forms of the same test and correlating resultant scores Split-half or odd-even method: Split-half reliability divides the test in two halves and assess the consistency of the scores Standardization: This is the administration of the test to a representative sample of people to establish meaningful basis for comparison in future tests. Standardized test produce a pattern of scores distributed on a bell shaped curve which is called a normal curve. Qualities Of A Good Test Measure all instructional objectives: Objectives that are communicated and imparted to the students. Designed as an operational control to guide the learning sequences and experiences. Harmonious to the teachers instructional objectives. Covers all learning tasks: Measures the representative part of learning task. Appropriate testing strategies or items: Items which appraise the specific learning outcome. Measurements or tests based on the domains of learning. The test should be valid and reliable: Reliable when it produce dependant, consistent, and accurate scores. Valid when it measures what it purports to measure. Tests which are written clearly and unambiguous are reliable. Tests with fairly more items are reliable than tests with less items. Tests which are well planned, covers wide objectives, and are well executed are more valid. The test can be used to improve learning: Tests are not only an assessment but also it is a learning experience. Going over the test items may help teachers to reattach missed items. Discussion and clarification over the right choice gives further learning. Further guidance and modification in teaching measures enabled through the revision of test. Norm-referenced and criterion-referenced tests: Norm referenced: Higher and abstract level of cognitive domain. Criterion referenced: Lower and concrete levels of learning. When constructing or selecting assessment, the most important question are: 1.To what extent will the interpretation of the score be appropriate, meaningful, and useful for the intended application of the result? 2. What are the consequence of the particular uses and interpretations that are made of the results? However, assessment take a wide variety forms, ranging from the familiar multiple choice or other type of fixed response test to extended observation of performance. But regardless of the type of assessment used or how the result are to be used, all assessment should possess certain characteristics. The most essential of these are the: ---- VALIDITY --- RELIABILTY ----- USABILTY VALIDITY Validity refers to the adequacy and appropriateness of the interpretations made from assessments, with regard to a particular use. For example, if an assessment is to be used to describe student achievement, we should be able to interpret the scores as a relevant and representative sample of the achievement domain to be measured. If the results are to be used to predict students success in some future activity, we should like our interpretation to be based on as good estimate for future success as possible. On the other hand, if the results are to be used as a measure of students reading comprehension, we should like our interpretations to be based on evidence that the scores actually reflect reading comprehension and are not distorted by irrelevant factors. Basically, validity is always concerned with the specific use of assessment results and the soundness of our proposed interpretations of those results. NATURE OF VALIDITY When using judging the quality of a good test, the term validity in relation to testing and assessment is of importance and there are numbers of cautions to bear in mind which include: 1. validity refers to appropriateness of the interpretation of the result of an assessment procedure for a given group og individuals, not to procedure itself 2. validity is a matter of degree; it does not exist on an all-or-one basis. 3. validity is always specific to some particular use or interpretation. ( no assessment is valid for all purpose) 4. validity is a unitary concept; the conceptual nature of validity has typically been described for the testing profession in a set of standards prepared by a joint committee made up by members from three professional organizations that are especially concerned with educational and psychological testing and assessment. 5. validity involves and overall evaluative judgement; its requires an evaluation of the degree to which interpretations and uses of assessment results are justified by supporting evidence and in terms of the consequences of those interpretations and uses. MAJOR CONSIDERATIONS IN ASSESSMENT OF VALIDATION 1. CONTENT VALIDATION - How well does the sample of assessment tasks represents the domain of tasks to be measured 2. TEST – CRITERION RELATIONSHIP – how well performance on the assessment predicts future performance or estimates current performance on some valued measures other than the test itself ( called criterion) 3. CONSTRUCT – how well performance on the assessment can be interpreted as a meaningful measure of some characteristics or quality 4. CONSEQUENCES - how well use of assessment results accomplishes intended purposes and avoids unintended effects. RELIABILTY NATURE OF RELIABILTY Reliability refers to the consistency of measurement; that is how consistent test scores or measurement results are from one measurement or the other. Reliability of a test is the ability of the test to measure what it is supposed to measure consistently. A highly valid test tends to give consistent result whenever it is used. However, a test may be reliable but not valid. Forinstance, a reliable physics test administered to a law student, may consistently result in failure grades anytime the test is administered. The test is reliable , but not valid for the student who is not a physics student. A test score gained by a student in a subject area may change from administration to another, from time to time, and from place to place. Similary, if another test is prepared based on the same syllabus, course work or lesson note, the candidates score may be different. This means that the score are not consistent. Based on these inconsistencies, there are four different estimates of test reliability as follows: 1. scoring rate (rating) reliability coefficient 2. coefficient of stability 3. coefficient of internal consistency 4. coefficient of equivalence Scoring reliability coefficient : the correlation coefficient is known as interscorer (inter rater) reliability coefficient. This type of reliability is useful when the number of candidates to be observed or to be scored are many and the experts needs more hands to assist in coping with the numbers of candidates. Coefficient of stability: this is a measure of how stable an instrument can be over a period of time. It can be represented by test-retest reliability. To determine this reliability coefficient, the instrument is first administered to the candidates and their score as recorded after a, period of 6- 8weeks, the test is re-administered to the same students, the two sets of scores can be correlated using pearson moment correlation. Coefficient of internal consistency this is an important parameters in the construction of measurement instruments. The coefficient of internal consistency is a measure degree to which the test items are homogenous. i.e. measuring the same things, talents or skills It can be determined by either or all of these three methods; - split half - kuder-Richardson formula method - factor analysis split half involves a. One test form b. One group of candidates c. One test administration, The procedure includes 1. The total test is administered to a group of students first 2. The test is divided in to two comparable halves 3. Each persons scores on each of the two halves are computed 4. The two sets of scores are now correlated using pearson moment correlation formula, and a further correction formula is applied to adjust the reliability coefficient, which is Spearman-Brown prophecy formula. Kuder-Richardson method Kuder and Richardson (1939) developed two formular to estimate the internal consistency of test instrument. (assignment what are the two formulars) the use of the formular faces two problems a. it is not appropriate for some achievement tests that are not homogenous b. it is also not appropriate for high speed test (e.g aptitude test) Internal consistency by factor analysis it is a generic term associated with number of multivariate statistical methods that model sets of manifest or observed variables in terms of linear functions of latent unobserved variable(Muliak, 1982). In order words, in factor analysis, dependent variable are manifest or observed variables linearly dependent on a set of latent unobserved independent variables. It is dependent on whether their aim is exploratory or confirmatory. Exploratory seeks to discover those important latent variables of new domain of variables. Confirmatory methods seeks to confirm hypotheses regarding the structural composition of manifest variable. Coefficient of Equivalence this is the degree to which different forms of the test(equivalent or alternate forms) gives the same or consistent results during administration. This is achieved by: 1. administering two different forms of the same test to same set of students 2. one form after the other without any time lapse between two FACTORS INFLUENCING RELIABITY MEASURES INCLUDE: 1. Number of assessment tasks 2. Spread of the score 3. objectivity 4. methods of estimating reliability USABILITY In selecting assessment procedures, practical considerations cannot be neglected. Assessment are usually administered and interpreted by teachers/experts with only a minimum of training in measurement. The time available for assessment is almost always limited, because assessment is in constant competition with other important activities for time in school schedule. These and other factors pertinent to the usability of assessment procedures must be taken into account when selecting assessment procedures. 1. ease of administration 2. time required for administration 3. ease of interpretation and application 4. availability of equivalent or comparable form Cost of testing Economy and Practicality Practicality of a test are the factors taken into consideration while determining the use of a test. There are five practical considerations which must be taken when deciding to use a test, these include: Ease of administration: A test must be easy for the tester to administer. This means that instructions to be given must be simple and clear, subsets of questions must not be too long and the time required must be appropriate. Time required: The safest procedure is to allot as much time as the test requires for providing reliable results. This is to prevent the test not being reliable due to short time given. Ease of interpretation and application: If there is a misinterpretation of the test, it will prove harmful to the testees and if it is wrongly applied, the test will lose its usefulness Availability of equivalent forms: Equivalent forms helps to verify test scores. Cost of testing: A test should be economical in terms of preparation, administration and scoring. Administration of test Test administration is concerned in the physical and psychological environment in which the testees are given the test. Ensuring adequate conditions for a test to hold enables production of valid and reliable results. There are three stages in administering a test: Before the test During the test After the test Before the test:The following are the responsibilities of the tester before the administering the test Ensuring all items are grouped in a similar format to avoid confusion Ensuring items are well spaced and easy to read Ensuring diagrams, illustrations, charts are placed accordingly Ensuring the methods in which answers will be recorded has been determined Ensuring test items are arranged appropriately: from easy to hard Ensuring it is properly proofread Ensuring enough copies of test items are available Ensuring confidentiality of the test is maintained Ensuring the environment is adequate to carry out the test in the time given Informing students of necessary information needed before taking the test Ensuring all students have the appropriate requirements for the test Ensuring the atmosphere is calm and settled before commencing. During the test: The following are responsibilities of the tester during the period the test is carried out to prevent the test from being invalid and Unreliable. Maintaining a good seat space between testees to prevent potential cheating Ensuring all extra papers used for calculations returned after the test Monitoring student’s movement closely during the period of the test Using two different test items to prevent cheating Responsibilities after the test: The main goal is to maintain orderliness while ensuring all test items are collected by the tester. After this is done, the test administrator is to: Ensure the number of test items are complete by counting Ensure all unused items are properly kept and documented on If the administrator is the teacher, he/she is to develop and adequate marking scheme He/she is to evaluate the overall performance of the group and assess individually GUIDES TO CONSTRUCTION OFQUESTIONAIRES Itsimportant to note that every item in the questionnaire constitute a hypothesis or part of a hypothesis or a research question. Therefore, the inclusion of every item should be based on the understanding that the answer is significant to the central problem. Other guidelines include: A. make the first set of question arouse the interest of the respondents B. graduate from simple to complex item C. the questionnaire should be arranged in such a way that the respondent moves from one frame of reference the another smoothly without jumping back and forth from one to the other. In this connection, items should be grouped under scale headings- with each heading consisting of homogenous items. Factors that affects respondent rate i. the clarity and attractiveness of the questionnaire ii. The length of the questionnaire iii. The introduction letter and guarantee for confidentiality iv. The interest of the research objectives to the respondent vi. The inducements offered to reply CHARACTERISTICS INCLUDES: Consistency Usability Clarity Quantifiability Legibility