PSYC 5123 Educational Psychology Lecture Notes PDF

PSYC 5123 Educational Psychology 2023-2024 Term 2 Lecture 9 March 14 A s s e s s m e n t & Te s t i n g Raw/Base Score Reliability Validity Objective/Subjective/ Performance-Based Properties Standard Scores Types of Scores Grade Equivalents Percentile Rank Levels of Measurement Ordinal Ratio Interval Classroom Standardized ASSESSMENT Measurement Nominal Tests Types of Tests Criterion- vs. NormReferenced Evaluation Purposes Interpreting and Summarizing Data Formative Distributions Central Tendency Dispersion Association Summative Diagnostic Defining the Terms Assessment “The process of obtaining information that is used to make educational decisions about students, to give feedback, to the student about his/her progress, strengths, and weaknesses, to judge instructional effectiveness and curricular adequacy, and to inform policy” (American Federation of Teachers, National Council of Measurement in Education, & National Educational Association, 1990, p. 1). Tests sampling of human behavior or attributes Defining the Terms Measurement process of converting information to numerical representations Evaluation specific judgments or decisions based on their assessments Formative Ungraded Used before or during instruction to plan future instruction Diagnostic Purposes of Evaluation Ungraded Used to pinpoint strengths and weaknesses in a student Summative Graded Follows instruction to determine if goals have been met Used to reach a terminal decision about the students or instruction (e.g., assess achievement) Test Properties Reliability Validity …the degree to which a measure yields consistent results. Types of Reliability Test-retest: stability, consistency in results upon multiple administrations Alternate Form: equivalence, comparable scores for the same individuals taking different forms of same test. Inter-rater/observer: level of agreement between observations of independent observers Dimensions of Reliability for Classroom Assessments 1. Can I make a dependable statement about what students have learned in relation to my goals and objectives? 2. Do my tests items or performance directions five clear, unambiguous expectations for students? 3. Am I consistent in making decisions across students and across similar types of work? Test Properties Reliability Validity …the degree to which a test measures what it claims to measure. Types of Validity Content: Extent to which content represents a balanced and adequate sampling of the outcomes about which inferences are to be made. Predictive: Accuracy with which a test is indicative of performance on a future measure Concurrent: Extent to which tests measure the same content or construct are in agreement Dimensions of Validity for Classroom Assessments 1. Do the concepts and processes required in assessment really reflect the content of the subject area or discipline? 2. Do items actually draw our of the targeted concepts and skills and do students perform similarly on different assessments of the same knowledge and skills? 3. Do the assessments fit the instructional methods and work equally well across groups and settings? 4. What are the relationships between assessment and other measures of background variables? 5. Are there negative consequences of assessments and grading practices that could be prevented if the assessment had been more valid? Test Properties Reliability Validity Why are reliability and validity important in classroom assessments? Have you ever questioned the reliability of an assessment? Have you ever questioned the validity of an assessment? Types of Tests Classroom Standardized Norm-Criterion Referenced Format Type Classroom Used to assess students in individual classrooms Often created or modified by classroom teachers Maybe norm or criterion Standardized Used to evaluate students across a range of schools, countries, experiences, etc. Typically given to large groups of students Administered under uniform conditions Scored according to uniform procedures May be norm or criterion Types of Tests Classroom Standardized Norm-Criterion Referenced Format Type Norm-Referenced Tests Any test in which the score acquires additional meaning by comparing it to the scores of people in an identified norm group Norm-Grading: Assessment of students’ achievement in relation to one another Norming Group: A group whose average score serves as a standard for evaluating any student’s score on the test Criterion Referenced (Standard’s based) Tests Any test in which scores are compared to a set performance standard Criterion Grading: Assessment of each student’s mastery of course objectives Norm-Referenced Assessments Strengths Weaknesses Criterion/StandardsBased Assessments Strengths Weaknesses Can measure wide range of objectives Does not inform on students’ readiness for more advanced material Measures mastery of specific objectives Cuttoffs and standards may be arbitrary Provides a description of overall achievement Encourages competition and comparisons Scores have meaning in terms of what student know or can do Cannot make comparisons to larger group The difficulty or ease of the test is less important Forces failure and success regardless of actual performance Comparisons are made to pre-set standards Not everything can be broken down into separate measurable objectives Appropriate when only the top few candidates can accepted Who the norming group is greatly affects an individual’s reported scores Can be more useful; “Child can do X,” rather than “Child is better than others” Types of Tests Classroom Standardized Norm-Criterion Referenced Format Type Objective Tests/Assessments Scoring does not require interpretation E.g., Multiple Choice Subjective Tests/Assessments Scoring requires interpretation E.g., Essay Performance-based/Authentic Assessments Measurement of important abilities using procedures that simulate the application of those abilities Procedures that tests the skills and abilities as they would be applied in a reallife setting Types of Scores Raw Score Percentile Standard Grade Equivalent Base/Raw Score: a person’s observed score on a test Percentile Rank: The percentage of scores falling at or below a certain point in the score distribution Standard Scores: A general term referring to scores that have been “transformed” for reasons of convenience, comparability, ease of interpretation, etc. Grade Equivalent Scores: Measure of grade-level based on comparison with norming samples from each grade Types of Scores Raw Score Percentile Standard Grade Equivalent Using and Interpreting GE Scores - Measure of grade-level based on comparison with norming samples from each grade Interpretation—If a student scores at the average of all 5th graders in the first month of the school year, on the 5th grade test, the student will obtain a GE of 5.1 Example—If a student at the end of 4th grade received a GE of 8.7 on the 4th grade math sub-test—what does it mean? This student is not necessarily ready for 8th grade work The student obtained the same score as an average student in the 7th month of the 8th grade, had the 8th grade student taken the 4th grade test Alternative Assessments Performance-based Assessments Emphasize application of knowledge Assess conceptual knowledge Advantages: Assess complex ideas Require complex mental processes Procedural orientation Greater Depth of exploration Student engagement Disadvantages: Limited Scope Production Demands Time and energy Provide needed scaffolding Scoring Difficulties Scoring with Rubrics What are they? Criteria indicative of competent performance Scores set for level of performance on each criteria Tool listing what counts Articulates gradations of quality Scoring with Rubrics Why use them? Improve teaching and student performance Allow students to self-monitor, Students become more thoughtful judges of work Expectations are clear as well as how to meet them Able to spot and solve problems in student learning Reduce time for evaluating work Can stretch to reflect abilities of students Easy to use and explain How to create a Rubric Look at models Show examples of good and not so good work List criteria List criteria of what counts as quality work Practice on models Use rubric to evaluate models Use self and peer assessment Revise work based on feedback and evaluation Use same rubric for self-assessment and teacher assessment Tips Avoid unclear language Avoid negative language Portfolios What are they? Collection of work that represents certain skills, knowledge, or abilities Showcase of accomplishments Best work or Demonstration of Learning and Growth Portfolios Advantages: Disadvantages: Self-Assessment Sampling Problems Self-determination and self-evaluation What should be included? Justify inclusion of different pieces Individualization Unique representation of skills and abilities Accumulated Evaluations Work samples collected over days, weeks, months, years Does student know what is most appropriate? Scoring Consistency Score twice? Time Demands Raw/Base Score Reliability Validity Objective/Subjective/ Performance-Based Properties Standard Scores Types of Scores Grade Equivalents Percentile Rank Levels of Measurement Ordinal Ratio Interval Classroom Standardized ASSESSMENT Measurement Nominal Tests Types of Tests Criterion- vs. NormReferenced Evaluation Purposes Interpreting and Summarizing Data Formative Distributions Central Tendency Dispersion Association Summative Diagnostic

PSYC 5123 Educational Psychology Lecture Notes PDF

Document Details

Tags

Related

Summary

Full Transcript