Materials Evaluation PDF

Chapter 1 Materials evaluation Brian Tomlinson What is materials evaluation? Materials evaluation is a procedure that involves measuring the value (or potential value) of a set of learning materials. It involves making judgements about the effect of the materials on the people using them and it tries to measure some or all of the following: the appeal of the materials to the learners; the credibility of the materials to learners, teachers and administrators; the validity of the materials (i.e. Is what they teach worth teaching?); the reliability of the materials (i.e. Would they be likely to have the same effect with different groups of target learners?); the ability of the materials to engage the learners and the teachers; the ability of the materials to motivate the learners; the value of the materials in terms of short-term learning (important, for example, for performance on tests and examinations); the value of the materials in terms of long-term learning (of both language and the ability to use it effectively to achieve communication); the value of the materials in contributing to the learners’ development of cultural awareness; the educational value of the materials in contributing to the development of such life-long skills as criticality and creativity; the learners’ perceptions of the value of the materials; the teachers’ perceptions of the value of the materials; the assistance given to the teachers in terms of preparation, delivery and assessment; the flexibility of the materials (e.g. the extent to which it is easy for a teacher to adapt the materials to suit a particular context); the contribution made by the materials to teacher development; the match with administrative requirements (e.g. standardization across classes, coverage of a sylla- bus, preparation for an examination). eltshop.ir 26 Developing Materials for Language Teaching It is obvious from a consideration of the effects above that no two evaluations can be the same, as the needs, wants, objectives, backgrounds and preferred styles of the participants will differ from context to context. This is obviously true of an evaluation of the value of a coursebook for use with sixteen-year- olds preparing for a Ministry of Education Examination in South Africa compared to an evaluation of the same coursebook for use with teenagers and young adults being prepared for the Cambridge First Certificate at a language school in Oxford. It is also true for the evaluation of a set of materials prepared for Foundation Level learners in a university in January compared with a set of materials for the same type of learners prepared in the same university in July. The main point is that it is not the materials which are being evaluated but their effect (or likely effect) on the people who come into contact with them (including, of course, the evaluators). An evaluation is not the same as an analysis. It can include an analysis or follow on from one, but the objectives and procedures are different. An evaluation focuses on the users of the materials and makes judgements about the effects of the materials on the users. No matter how structured, criterion referenced and rigorous an evaluation is, it will be essentially subjective. On the other hand, an analysis focuses on the materials and it aims to provide an objective analysis of them. It ‘asks questions about what the materials contain, what they aim to achieve and what they ask learners to do’ (Tomlinson, 1999, p. 10). So, for example, ‘Does it provide a transcript of the listening texts?’ is an analysis question which can be answered by either ‘Yes’ or ‘No’. ‘What does it ask the learners to do immediately after reading a text?’ is also an analysis question and can be answered factually. Another example of an analysis question would be, ‘To what extent are different cultures represented in the materials?’ As a result of answering many such questions, a description of the materials can be made which specifies what the materials do and do not contain. On the other hand, ‘Are the listening texts likely to engage the learner?’ is an evaluation question and can be answered on a cline between ‘Very unlikely’ and ‘Very likely’. It can also be given a numerical value (e.g. 2 for ‘Unlikely’) and after many such questions have been asked about the materials, subtotal scores and total scores can be calculated and indications can be derived of the potential value of the materials and of subsections of them. For example, a coursebook which scores a total of 75 per cent or more is likely to be generally effective but, if it scores a subtotal of only 55 per cent for listening, it is unlikely to be effective for a group of learners whose priority is to develop their listening skills. See Littlejohn (2011) for an example and discussion of materials analysis and Tomlinson et al. (2001), Masuhara et al. (2008), Tomlinson and Masuhara (2013, 2018) and Tomlinson (2019) for examples of materials evaluation. A detailed analysis of a set of materials can be very useful for deciding, for example, if anything important has been missed out of a draft manuscript, for deciding how closely it matches the require- ments of a particular course and as a database for a subsequent evaluation of the materials. Ideally analysis is objective but analysts are often influenced by their own ideology and their questions are biased accordingly. For example, in the question ‘Does it provide a lot of guided practice?’, the phrase ‘a lot of’ implies it should do and this could interfere with an objective analysis of the materials. Analysts also often have a hidden agenda when designing their instruments of analysis. For example, an analyst might ask the question ‘Are the dialogues authentic?’ in order to provide data to support an argument that intermediate coursebooks do not help to prepare learners for the realities of conversation. Or an analyst might ask the question, ‘Is the learners’ first language made use of in the materials?’, either Materials Evaluation 27 because they think it should be represented or because they think it should not be. This is legitimate if the analysis questions are descriptive and the subsequent data provided is open to evaluative interpre- tation. For example, I conducted an analysis of ten lower-level coursebooks (Tomlinson, 1999, p. 10) to provide data to support my argument that such books were too restricted in their emphasis on language form, on language practice rather than use and on low-level decoding skills. My data revealed that nine out of the ten books were forms and practice focused and that in these books there were five times more activities involving the use of low-level skills (e.g. pronouncing a word) than there were involving the use of high-level skills (e.g. making inferences). I was then able to use my data to argue the need for lower-level coursebooks to be more holistic and meaning-focused and to be more help to the learners in their development of high-level skills. But a different analyst could have used the same instruments and the same data to argue that lower-level coursebooks were helping learners to develop from a confident base of low-level skills. Many publications on materials evaluation mix analysis and evaluation and make it very difficult to use their suggested criteria because, for example, in a numerical evaluation most analysis ques- tions would result in 1 or 5 on a 5-point scale and would thus be weighted disproportionately when combined with evaluation questions, which tend to yield 2, 3 or 4. For example Mariani (1983, pp. 28–9) includes in a section on ‘Evaluate your coursebook’ such analysis questions as, ‘Are there any teacher’s notes …’ and ‘Are there any tape recordings?’ alongside such evaluation questions as, ‘Are the various stages in a teaching unit adequately developed?’ And Cunningsworth (1984, pp. 74–9) includes both analysis and evaluation questions in his ‘Checklist of Evaluation Criteria’. Cunningsworth does recognize the problem of mixing these different types of questions by saying that ‘Some of the points can be checked off either in polar terms (i.e. yes or no) or where we are talking about more or less of something, on a gradation from 1 to 5’ (1984, p. 74). My preference for separating analysis from evaluation is shared by Littlejohn (2011), who presents a general framework for analysing materials (pp. 182–98), which he suggests could be used prior to evaluation and action in a model which is sequenced as follows: Analysis of the target situation of use. Materials analysis. Match and evaluation (determining the appropriacy of the materials to the target situation of use). Action. McDonough, Shaw and Masuhara (2013) propose a similar model to Littlejohn’s but one which is less demanding of teacher time and expertise by only having two stages. The first stage is ‘an external eval- uation that offers a brief overview of the materials from the outside (cover, introduction, list of contents)’ (p. 53) and the second stage consists of a criterion-referenced ‘internal evaluation’. Materials evaluation is not only useful as a source of information for the monitoring of materials in development as well as for the selection and adaptation of materials already developed but also as a catalyst for teacher development. For a discussion of how materials evaluation can be used to help teachers to increase their knowledge and awareness of second language acquisition, see Chapter 1 (Tomlinson) in the web supplement to this volume. 28 Developing Materials for Language Teaching Principles in materials evaluation Many evaluations are impressionistic, or at best are aided by an ad hoc and very subjective list of criteria. In my view it is very important that evaluations (even the most informal ones) are driven by a set of principles of language learning and that these principles are articulated by the evaluator(s) prior to the evaluation. In this way greater validity and reliability can be achieved and fewer mistakes are likely to be made when selecting and using materials. In developing a set of principles it is useful to consider the following. The evaluator’s theory of learning and teaching All teachers develop theories of learning and teaching which they apply in their classrooms (even though they are often unaware of doing so). Many researchers (e.g. Schon, 1983; Farrell, 2018) argue that it is useful for teachers to try to achieve an articulation of their theories by reflecting on their practice. For example Edge and Wharton (1998, p. 297) argue that reflective practice can not only lead to ‘perceived improvements in practice but, more importantly, to deeper understandings of the area investigated’. In a similar way I am going to argue that the starting point of any evaluation should be reflection on the evaluator’s practice leading to articulation of the evaluator’s theories of learning and teaching. In this way evaluators can make overt their predispositions and can then both make use of them in constructing criteria for evaluation and be careful not to let them weight the evaluation too much towards their own bias. At the same time evaluators can learn a lot about themselves and about the learning and teaching process. Here are some of my theories, which I have articulated as a result of reflection on my research and reading and on my own and other teachers’ practice: Language learners succeed best if learning is a positive, relaxed and enjoyable experience. Language teachers tend to teach most successfully if they enjoy their role and if they can gain some enjoyment themselves from the materials they are using. Materials should be learner-centred rather than teacher-centred (i.e. they should be for language learning rather than language teaching). Learning materials lose credibility for learners if they suspect that the teacher does not value them. Each learner is different from all the others in a class in terms of his or her personality, motivation, attitude, aptitude, prior experience, interests, needs, wants and preferred learning styles. Each learner varies from day to day in terms of motivation, attitude, mood, perceived needs and wants, enthusiasm and energy. There are superficial cultural differences between learners from different countries (and these differ- ences need to be respected and catered for) but there are also strong universal determiners of successful language teaching and learning. Successful language learning in a classroom (especially in large classes) depends on the generation and maintenance of high levels of energy. The teacher is responsible for the initial generation of energy in a lesson; good materials can then maintain and even increase that energy. eltshop.ir Materials Evaluation 29 Learners only learn what they really need or want to learn. Learners often say that what they want is focused language practice but they often seem to gain more enjoyment and learning from activities which stimulate them to use the target language to say something they really want to say. Learners think, say and learn more if they are given an experience or text to respond to than if they are just asked for their views, opinions and interests in a vacuum. The most important thing that learning materials have to do is to help the learner to connect the learn- ing experience in the classroom to their own life outside the course. The more novel (or better still bizarre) the learning experience is the more impact it is likely to make and the more likely it is to contribute to long-term acquisition. The most important result that learning materials can achieve is to engage the emotions of learners. Laughter, joy, excitement, sorrow and anger can promote learning. Neutrality, numbness and nullity cannot. I could go on for pages more articulating theories which I did not really know I believed in so strongly. These theories are valid for me in that they have come from seven years of classroom language learning and of over fifty years of teaching a language in nine different countries. They will be of considerable help when it comes to me constructing my own criteria for materials evaluation. However, what is valid for me from my own experience will not be valid for other evaluators and users of materials from their experience and I must be careful not to assume that my criteria will be the correct criteria. For example, from a quick glance at the extracts from my theories above it is obvious that I favour a holistic rather than a discrete approach to language learning, that I think flexibility and choice are very important and that I value materials which offer affective engagement to both the learner and the teacher. I must be careful not to insist that all learning materials match my requirements. Learning theory Research into learning is controversial as there are so many variables involved and local circumstances often make generalization precarious. However, it is important that the materials evaluator considers the findings of learning research and decides which of its findings are convincing and applicable. The conclusions which convince me are that: Deep processing of intake is required if effective and durable learning is to take place (Craik and Lockhart, 1972). Such processing is semantic in that the focus of the learner is on the meaning of the intake and in particular on its relevance to the learner and to the context of the input. Affective engagement is also essential for effective and durable learning. Having positive attitudes towards the learning experience and developing self-esteem while learning are important determiners of successful learning. And so is emotional involvement. Emotions must be ‘considered an essential part of learning’ (Williams and Burden, 1997, p. 28) as they ‘are the very centre of human mental life … [they] link what is important for us to the world of people, things and happenings’ (Oatley and Jenkins, 30 Developing Materials for Language Teaching 1996, p. 122). See Damasio and Carvalho (2013), Farrell (2018) and Tomlinson and Masuhara (2021) for detailed arguments for affective engagement being a pre-requisite for effective learning. Making mental connections is a crucial aspect of the learning process. In order for learning to be successful, connections need to be made between the new and the familiar, between what is being learned and the learner’s life and between the learning experience and its potential value both now and in the future. See Kern (2008) for discussion of the value of helping learners to make connections with their lives and examples of how the teacher can achieve this when using language learning materials; and see Shing and Brod (2016) for information about how connecting new to prior learning can facilitate durable learning. Experiential learning is essential (though not sufficient) for effective and durable learning. It provides opportunities for the brain to connect new knowledge to previous experience and knowledge in contextualized and meaningful ways and to connect learning to its utilization in life. It does so by stimulating apprehension as a useful precursor of comprehension (Kolb, 1984; Kelly, 1997; Kolb and Kolb, 2009; Tomlinson and Masuhara, 2000, 2018, 2021). Learners will only learn if they need and want to learn and if they are willing to invest time and energy in the process. In other words, both instrumental and integrative motivation are vital contributors to learning success (Dörnyei and Ushioda, 2009; Dörnyei and Ryan, 2015), ideally on a constant basis but more realistically on an occasional basis with learners being motivated (or not) by their materials and the ways that their teachers make use of them. It cannot be assumed that learners are motivated and learners cannot be blamed for not being so. In my view materials should be developed on the assumption that their users will not be sufficiently motivated to achieve communicative competence and will need to be motivated by meaningful and engaging materials in order to succeed (Dörnyei, Henry and Muir, 2015; Tomlinson and Masuhara, 2021). Multidimensional processing of intake is essential for successful learning and involves the learner creating a representation of the intake through such mental processes as sensory imaging (especially visualization), affective association and the use of the inner voice (Masuhara, 1998, 2005; Tomlinson, 2000a, 2000b, 2001a, 2003, 2011b, 2020a; de Guerro, 2005, 2018; Wiley, 2006; Tomlinson and Avila, 2007). As Berman (1999, p. 2) says, ‘we learn best when we see things as part of a recognised pattern, when our imaginations are aroused, when we make natural associations between one idea and another, and when the information appeals to our senses.’ One of the best ways of achieving multidimensional representation in learning seems to be a whole person approach which helps the learner to respond to the learning experience with emotions, attitudes, opinions and ideas (Jacobs and Schumann, 1992; Schumann, 1997, 1999; Arnold, 1999). Materials which address the learner in an informal, personal voice are more likely to facilitate learning than those which use a distant, formal voice (Beck et al., 1995; Tomlinson, 2001b). Features which seem to contribute to a successful personal voice include such aspects of orality as: Informal discourse features (e.g. contracted forms, ellipsis, informal lexis) The active rather than the passive voice Materials Evaluation 31 Concreteness (e.g. examples, anecdotes) Inclusiveness (e.g. not signalling intellectual, linguistic or cultural superiority over the learners) Sharing experiences and opinions Sometimes including casual redundancies rather than always being concise (Tomlinson, 2001b). As a materials evaluator I would convert the assertions above into criteria for the assessment of learning material. For example, I would construct such criteria as: To what extent are the materials likely to relate to the wants of the learners? To what extent are the materials likely to help the learners to achieve connections with their own lives? To what extent are the materials likely to stimulate emotional engagement? To what extent are the materials likely to promote visual imaging? Second language acquisition research (SLA) SLA research is so far inconclusive and has stimulated many disagreements and debates (e.g. about the value of the explicit teaching of discrete language points comparted to the implicit acquisition of language in meaning-focused activities). However, there is now a sufficient consensus of opinion on certain facilitating features of language learning for them to be useful in helping to articulate principles to be used as a basis of materials evaluation. In Tomlinson (2011a, pp. 6–23) I discussed the principles of second language acquisition which I think SLA researchers would agree are relevant to the development of materials for the teaching of languages. Some of these principles are summarized below with up-to-date references and additional comments: Materials should achieve impact (through novelty, variety, surprise, bizarreness, attractive presenta- tion and appealing content). Materials should help learners to feel at ease (e.g. through the use of white space to prevent clutter, through the use of texts and illustrations which they can relate to their own culture, through a support- ive approach which is not always testing them and through the use of a personal voice). Materials should help the learners to develop confidence (e.g. through ‘pushing’ learners slightly beyond their existing proficiency and by involving them in tasks which are challenging but achievable). What is being taught should be perceived by learners as relevant, meaningful and useful (Stevick, 1976; Krashen, 1982; Wenden and Rubin, 1987, Tomlinson and Masuhara, 2018, 2021). Materials should require and facilitate learner self-investment (e.g. through giving learners respon- sibility for making decisions and through encouraging them to make discoveries about the language for themselves (Rutherford and Sharwood-Smith, 1988; Tomlinson, 1994, 2007, 2018; Bolitho et al., 2003). Learners must be ready to acquire the points being taught in terms of both developmental readi- ness and psychological readiness too (Meisel et al., 1981; Pienemann, 1985, 2005, Tomlinson and eltshop.ir 32 Developing Materials for Language Teaching Masuhara, 2021). Some language features can only be acquired once other features have been acquired (e.g. the present continuous tense after the simple present of the verb ‘to be’ has been acquired) whereas many features can be acquired if the materials and/or the teacher create the need and motivation to acquire them. Materials should expose the learners to language in authentic use (ideally to a rich and varied input which includes unplanned, semi-planned and planned discourse and which stimulates mental responses). See Mishan (2005), Rilling and Dantas-Whitney (2009), Tomlinson (2012), Maley and Tomlinson (2017) and Tomlinson and Masuhara (2018, 2021) for a discussion of the value of incorpo- rating authentic texts and tasks in materials and for reference to researchers who support authenticity and to some who argue against it. The learners’ attention should be drawn to linguistic features of the input so that they are alerted to subsequent instances of the same feature in future input (Seliger, 1979; White, 1990; Schmidt, 1992; Ortega, 2009, 2021, Ellis, 2015, Long, 2015, Tomlinson and Masuhara, 2018, 2021). Ideally this attention should be learner initiated and should be activated by wanting or needing to learn about features which have been encountered (or needed) in meaning-focused activities and have proved to be particularly salient, problematic or valuable. Materials should provide the learners with opportunities to use the target language to achieve purposeful communication. This gives the learners opportunities to check the validity of their existing hypotheses (Swain, 1985, 2005), both to strengthen language and strategies already acquired and to acquire new language and strategies as a result of being pushed. It also gives them opportunities to receive situational feedback and to experience ‘new’ input from their interactants (Canale and Swain, 1980; Swain, 1985, 2005; Schütze, 2017). See Tomlinson and Masuhara (2018, 2021) for detailed discussions of the value of communicative use of a language whilst learning it and for recommenda- tions for such experiential approaches as Task-Based Language Teaching, Text-Driven Approaches, Project Approaches, Problem-Based Approaches, Content and Language Integrated Learning and the Action-oriented Approach as ways of providing learners with potentially engaging ways of using and extending their existing linguistic and strategic repertoires in order to achieve effective communication. Materials should take into account that the positive effects of instruction are usually delayed, and therefore should not expect effective production immediately to follow initial presentation but should rather ensure spaced recycling and frequent and ample exposure to the instructed features in communicative use. Most materials cater predominantly for students with a preference for studial learning but ideally they should take into account that learners differ in preferred learning styles not only from each other but from one learning context to another (Oxford and Anderson, 1995; Oxford, 2002; Anderson, 2005), and should therefore ensure that they also cater for learners who are predominantly visual, auditory, kinaesthetic, studial, experiential, analytic, global, dependent or independent. Ways of doing this include ensuring that all learners experience all these styles and offering choices of activities which vary in their predominant learning style. Materials should take into account that learners differ in affective attitudes (Wenden and Rubin, 1987), and therefore materials should offer variety and choice. Materials Evaluation 33 Materials should maximize learning potential by encouraging intellectual, aesthetic and emotional involvement which stimulates both right and left brain activities through a variety of non-trivial activities requiring a range of different types of processing (Schütze, 2017; Tomlinson and Masuhara, 2021). Materials should provide opportunities for outcome feedback (i.e. feedback on the effectiveness of the learner in achieving communication objectives rather than just feedback on the accuracy of the output). There should be opportunities for such feedback from teachers, from peers and from the learner. See Chapters 7 and 9 of Tomlinson and Masuhara (2021) for discussions of the value of teacher, peer and self-monitoring. In addition to the requirements listed above I would like to add that materials should: help the learner to develop cultural awareness and sensitivity (Tomlinson, 2000b; Byram and Masuhara, 2013; Chapter 25 (Mishan) in this volume); reflect the reality of language use; help learners to learn in ways similar to the circumstances in which they will have to use the language; help to create readiness to learn (e.g. by helping learners to draw their attention to the gap between their use of a feature of communication and the use of that feature by proficient users of the language, or by involving the learners in a task in which they need to learn something new in order to be successful); achieve affective and cognitive engagement (Tomlinson, 2010, 2016a; Tomlinson and Masuhara, 2018, 2021). Richards (2001, p. 264) suggests a rather different and briefer list of the ‘qualities each unit in the materials should reflect’: Gives learners something they can take away from the lesson. Teaches something learners feel they can use. Gives learners a sense of achievement. Practises learning items in an interesting and novel way. Provides a pleasurable learning experience. Provides opportunities for individual practice. Provides opportunities for personalization. Provides opportunities for self-assessment of learning. Other principled requirements for materials are listed by Harwood (2014), Tomlinson (2013, 2016a, 2016b) and Tomlinson and Masuhara (2018, 2021). The important thing is for materials evaluators to decide for themselves which findings of SLA research they will use to develop principles for their evaluation. Ultimately what matters is that an eval- uation is principled, that the evaluator’s principles are made overt and that they are referred to when determining and carrying out the procedures of the evaluation. Otherwise the evaluation is likely to be ad hoc with the result that significant and expensive mistakes could be made. This is especially true 34 Developing Materials for Language Teaching when using an evaluation to select the coursebooks to be used by a class, an institution or a nation. A textbook selected mainly because of its attractive appearance could turn out to be very boring for the learners to use; a review which overemphasizes an irritating aspect of the materials (e.g. a particular character in a video course) can give a distorted impression of the value of the materials; a course selected for national use by a ministry of education because it is the cheapest or because it is written by famous writers and published by a prestigious publisher could turn out to be a very expensive disaster. Types of materials evaluation There are many different types of materials evaluation. It is possible to apply the basic principles of materials evaluation to all types of evaluation but it is not possible to make generalizations about procedures which apply to all types. Evaluations differ, for example, in purpose, in personnel, in formality and in timing. You might do an evaluation in order to help a publisher to make decisions about publication, to help yourself in developing materials for publication, to select a textbook, to write a review for a journal, as part of a research project or as an interim monitoring stage when evaluating materials you are producing. As an evaluator you might be a materials developer, a learner, a teacher, an editor, a researcher, a Director of Studies or an Inspector of English. You might be doing a mental evaluation in a bookshop, filling in a short questionnaire in class or doing a rigorous, empirical analysis of data elicited from a large sample of users of the materials. You might be doing your evaluation before the materials are used, while they are being used or after they have been used. In order to conduct an effective evaluation you need to apply your principles of evaluation to the contextual circumstances of your evaluation in order to determine the most reliable and effective procedures. For example, one of my doctorate students from Anaheim University has just conducted a principled evaluation of the pragmatic components of all the coursebooks used by English Major students and by English Translation students at a university in Iraq. His purpose was to make use of this evaluation in his wider investigation of how these students performed the speech acts of request, apology and complaint. His approach was to specify contextually appropriate principles for the development of pragmatic competence, then to turn these principles into evaluation questions and then to use these questions to conduct an evaluation of the coursebooks. Pre-use evaluation Pre-use evaluation involves making predictions about the potential value of materials for their users. It can be context-free, as in a review of materials for a journal, context-influenced as in a review of draft materials for a publisher with target users in mind or context-dependent, as when a teacher selects a coursebook for use with her particular class. Often pre-use evaluation is impressionistic and consists of a teacher flicking through a book to gain a quick impression of its potential value (publishers are well aware of this procedure and sometimes place attractive illustrations in the top right-hand corner of the right-hand page in order to influence the flicker in a positive way). Even a review for a publisher eltshop.ir Materials Evaluation 35 or journal and an evaluation for a ministry of education is often ‘fundamentally a subjective, rule of thumb activity’ (Sheldon, 1988, p. 245) and often mistakes are made. Making an evaluation criterion- referenced can reduce (but not remove) subjectivity and can certainly help to make an evaluation more principled, rigorous, systematic and reliable. This is especially true if more than two evaluators conduct the evaluation independently and then average their conclusions. For example, in the review of eight adult EFL courses conducted by Tomlinson et al. (2001), the four evaluators devised 133 criteria together and then used them independently and in isolation to evaluate the eight courses before pooling their data and averaging their scores. Even then, though, the reviewers admitted that ‘the same review, conducted by a different team of reviewers, would almost certainly have produced a different set of results’ (p. 82). Making use of a checklist of criteria has become popular in materials evaluations and certain check- lists from the literature have been frequently made use of in evaluations (e.g. Cunningsworth (1984, 1995), Skierso (1991), Brown (1997), Gearing (1999)). The problem though is that no one set of criteria is applicable to all situations and, as Byrd (2001) says, it is important that there is a fit between the materi- als and the curriculum, students and teachers. Matthews (1985), Cunningsworth (1995) and Tomlinson (2012, 2016a, 2016b) have also stressed the importance of relating evaluation criteria to what is known about the context of learning but Mukundan and Ahour (2010) in their review of forty-eight evaluation checklists were critical of most checklists for being too context bound to be generalizable. Mukundan and Ahour (2010) proposed that a framework for generating flexible criteria would be more useful than detailed and inflexible checklists (a proposition also made by Ellis (2011) and stressed and demon- strated by Tomlinson (2003b)). Other researchers who have proposed and exemplified frameworks for generating evaluation criteria include: McDonough et al. (2013), who focus on developing criteria evaluating the suitability of materials in relation to usability, generalizability, adaptability and flexibility. McGrath (2002), who suggests a procedure involving materials analysis followed by first glance eval- uation, user feedback and evaluation using context-specific checklists. Mukundan (2006), who describes the use of a composite framework combining checklists, reflective journals and computer software to evaluate ELT textbooks in Malaysia. Riazi (2003), who suggests surveying the teaching/learning situation, conducting a neutral analysis and the carrying out of a belief-driven evaluation. Rubdy (2003), who suggests a dynamic model of evaluation in which the categories of psychological validity, pedagogical validity and process and content validity interact. Tomlinson and Masuhara (2004, p. 7), who proposed the following criteria for evaluating criteria: a Is each question an evaluation question? b Does each question only ask one question? c Is each question answerable? d Is each question free of dogma? e Is each question reliable in the sense that other evaluators would interpret it in the same way? 36 Developing Materials for Language Teaching Tomlinson (2012) reports these criteria and gives examples from the many checklists in the literature of evaluation criteria which their use exposes as inadequate in terms of specificity, clarity, answerability, validity and generalizability. Other proposals for generating evaluation criteria and procedures include: Mukundan and Ahour (2010), who argue for teacher-friendly evaluation procedures which are useful to them and do not place unrealistic demands on their time or expertise; McDonough, Shaw and Masuhara (2013), who propose a two stage process in which ‘an external evaluation of the materials from the outside (cover, introduction, table of contents) … is followed by a closer and more detailed internal evaluation’ (p. 53); Nimehchisalem and Mukundan (2014), who provide an ELT Textbook Evaluation Checklist compiled after an evaluation of forty-eight checklists in Mukundan and Ahour (2010) and revised after numer- ous evaluations of the checklist in action; Richards (2014), who insists on always relating evaluation to the context of learning; Mishan and Timmis (2015), who insist on principled and systematic evaluation; Tomlinson (2016a), who proposes using five SLA prerequisites (rich exposure, affective engagement, cognitive engagement, meaning-focused attention to form and opportunities for communicative use) as the basis for evaluation questions and provides an example of an evaluation template which makes use of such questions; Tomlinson and Masuhara (2018), who provide a comprehensive and critical review of the litera- ture on evaluating materials and recommend and exemplify a principled procedure which involves establishing an evaluation team, brainstorming beliefs about what facilitates language acquisition, categorizing the commonly held beliefs, converting the beliefs into universal criteria, developing a set of local criteria, developing a set of medium specific criteria and combining the three sets of criteria in a table ready for use; Tomlinson (2019, 2021), who favours ‘developing and using evaluation criteria based on what is known about what best facilitates language acquisition and on what you know about the target students and what they need, want and are likely to benefit from’ (Tomlinson, 2019, p. 14). However, he also suggests other ways of evaluating materials, including ‘doing all the activities in two or three sample units as though you were a student and then answering evaluation questions about the mate- rials’ (p. 13) and ‘getting sample students to do all the activities in a typical unit and then answering evaluation questions about the unit’ (p. 13). Whilst-use evaluation This involves measuring the value of materials while using them or while observing them being used. It can be more objective and reliable than pre-use evaluation as it makes use of observation, measurement and data collection rather than prediction. However, it is limited to measuring what is observable (e.g. ‘Are the instructions clear to the learners?’) and cannot claim to measure what is happening in the learners’ Materials Evaluation 37 brains (although it could predict the likelihood of mental activity based upon observation of learner behaviour). It can measure short-term learning through observing learner performance on activities but, without frequent spaced observations and close control over learner experience in between, it cannot measure durable and effective acquisition because of the inevitable delayed effect of instruction. It is therefore very useful but dangerous too, as teachers and observers can be misled by whether the activities seem to ‘work’ or not, often concluding that an activity is ‘working’ if the learners are getting most of the answers right or are enjoying themselves. Exactly what can be measured in a whilst-use evaluation is controversial but I would include the following: Clarity of instructions (by observing whether the learners can actually follow them without having to seek clarification). Clarity of layout (by noting any observable confusion or hesitation which could be attributed to cluttered or unclear layout). Comprehensibility of texts (by observing learner responses before, as and after they read or listen). Credibility of tasks (by noting learner reactions to being asked to do an activity and by listening to any pair or group comments on the task they are engaged in). Achievability of tasks (by noting how effectively the learners are able to complete a task and then rating it on a scale from ‘Too easy’ to ‘An achievable challenge’ to ‘Too demanding’). Achievement of performance objectives (by setting or ascertaining performance objectives and then checking the degree of achievement during performance). Potential for localization (by noting any deliberate or incidental localization made by the teacher or the learners prior to or during the performance of an activity). Practicality of the materials (by observing whether the learners can actually do what they are being asked to do – e.g. matching words to pictures, recording information in a table, identifying back- ground noises in a recording, reading video subtitles before a different scene appears). Flexibility of the materials (by observing to what extent teachers and learners can use them in differ- ent ways both at different times and synchronously). Appeal of the materials (by noting learner facial and spoken reactions when encountering the materials). Motivating power of the materials (by noting how immediate and sustained the learners’ focused activity is during the move from following instructions to completion). Effectiveness in facilitating short-term learning (by measuring how successful the learners are in doing what they have been asked to do following instruction from the teacher and/or the materials). Affective engagement (by noting indications of the strength and duration of emotional responses such as laughter, excitement, pleasure, sympathy, empathy, anger, sadness, fear, etc.). Cognitive engagement (by noting indications of the strength and duration of cognitive responses such as agreeing, disagreeing, questioning, challenging, pondering, hypothesizing, discovering, creating, etc.). eltshop.ir 38 Developing Materials for Language Teaching Many of the above can be estimated during an open-ended, impressionistic observation of materials in use but greater reliability can be achieved by focusing on one criterion at a time and by using pre-prepared instruments of measurement. For example, oral participation in an activity can be measured by recording the incidence and duration of each student’s oral contribution and potential for localization can be estimated by noting the times the teacher or a student refers to the location of learning while using the materials. Motivation can be estimated by noting such features as student eye focus, proximity to the materials, time on task and facial animation. Whilst-use evaluation receives very little attention in the literature, but Jolly and Bolitho (2011) describe interesting case studies of how student comments and feedback during lessons provided useful evaluation of materials, which led to improvements being made in the materials during and after the lessons. Also Tomlinson and Masuhara (2010, 2018) report materials development projects in which whilst-use evaluation was made use of. Post-use evaluation Post-use evaluation is probably the most valuable (but least conducted) type of evaluation as it can measure the actual effects of the materials on the users. It can measure the short-term effect as regards motivation, impact, achievability, instant learning, etc., and it can measure the long-term effect as regards durable acquisition of communicative competence. It can answer such important questions as: What do the learners know which they did not know before starting to use the materials? What do the learners still not know despite using the materials? What can the learners do in the L2 which they could not do before starting to use the materials? What can the learners still not do in the L2 despite using the materials? To what extent have the materials prepared the learners for their examinations? To what extent have the materials prepared the learners for their post-course use of the target language? To what extent have the materials encouraged the learners to look for English outside the classroom? To what extent have the materials encouraged the learners to make discoveries for themselves about the use of the L2? What effect have the materials had on the confidence of the learners? What effect have the materials had on the motivation of the learners? To what extent have the materials helped the learners to become more independent learners? To what extent did the teachers find the materials easy to use? To what extent did the materials help the teachers to cover the syllabus? To what extent do the teachers think the materials have proved beneficial to the learners? To what extent do the learners think the materials have proved beneficial? To what extent did the administrators find the materials helped them to standardize the teaching in their institution? Materials Evaluation 39 In other words, post-use evaluation can measure the actual outcomes of the use of the materials and thus provide the data on which reliable decisions about the use, adaptation or replacement of the materials can be made. We need to be wary though of jumping to conclusions about the effectiveness of materials after measuring the effects of the materials in one lesson or even one unit. Successful completion of a set of simple exercises is not an indication of acquisition just as inability to use a ‘new’ structure effectively in communication after one lesson is not an indication of failure. Acquisition of language and development of skills take time and require multiple engaged and meaningful experiences of the language feature or skill before effectiveness can be achieved. Ways of ‘measuring’ the post-use effects of materials include: tests of what has been ‘taught’ by the materials; tests of the students’ communicative competence in the L2; tests of what the students can do in the L2; institutional and high-stakes examinations; interviews with students and with teachers; questionnaires for learners and teachers to respond to; self-evaluations by the learners of their increased competence; criterion-referenced evaluations of the materials by the users; post-course learner diaries of their ability to use the L2 in ‘real life’; post-course ‘shadowing’ of the learners on their academic courses or in their work place; post-course reports on the learners by employers, subject tutors, etc. The main problem, of course, is that it takes time and expertise to measure post-use effects reliably (especially as, to be really revealing, there should be measurement of pre-use attitudes and abilities in order to provide data for post-use comparison). But publishers and ministries do have the time and could engage the expertise, and teachers can be helped to design, administer and analyse post- use instruments of measurement. Then we will have much more useful information, not only about the effects of particular courses of materials but about the relative effectiveness of different types of materials. Even then, though, we will need to be cautious, as it will be very difficult to separate such variables as teacher effectiveness, parental support, language exposure outside the classroom, intrinsic motivation, etc. For a description of the process of post-use evaluation of piloted materials see Donovan (1998), for descriptions of how publishers use focus groups for post-use evaluation of materials see Amrani (2011), and for suggestions of how teachers could do post-use micro-evaluations of materials see Ellis (1998, 2011). For reports of projects which have conducted post-use evaluation of materials in many different countries see Tomlinson and Masuhara (2010, 2018), Harwood (2014) and Garton and Graves (2014), and for an empirical post-use evaluation of the effects of humanistic materials on intermediate level learners of English in Vietnam, see Phuam (2021). 40 Developing Materials for Language Teaching Standard approaches to materials evaluation My experience of materials evaluation in many countries has been rather worrying. I have sat on National Curriculum committees which have decided which books should be used in schools purely on the basis of the collective impressions of their members. I have written reviews of manuscripts for publishers without any criteria being specified or asked for. I have had my own books considered by Ministry of Education officials for adoption without any reference to a coherent set of criteria. I have read countless published reviews (and even written a few myself) which consist of the reviewers’ ad hoc responses to the materials as they read them. I have conducted major materials evaluations for publishers and software companies without being given or asked for any criteria. I wonder how many mistakes I have contributed to. On the other hand, I was encouraged by a major British publisher to develop a comprehensive set of principled criteria prior to conducting an evaluation for them and I led a team of evaluators in developing a set of 133 criteria prior to evaluating eight adult EFL courses for ELT Journal (Tomlinson et al., 2001) and a smaller number of principled criteria for another ELT Journal review of coursebooks (Tomlinson and Masuhara (2013). Most of the literature on materials development has so far focused on materials evaluation, and useful advice on conducting evaluations can be found in Brown (1997); Byrd (1995); Candlin and Breen (1980); Cunningsworth (1984, 1995); Donovan (1998); Daoud and Celce-Murcia (1979); Ellis (1995, 1998); Grant (1987); Garton and Graves (2014); Harwood (2014); Hidalgo et al. (1995); Jolly and Bolitho (1998); Littlejohn (2011); Mariani (1983); Masuhara et al. (2008); McDonough et al. (2013); McGrath (2002, 2013, 2016); Mishan and Timmis (2015); Mukundan (2006); Mukundan and Ahour (2010); Nimehchisalem and Mukundan (2014); Pham (2021); Richards (2001); Roxburgh (1997); Sheldon (1987, 1988); Skierso (1991); Tomlinson (1999, 2016a, 2019, 2021); Tomlinson et al. (2001); Tomlinson and Masuhara (2004, 2013, 2018); and Williams (1983). Many of the checklists and lists of criteria suggested in these publi- cations provide a useful starting point for anybody conducting an evaluation but some of them are impressionistic and biased (e.g. Brown (1997) awards points for the inclusion of tests in a coursebook and Daoud and Celce-Murcia (1979, p. 305) include such dogmatic criteria as, ‘Are the vocabulary items controlled to ensure systematic gradation from simple to complex items?’). Some of the lists lack cover- age, systematicity and/or a principled base, and some give the impression that they could be used in any materials evaluation despite the fact that ‘there can be no one model framework for the evaluation of materials; the framework used must be determined by the reasons, objectives and circumstances of the evaluation’ (Tomlinson, 1999, p. 11). Most of the lists in the publications above are to some extent subjective as they are lists for pre-use evaluation and this involves selection and prediction. For exam- ple, Tomlinson et al. (2001, p. 81) say, We have been very thorough and systematic in our evaluation procedures, and have attempted to be as fair, rigorous, and objective as possible. However, we must start this report on our evaluation by acknowledging that, to some extent, our results are still inevitably subjective. This is because any pre-use evaluation is subjective, both in its selection of criteria and in the judgements made by the evaluators. eltshop.ir Materials Evaluation 41 A useful exercise for anybody writing or evaluating language teaching materials would be to evaluate the checklists and criteria lists from a sample of the publications above against the following criteria: Is the list based on a coherent set of principles of language learning? Are all the criteria actually evaluation criteria or are they criteria for analysis? Are the criteria sufficient to help the evaluator to reach useful conclusions? Are the criteria organized systematically (e.g. into categories and subcategories which facilitate discrete as well as global verdicts and decisions)? Are the criteria sufficiently neutral to allow evaluators with different ideologies to make use of them? Is the list sufficiently flexible to allow it to be made use of by different evaluators in different circumstances? It would also be useful to evaluate each criterion in any check list you intend to use (and preferably in your own set of criteria) against the following questions based on Tomlinson (2019, p. 22): Is the criterion evaluative? Is the criterion answerable? Is the criterion specific? Does the criterion contain only one question? Is the criterion non-dogmatic? Is the criterion valid? Is the criterion reliable? Any criterion which does not meet all of these criteria will need to be revised or possibly even abandoned. Many of the publications on materials evaluation listed above have been discredited because they focus on pre-use evaluation and this is considered to lack academic rigour because of the impossibility of collecting empirical data. For example, publications on materials evaluation receive no credit on The Research Excellence Framework (REF), which is used to evaluate the impact of the research output of British higher education institutions and to decide which institutions should be rewarded with funding. Also the University of Liverpool does not allow students on their MA in Applied Linguistics and MA in TESOL programmes to focus their dissertations on materials evaluation. I recently accepted an excellent report of an evaluation study for one of my edited publications but the publishers refused to include it because it lacked empirical data. It seems to me that applied linguistic research is often only valued if it is useful (or impressive) to academic researchers regardless of whether or not it is potentially useful to learners and practitioners (see Maley (2016) for a powerful indictment of the pedagogic irrelevance of many applied linguistic studies). In my view this insistence on empirical data is unfortunate because a systematic, rigorous and principled evaluation can be of great value in monitoring and revising materials in development, in reviewing materials, in selecting materials and especially in adapting materials. When undertaking an evaluation for one of these purposes, much more useful to a materials evaluator than 42 Developing Materials for Language Teaching statistical data purporting to prove the superiority of explicit teaching or of lists of evaluation criteria (which might not fit the contextual factors of a particular evaluation or be acceptable to all the evaluators) would be a suggested procedure for developing criteria to match the specific circumstances of a particular evaluation. I would like to conclude this chapter by suggesting such a procedure below. Developing criteria for materials evaluation My experience both personally and of students, teachers, materials developers, materials reviewers and researchers is that it is extremely useful to develop a set of formal criteria for use on a particular evaluation and then to use that set as a flexible basis for developing subsequent context-specific sets. Initially this is demanding and time-consuming, but it not only helps the evaluators to clarify their principles of language learning and teaching but it also ensures that future evaluations (both formal and informal) are systematic, rigorous and, above all, principled (as many participants on my in-service materials development courses have attested (Tomlinson, 2014). One way of developing such a set of criteria is as follows: 1 Brainstorm a list of universal criteria Universal criteria are those which would apply to any language learning materials anywhere for any learners. So, for example, they would apply equally to a video course for ten-year-olds in Argentina and an English for academic purposes textbook for undergraduates in Thailand. They derive from principles of language acquisition, the results of classroom research and observation and the informed intuitions of the evaluator(s), and they should, in my view, provide the fundamental basis for any pre-use materials evaluation. Brainstorming a random list of such criteria (ideally with other colleagues) is a very useful way of beginning an evaluation, and the most useful way I have found of doing it is to phrase the criteria as specific questions about the likely effects of the materials rather than to list them as general headings. Examples of universal criteria would be: To what extent do the materials provide useful opportunities for the learners to think for themselves? To what extent are the target learners likely to be able to follow the instructions? To what extent are the materials likely to cater for different preferred learning styles? To what extent are the materials likely to achieve affective engagement? Whilst conducting an evaluation I have found it useful to phrase the questions so that they invite the grading of the likely effect rather than require absolute yes/no answers. Here are the universal criteria used in Tomlinson and Masuhara (2013) to evaluate six current global coursebooks. To what extent is the course likely to: provide extensive exposure to English in use? engage the learners affectively? Materials Evaluation 43 engage the learners cognitively? provide an achievable challenge? help learners to personalize their learning? help the learners to make discoveries about how English is typically used? provide opportunities to use the target language for communication? help the learners to develop cultural awareness? help the learners to make use of the English environment outside the classroom? cater for the needs of all the learners? provide the flexibility needed for effective localization? help the learners to continue to learn English after the course? help learners to use English as a lingua franca? help learners to become effective communicators in English? achieve its stated objectives? 2 Subdivide some of the criteria If the evaluation is going to be used as a basis for revision or adaptation of the materials, or if it is going to be a formal evaluation and is going to inform important decisions such as acceptance or selection, it is useful to subdivide some of the criteria into more specific questions. For example: Are the instructions: succinct? sufficient? self-standing? standardized? separated? sequenced? staged? Such a subdivision can help to pinpoint specific aspects of the materials which could gain from revision or adaptation. Incidentally it is amazing that so many criteria for evaluating instructions begin with the letter ‘s’. At a materials evaluation workshop I ran in Botswana, for example, the teachers came up with twenty- seven such criteria beginning with the letter ‘s’ and teachers all over the world have achieved similar numbers. eltshop.ir 44 Developing Materials for Language Teaching 3 Monitor and revise the list of universal criteria Writing effective evaluation criteria is not an easy task and inevitably some first draft criteria are likely to be more revealing than others. It is very important therefore that even very experienced evaluators monitor their draft criteria and revise them according to the following criteria: Is each question an evaluation question? If a question is an analysis question (e.g. ‘Does each unit include a test?’), then you can only give the answer a 1 or a 5 on the 5-point scale which is recommended later in this suggested procedure. However, if it is an evaluation question (e.g. ‘To what extent are the tests likely to provide useful learning experiences?’), then it can be graded at any point on the scale. An analysis question can reveal what the materials include, what they ask the learners to do and what their objectives are. However, they do not provide information about the likely effect of the materials. Does each question only ask one question? Many criteria in published lists ask two or more questions and therefore cannot be used in any numerical grading of the materials. For example, Grant (1987) includes the following question which could be answered ‘Yes; No’ or ‘No; Yes’: ‘1 Is it attractive? Given the average age of your students, would they enjoy using it?’ (p. 122). This question could be usefully rewritten as: To what extent: 1 is the book likely to be attractive to your students? 2 is the book likely to be suitable for the age of your students? 3 are your students likely to enjoy using the book? Other examples of multiple questions are: ‘Do illustrations create a favourable atmosphere for practice in reading and spelling by depicting realism and action?’ (Daoud and Celce-Murcia, 1979, p. 304) ‘Does the book provide attractive, interesting (and perhaps exciting) language work, as well as a steady and systematic development of the language system?’ (Mariani, 1983, p. 29) Is each question answerable? This might seem an obvious question but in many published lists of criteria some questions are so large and so vague that they cannot usefully be answered. Or sometimes they cannot be answered without reference to other criteria, or they require the evaluator to possess expert knowledge. For example: ‘Is it culturally acceptable?’ (Grant, 1987, p. 122) Materials Evaluation 45 ‘Does it achieve an acceptable balance between knowledge about the language and practice in using the language?’ (Ibid.) ‘Does the writer use current everyday language, and sentence structures that follow normal word order?’ (Daoud and Celce-Murcia, 1979, p. 304) Is each question free of dogma? The questions should reflect the evaluators’ principles of language learning but should not impose a rigid methodology as a requirement of the materials. If they do, the materials could be dismissed without a proper appreciation of their potential value. For example, the following criteria make assumptions about the pedagogical procedures of course- books which not all coursebooks actually follow and not all teachers would want them to follow: ‘Are the various stages in a teaching unit (what you would probably call presentation, practice and production) adequately developed?’ (Mariani, 1983, p. 29) Do the sentences gradually increase in complexity to suit the growing reading ability of the students? (Daoud and Celce-Murcia, 1979, p. 304) Is each question reliable in the sense that other evaluators would interpret it in the same way? Some terms and concepts which are commonly used in applied linguistics are amenable to differing interpretations and are best avoided or glossed when attempting to measure the effects or likely effects of materials. For example, each of the following questions could be interpreted in a number of ways: Are the materials sufficiently authentic? Is there an acceptable balance of skills? Do the activities work? Is each unit coherent? There are a number of ways in which each question could be rewritten to make it more reliable and useful. For example: To what extent: are the materials likely to help the learners to use the language in situations they could find them- selves in after the course? is the proportion of the materials devoted to the development of reading skills likely to be suitable for your learners? are the communicative tasks likely to be useful in providing learning opportunities for the learners? are the activities in each unit linked to each other in ways which are likely to help the learners? 46 Developing Materials for Language Teaching Is each question valid in the sense that it is generally agreed that it relates to a characteristic that is likely to have a beneficial effect? For example, it would be generally agreed that ‘To what extent are the tasks likely to provide the learners with opportunities for purposeful communication in the target language?’ is a valid criterion as most researchers and teachers would agree that such opportunities are likely to facilitate eventual acquisition of the target language. However, it is unlikely to be agreed that ‘Is each example easy to memorise?’ is a valid criterion as many researchers and teachers would not agree that memorization of examples can facilitate language acquisition. 4 Categorize the list It is very useful to rearrange the random list of universal criteria into categories. This can facilitate focus and enable generalizations to be made. An extra advantage is that you often think of other criteria related to the category as you are doing the categorization exercise. Possible categories for universal criteria could be: Learning Principles Cultural Perspectives Topic Content Teaching Points Learning Points Texts Activities Methodology Instructions Illustrations Design and Layout 5 Develop media-specific criteria These are criteria which ask questions of particular relevance to the medium of delivery used by the materials which are being evaluated (e.g. criteria for books, for audio cassettes, for videos, for digital materials). Examples of such criteria would be: Are the learners likely to be able to distinguish between the voices they hear? Are the learners likely to be able to understand the gestures used by actors in the video? Are the learners likely to be able to read the subtitles before the scene moves on? Are the learners likely to be able to connect the different scenes to each other? eltshop.ir Materials Evaluation 47 Are the learners likely to be able to navigate between the instructions, the activities and the feedback? Obviously these criteria could also be usefully sub-categorized (e.g. under Illustrations, Layout, Audibility, Clarity, Mobility, Interactivity). 6 Develop content-specific criteria These are criteria which relate to the topics and/or teaching/learning points of the materials being evaluated. ‘Thus there would be a set of topic related criteria which would be relevant to the evaluation of a business English textbook but not to a general English coursebook; and there would be a set of criteria relevant to a reading skills book which would not be relevant to the evaluation of a grammar practice book and vice versa’ (Tomlinson, 1999, p. 11). Examples of content-specific criteria could be: To what extent are the examples of business texts (e.g. letters, invoices) likely to provide learners with experience of real-life business practice? To what extent are the reading texts likely to provide exposure to genres which are likely to feature in the learners’ post-course reading experience? To what extent are the learners likely to develop problem-solving skills useful in their engineering careers? 7 Develop age-specific criteria These are criteria which relate to the age of the target learners. Thus there would be criteria which are only suitable for five-year-olds, for ten-year-olds, for teenagers, for young adults and for mature adults. These criteria would relate to cognitive and affective development, to previous experience, to interests and to wants and needs. Examples of age-specific criteria would be: Are the activities likely to match the attention span of the learners? Is the content likely to provide an achievable challenge in relation to the maturity level of the learners? Such criteria and content-specific criteria could be considered as local criteria and categorized as such. 8 Develop local criteria These are criteria which relate to the actual or potential environments of learning and of use. They are questions which are not concerned with establishing the value of the materials per se but rather with measuring the value of the materials for particular learners in particular circumstances with particular wants and likely needs. It is this set of criteria which is unique to the specific evaluation being undertaken and which is ultimately responsible for many of the decisions made in relation to the adoption, revision or adaptation of the materials. 48 Developing Materials for Language Teaching Typical features of the environment which would determine this set of materials are: the type(s) of institution(s); the resources of the institution(s); class size; the previous target language experience of the learners; the previous life experience of the learners; the motivation of the learners; the needs and wants of the learners; the background, needs, wants and beliefs of the teachers; the language policies in operation; the syllabus; the objectives of the courses; the intensity and extent of the teaching time available; the target examinations; the amount of exposure to the target language outside the classroom. Examples of local criteria would be: To what extent are the stories likely to interest fifteen-year-old boys in Turkey? To what extent are the reading activities likely to prepare the students for the reading questions in the Primary School Leaving Examination (PSLE) in Singapore? To what extent are the topics likely to be acceptable to parents of students in Iran? To what extent are the locations of the texts likely to be meaningful to teenagers in Peru? 9 Trial the criteria It is important to trial the criteria (even prior to a small, fairly informal evaluation) to ensure that the criteria are sufficient, answerable, valid, reliable and useful. The trialling could consist of the evaluator (or ideally a group of evaluators) using the criteria on a sample of equivalent materials. Revisions and deletions can then be made to criteria before the actual evaluation begins. 10 Conduct the evaluation From experience I have found the most effective way of conducting an evaluation is to: try to make sure that there is more than one evaluator to minimize evaluator bias; discuss each criterion to make sure there is equivalence of interpretation; focus in a large evaluation on a typical unit for each level (and then check its typicality by reference to other units); Materials Evaluation 49 initially answer the criteria independently and in isolation from the other evaluator(s); give a score for each criterion (perhaps with some sets of significant criteria weighted more heavily than others); write down brief comments and examples to justify each score and write comments at the end of each category highlighting the weaknesses which need addressing; at the end of the evaluation aggregate each evaluator’s scores for each criterion, category of criteria and set of criteria and then average the scores; record the comments shared by the evaluators; write a joint report. See Tomlinson et al. (2001) for a report of a large-scale evaluation in which four evaluators from different cultures independently evaluated eight adult EFL courses using the same 133 criteria (weighted 0–20 for Publisher’s Claims, 0–10 for Flexibility and 0–5 for the other categories of criteria). See also Masuhara et al. (2008), Tomlinson (2008) and Tomlinson and Masuhara (2013) for other examples of evaluations. What is recommended above is a very rigorous, systematic but time-consuming approach to mate- rials evaluation which I think is necessary for major evaluations from which important decisions are going to be made. However, for more informal evaluations (or when very little time is available) I would recommend the following procedure: 1 Brainstorm beliefs 2 Decide on shared beliefs 3 Convert the shared beliefs into universal criteria 4 Write a profile of the target learning context for the materials 5 Develop local criteria from the profile 6 Evaluate and revise the universal and the local criteria 7 Conduct the evaluation Whether or not the evaluation is formal or informal I would always develop and use my universal criteria first. If the materials do not satisfy the universal criteria, there is no point in assessing them against local criteria as they are unlikely to facilitate language acquisition and the development of communicative competence. Also, if prominence is given to local criteria over universal criteria, the materials might help the teacher to cover the curriculum and the learners to pass their examinations but not help the learners to develop the ability to communicate effectively The examples provided in the stages of the recommended formal and informal procedure above are worded for use in pre-use evaluations. They could easily be reworded for use in whilst- or post-use evaluations. For example: 1 Pre-Use Evaluation: To what extent are the learners likely to be affectively engaged by the text? 2 Whilst-Use Evaluation: To what extent do the learners seem to be affectively engaged by the text? 3 Post-Use Evaluation: To what extent were the learners affectively engaged by the text? eltshop.ir 50 Developing Materials for Language Teaching In 1 the evaluators would predict the likelihood of affective engagement by linking the content of the text to what they know about the learners. In 2 they would note the effects that the text seems to be having on the learners. In 3 they could base their answer on responses to learner questionnaires, to interviews and in focus groups. Notice how much more useful the post-use evaluation criteria are as they could be used to inform questionnaires and interviews eliciting actual data rather than informed predictions and guesses. What a pity therefore that so few criterion-referenced post-use evaluations are carried out by publishers and by materials developers and teachers during materials development, selection and adaptation procedures. Such evaluations are time-consuming, potentially expensive and difficult to design and control. But how useful they can be. Fortunately there have been a number of small-scale post-use evaluations of the effects of mate- rials, mainly from post-graduate students. For example, a recent empirical post-use evaluation of the effects of humanistic materials on intermediate level learners of English in Vietnam was conducted by Phuam (2021), who compared the effects on communicative competence of text book units used as scripts with a control group and the same units humanized with the treatment group. He humanized the units by personalizing and localizing them and, for example, replaced reference to Hollywood films in one unit with reference to Vietnamese films popular with the students and replaced illustrations of a ‘Western’ family with photos brought to class by the students of their own families. The results revealed far greater engagement by the treatment group and a considerable improvement in communicative ability compared to the control group. In another study Nolan (2019) compared the effect of course book and of equivalent text-driven units on learner classroom interaction and discourse and found the text driven group achieved richer, more confident and more effective interaction than the control group (Margaret Nolan_University of Liverpool_Dissertation.pdf (teachingenglish.org.uk)). For reports of research projects which have conducted post-use evaluation of the effects of materials in many different countries see, for example, Tomlinson and Masuhara (2010, 2018), Tomlinson (2013, 2016c), Harwood (2014) and Garton and Graves (2014). Conclusion As I said above, materials evaluation is initially a time-consuming and difficult undertaking. Approaching it in the principled, systematic and rigorous ways suggested above can not only help to make and record vital discoveries about the materials being evaluated but can also help the evaluators to learn a lot about materials, about learning and teaching and about themselves. This is certainly what has happened to my students on MA courses in Anaheim, Ankara, Leeds, Liverpool, Luton, Norwich and Singapore and to the teachers on materials evaluation workshops I have conducted all over the world. See Tomlinson (2014) for comments by teachers who have enjoyed and gained considerably from participating in materials development courses. Doing evaluations formally and rigorously can also eventually contribute to the development of an ability to conduct principled informal evaluations quickly and effectively when the occasion demands (e.g. when asked for an opinion of a new book; when deciding which materials to buy in a bookshop; Materials Evaluation 51 when editing other people’s materials). I have found evaluation demanding but rewarding. Certainly, I have learned a lot every time I have evaluated materials, whether it be the worldwide evaluation of a coursebook I once undertook for a British publisher, the evaluation of computer software I once under- took for an American company, the evaluation of materials I have done for reviews in ELT Journal or just looking through new materials in a bookshop every time I visit my daughter in Cambridge. I hope, above all else, that I have learned to be more open-minded and that I have learned what is needed to develop a course of materials which can help its target learners to develop communicative competence in a second or foreign language. Unfortunately for me this is unlikely to be a course of materials which achieves great commercial success as its many differences from the stereotypical coursebook will not allow it to achieve the face validity required for commercial success. See Tomlinson (2020b) and Mishan (2021) for descriptions and explanations of the mismatches between the principled materials advocated by most materials development researchers and the published coursebooks which sell well. However, see also Hughes (2019) for a counter claim that global coursebooks are principled and do match SLA theory. My great hope is that one day the mismatches between theory and practice will diminish and learn- ers, teachers and publishers around the world will gain. Readers’ tasks 1 1 Pick a unit at random from any coursebook. Then imagine a target group of learners at the spec- ified level of the coursebook and write a brief profile of the group. 2 Specify ten universal criteria. 3 Use your learner profile to specify five local criteria. 4 Use your universal and local criteria to evaluate the likely effectiveness of the unit. 5 Make recommendations for modifying the unit so as to make it more likely to facilitate language acquisition for the learners. 2 1 For the same unit you evaluated in 1 above devise a post-use evaluation in which you measure: the extent to which the learners are affectively engaged by the reading texts; the extent to which the learners are cognitively engaged by the reading texts; the delayed effect of the reading texts on the learners’ language acquisition. 2 If possible carry out the post-use evaluation you have designed in 1 above. 3 If you manage to carry out the post-use evaluation, suggest ways in which the texts could be modified, supplemented or replaced in order to increase the likelihood of the unit making a bene- ficial contribution to the learners’ language acquisition. 52 Developing Materials for Language Teaching For information about affective and cognitive engagement see: Tomlinson, B. & Masuhara, H. (2021). SLA Applied: Connecting Theory and Practice. Cambridge: Cambridge University Press. Further reading Tomlinson, B. (2021), Evaluating, Adapting and Developing Materials for Learners of English as an International Language. Shanghai: Shanghai International Press. A concise introduction for trainee and practising teachers to the theory and practice of evaluating, adapting and developing materials, which includes a chapter specifically on the evaluation of language learning materials. Tomlinson, B. and Masuhara, H. (2018), Chapter 3 Materials Evaluation, The Complete Guide to the Theory and Practice of Developing Materials for Language Learning. Hoboken, NJ: Wiley Blackwell, pp. 52–81. A chapter which discusses the issues, reports the literature and recommends procedures. References Amrani, F. (2011), ‘The process of evaluation: A publisher’s view’, in B. Tomlinson (ed.), Materials Development in Language Teaching. Cambridge: Cambridge University Press, pp. 267–95. Anderson, N. J. (2005), ‘L2 learning strategies’, in E. Hinkel (ed.), Handbook of Research in Second Language Learning. Mahwah, NJ: Erlbaum, pp. 757–72. Arnold, J. (ed.) (1999), Affect in Language Learning. Cambridge: Cambridge University Press. Beck, I. L., McKeown, M. G. and Worthy, J. (1995), ‘Giving a text voice can improve students’ understanding’, Research Reading Quarterly, 30 (2), 220–38. Berman, M. (1999), ‘The teacher and the wounded healer’, IATEFL Issues, 152, 2–5. Bolitho, R., Carter, R., Hughes, R., Ivanic, R., Masuhara, H. and Tomlinson, B. (2003), ‘Ten questions about language awareness’, ELT Journal, 57 (2), 251–9. Brown, J. B. (1997), ‘Textbook evaluation form’, The Language Teacher, 21 (10), 15–21. Byram, M. and Masuhara, H. (2013), ‘Intercultural competence’, in B. Tomlinson (ed.), Applied Linguistics and Materials Development. London: Bloomsbury, pp. 143–60. Byrd, P. (1995), Material Writer’s Guide. New York: Heinle and Heinle. Byrd, P. (2001), ‘Textbooks: Evaluation for selection and analysis for implementation’, in M. Celce-Murcia (ed.), Teaching English as a Second or Foreign Language (3rd edn). Boston, MA: Heinle and Heinle, pp. 415–27. Canale, M. and Swain, M. (1980), ‘Theoretical bases of communicative approaches to second language teaching and testing’, Applied Linguistics, 1 (1), 11–47. Candlin, C. N. and Breen, M. (1980), ‘Evaluating and designing language teaching materials’, in Practical Papers in English Language Education Vol. 2. Lancaster: Institute for English Language Education, University of Lancaster. Craik, F. I. M. and Lockhart, R. S. (1972), ‘Levels of processing: a framework for memory research’, Journal of Verbal Learning and Verbal Behaviour, 11, 671–84. Cunningsworth, A. (1984), Evaluating and Selecting EFL Teaching Material. London: Heinemann. Cunningsworth, A. (1995), Choosing Your Coursebook. Oxford: Heinemann. Damasio, A. and Carvalho, G. B. (2013), ‘The nature of feelings: Evolutionary and neurobiological origins’, Nature Reviews Neuroscience, 14 (2), 143–52. eltshop.ir Materials Evaluation 53 Daoud, A. and Celce-Murcia, M. (1979), ‘Selecting and evaluating textbooks’, in M. Celce-Murcia and L. McIntosh (eds), Teaching English as a Second or Foreign Language. New York: Newbury House, pp. 302–7. Donovan, P. (1998), ‘Piloting – a publisher’s view’, in B. Tomlinson (ed.), Materials Development for Language Teaching. Cambridge: Cambridge University Press, pp. 149–89. Dörnyei, Z. and Ryan, S. (2015), The Psychology of the Language Learner Revisited. New York: Routledge. Dörnyei, Z. and Ushioda, E. (eds) (2009), Motivation, Language Identity and the L2 Self. Bristol: Multilingual Matters. Dörnyei, Z., Henry, A. and Muir, C. (2015), Motivational Currents in Language Learning: Frameworks for Focused Interventions. New York: Routledge. Edge, J. and Wharton, S. (1998), ‘Autonomy and development: Living in the material world’, in B. Tomlinson (ed.), Materials Development for Language Teaching. Cambridge: Cambridge University Press, pp. 295–310. Ellis, R. (1995), ‘Does it “work”?’ Folio, 2 (1), 19–21. Ellis, R. (1998), ‘The evaluation of communicative tasks’, in B. Tomlinson (ed.), Materials Development for Language Teaching. Cambridge: Cambridge University Press, pp. 217–38. Ellis, R. (2011), ‘Macro- and micro-evaluations of task-based teaching’, in B. Tomlinson (ed.), Materials Development in Language Teaching (2nd edn). Cambridge: Cambridge University Press, pp. 21–35. Ellis, R. (2015), Understanding Second Language Acquisition (2nd edn). Oxford: Oxford University Press. Farrell, C. F. C. (2018), ‘Reflective practice for language teachers’, in J. Liontas (ed.), The TESOL Encyclopedia of Language Teaching. https://doi.org/10.1002/9781118784235.eelt0873 Garton, S. and Graves, K. (eds) (2014), International Perspectives on Materials in ELT. Basingstoke: Palgrave Macmillan. Gearing, K. (1999), ‘Helping less experienced teachers of English to evaluate teacher’s guides’, ELT Journal, 53 (2), 122–7. Grant, N. (1987), Making the Most of Your Textbook. Harlow: Longman. De Guerro, M. C. M. (2005), Inner Speech – Thinking Words in a Second Language. New York: Springer-Verlag. De Guerro, M. C. M. (2018), ‘Going covert: Inner and private speech in language learning’, Language Teaching, 51 (01), 1–35. DOI: 10.1017/S0261444817000295 Harwood, N. (ed.) (2014), English language teaching textbooks: Content, consumption, production. Basingstoke: Palgrave Macmillan. Hidalgo, A. C., Hall, D. and Jacobs, G. M. (1995), Getting Started: Materials Writers on Materials Writing. Singapore: RELC. Hughes, S. H. (2019). ‘Coursebooks – Is there more than meets the eye?’, ELT Journal 73 (4), pp. 447–55. https://doi. org/10.1093/elt/ccz040 Jacobs, B. and Schumann, J. A. (1992), ‘Language acquisition and the neurosciences: Towards a more integrative perspective’, Applied Linguistics, 13 (3), 282–301. Jolly, D. and Bolitho, R. (1998), ‘A framework for materials writing’, in B. Tomlinson (ed.), Developing Materials for Language Teaching. Cambridge: Cambridge University Press, pp. 90–115. Jolly, D. and Bolitho, R. (2011), ‘A framework for materials development’, in B. Tomlinson (ed.), Materials Development in Language Teaching (2nd edn). Cambridge: Cambridge University Press, pp. 107–34. Kelly, C. (1997), ‘David Kolb, the theory of experiential learning and ESL’, The Internet TESL Journal, III (9), http://iteslj. org/Articles/Kelly-Experiential/ Kern, R. (2008), ‘Making connections through texts in language teaching’, Language Teaching 41(03), 367–87. DOI: 10.1017/S0261444808005053 Kolb, A. Y. and Kolb, D. A. (2009), ‘The learning way: Meta-cognitive aspects of experiential learning’, Simulation and Gaming: An Interdisciplinary Journal of Theory, Practice and Research, 40 (3), 297–327.

Materials Evaluation PDF

Document Details

Tags

Related

Summary

Full Transcript