Technology and Translation part 2.pdf

Full Transcript

Technology and Translation Part 2 Dr. Manal Alghannam week 4-2 Computer-aided translation (cat) tools Computer-aided translation (CAT) tools, are designed to increase productivity while leaving the core task to the human translator....

Technology and Translation Part 2 Dr. Manal Alghannam week 4-2 Computer-aided translation (cat) tools Computer-aided translation (CAT) tools, are designed to increase productivity while leaving the core task to the human translator. CAT tools for translation and software localization nowadays offer, almost without exception, a bundle of functions that tend to include: 1. Alignment, 2. Concordancing and term extraction. 3. Quality assurance 4. Workflow management. This section focuses on the distinctive features of translation and localization tools, as well as tools for subtitling, that support the translator’s core task. 4.1. Translation memory (TM) tools Based on the insight that existing translations contain solutions to many of the problems faced daily by translators, TM tools enable the efficient creation and searching of databases of translated documents and their originals. What is Translation Memory (TM)? A translation memory is a database that stores sentences, paragraphs or segments of text that have been translated before. These memories comprise: translation units (TUs), consisting of corresponding source and target segments. While the segments may often be full sentences, a TU can also pair captions, headings, list items, contents of individual table cells or even single words. How does TM work? TMs are designed to increase productivity by detecting that the segment currently being translated matches wholly or partially the source side of one or more TUs and then presenting to the translator the corresponding target segment or segments. If a similar sentence is found in the Translation Memory, the CAT tool will indicate this information to the translator and provide the option to: Use the TM translation, Replace it with a new translation OR modify the reference provided by the TM. New TUs are added to the memory and thus the volume of reusable TUs grows progressively. Large companies often have TMs numbering millions of units. The TM used for a particular job can be created by importing TUs from existing sources (hence the importance of TMX). Sentences identical to those in the translation memory are called 100% matches. This means that the translation memory contains an example where the exact sentence has been translated before. It is even possible to have 101% and 102% matches, which means that not only the current sentence is 100% the same, but also one or both of those sentences before and/or after it are the same. Sentence matches that are similar but not 100% the same are called fuzzy matches. Matches are usually ranked from 0% to 99% based on the similarity. A 99% match means that the segments differ by one character. Matches below 70% are generally not useful and are ignored by the translator. Thus, translation memory is most beneficial when the texts to be translated are highly repetitive such as manuals or catalogues. How useful is Translation Memory? The idea behind leveraging a translation memory is simple. It saves translators time translating the same sentences repeatedly, and with this comes a list of benefits: Consistent Translations: When the same sentences need to be translated again and again, translation memory suggests previous translations to keep consistency throughout all the work. Centralized Database: When one or more translators are working on the same texts, they will more likely use the same translation based on the references and suggestions by the translation memory. Cost Savings: Time savings for translators translates to cost savings. Moreover, translation memory is like an accumulated investment where the more translated sentence pairs it stores, the more quickly the translator can translate new (similar) texts. Improved Quality: Editors, managers, and other translators can refine and optimize translation memory entries to adhere to a consistent tone and brand voice. Clearly the success or otherwise of the tool in placing the most readily reusable suggestion at the top of the list will affect the translator’s productivity. But it can affect remuneration also. How does TM affect translator’s rewards? Rewards are automatically calculated and have been decided based on the level of effort required to review or edit a job. For segments within jobs that have no matches, rewards are unaffected. When a match is found in the translation memory, rewards are adjusted only for the given segment based on the quality and length of the match. Examples: New / Unique (0–74%, no/unusable match that requires full translation) 0% reduction. Fuzzy (75–99%, a partial match that requires minor edits) 60% reduction. Exact (100%, a perfect character-by-character match that requires spot checking) 80% reduction Contextual (101%, an exact match where both the preceding and following segments are also exact matches and requires spot checking) 80% reduction Repetition (a duplicate exact match in the same job that requires none/minimal work) 100% reduction ‫ماراح يعطونك وال ريال‬ 4.2 Software localization (l10n) tools In addition to the terminology, TM matching and pre-translate functions, software localization tools add specialized functions that reflect their prime use in translating text embedded in computer programs. Just as TM tools protect the formatting, so localization tools protect the program code by extracting the translatable text – mostly text that appears in the user interface, often called ‘strings’ – for translation in a safe environment and eventually reinsert the translated version in the right places in the right files. 4.3 Subtitling tools dedicated subtitling tools provide no help for the core task of finding the right words. Their specificity is to display the draft subtitles as the viewer will see them and to alert the subtitler to any violations of timing constraints. These are imposed jointly by the assumed reading speeds of different viewers (adult, child, hard-of-hearing), the medium (film, DVD, TV) and the rhythm of shot changes. Tool flags subtitles which are too long to be read in the time they are displayed or whose separation from the next subtitle is too short. 5. Machine translation (MT) tools 1. There are two basic approaches to building an MT system: (1) encode linguistic knowledge about the morphological, lexical, syntactic and functional structures of the source and target languages and the mappings between them; 2. provide enough aligned data to ‘train’ it to ‘learn’ the statistically most likely mappings between strings of characters in the two languages. The first approach is that of rule-based MT (RBMT) and the second that of statistics-based MT (SMT). 5.1 Current use and deployment of MT: 1. The overwhelming use of MT today, certainly of free online MT, is for assimilation – the understanding of incoming information. 2. A growing use of MT for dissemination – the publication of outgoing information. 5.2 Architectures and limitations on improvability: The RBMT model (rule-based machine translation) is the transfer architecture. RBMT technology applies to large collections of linguistic rules in three different phases: a. An initial analysis stage is intended to result in identifying the constituents of the input sentence and the functional relations – predicate, subject, object, etc. – between them, as well as sentential features such as tense, aspect and modality. Analysis relies on knowledge of the source language (SL) only, expressed as far as possible in terms of generalizations about combinations of part-of-speech categories rather than individual lexical items. a. The following transfer stage relies on a bilingual dictionary and mappings between the abstract structure describing an SL sentence and a structure underlying the corresponding target language (TL) sentence. b. The final generation stage aims to linearize this TL structure as a grammatically correct sequence of TL words. The translator can directly improve system performance by creating user dictionaries that remedy defects in the MT dictionaries supplied. SMT (statistics-based machine translation) systems rely on two models of statistical probabilities, a. The translation model b. The (target-)language model, both calculated on the basis of a large bilingual corpus (preferably of millions of words). The translation model Is ‘the set of probabilities for each word on the source side of the corpus that it corresponds to … each word on the target side of the corpus’ (Somers 2008). In this model, readily usable translation equivalents are expected to have a high statistical probability. The (target-)language model Is the set of probabilities of the relative ordering of a given set of TL words. These two models are then used in conjunction by a so-called decoder, whose task ‘consists of applying the translation model to a given sentence S to produce a set of probable [TL] words, and then applying the language model to those words to produce the target sentence T’ (ibid.), such that the probability of T is the highest possible. Recent approaches include phrases in both models, with improved results. RBMT vs SMT The main challenge for RBMT is ambiguity at any linguistic level, hence the attraction of controlled languages. For SMT the main challenge is data sparsity – words in the current source text which have been encountered only rarely (or even not at all) in the training data. RBMT systems are judged more robust in maintaining their translation quality. While SMT errors may more often be incomprehensible, the errors made by RBMT systems tend to be more consistent and, as a result, easier for posteditors to find since they are the product of a rule-based process 6. Project management tools in response to the technological and human complexity of larger projects, specialist translation management tools have appeared. They cover every step from costing and quoting to invoicing. They interface with TM tools to be able to import the results of source text analysis – word counts for each category of match – and come populated with features peculiar to translation, such as setting rates for defined roles (translator, reviser, reviewer, etc.), SL–TL pairs or subject specialisms. They can be used to enforce certain workflows by requiring one process to be signed off before the next begins. The need for quality assurance (QA) has resulted in tools which seek to automate parts of this process, available either as standalone tools or, more commonly, as plug-ins for different TM workstations. The kinds of checks they offer include: a. identifying untranslated or partially translated segments, b. detecting inconsistent translations of words or segments, punctuation, numbers, approved terminology and tags. 7 Collaborative translation tools: This chapter began with the assertion that translation is increasingly a collaborative activity. Collaborative translation is done by self-organizing communities of committed enthusiasts. This model has worked well in the open source software community, not only for writing program code but also for authoring and translating documentation. Possibly the best-known source of user-generated content is Wikipedia, which likewise provides guidance on good practice and a mechanism for flagging translations in progress http://meta.wikimedia.org/wiki/Translation https://meta.wikimedia.org/wiki/Translation_of_the_week 8. Evaluation techniques: There are many reasons for evaluating translation technologies: 1. determining whether a tool is fit for purpose, 2. tracking its performance on different kinds of data 3. measuring its cost-effectiveness over time. MT Evaluation is Difficult: The absence of a gold standard for a whole text makes the evaluation of translation quality a hard task, open to subjective variation between judges. Several judges are then required to provide enough data points to support reliable general conclusions about the translation capabilities of a person or of a system.

Use Quizgecko on...
Browser
Browser