Translation Problems PDF
Document Details
Uploaded by SaneNarrative
Qassim University
Tags
Summary
This document discusses the various challenges of translation, particularly focusing on the concept of ambiguity and structural differences in languages. It explores how lexical differences and multi-word units (such as idioms) can pose significant difficulties in the translation process.
Full Transcript
Translation Problems It is useful to think of these problems under the following headings: 1. Problems of ambiguity. 2. Problems that arise from structural and lexical differences between languages. 3. Multiword units like idioms and collocations. Ambiguity In the best o...
Translation Problems It is useful to think of these problems under the following headings: 1. Problems of ambiguity. 2. Problems that arise from structural and lexical differences between languages. 3. Multiword units like idioms and collocations. Ambiguity In the best of all possible words (as far as most Natural Language Processing is concerned, anyway) every word would have one and only one meaning. But, as we all know, this is not the case. When a word has more than one meaning, it is said to be lexically ambiguous. When a phrase or sentence can have more than one structure it is said to be structurally ambiguous. Ambiguity is a pervasive phenomenon in human languages. It is very hard to find words that are not at least two ways ambiguous, and sentences which are (out of context) several ways ambiguous are the rule, not the exception. This is not only problematic because 1- some of the alternatives are unintended (i.e. represent wrong interpretations), but because 1- ambiguities ‘multiply’. In the worst case, a sentence containing two words, each of which is two ways ambiguous may be four ways ambiguous. What is Ambiguity? 1. Uncertainty or inexactness of meaning in language. 2. Word, phrase or statement which contains more than one meaning. 3. The quality of being open to more than one interpretation. What are the types of Ambiguity? 1. When a word has more than one meaning, it is said to be lexically ambiguous. 2. When a phrase or sentence can have more than one structure it is said to be structurally ambiguous. Why is Lexical Ambiguity Problematic? 1. Some of the alternatives are unintended. 2. Ambiguities ‘multiply’. Lexical and Structural Mismatches NLP:Natural language processing In the best of all possible words for NLP, every word would have exactly one sense. While this is true for most NLP, it is an exaggeration as regards MT. It would be a better world, but not the best of all possible worlds, because we would still be faced with difficult translation problems. Some of these problems are to do with lexical differences between languages — differences in the ways in which languages seem to classify the word, what concepts they choose to express by single words, and which they choose not to lexicalize. Other problems arise because different languages use different structures for the same purpose, and the same structure for different purposes. In either case, the result is that we have to complicate the translation process. Why do structural mismatches appear? 1. Lexical differences between languages — differences in the ways in which languages seem to classify the word. 2. Different languages use different structures for the same purpose. 3. The same structure for different purposes. Why is classification not so easy when one turns to cases of structural mismatch? When one turns to cases of structural mismatch, classification is not so easy because one may often think that the reason one language uses one construction, where another uses another is because of the stock of lexical items the two languages have. Thus, the distinction is to some extent a matter of taste and convenience. A particularly obvious example of this involves problems arising from what are sometimes called lexical holes — that is, cases where one language has to use a phrase to express what another language expresses in a single word. Text The problems raised by such lexical holes have a certain similarity to those raised by idioms: In both cases, one has phrases translating as single words. One kind of structural mismatch occurs where two languages use the same construction for different purposes, or use different constructions for what appears to be the same purpose. Why do structural mismatches occur What are Lexical Holes? Lexical Holes are the cases where one language has to use a phrase to express what another language expresses in a single word. Multiword units: Idioms Idioms are expressions whose meaning cannot be completely understood from the meanings of the component parts. For example, whereas it is possible to work out the meaning of (a) on the basis of knowledge of English grammar and the meaning of words, this would not be sufficient to work out that (b) can mean something like ‘If Sam dies, her children will be rich’. This is because kick the bucket is an idiom. a. If Sam mends the bucket, her children will be rich. b. If Sam kicks the bucket, her children will be rich. In many cases, a natural translation for an idiom will be a single word. Lexical holes and idioms are frequently instances of word phrase translation. The difference is that with lexical holes, the problem typically arises when one translates from the language with the word into the language that uses the phrase, whereas with idioms, one usually gets the problem in translating from the language that has the idiom (i.e. the phrase) into the language which uses a single word. In general, there are two approaches one can take to the treatment of idioms. The first is to try to represent them as single units in the monolingual dictionaries. The second approach to idioms is to treat them with special rules that change the idiomatic source structure into an appropriate target structure. What are the problems with sentences that contain idioms One problem with sentences which contain idioms is that they are typically ambiguous, in the sense that either a literal or idiomatic interpretation is generally possible. Another problem is that they need special rules, in addition to the normal rules for ordinary words and constructions. The real problem with idioms is that they are not generally fixed in their form, and that the variation of forms is not limited to variations in inflection. Thus, there is a serious problem in recognizing idioms. Machine Translation in Practice The Scenario What are the Responsibilities of the Language Centre in a company? 1. The translation of documents created within the company into a variety of European and Oriental languages. 2. Exercising control over the content and presentation of company documentation in general. 3. It attempts to specify standards for the final appearance of documents in distributed form, including style, terminology, and content in general. What is the Overall Policy of the Language Centre? The overall policy is to enshrined in the form of a corporate Document Design and Content Guide which the Centre periodically updates and revises. For what is Machine Translation to be used in a Multinational Company? 1. Technical documentation such as User and Repair manuals. 2. Some classes of highly routine internal business correspondence. 3. Legal and marketing material. What is ETRANS? The MT system which you use is called ETRANS and forms part of the overall documentation system. (ETRANS is just a name we have invented for a prototypical MT system.). Parts of an electronic document on the system can be sent to the MT system in the same way that they can be sent to a printer or to another device or facility on the network. ETRANS is simultaneously available from any workstation and, for each person using it, behaves as if it is his or her own personal MT system. How are technical documents submitted to MT system? Text بيجي با&ختبار 1. All the sentences are relatively short and rather plain. 2. It must be written in accordance with the Language Centre document specification and with MT very much in mind. 3. There are no obvious idioms or complicated linguistic constructions. 4. Many or all of the technical terms relating to printers are in regular use in the company and are stored and defined in paper or electronic dictionaries available to the company’s technical authors and translators. What are the main groups responsible for Translation in the company? 1. Documentation managers, who specify company policy on documentation. 2. Authors of texts who (ideally) write with MT in mind, following certain established Guidelines. 3. Translators who manage the translation system in all respects pertaining to its day to day operation and its linguistic performance. In many cases the document management role will be fulfilled by translators or technical authors. For obvious reasons, there will be fairly few individuals who are both technical authors and translators. What are the important entities in the process of translation in the company? 1. Multi-Lingual Electronic Documents which contain text for translation. 2. The Document Preparation system which helps to create, revise, distribute and archive electronic documents. 3. The Translation System which operates on source text in a document to produce a translated text of that document. What are the various processes or steps in the whole business? 1. Document Preparation (which includes authoring and pre-editing). 2. The Translation Process, mediated by the translation system, perhaps in conjunction with the translator. 3. Document Revision (which is principally a matter of post-editing by the translator). Document Preparation Authoring and Pre-Editing What does the corporate language policy as described in the scenario try to do? The corporate language policy as described in the scenario tries to ensure that text which is submitted to an MT system is written in a way which helps to achieve the best possible raw MT output. What are the basic writing rules? 1. Keep sentences short. 2. Make sure sentences are grammatical. 3. Avoid complicated grammatical constructions. 4. Avoid words which have several meanings. 5. In technical documents, only use technical words and terms which are well established, well defined and known to the system. Types of Machine Translation according to the Users 1. Machine Translation for Watcher (MT-W) 2. Machine Translation for Revisers (MT-R) 3. Machine Translation for Translators (MT-T) 4. Machine Translation for Authors (MT-A) Machine Translation for Watcher (MT-W) This is intended for readers who wanted to gain access to some information written in foreign language who are also prepared to accept possible bad translation rather than nothing. This came in with the need to translate military technological documents. This was almost the dictionary based translation far away from linguistic based machine translation. OR RBMT Machine Translation for Revisers (MT-R) This type aims at producing raw translation automatically with a quality comparable to that of the first drafts produced by human. The translation output can be considered only as brush-up so that the professional translator freed from that very boring and time consuming task can be promoted to revisers. Machine Translation for Translators (MT-T) This aims at helping human translators do their job by providing on-line dictionaries, thesaurus and translation memory. This type of machine translation system is usually incorporated into the translation work stations and the PC based translation tools. And those systems running on standard platforms and integrated with several text processors are the ones that attained operational and commercial success. Machine Translation for Authors (MT-A). This aims at authors wanting to have their texts translated into one or several languages and accepting to write under control of the system or to help the system disambiguate the utterance so that satisfactory translation can be obtained without any revision. This is an “interactive MT, The interaction was however done both during analysis and during transfer, and not by authors, but by specialists of the system and language(s).” In short, there have been no operational successes yet in MT-A, but the designs are becoming increasingly user oriented and geared towards the right kind of potential users, people users, people needing to produce translations, preferably into several languages. Evaluation of Machine Translation Systems Evaluating Machine translation system is important not only for its potential users and buyers, also to researchers and developers. Various types of evaluation have been developed, such as : 1. BLEU (BiLingual Evaluation Understudy) 2. WER (Word Error Rate) 3. PER (Position-independent word Error Rate) 4. TER (Translation Error Rate) 1. BLEU (BiLingual Evaluation Understudy) The BLEU metric, proposed by Papineni in 2001 was the first automatic measurement accepted as a reference for the evaluation of translations. The principle of this method is to calculate the degree of similarity between candidate (machine) translation and one or more reference translations based on the particular n-gram precision. 2. WER (Word Error Rate) It is used in machine translation to evaluate the quality of a translation hypothesis in relation to a reference translation. The idea is to calculate the minimum number of edits (insertion, deletion or substitution of the word) to be performed on hypothesis translation to make it identical to the reference translation. 3. PER (Position-independent word Error Rate) The PER metric compares the words of machine translation with those of the reference regardless of their sequence in the sentence. 4. TER (Translation Error Rate) It is defined as the minimum number of edits needed to change a hypothesis so that it exactly matches one of the references. The possible edits in TER include: 1. Insertion. 2. Deletion. 3. Substitution of single words. 4. An edit which moves sequences of contiguous words. The Importance of Machine Translation The topic of MT is one that we have found sufficiently interesting to spend most of our professional lives investigating, and we hope the reader will come to share, or at least understand, this interest. But whatever one may think about its intrinsic interest, it is undoubtedly an important topic — socially, politically, commercially, scientifically, and intellectually or philosophically. Why MT Matters 1. The social or political importance. 2. The commercial importance. 3. Scientifically importance. 4. Philosophically and intellectually importance. 1. The social or political importance of MT arises from the socio-political importance of translation in communities where more than one language is generally spoken. 2. The commercial importance of MT is a result of related factors: a. Translation itself is commercially important. b. Translation is expensive. 3. Scientifically, MT is interesting, because it is an obvious application and testing ground for many ideas in Computer Science, Artificial Intelligence, and Linguistics, and some of the most important developments in these fields have begun in MT. 4. Philosophically and intellectually, MT is interesting, because it represents an attempt to automate an activity that can require the full range of human knowledge — that is, for any piece of human knowledge, it is possible to think of a context where the knowledge is required. Popular Misconceptions 1. False: MT is a waste of time because you will never make a machine that can translate Shakespeare. 2. False: There was/is an MT system which translated The spirit is willing, but the flesh is weak into the Russian equivalent of The vodka is good, but the steak is lousy, and hydraulic ram into the French equivalent of water goat. MT is useless. 3. False: Generally, the quality of translation you can get from an MT system is very low. This makes them useless in practice. 4. False: MT threatens the jobs of translators. 5. False: The Japanese have developed a system that you can talk to on the phone. It translates what you say into Japanese, and translates the other speaker’s replies into English. 6. False: There is an amazing South American Indian language with a structure of such logical perfection that it solves the problem of designing MT systems. 7. False: MT systems are machines, and buying an MT system should be very much like buying a car. Popular Conceptions Some Facts about MT 1. True: MT is useful. The METEO system has been in daily use since 1977. As of 1990, it was regularly translating around 45 000 words daily. In the 1980s, The diesel engine manufacturers Perkins Engines was saving around £ 4 000 and up to 15 weeks on each manual translated. 2. True: While MT systems sometimes produce howlers, there are many situations where the ability of MT systems to produce reliable, if less than perfect, translations at high speed is valuable. 3. True: In some circumstances, MT systems can produce good quality output: less than 4% of METEO output requires any correction by human translators at all (and most of these are due to transmission errors in the original texts). Even where the quality is lower, it is often easier and cheaper to revise ‘draft quality’ MT output than to translate entirely by hand. 4. True: MT does not threaten translators’ jobs. The need for translation is vast and unlikely to diminish, and the limitations of current MT systems are too great. However, MT systems can take over some of the boring, repetitive translation jobs and allow human translation to concentrate on more interesting tasks, where their specialist skills are really needed. 5. True: Speech-to-Speech MT is still a research topic. In general, there are many open research problems to be solved before MT systems will be come close to the abilities of human translators. 6. True: Not only are there are many open research problems in MT, but building an MT system is an arduous and time consuming job, involving the construction of grammars and very large monolingual and bilingual dictionaries. There is no ‘magic solution’ to this. 7. True: In practice, before an MT system becomes really useful, a user will typically have to invest a considerable amount of effort in customizing it. Rule-Based Machine Translation (RBMT) VS Statistical Machine Translation (SMT) What is Machine Translation? Machine translation (MT) is automated translation or “translation carried out by a computer”, as defined in the Oxford English dictionary. It is a process, sometimes referred to as Natural Language Processing which uses a bilingual data set and other language assets to build language and phrase models used to translate text. As computational activities become more mainstream and the internet opens up the wider multilingual and global community, research and development in Machine Translation continues to grow at a rapid rate. What are the MT systems available in the market? 1. Statistical Machine Translation (SMT). 2. Rule-Based Machine Translation (RBMT). 3. Hybrid Systems, which combine RBMT and SMT. Human vs. Machine Translation What must be done in any translation, whether human or automated? In any translation, whether human or automated, the meaning of a text in the source (original) language must be fully transferred to its equivalent meaning in the target language’s translation. While on the surface this seems straightforward, it is often far more complex. Translation is never a mere word-for-word substitution. What must a human translator do? A human translator must interpret and analyze all of the elements within the text and understand how each word may influence the context of the text. What does human translator’s work require? Human translator’s work requires extensive expertise in grammar, syntax (sentence structure), semantics (meanings), etc., in the source and target languages, as well as expertise in the domain. Types of Making Machine Translation Systems (Machine Translation Engines) Rule-Based Machine Translation (RBMT) Technology On what does RBMT rely? RBMT relies on countless built-in linguistic rules and millions of bilingual dictionaries for each language pair. How does RBMT system work? The RBMT system parses text and creates a transitional representation from which the text in the target language is generated. This process requires extensive lexicons with morphological, syntactic, and semantic information, and large sets of rules. The software uses these complex rule sets and then transfers the grammatical structure of the source language into the target language. On what are RBMT systems built? Rule-based Machine Translation systems are built on gigantic dictionaries and sophisticated linguistic rules. Users can improve translation quality by adding terminology into the translation process by creating user-defined dictionaries, which override the system's default settings. While rule-based MT may bring a company to a reasonable quality threshold, the quality improvement process is generally long, expensive and needs to be carried out by trained experts. This has been a contributing factor to the slow adoption and usage of MT in the localization industry. Statistical Machine Translation (SMT) Technology What does SMT do? SMT utilizes statistical translation models generated from the analysis of monolingual and bilingual training data. What does SMT use? It uses computing power to build sophisticated data models to translate one source language into another. The translation is selected from the training data using algorithms to select the most frequently occurring words or phrases. How is SMT building models? Building SMT models is a relatively quick and simple process which involves uploading files to train the engine for a specific language pair and domain. A minimum of two million words is required to train an engine for a specific domain, however it is possible to reach an acceptable quality threshold with much less. On what does SMT technology rely? SMT technology relies on bilingual corpora such as translation memories and glossaries to train it to learn language pattern, and is uses monolingual data to improve its fluency. SMT engines will prove to have a higher output quality if trained using domain specific training data such as; medical, financial or technical domains. SMT technology is CPU intensive and requires an extensive hardware configuration to run translation models at acceptable performance levels. Because of this, cloud-base systems are preferred, whereby they can scale to meet the demands of its users without the users having to invest heavily in hardware and software costs. RBMT vs. SMT RBMT SMT RBMT can achieve good results but the SMT systems can be built in much less training and development costs are time and do not require linguistic very high for a good quality system. In experts to apply language rules to the terms of investment, the customization system. cycle needed to reach the quality threshold can be long and costly. RBMT systems are built with much less SMT models require state-of the-art data than SMT systems, instead using computer processing power and storage dictionaries and language rules to capacity to build and manage large translate. This sometimes results in a translation models. lack of fluency. Language is constantly changing, which SMT systems can mimic the style of the means rules must be managed and training data to generate output based updated where necessary in RBMT on the frequency of patterns allowing systems. them to produce more fluent output. The Verdict (A Decision-Judgement) Statistical Machine Translation technology is growing in acceptance and is by far, the clear leader between both technologies. The increasing availability of cloud-based computing is providing a solution to the high computer processing power and storage capacity required to run SMT technology effectively, making SMT a game changer for the localization industry. Training data for SMT engines is becoming more widely available, thanks to the internet and the increasing volumes of multilingual content being created by both companies and private internet users. High quality aligned bilingual corpora is still expensive and time consuming to create but, once created becomes a valuable asset to any organization implementing SMT technology, with translations benefiting from economies of scale over time. Machine Translation ENG 391 Academic Year (1445) Semester (452) Section (810) Course Dr. Khalid Abdurrahman Jabir Othman Instructor College College of Sciences and Arts – Ar Rass Tel Number 0163012144 Email [email protected] Office Hours Sunday + Monday:10-10:50, Tuseday: 9-9:50, Wednesday: 8-10 Course Objective This course aims to familiarize students with machine translation by explaining its theories and to present students to a variety of machine translation methodologies which ultimately will assist them in their practice of translation in general. Topics to be covered No Topic Week No. Contact Hours 1 Introduction to machine translation 1-2 6 Aims, needs, and state of the art of machine 3-4 6 2 translation.-Advantages and disadvantages of machine translation 3 Theory and Approaches for MT 5-6 3 4 Discussion of Pre-editing and Post-editing 7-8-9 3 5 Human translation theory and practice. 10-11 6 6 The Future of Machine Translation 12-13 6 Presentation of various free automatic translation 14-15 3 7 tools on web and evaluation of results. Samples of Google translation (In-class Practice) All Along 12 8 Prescribed textbook Nirenburg, Sergei (ed.) (1987) "Machine Translation: Theoretical and Methodological Issues", Cambridge University Press. “A compiled coursebook is available” Assessment Tools No Assessment task Week due Assessment Tools 1 Translation Practice All along 10% 2 Quiz-1 5 5% 3 Quiz-2 11 5% 4 Project 12 5% 5 Midterm Exam 9 25% 6 Final Exam TBA 50% General Introduction and Brief History The mechanization of translation has been one of humanity’s oldest dreams. In the twentieth century it has become a reality, in the form of computer programs capable of translating a wide variety of texts from one natural language into another. There are no ‘translating machines’ which can take any text in any language and produce a perfect translation in any other language without human intervention or assistance. What has been achieved is the development of programs which can produce ‘raw’ translations of texts in relatively well-defined subject domains, which can be revised to give good- quality translated texts at an economically viable rate or which in their unedited state can be read and understood by specialists in the subject for information purposes. In some cases, with appropriate controls on the language of the input texts, translations can be produced automatically that are of higher quality needing little or no revision. These are solid achievements by what is now traditionally called Machine Translation (henceforth in this book, MT), but they have often been obscured and misunderstood. What is the public perception of Machine Translation? 1. There are those who are unconvinced that there is anything difficult about analyzing language, since even young children are able to learn languages so easily; and who are convinced that anyone who knows a foreign language must be able to translate with ease. 2. There are those who believe that because automatic translation of Shakespeare, Goethe, Tolstoy and lesser literary authors is not feasible there is no role for any kind of computer-based translation. The aims of Machine Translation What are the fields for the great majority of professional translators? 1. Translations of scientific and technical documents, 2. Commercial and business transactions, 3. Administrative memoranda, 4. Legal documentation, 5. Instruction manuals, 6. Agricultural and medical text books, 7. Industrial patents, publicity leaflets, 8. Newspaper reports, etc. How is the practical Text usefulness of an MT system determined The practical usefulness of an MT system is determined ultimately by the quality of its output. But what counts as a ‘good’ translation, whether produced by human or machine, is an extremely difficult concept to define precisely. Much depends on the particular circumstances in which it is made and the particular recipient for whom it is intended. What are the criteria which can be applied to determine the practical usefulness of an MT system? 1. Fidelity. 2. Accuracy. 3. Intelligibility. 4. Appropriate style and register. The major obstacles to translating by computer are, as they have always been, not computational but linguistic. The problems of: 1. lexical ambiguity, 2. syntactic complexity, 3. vocabulary differences between languages, 4. elliptical and ‘ungrammatical’ constructions, 5. extracting the ‘meaning’ of sentences and texts from analysis of written signs and producing sentences and texts in another set of linguistic symbols with an equivalent meaning. MT is not in itself an independent field of 'pure' research. It takes from 1. linguistics, 2. computer science, 3. artificial intelligence, 4. translation theory, 5. any ideas, methods and techniques which may serve the development of improved systems. What is machine Translation? The term Machine Translation (MT) is the now traditional and standard name for computerised systems responsible for the production of translations from one natural language into another, with or without human assistance. How can the translation quality of MT be improved? The translation quality of MT systems may be improved by: 1. Developing better methods. 2. Imposing certain restrictions on the input. According to what are MT systems designed? (1) according to number of languages: a. for one particular pair of languages (bilingual systems). b. for more than two languages (multilingual systems). (2) according to translation direction: a. in one direction only (uni-directional systems). b. in both directions (bi-directional systems). What are the three basic types in the overall system design? 1. The direct translation approach: The MT system is designed in all details specifically for one particular pair of languages in one direction, the source language and the target language. Source texts are analysed no more than necessary for generating texts in the other language. 2. The interlingua approach: Which assumes the possibility of converting texts to and from ‘meaning’ representations common to more than one language. 3. The less ambitious transfer approach: Rather than operating in two stages through a single interlingual meaning representation, there are three stages involving, usually, syntactic representations for both source and target texts. What are the stages of translation in the interlingua approach? 1. From the source language to the interlingua, 2. And from the interlingua into the target language. What are the stages of translation in the less ambitious transfer approach? 1. The first stage converts texts into intermediate representations in which ambiguities have been resolved irrespective of any other language. 2. In the second stage these are converted into equivalent representations of the target language; and 3. In the third stage, the final target texts are generated. Analysis and Generation programs are specific for particular languages and independent of each other. Differences between languages, in vocabulary and structure, are handled in the intermediary transfer program. stages of analysis and generation Within the stages of analysis and generation, most MT system exhibit clearly separated components dealing with different levels of linguistic description. Hence, analysis may be divided into: 1. morphological analysis. 2. syntactic analysis. 3. semantic analysis. History of Machine Translation Introduction Over the years, Machine Translation has been a focus of investigations by linguists, psychologists, philosophers, computer scientists and engineers. It will not be an exaggeration to state that early work on MT contributed very significantly to the development of such fields as computational linguistics, artificial intelligence and application-oriented natural language processing. Definition Machine translation, commonly known as MT, can be defined as “translation from one natural language (source language (SL)) to another language (target language (TL)) using computerized systems and, with or without human assistance. What are the four (4) points through which the MT has been developed? 1. Surveys the chronological development of machine translation. 2. The different approaches developed (linguistic and computational). 3. The types of machine translation. 4. How to evaluate a machine translation. History of Machine Translation Although we may trace the origins of machine translation (MT) back to seventeenth century ideas of universal (and philosophical) languages and of ‘mechanical’ dictionaries, it was not until the twentieth century that the first practical suggestions could be made. The history of machine translation can be divided into five (5) periods 1. First period (1948-1960): The beginning. 2. Second Period (1960-1966): Parsing and disillusionment. 3. Third period (1966-1980): New birth and hope. 4. Fourth Period (1980-1990): Japanese invaders. 5. Fifth Period (since 1990): The Web and the new vague of translators.