Podcast
Questions and Answers
What is meant by 'grammatical data' in MT systems?
What is meant by 'grammatical data' in MT systems?
Why is the term 'lexicon' preferred to 'dictionary' in MT systems?
Why is the term 'lexicon' preferred to 'dictionary' in MT systems?
What is not required in MT systems?
What is not required in MT systems?
Why are grammars kept separate from lexical information in MT systems?
Why are grammars kept separate from lexical information in MT systems?
Signup and view all the answers
What is specific to each individual lexical item in the vocabulary of the languages concerned?
What is specific to each individual lexical item in the vocabulary of the languages concerned?
Signup and view all the answers
What is the primary characteristic of the direct translation design in MT systems?
What is the primary characteristic of the direct translation design in MT systems?
Signup and view all the answers
What is the function of the morphological analysis phase in first generation direct MT systems?
What is the function of the morphological analysis phase in first generation direct MT systems?
Signup and view all the answers
What is the primary limitation of the direct approach in MT systems?
What is the primary limitation of the direct approach in MT systems?
Signup and view all the answers
What is the purpose of the bilingual dictionary look-up program in first generation direct MT systems?
What is the purpose of the bilingual dictionary look-up program in first generation direct MT systems?
Signup and view all the answers
What is the characteristic of the output produced by first generation direct MT systems?
What is the characteristic of the output produced by first generation direct MT systems?
Signup and view all the answers
Study Notes
Linguistic Data in MT Systems
- Linguistic data in MT systems can be divided into two categories: lexical data and grammatical data
- Grammatical data refers to the information used by analysis and generation routines, stated in terms of acceptable combinations of categories and features
- Lexical data refers to the specific information about each individual lexical item (word or phrase) in the vocabulary of the languages concerned
Lexical Data
- Lexical data in MT systems differs from that found in conventional dictionaries
- No need for information about pronunciation, etymology, synonyms, definitions, or examples of usage
- Required information includes:
- Grammatical category
- Morphological type
- Subcategorization features
- Valency information
- Case frames
- Semantic features
- Selection restrictions
- The term lexicon is preferred to 'dictionary' due to these differences
Design Decisions and Organization of Lexical Data
- Design decisions have implications for the organization of lexical data
- In MT systems of the direct translation design, there is typically one bilingual lexicon containing data about lexical items of the source language and their equivalents in the target language
- Each entry in the lexicon combines grammatical data for the source item, its target language equivalents, and the information necessary to select between target language alternatives and change syntactic structures
Direct Translation Approach
- The direct approach is an MT strategy that lacks intermediate stages in translation processes
- Processing of the source language input text leads directly to the desired target language output text
- The approach is still valid today, but first-generation direct MT systems have a more primitive software design
- First-generation direct MT systems:
- Began with a morphological analysis phase
- Identified word endings and reduced inflected forms to their uninflected basic forms
- Input results into a large bilingual dictionary look-up program
- No analysis of syntactic structure or semantic relationships
- Lexical identification depended on morphological analysis and led directly to bilingual dictionary look-up providing target language word equivalences
- Followed by local reordering rules to give more acceptable target language output
- Target language text was produced
- The direct approach can be characterized as 'word-for-word' translation with some local word-order adjustment
- It has severe limitations, resulting in poor translation quality
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Description
This quiz is about the types of linguistic data required in Machine Translation (MT) systems, including lexical and grammatical data. It covers the information embodied in grammars and specific information about individual lexical items.