Podcast
Questions and Answers
What is the correct tokenization for 'Finland's capital' in English?
What is the correct tokenization for 'Finland's capital' in English?
- Finland’s
- Finland's capital
- Finland (correct)
- Finlands
What is the correct tokenization for 'state of the art' in English?
What is the correct tokenization for 'state of the art' in English?
- the art
- state of the art (correct)
- state
- state of
What is the correct tokenization for 'L'ensemble' in French?
What is the correct tokenization for 'L'ensemble' in French?
- L'ensemble (correct)
- L . ensemble
- L’ . ensemble
- Le ensemble
Which of the following is a correct tokenization for 'what're' in English?
Which of the following is a correct tokenization for 'what're' in English?
In French, how should 'L'ensemble' be tokenized?
In French, how should 'L'ensemble' be tokenized?
What is the correct tokenization for 'Lebensversicherungsgesellschaftsangestellter' in German?
What is the correct tokenization for 'Lebensversicherungsgesellschaftsangestellter' in German?
Study Notes
Tokenization
- 'Finland's capital' in English should be tokenized as ['Finland', "'s", 'capital']
- 'state of the art' in English should be tokenized as ['state', 'of', 'the', 'art']
- 'L'ensemble' in French should be tokenized as ['L', "ensemble"]
- 'what're' in English can be correctly tokenized as ['what', "'re"]
- 'L'ensemble' in French should be tokenized as ['L', "ensemble"]
- 'Lebensversicherungsgesellschaftsangestellter' in German should be tokenized as a single token, as it is a compound word.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Description
Test your knowledge on tokenization and its issues in English and French language. Learn about identifying and handling tokenization errors in different contexts.