Tokenization
9 Questions
1 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

Which of the following accurately describes tokenization in Chinese and Japanese?

  • Only Chinese has no spaces between words
  • Only Japanese has no spaces between words
  • Chinese and Japanese have no spaces between words (correct)
  • Words are separated by spaces in both languages

What makes tokenization in Japanese more complicated?

  • The use of kanji characters
  • The use of romaji
  • The lack of spaces between words
  • The use of multiple alphabets (correct)

What is an example of a challenge in tokenization for Japanese dates and amounts?

  • Multiple formats (correct)
  • Use of kanji characters
  • Use of romaji
  • Lack of spaces between words

Which of the following is an issue in English tokenization?

<p>Capitalization of proper nouns (C)</p> Signup and view all the answers

What is the correct tokenization for the French word 'L'ensemble'?

<p>L'ensemble (C)</p> Signup and view all the answers

What is the correct tokenization for the German compound noun 'Lebensversicherungsgesellschaftsangestellter'?

<p>Lebensversicherungsgesellschaftsangestellter (A), Lebensversicherungsgesellschaftsangestellter (D)</p> Signup and view all the answers

What is an example of a challenge in English tokenization?

<p>Tokenizing acronyms like PhD. (B)</p> Signup and view all the answers

What is the correct tokenization for the French word 'L'ensemble'?

<p>L'ensemble (C)</p> Signup and view all the answers

What is the correct tokenization for the German word 'Lebensversicherungsgesellschaftsangestellter'?

<p>Lebensversicherungsgesellschaftsangestellter (A)</p> Signup and view all the answers

More Like This

Use Quizgecko on...
Browser
Browser