Tokenization

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson
Download our mobile app to listen on the go
Get App

Questions and Answers

Which of the following accurately describes tokenization in Chinese and Japanese?

  • Only Chinese has no spaces between words
  • Only Japanese has no spaces between words
  • Chinese and Japanese have no spaces between words (correct)
  • Words are separated by spaces in both languages

What makes tokenization in Japanese more complicated?

  • The use of kanji characters
  • The use of romaji
  • The lack of spaces between words
  • The use of multiple alphabets (correct)

What is an example of a challenge in tokenization for Japanese dates and amounts?

  • Multiple formats (correct)
  • Use of kanji characters
  • Use of romaji
  • Lack of spaces between words

Which of the following is an issue in English tokenization?

<p>Capitalization of proper nouns (C)</p> Signup and view all the answers

What is the correct tokenization for the French word 'L'ensemble'?

<p>L'ensemble (C)</p> Signup and view all the answers

What is the correct tokenization for the German compound noun 'Lebensversicherungsgesellschaftsangestellter'?

<p>Lebensversicherungsgesellschaftsangestellter (A), Lebensversicherungsgesellschaftsangestellter (D)</p> Signup and view all the answers

What is an example of a challenge in English tokenization?

<p>Tokenizing acronyms like PhD. (B)</p> Signup and view all the answers

What is the correct tokenization for the French word 'L'ensemble'?

<p>L'ensemble (C)</p> Signup and view all the answers

What is the correct tokenization for the German word 'Lebensversicherungsgesellschaftsangestellter'?

<p>Lebensversicherungsgesellschaftsangestellter (A)</p> Signup and view all the answers

Flashcards are hidden until you start studying

Use Quizgecko on...
Browser
Browser