DTM powerpoint 3

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson
Download our mobile app to listen on the go
Get App

Questions and Answers

What describes string patterns?

  • String sequences
  • Literal characters
  • Character classes
  • Regular expressions (correct)

What does /st.k/ describe?

  • Strings starting with `st` and ending with `k` with one symbol in between (correct)
  • Strings starting with `k` and ending with `st`
  • All strings that contain `st` and` k`
  • All of the above

What characters enclose a regular expression?

  • ``
  • ''
  • / (correct)
  • \

What is used to specify character classes in regular expressions?

<p>Square brackets <code>[]</code> (C)</p> Signup and view all the answers

What does the regular expression /[Ss]pam/ allow?

<p><code>Spam</code> or <code>spam</code> (B)</p> Signup and view all the answers

In regular expressions, what character is used to negate a set?

<p>^ (D)</p> Signup and view all the answers

What does \d represent in regular expressions?

<p>Any digit (B)</p> Signup and view all the answers

What is used to indicate that a character is optional in regular expressions?

<p>? (A)</p> Signup and view all the answers

What does the Kleene star * match?

<p>Zero or more occurrences (A)</p> Signup and view all the answers

In regular expressions, what does a period . match?

<p>Any character (B)</p> Signup and view all the answers

What is used to match a fixed length in regular expressions?

<p>Curly brackets <code>{}</code> (D)</p> Signup and view all the answers

What does it mean for a regular expression to exhibit greedy matching?

<p>It matches the longest possible string (A)</p> Signup and view all the answers

How can minimal matching be achieved in regular expressions?

<p>Using the <code>?</code> character (C)</p> Signup and view all the answers

What character anchors a regular expression to the start of a string?

<p>^ (B)</p> Signup and view all the answers

If a character is used as a metacharacter, how can it be searched for literally?

<p>By preceding it with a backslash (D)</p> Signup and view all the answers

What is a practical application of named entity recognition?

<p>Finding patterns or information within strings in texts (B)</p> Signup and view all the answers

What is phonotactics?

<p>The distinction between possible and impossible sound combinations within the words of a language. (D)</p> Signup and view all the answers

What does morphosyntax primarily deal with?

<p>The distinction between possible and impossible morpheme and word combinations in a language. (A)</p> Signup and view all the answers

What is a string in the context of digital tools and methods?

<p>A sequence of symbols. (D)</p> Signup and view all the answers

What levels of language can strings correspond to?

<p>Letter, word, sound and sentence sequences. (D)</p> Signup and view all the answers

What is the main function of grep?

<p>To search text files using regular expressions. (C)</p> Signup and view all the answers

On which operating systems is grep typically standard?

<p>Unix, Linux, and MacOSX (C)</p> Signup and view all the answers

What is egrep?

<p>A version of <code>grep</code> that supports extended regular expressions. (C)</p> Signup and view all the answers

What purpose does the backslash (\) serve in regular expressions, particularly with special characters?

<p>It escapes the special character, treating it as a literal. (B)</p> Signup and view all the answers

What is the Kleene Star character?

<ul> <li>(B)</li> </ul> Signup and view all the answers

What does NER mean?

<p>Named Entity Recognition (C)</p> Signup and view all the answers

What is a tool associated with NER?

<p>egrep (B)</p> Signup and view all the answers

What is the purpose of curly brackets {} in regular expressions?

<p>match a fixed length (A)</p> Signup and view all the answers

What do meta-characters have in REs?

<p>special meaning (C)</p> Signup and view all the answers

What character can be used to specify character classes?

<p><code>[]</code> (C)</p> Signup and view all the answers

What type of character is the .?

<p>a special character (C)</p> Signup and view all the answers

What RE is considered the most general?

<p><code>.*</code> (A)</p> Signup and view all the answers

What provides a short way of writing long disjunctions?

<p>shorter ranges (C)</p> Signup and view all the answers

When searching for special characters in text, what action must be taken?

<p>they require escaping (C)</p> Signup and view all the answers

What does 'minimal matching' involve?

<p>using '?' if you want to do minimal matching (A)</p> Signup and view all the answers

What is a 'string' defined as?

<p>A sequence of symbols. (C)</p> Signup and view all the answers

At which of these levels can language correspond to strings?

<p>All of the above. (D)</p> Signup and view all the answers

What do regular expressions (REs) primarily describe?

<p>String patterns. (C)</p> Signup and view all the answers

What is the start and end symbol for regular expressions?

<p>/ (D)</p> Signup and view all the answers

What do square brackets [ ] specify in regular expressions?

<p>Character classes. (D)</p> Signup and view all the answers

According to the examples, what does [A-Z] represent in regular expressions?

<p>Any uppercase letter. (B)</p> Signup and view all the answers

What does the ? symbol indicate in a regular expression?

<p>That a character is optional. (A)</p> Signup and view all the answers

In regular expressions, what is the function of the plus sign +?

<p>Matches one or more occurrences. (C)</p> Signup and view all the answers

What character is used as a wildcard to match any single character?

<p>. (D)</p> Signup and view all the answers

What is the most general regular expression?

<p>/.*/ (C)</p> Signup and view all the answers

Which of the following is matched by hello{3}?

<p>hellooo (A)</p> Signup and view all the answers

What does 'greedy' matching refer to in the context of regular expressions?

<p>Matching as much as possible. (B)</p> Signup and view all the answers

The expression /ab.*d/ when used on abcdaaad would find what match?

<p>abcdaaad (D)</p> Signup and view all the answers

Which character facilitates 'minimal matching' in regular expressions?

<p>? (C)</p> Signup and view all the answers

What does ^ signify in regular expressions?

<p>The beginning of a string. (D)</p> Signup and view all the answers

What character is used to anchor an expression to the end of a string?

<p>$ (C)</p> Signup and view all the answers

In regular expressions, what does 'escaping' a character mean?

<p>Treating a metacharacter as a literal character. (B)</p> Signup and view all the answers

What character escapes metacharacters?

<p>\ (D)</p> Signup and view all the answers

What is phonotactics related to?

<p>Possible sound combinations in a language. (C)</p> Signup and view all the answers

What is morphosyntax?

<p>The study of possible morpheme and word combinations. (D)</p> Signup and view all the answers

What can grep be used for?

<p>Searching text files using regular expressions. (D)</p> Signup and view all the answers

Which operating systems typically include grep as a standard tool?

<p>Unix, Linux, and macOS. (D)</p> Signup and view all the answers

What does egrep generally stand for?

<p>Extended Grep. (C)</p> Signup and view all the answers

What task does NER help with?

<p>Identifying named entities in text. (B)</p> Signup and view all the answers

What is a key part of NER?

<p>Formulating smart regular expressions. (D)</p> Signup and view all the answers

What is a primary goal of NER?

<p>To balance hits with minimizing false alarms. (B)</p> Signup and view all the answers

What kind of information does Nederlandse Voornamenbank provide?

<p>Information about first names in the Netherlands. (A)</p> Signup and view all the answers

Flashcards

What is a string?

A sequence of symbols.

What do Regular Expressions (REs) do?

Describe string patterns.

What do Regular Expressions provide?

A language for specifying search patterns.

What does disjunction in REs do?

Specifies alternative character choices.

Signup and view all the flashcards

What does the Kleene star (*) do?

Matches zero or more occurrences of a pattern.

Signup and view all the flashcards

What does the Kleene plus (+) do?

Matches one or more occurrences of a pattern.

Signup and view all the flashcards

What does the wildcard character (.) do?

Matches any single character.

Signup and view all the flashcards

How to match a specific length in REs?

Match a fixed length using { }.

Signup and view all the flashcards

What does the caret (^) do in REs?

Anchors the match to the beginning of the string.

Signup and view all the flashcards

What does the dollar sign ($) do in REs?

Anchors the match to the end of the string.

Signup and view all the flashcards

What are metacharacters?

Special characters in regular expressions.

Signup and view all the flashcards

What is grep?

A program for searching in text files using regular expressions.

Signup and view all the flashcards

What is Named Entity Recognition (NER)?

The identification of names, dates, addresses in text.

Signup and view all the flashcards

Study Notes

  • Regular expressions are a toolset rooted in a fundamental theoretical concept.
  • Regular expressions describe patterns
  • /st.k/ describes strings starting with "st", ending with "k", and having one symbol in between
  • Example strings include stak, stbk, and stck
  • Patterns depend on the symbols

Regular Expression Syntax

  • Use / to start and end symbols
  • Simple strings of characters can be used, such as /c/, /A100/, /natural language/, and /30 years!/

Disjunction

  • Ordinary disjunction examples: /devoured|ate/ and /famil(y|ies)/
  • Character classes are specified using square brackets
  • /[Ss]pam/ matches Spam or spam
  • /[Tt]he/ matches The or the
  • /bec[oa]me/ matches become or became
  • Ranges are short ways to write disjunctions
  • [A-Z] matches uppercase letters
  • [a-z] matches lowercase letters
  • [0-9] matches digits
  • Character classes can be combined
  • [A-Za-z] matches any letter
  • [A-Za-z0-9] matches alphanumeric characters
  • Sets can be negated with ^
  • [^Ss] matches neither S nor s
  • [^A-Z] matches not an uppercase letter
  • [^A-Za-z0-9] matches not an alphanumeric character
  • Shorter ranges examples:
  • \d matches any digit
  • \s matches whitespace
  • \w matches alphanumeric characters, including underscore
  • Negations are in uppercase
  • \D matches non-digits
  • \S matches non-whitespace
  • \W matches non-alphanumeric characters

Optionality

  • '?' indicates an optional character
  • /colou?r/ matches color or colour
  • Use parentheses to list optional multi-character sequences
  • /hello(oooooo)?/ matches hello or hellooooooo

Kleene Star

  • Kleene * matches zero or more occurrences
  • /a*/ allows zero or any number of a's in a row.
  • Example:/abaab*a/
  • Valid sequences include: abaaba, abaaaaaaaba, ba, baa, aabaaaabbbbbbbb, abba

Kleene Plus

  • Kleene + matches one or more occurrences
  • /a+/ accepts one or more a's in a row
  • Example: /abaab+a/
  • Acceptable strings include: abaaba, abaaaaaaaba, ba, baa, baba
  • [0-9]* years and [0-9]+ dollars are more examples

Wildcard Character

  • Use '.' to match anything
  • Example: /beg.n/ (begin, began, begqn, beg!n, etc.)
  • The most general regular expression is /.*/

Scope

  • Curly brackets {} match a fixed length
  • /hello{3}/ matches hellooo
  • Default is 'greedy' matching
  • /ab.*d/ in abcdaaad matches abcdaaad
  • Use '?' for minimal matching
  • /ab.*?d/ in abcdaaad matches abcd

Anchors

  • Regular expressions can be anchored to the start or end of a string
  • ^ marks the start of a string
  • $ marks the end of a string
  • /^abc/ anchors at the start
  • /xyz$/ anchors at the end

Special Characters

  • Metacharacters have special meanings: ^ $ * + ? { } [ ] \ | ( )
  • These characters require escaping with \ when searching

Regular Expression Applications

  • Regular expressions are used in many search scenarios, including document retrieval, web search, and NER
  • Utilized in word processing for spelling variants, errors, and computation of frequencies from corpora
  • Many Unix tools, editors, and programming languages incorporate regular expressions
  • Implementations are efficient for searching large text files
  • Tools and languages differ in the exact syntax

Grep

  • Program for searching text files using regular expressions
  • Standard on Unix, Linux, and Mac OSX, also available for Windows
  • egrep is an extended version to support the full set of operators

Grep Examples

  • 'and' in f.txt matches and, Ayn Rand, and Candy
  • 'the year [0-9][0-9][0-9][0-9]' in f.txt matches the year 1776, the year 1812, and the year 2001
  • 'why?' in f.txt matches why?, while 'why?' matches why
  • 'couch|sofa' in f.txt matches couch or sofa
  • 'un(interest|excit)ing' in f.txt matches uninteresting or unexciting
  • 'o.e' in f.txt matches ore, one, and ole
  • 'a*rgh' in f.txt matches argh, aargh, and aaargh
  • 'sha(la)*' in f.txt matches sha, shala, and shalala
  • 'john+y' in f.txt matches johny and johnny, but not johy
  • 'joh?n' in f.txt matches jon and john

Named Entity Recognition (NER)

  • The challenge is to identify names, dates, addresses, etc. in a text
  • Relies on formulating smart regular expressions
  • The goal is to maximize hits while minimizing false alarms

Exam Practice Questions

  • Need to write regular expressions that match specific words
  • Understanding regular experessions to identify patterns in language
  • Apply regular expressions for language variants

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

Use Quizgecko on...
Browser
Browser