Regular Expressions in Python

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

Which function is used to create Regex objects?

  • Regex.new()
  • re.build()
  • re.compile() (correct)
  • re.create()

Raw strings are not necessary when creating Regex objects.

False (B)

What method would you use to find all occurrences of a pattern in a string?

findall()

In the regex pattern r'( ext{test})', the '\b' denotes a _______.

<p>word boundary</p> Signup and view all the answers

Match the following regex characters with their meanings:

<ul> <li>= One or more occurrences</li> </ul> <ul> <li>= Zero or more occurrences ? = Zero or one occurrence {} = Specifies a number or range of occurrences</li> </ul> Signup and view all the answers

What does the | character signify in regular expressions?

<p>Logical OR (A)</p> Signup and view all the answers

The .* pattern matches any character except newline by default.

<p>True (A)</p> Signup and view all the answers

How can you make a regex search case-insensitive?

<p>By using re.IGNORECASE or '(?i)' in the pattern.</p> Signup and view all the answers

Flashcards

Regex object creation

Regex objects are created using the re.compile() function.

Raw strings in Regex

Raw strings prevent escape sequence interpretation in Regex.

search() method return

The search() method returns a Match object or None if no match.

Getting matches from Match object

Use the group() method to retrieve matched strings from a Match object.

Signup and view all the flashcards

Matching actual parentheses in Regex

Use backslashes: ( for '(' and . for '.'.

Signup and view all the flashcards

Meaning of | in Regex

The | character signifies a logical OR between patterns.

Signup and view all the flashcards

Difference between + and *

'+' requires one or more occurrences, '*' allows zero or more occurrences.

Signup and view all the flashcards

Case-insensitive Regex

Use re.IGNORECASE to make a regex search case-insensitive.

Signup and view all the flashcards

Study Notes

Regular Expression Functions and Methods

  • re.compile() creates Regex objects.
  • Raw strings (prefixed with 'r') are often used for Regex patterns to avoid backslash escapes.
  • search() returns a Match object if a match is found; otherwise, it returns None.
  • To get the matched strings from a Match object, use the group() method. For instance, match.group(0) returns the entire match, match.group(1) returns the first captured group, and so on.

Regex Specifics with Examples

  • r'(\d\d\d)-(\d\d\d-\d\d\d\d)': This regex matches a phone number with the format ###-###-####.
    • group(0) covers the entire match (e.g., '123-456-7890').
    • group(1) covers the first set of three digits (e.g., '123').
    • group(2) covers the second set of three digits followed by four digits (e.g., '456-7890').
  • To match literal parentheses or periods, escape them with a backslash (e.g., \(, \) or \.)
  • findall() returns a list of strings if the regex doesn't contain capturing groups, or a list of tuples of strings if capturing groups are present.
  • | signifies OR (alternation) within a regular expression
  • ? signifies zero or one occurrences of the preceding element, or non-greedy match
  • + signifies one or more occurrences of the preceding element.
  • * signifies zero or more occurrences of the preceding element.
  • {3} signifies exactly three occurrences of the preceding element.
  • {3,5} signifies between three and five occurrences of the preceding element.
  • \d matches any digit (0-9)
  • \w matches any alphanumeric character (a-z, A-Z, 0-9, _)
  • \s matches any whitespace character (space, tab, newline).
  • \D, \W, \S match the complements of \d, \w and \s respectively (non-digits, non-alphanumerics, non-whitespace)
  • .* is greedy (matches as much text as possible); .*? is non-greedy (matches as little text as possible).
  • [0-9a-z] matches any lowercase letter or digit.
  • To make a regex case-insensitive, use the re.IGNORECASE (or re.I) flag.
  • . normally matches any character except a newline. re.DOTALL (or re.S) makes . match any character including a newline.
  • numRegex = re.compile(r'\d+'): The sub() method with this regex can replace all sequences of digits with 'X'.
  • re.VERBOSE allows for more readable regexes by ignoring whitespace and adding comments.
  • Regex for numbers with commas every three digits: r'\d{1,3}(,\d{3})*
  • Regex for names with a capitalized first name followed by last name "Watanabe": r'[A-Z][a-z]+\sWatanabe'
  • Regex for sentences matching specific conditions with case-insensitive: r'(Alice|Bob|Carol)\s(eats|pets|throws)\s(apples|cats|baseballs)\.', case-insensitive.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

More Like This

Java Regular Expressions
18 questions
REGEX
322 questions

REGEX

ConciseAndradite avatar
ConciseAndradite
Expressions régulières
10 questions
Use Quizgecko on...
Browser
Browser