Podcast
Questions and Answers
Which of the following is the most accurate description of a regular expression (regex)?
Which of the following is the most accurate description of a regular expression (regex)?
- A module used for complex mathematical calculations.
- A sequence of characters that defines a search pattern. (correct)
- A method within the string module to find substrings.
- A specific data type in Python used for pattern matching.
Which Python module is primarily used for working with regular expressions?
Which Python module is primarily used for working with regular expressions?
- regex
- string
- pattern
- re (correct)
Which of the following functions from the re
module returns a match object if a pattern is found anywhere in a string?
Which of the following functions from the re
module returns a match object if a pattern is found anywhere in a string?
- findall()
- search() (correct)
- sub()
- split()
What does the re.search()
function return if it cannot find a match for the specified regex pattern in the given string?
What does the re.search()
function return if it cannot find a match for the specified regex pattern in the given string?
If result = re.search('123', 'foo123bar')
, what would result.span()
evaluate to?
If result = re.search('123', 'foo123bar')
, what would result.span()
evaluate to?
Which of the following metacharacters, when placed inside square brackets []
, negates the character class?
Which of the following metacharacters, when placed inside square brackets []
, negates the character class?
What is the function of the '.' metacharacter in regular expressions?
What is the function of the '.' metacharacter in regular expressions?
Which special sequence matches any digit character?
Which special sequence matches any digit character?
What does the special sequence \b
do in a regular expression?
What does the special sequence \b
do in a regular expression?
Which of the following is true regarding the use of raw strings in regular expressions?
Which of the following is true regarding the use of raw strings in regular expressions?
What does the quantifier *
do in a regular expression?
What does the quantifier *
do in a regular expression?
Given the regex a+
, which of the following strings would not be a match?
Given the regex a+
, which of the following strings would not be a match?
Given the regex foo[0-9]{3}bar
, which of the following strings would match?
Given the regex foo[0-9]{3}bar
, which of the following strings would match?
What is the purpose of grouping constructs in regular expressions?
What is the purpose of grouping constructs in regular expressions?
If you have a match object m
, and you want to retrieve all captured groups as a tuple, which method should you use?
If you have a match object m
, and you want to retrieve all captured groups as a tuple, which method should you use?
What do backreferences in regular expressions allow you to do?
What do backreferences in regular expressions allow you to do?
What is the purpose of a non-capturing group in regular expressions, denoted by (?:regex)
?
What is the purpose of a non-capturing group in regular expressions, denoted by (?:regex)
?
What is a key benefit of using a non-capturing group (?:...)
in a regular expression?
What is a key benefit of using a non-capturing group (?:...)
in a regular expression?
What does a lookahead assertion do in a regular expression?
What does a lookahead assertion do in a regular expression?
In a regular expression, what is the key characteristic of lookahead and lookbehind assertions?
In a regular expression, what is the key characteristic of lookahead and lookbehind assertions?
What is the difference between a positive lookahead (?=...)
and a negative lookahead (?!...)
assertion?
What is the difference between a positive lookahead (?=...)
and a negative lookahead (?!...)
assertion?
Which of the following is the correct usage for the 'or' operator in regular expressions?
Which of the following is the correct usage for the 'or' operator in regular expressions?
If you want to match either 'cat' or 'dog' in a string, what regex pattern would you use?
If you want to match either 'cat' or 'dog' in a string, what regex pattern would you use?
Which function from the re
module would you use to find all non-overlapping matches of a pattern in a string, returning them as a list?
Which function from the re
module would you use to find all non-overlapping matches of a pattern in a string, returning them as a list?
Which re
module function is best suited for replacing occurrences of a pattern with a replacement string?
Which re
module function is best suited for replacing occurrences of a pattern with a replacement string?
What is the purpose of flags in re.search()
and other regular expression functions?
What is the purpose of flags in re.search()
and other regular expression functions?
Which flag makes alphabetic character matching case-insensitive?
Which flag makes alphabetic character matching case-insensitive?
What is the effect of the re.MULTILINE
flag?
What is the effect of the re.MULTILINE
flag?
Which flag causes the dot (.
) metacharacter to match newline characters as well?
Which flag causes the dot (.
) metacharacter to match newline characters as well?
What is the primary purpose of the re.VERBOSE
flag?
What is the primary purpose of the re.VERBOSE
flag?
What is the output of bool(re.search('abc', 'def'))
?
What is the output of bool(re.search('abc', 'def'))
?
If you have result = re.search('[0-9]{2}', 'abc12def')
, what would result.group()
return?
If you have result = re.search('[0-9]{2}', 'abc12def')
, what would result.group()
return?
What is the output of the following code?
import re
s = 'hello world'
result = re.search('^hello', s)
print(bool(result))
What is the output of the following code?
import re
s = 'hello world'
result = re.search('^hello', s)
print(bool(result))
What is the output of the following code?
import re
s = 'hello world'
result = re.search('world$', s)
print(result.group())
What is the output of the following code?
import re
s = 'hello world'
result = re.search('world$', s)
print(result.group())
What does re.search(r'\d+', 'abc 123 def').group()
return?
What does re.search(r'\d+', 'abc 123 def').group()
return?
What will the following code print?
import re
text = "The cat in the hat."
result = re.search(r"t.e", text, re.IGNORECASE)
print(result.group())
What will the following code print?
import re
text = "The cat in the hat."
result = re.search(r"t.e", text, re.IGNORECASE)
print(result.group())
What will the following code print?
import re
text = "apple, banana, cherry"
result = re.split(r",\s*", text)
print(result)
What will the following code print?
import re
text = "apple, banana, cherry"
result = re.split(r",\s*", text)
print(result)
What will be the output of the code?
import re
text = '123abc456def'
new_text = re.sub(r'\d+', '#', text)
print(new_text)
What will be the output of the code?
import re
text = '123abc456def'
new_text = re.sub(r'\d+', '#', text)
print(new_text)
What is the output of the following Python code snippet?
import re
string = 'hello123world'
pattern = r'(\D+)(\d+)(\D+)'
match = re.search(pattern, string)
print(match.groups())
What is the output of the following Python code snippet?
import re
string = 'hello123world'
pattern = r'(\D+)(\d+)(\D+)'
match = re.search(pattern, string)
print(match.groups())
What does the regular expression r'\btest\b'
match?
What does the regular expression r'\btest\b'
match?
Which of the following regular expressions correctly matches a string that starts with 'foo', followed by any number of digits, and ends with 'bar'?
Which of the following regular expressions correctly matches a string that starts with 'foo', followed by any number of digits, and ends with 'bar'?
Flashcards
Regular Expression (RegEx)
Regular Expression (RegEx)
A sequence of characters that forms a complex string-matching pattern.
re Module in Python
re Module in Python
A module in Python used for working with regular expressions.
re.search()
re.search()
Returns a Match object if there is a match anywhere in the string.
Falsy Values
Falsy Values
Signup and view all the flashcards
Truthy Values
Truthy Values
Signup and view all the flashcards
Metacharacters
Metacharacters
Signup and view all the flashcards
[0-9] in Regex
[0-9] in Regex
Signup and view all the flashcards
Dot (.) Metacharacter
Dot (.) Metacharacter
Signup and view all the flashcards
Special Sequence (Regex)
Special Sequence (Regex)
Signup and view all the flashcards
\w in Regex
\w in Regex
Signup and view all the flashcards
\W in Regex
\W in Regex
Signup and view all the flashcards
\d in Regex
\d in Regex
Signup and view all the flashcards
\D in Regex
\D in Regex
Signup and view all the flashcards
\B in Regex
\B in Regex
Signup and view all the flashcards
\b in Regex
\b in Regex
Signup and view all the flashcards
Raw string
Raw string
Signup and view all the flashcards
- Quantifier
- Quantifier
Signup and view all the flashcards
- Quantifier
- Quantifier
Signup and view all the flashcards
? Quantifier
? Quantifier
Signup and view all the flashcards
{m} Quantifier
{m} Quantifier
Signup and view all the flashcards
Grouping Constructs
Grouping Constructs
Signup and view all the flashcards
m.groups()
m.groups()
Signup and view all the flashcards
m.group()
m.group()
Signup and view all the flashcards
Backreferences
Backreferences
Signup and view all the flashcards
re.IGNORECASE
re.IGNORECASE
Signup and view all the flashcards
findall()
findall()
Signup and view all the flashcards
split()
split()
Signup and view all the flashcards
sub()
sub()
Signup and view all the flashcards
re.MULTILINE
re.MULTILINE
Signup and view all the flashcards
re.DOTALL
re.DOTALL
Signup and view all the flashcards
Study Notes
- Regular Expressions (RegEx) constitute a sequence of characters shaping a complex string-matching pattern
- The
re
module is used in Python for regular expressions
Key Resources
- https://docs.python.org/3/library/re.html#re.IGNORECASE
- https://realpython.com/regex-python/
- https://www.programiz.com/python-programming/regex
- https://www.w3schools.com/python/python_regex.asp
Matching a Substring
- When you have a string object referred to as
s
, you can ascertain ifs
contains a specific substring, such as'123'
, using Python code - One method is available to determine if a string object contains a specified substring
- The
.find()
or.index()
methods locate the position of a substring like'123'
- These
find()
orindex()
methods are integrated into the string module of Python
Python's Regular Expressions
- Simple character-by-character comparisons are effective in many situations, but, might not always be sufficient for complex string matching
- Regular expressions can identify a sequence of three consecutive decimal digits within strings like
'foo123bar'
,'foo456bar'
,'234baz'
, and'qux678'
- Python's regular expressions (regexes) solve complex matching issues
The ‘re’ Module
- A built-in package in Python called
re
is used when working with Regular Expressions - One must import the
re
module to search a string for matches with regular expressions usingimport re
Key Functions
findall
gets a list containing all matchessearch
gets a Match object if there is a match anywhere in the stringsplit
gets a list where the string has been split at each matchsub
replaces one or many matches with a string
The re.search() Function
- The focus will be on the
re.search()
function for regex matching re.search(<regex>, <string>)
scans a string for a regex matchre.search(<regex>, <string>)
scans thestring
to find the first location where the pattern matchesre.search()
gets a match object if a match is found; otherwise,None
is returned
Examining the Returned Match Object
- The search pattern is
123
and iss
, and the returned match object will be shown - A successful call returned a match object rather than
None
span=(3, 6)
show the indices where the match was found found in the strings[3:6] # '123'
means the same as if the substring was obtained through slice notation, i.e.s[3:6]
match='123'
indicates the characters from the string that matched
Truthiness of Match Objects
- A match object is "truthy", enabling its use in a Boolean context such as a conditional statement
- A value such as
False
is considered Falsy - A value such as
True
is considered Truthy
Truthy and Falsy Values according to Python Documentation
- A default object is
True
- Non-empty sequences and/or collections (lists, tuples, strings, dictionaries, sets) are
True
- Numeric values that are not
0
areTrue
Falsy
- Empty lists, such as
[]
- Empty tuples, such as
()
- Empty dictionaries, such as
{}
- Empty sets, such as
set()
- Empty strings, such as
""
- Empty ranges, such as
range(0)
- Zero is
False
for any non-complex numeric type - Numerical Types:
- Integer: 0
- Float: 0.0
- Constants:
None
andFalse
Metacharacters
- The true strength of regex matching in Python uses special characters known as metacharacters in it's pattern
- Interpratation of metacharacters by the RegEx matching engine expands search capabilities
- A character class can determine whether a string contains any sequence of three consecutive decimal digits
- A character class is defined as a set of characters enclosed in square brackets (
[]
) - The metacharacter sequence matches any single character that belongs to the class
Commonly Used Metacharacters
[0-9]
matches any single decimal digit character, any character between'0'
and'9'
- The expression
[0-9][0-9][0-9]
matches a string containing any sequence of three decimal digit characters s
matches because it contains three consecutive decimal digit characters,'123'
- Examples of matching other numbers in strings include '465' in "foo465"
- A string that does not contain three consecutive digits will not match
Obtaining String Details with Metacharacters
- When you want details of the output, you can use the following
.start()
shows the index where the match occurred.end()
shows the index (not included) where the match finished.span()
returns a tuple including the starting and ending (not included) indexes.group()
the match case
Metacharacters Listed
Character(s) | Meaning | Character(s) | Meaning |
---|---|---|---|
. |
Matches any single character except for newline | {} |
Matches an explicitly specified number of repetitions |
^ |
Anchors a match at the start of a string, complements a character class | \ |
Escapes a metacharacter of its special meaning, Introduces a special character class |
$ |
Anchors a match at the end of a string | [] |
Specifies a character class |
* |
Matches zero or more repetitions | | |
Designates alternation |
+ |
Matches one or more repetitions | () |
Creates a group |
? |
Matches zero or one repetition, specifies the non-greedy versions of *, +, and ?, Creates a named group | : |
Designate a specialized group |
Introduces a lookahead or lookbehind assertion | # = ! |
Creates a named group | |
<> |
Enumerating Characters
- Characters contained in square brackets (
[]
) represent a character class - This is an enumerated set of characters to match from
- A character class metacharacter sequence will match any single character that is contained in the class
- Individual matching characters can be enumerated individually
Representing Ranges
- A regex pattern can include a range of characters separated by a hyphen (
-
) - This setup matches any single character that falls within that specified range
- For instance, the character class
[a-z]
matches any lowercase letter from'a'
to'z'
, inclusive [0-9]
matches any digit character and[0-9a-fA-F]
matches any hexadecimal digit character- The returned match is always the leftmost one found. the function
re.search()
scans the search string from left to right and stops as soon as it finds a match for the regex pattern, it then halts scanning
Complementation
- You can complement a character class by specifying
^
as the first character to match any character that is not in the set [^0-9]
matches any character that isn’t a digit- If a
^
character appears in a character class somewhere other than first, then the^
has no special meaning and matches a literal'^'
Other Usage Cases for Metacharacters
- Use as first or last character or escape with a backslash (
\
)
Hyphen Usage
- Hyphens can be used in three ways
- You can place it as the first or last character
- Escape it with a backslash
\
to use, instead of specifying a range of characters in a character class
Square Brackets
- Square brackets can be used in two ways
- Place it as the first character
- Escape it with
\
backslash
Other Regex Metacharacters
- They lose special meaning inside a character class
Dot metacharacter
- Matches one single character, excluding newline
- As a RegEx,
foo.bar
equals the characters "foo," any character except newline, then characters "bar"- The first string shown above,
fooxbar
means.metacharacter
matches characterx
- The first string shown above,
- Match fails when newline is encountered
Special Sequences in RegEx
- A special sequence is a backslash (
\
) that follows one of the characters in the list on the next slide - They each have special meanings
List of Special Character Sequences
Character | Description | Example |
---|---|---|
\A |
Returns a match if the specified characters are at the beginning of the string | "\AThe" |
\b |
Returns a match where the specified characters are at the beginning or end of a word, the letter "r" ensures string becomes a raw string. It is not a command, but a modifier of a string | r"\bain" |
\B |
Returns a match where the specified characters are present, but NOT at the beginning or end of a word. Letter r ensures String is being treated as a raw string |
r"\Bain" |
\d |
Returns a match where the string contains digits (numbers 0-9 ) |
"\d" |
\D |
Returns a match where the string DOES NOT contain digits | "\D" |
\s |
Returns a match where the string contains a white space character | "\s" |
\S |
Returns a match where the string DOES NOT contain a white space character | "\S" |
\w |
Returns a match where the string contains any word character (characters a-Z , digits 0-9 ), and the underscore _ character |
"\w" |
\w |
Returns a match where the string DOES NOT contain any word characters | "\W" |
Commonly Used Special Sequences
\w
and \W
Feature | \w |
\W |
---|---|---|
Matches | Any alphanumeric character | Any non-word character |
Definition | "[A-Z a-z 0-9_] " |
[^a-zA-Z0-9_] |
Type of Word Characters | Uppercase and lowercase | Not applicable. |
Characteristics | Letters, digits, and "_" | Symbolic (e.g. # @ % &*(). ) |
\d
and \D
Feature | \d |
\D |
---|---|---|
Matches | Any decimal digit | Isn't a decimal digit |
Definition | [0-9] |
[^0-9] |
Digits | Numerical | *Symbol_ like @ |
Characteristics | Recognizeable character | "Q", "?", " \ {=^~ |
\s
and \S
Feature | "\s" |
"\S" |
---|---|---|
Matches New line | Yes | No |
Is it the "Opposite match" | Yes | Yes |
Considers tab spaces and returns new line | Yes | No |
Anchors and Bolds
- Anchors in Regex are unique and specialized, they always match zero-width positions.
| ^ and
\A
| Dollar "$" and\Z
*| | :--------------------- | :---------------------------------------- | |The Beginning
| end| | Word (\b
) | not a word boundary (\B
)| - Anchors do not consume any part of the search string
- Anchors Specifies a specific location in a search for a Match
Boundaries with Raw Strings
\b
(and`\B`*) demands the use of raw strings in Python- Strings will begin with a suffix letter as
r or R
that will be used to ignore the special character
\b Special Sequence
- Anchors a match to a word boundary
- Position current position is used at the beginning or end of a word
- Common Alphanumeric Characters or underscore "_" in Regex as ([A-Za-z,0-9__])
What happens then?
- You use
\b
on the end in situations you're present - It is present in the string so it's present for the whole word
Important side note
Raw literal text that exists at the end and start as a \b
boundary.
\B - Opposite\b in Regex
- Not the start or finish
- The word "barfoobaz" exists and contains no word in *The Search**
"Escaping Metacharacters"
- Backslash
\
- Removes the special meaning of a Metacharacter
"First Example-back slash"
- The dot here represents a wildcard
- The dot
.
matches every character
"Second Example- Backarrow"
- Represent this to a new line string by defining the Literal Text
- Escape with
\
backslash
To Escape a Backslash
- you need to use raw strings
\Quantifiers
-
A* quantifier* Metaclass follows a segment and how many characters you need to define for the match to be considered
-
To the power * as 0 or greater
-
Additions use + to the power of and greater
-
If it's something you don't want to add you use
?
-
The quantifier meta characters represent both the lazy or greedy versions ### The "Regex"
-
Effectively defines what
ends
the character at the ending -
And what follows the *
<>
character
Summary of the Lazy or Greedy Versin
Greedy | Lazy |
---|---|
Longer strings are used | Shorter strings are used |
The longest is specified "\<[.+]\>" |
"\<[.+?]\>" |
What about 0? |
Additional Quantifiers
{\m}
- Exact charaters or repetitions/Quantifiers that the preceding Regex refers to{\m, \n}
Number of any that is preceding
Examples | Matches |
---|---|
Non-N-Neg. Int - Regex | Integer |
If omitted | Character is the same |
The Regex Version
-
Greedy Version = 3, 5 produces, the longest match
-
a{3, 5} produces "aaaaaaaa."
-
Then in *Regex" The shortest match"
-
A(3,5) produces A
Terms
"Grouping Constructs"
- Regex in parts
"Sub expression"
- Group represts the sinlge unit
"Additional Meta Characters"
- Applies each group like Unit
The captured text is the Unit that returns later than after |
---|
Group |
A group that is contained |
Additional Metacharacters |
Regex Terms and Applications
What expression does | How to solve it |
---|---|
What happens in a string? | \b must use raw* string* |
You may create groups | Additional MetaCharacters and units of units |
"Groups and Syntax
- What follows?
"One or more" of string
'bar'
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.