How to Use Python Regular Expressions

Regular Expressions (often abbreviated as regex or regexp) are one of the most powerful tools in programming, especially in Python. They allow you to search, match, and manipulate text with precision and efficiency. If you’ve ever wondered how to validate an email, extract data from logs, or parse complex text, then regex is your new best friend. Let’s dive deep into Python’s re module, starting from the basics and building up to advanced concepts.


What is a Regular Expression?

At its core, a regular expression is a sequence of characters that define a search pattern. Think of it as a special language for finding patterns in text. For example:

  • Pattern: cat
    Text: “The cat sat on the mat.”
    Match: Yes, the word “cat” is present.

Python provides the re module to work with regular expressions. To use it, simply import the module:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
import re
import re
import re

Basic Concepts and Syntax

1. Matching Literal Characters

A regex matches characters exactly unless you use special symbols. For example:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
pattern = r"cat"
text = "The cat is here."
match = re.search(pattern, text)
print(match.group() if match else "No match")
pattern = r"cat" text = "The cat is here." match = re.search(pattern, text) print(match.group() if match else "No match")
pattern = r"cat"
text = "The cat is here."
match = re.search(pattern, text)
print(match.group() if match else "No match")
  • r"cat": The r prefix denotes a raw string, ensuring backslashes are treated literally.
  • Output: cat

2. Metacharacters

Metacharacters are symbols with special meanings in regex. Some of the most common ones are:

MetacharacterDescription
.Matches any character except \n.
^Matches the start of a string.
$Matches the end of a string.
*Matches 0 or more repetitions.
+Matches 1 or more repetitions.
?Matches 0 or 1 repetition.
``Acts as an OR operator.

Example:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
pattern = r"ca."
text = "cat, car, cab"
matches = re.findall(pattern, text)
print(matches) # ['cat', 'car', 'cab']
pattern = r"ca." text = "cat, car, cab" matches = re.findall(pattern, text) print(matches) # ['cat', 'car', 'cab']
pattern = r"ca."
text = "cat, car, cab"
matches = re.findall(pattern, text)
print(matches)  # ['cat', 'car', 'cab']

3. Character Classes

Character classes let you specify a set of characters to match. For example:

  • [abc]: Matches any one of a, b, or c.
  • [a-z]: Matches any lowercase letter.
  • [^a-z]: Matches anything except lowercase letters.

Example:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
pattern = r"[aeiou]"
text = "Python is awesome."
matches = re.findall(pattern, text)
print(matches) # ['o', 'i', 'a', 'e', 'o', 'e']
pattern = r"[aeiou]" text = "Python is awesome." matches = re.findall(pattern, text) print(matches) # ['o', 'i', 'a', 'e', 'o', 'e']
pattern = r"[aeiou]"
text = "Python is awesome."
matches = re.findall(pattern, text)
print(matches)  # ['o', 'i', 'a', 'e', 'o', 'e']

Intermediate Techniques

1. Quantifiers

Quantifiers define how many times a character or group can repeat:

QuantifierDescription
{n}Exactly n times.
{n,}At least n times.
{n,m}Between n and m times.

Example:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
pattern = r"a{2,3}"
text = "aaa a aa aaa"
matches = re.findall(pattern, text)
print(matches) # ['aaa', 'aa']
pattern = r"a{2,3}" text = "aaa a aa aaa" matches = re.findall(pattern, text) print(matches) # ['aaa', 'aa']
pattern = r"a{2,3}"
text = "aaa a aa aaa"
matches = re.findall(pattern, text)
print(matches)  # ['aaa', 'aa']

2. Grouping and Capturing

Parentheses () group parts of a regex and capture matched text:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
pattern = r"(\w+)@(\w+).com"
text = "Contact us at support@example.com."
match = re.search(pattern, text)
print(match.groups()) # ('support', 'example')
pattern = r"(\w+)@(\w+).com" text = "Contact us at support@example.com." match = re.search(pattern, text) print(match.groups()) # ('support', 'example')
pattern = r"(\w+)@(\w+).com"
text = "Contact us at support@example.com."
match = re.search(pattern, text)
print(match.groups())  # ('support', 'example')
  • \w+: Matches one or more word characters.
  • groups(): Returns all captured groups.

3. Escape Sequences

Escape special characters with \ if you want to match them literally:

  • \. matches a literal period (.).
  • \d matches any digit (equivalent to [0-9]).
  • \D matches non-digit characters.
  • \s matches any whitespace (spaces, tabs, etc.).

Example:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
pattern = r"\d{4}"
text = "Year: 2023."
match = re.search(pattern, text)
print(match.group()) # 2023
pattern = r"\d{4}" text = "Year: 2023." match = re.search(pattern, text) print(match.group()) # 2023
pattern = r"\d{4}"
text = "Year: 2023."
match = re.search(pattern, text)
print(match.group())  # 2023

Advanced Topics

1. Lookahead and Lookbehind

These are special assertions to match patterns without including them in the result.

  • Positive Lookahead (?=): Ensures a pattern is followed by another.
  • Negative Lookahead (?!): Ensures a pattern is not followed by another.
  • Positive Lookbehind (?<=): Ensures a pattern is preceded by another.
  • Negative Lookbehind (?<!): Ensures a pattern is not preceded by another.

Example:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
pattern = r"\d+(?=\sUSD)"
text = "Price: 50 USD, 70 EUR"
matches = re.findall(pattern, text)
print(matches) # ['50']
pattern = r"\d+(?=\sUSD)" text = "Price: 50 USD, 70 EUR" matches = re.findall(pattern, text) print(matches) # ['50']
pattern = r"\d+(?=\sUSD)"
text = "Price: 50 USD, 70 EUR"
matches = re.findall(pattern, text)
print(matches)  # ['50']

2. Flags

Flags modify regex behavior. Common flags include:

  • re.IGNORECASE or re.I: Makes matching case-insensitive.
  • re.MULTILINE or re.M: Allows ^ and $ to match at the start and end of each line.
  • re.DOTALL or re.S: Makes . match newline characters as well.

Example:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
pattern = r"^hello"
text = "Hello\nhello"
matches = re.findall(pattern, text, re.I | re.M)
print(matches) # ['Hello', 'hello']
pattern = r"^hello" text = "Hello\nhello" matches = re.findall(pattern, text, re.I | re.M) print(matches) # ['Hello', 'hello']
pattern = r"^hello"
text = "Hello\nhello"
matches = re.findall(pattern, text, re.I | re.M)
print(matches)  # ['Hello', 'hello']

3. Substitutions

The re.sub function replaces patterns with a specified string:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
pattern = r"\d+"
text = "Replace 123 with 456."
result = re.sub(pattern, "456", text)
print(result) # Replace 456 with 456.
pattern = r"\d+" text = "Replace 123 with 456." result = re.sub(pattern, "456", text) print(result) # Replace 456 with 456.
pattern = r"\d+"
text = "Replace 123 with 456."
result = re.sub(pattern, "456", text)
print(result)  # Replace 456 with 456.

Real-World Examples

1. Email Validation

Validate email addresses with regex:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
pattern = r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"
email = "test@example.com"
if re.match(pattern, email):
print("Valid email")
else:
print("Invalid email")
pattern = r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$" email = "test@example.com" if re.match(pattern, email): print("Valid email") else: print("Invalid email")
pattern = r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"
email = "test@example.com"
if re.match(pattern, email):
    print("Valid email")
else:
    print("Invalid email")

2. Extracting URLs

Find all URLs in text:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
pattern = r"https?://[\w.-]+"
text = "Visit https://example.com and http://test.org."
urls = re.findall(pattern, text)
print(urls) # ['https://example.com', 'http://test.org']
pattern = r"https?://[\w.-]+" text = "Visit https://example.com and http://test.org." urls = re.findall(pattern, text) print(urls) # ['https://example.com', 'http://test.org']
pattern = r"https?://[\w.-]+"
text = "Visit https://example.com and http://test.org."
urls = re.findall(pattern, text)
print(urls)  # ['https://example.com', 'http://test.org']

3. Data Cleaning

Remove non-alphanumeric characters:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
pattern = r"[^a-zA-Z0-9 ]"
text = "Hello, World! 123."
cleaned_text = re.sub(pattern, "", text)
print(cleaned_text) # Hello World 123
pattern = r"[^a-zA-Z0-9 ]" text = "Hello, World! 123." cleaned_text = re.sub(pattern, "", text) print(cleaned_text) # Hello World 123
pattern = r"[^a-zA-Z0-9 ]"
text = "Hello, World! 123."
cleaned_text = re.sub(pattern, "", text)
print(cleaned_text)  # Hello World 123

4. Log Parsing

Extract timestamps from logs:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
pattern = r"\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}"
log = "2025-01-20 10:30:45 - INFO: Task completed"
timestamps = re.findall(pattern, log)
print(timestamps) # ['2025-01-20 10:30:45']
pattern = r"\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}" log = "2025-01-20 10:30:45 - INFO: Task completed" timestamps = re.findall(pattern, log) print(timestamps) # ['2025-01-20 10:30:45']
pattern = r"\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}"
log = "2025-01-20 10:30:45 - INFO: Task completed"
timestamps = re.findall(pattern, log)
print(timestamps)  # ['2025-01-20 10:30:45']

Tips for Mastery

  1. Test Patterns Online: Tools like regex101.com allow you to experiment with regex interactively.
  2. Start Simple: Break complex patterns into smaller pieces and build up.
  3. Use Comments: Python lets you write verbose regex for clarity:
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
pattern = re.compile(r"""
^ # Start of string
(\w+) # Capture a word
\s+ # One or more spaces
pattern = re.compile(r""" ^ # Start of string (\w+) # Capture a word \s+ # One or more spaces
pattern = re.compile(r"""
    ^               # Start of string
    (\w+)          # Capture a word
    \s+            # One or more spaces

Leave a Comment

Share this