re.sub() / re.split() / re.compile()

Functions for replacing and splitting strings using regular expressions, and for compiling patterns. Commonly used in text processing and parser tasks.

Syntax

import re

# Replaces parts of the string that match the pattern.
new_string = re.sub(pattern, replacement, string)
new_string, count = re.subn(pattern, replacement, string)

# Splits the string at pattern matches.
list = re.split(pattern, string)

# Compiles a pattern for reuse.
pattern = re.compile(pattern)
pattern.search(string)
pattern.findall(string)

Function List

Function	Description
re.sub(pattern, repl, string, count=0)	Replaces all occurrences of the pattern in the string with the replacement. Use `count` to limit the maximum number of replacements. Returns the new string.
re.subn(pattern, repl, string)	Same as `re.sub()`, but returns a tuple of `(new_string, number_of_replacements)`.
re.split(pattern, string, maxsplit=0)	Splits the string at each match of the pattern and returns a list. If the pattern contains a capturing group, the matched separators are also included in the result.
re.compile(pattern, flags=0)	Compiles a regular expression pattern into a reusable object. Efficient when the same pattern is used repeatedly.

Sample Code

import re

# Replace matches using sub().
text = 'Hello World hello python'
result = re.sub(r'hello', 'Hi', text, flags=re.IGNORECASE)
print(result)  # Outputs: 'Hi World Hi python'

# Collapse consecutive whitespace into a single space.
messy = 'This    is    a    test.'
clean = re.sub(r'\s+', ' ', messy)
print(clean)  # Outputs: 'This is a test.'

# Use capturing groups in the replacement string (referenced with \1).
date = '2025/04/15'
iso = re.sub(r'(\d{4})/(\d{2})/(\d{2})', r'\1-\2-\3', date)
print(iso)  # Outputs: '2025-04-15'

# Use subn() to also get the number of replacements.
new_text, count = re.subn(r'\d+', 'N', 'item1 and item2 and item3')
print(new_text)  # Outputs: 'itemN and itemN and itemN'
print(f'{count} replacement(s) made.')  # Outputs: '3 replacement(s) made.'

# Use split() to split a string by a regular expression pattern.
csv_like = 'Alice,Bob、Charlie　Dave'  # Mixed delimiters
names = re.split(r'[,、\s]+', csv_like)
print(names)  # Outputs: ['Alice', 'Bob', 'Charlie', 'Dave']

# Use compile() to reuse a pattern.
email_pattern = re.compile(r'[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}')
emails = [
    'user@example.com',
    'not-an-email',
    'info@test.co.jp',
]
for e in emails:
    if email_pattern.fullmatch(e):
        print(f'{e} is valid.')
    else:
        print(f'{e} is invalid.')

Notes

The replacement string in re.sub() can reference capturing groups using \1, \2, and so on. You can also pass a function instead of a replacement string to generate the replacement dynamically for each match.

When re.split() is used with a pattern that contains a capturing group, the matched separator is included in the returned list. To split without including the separator, use a non-capturing group ((?:...)).

re.compile() improves performance when the same pattern is used repeatedly. Note that Python internally caches compiled patterns to some extent, so the benefit is minimal for a small number of uses. It is most effective when the pattern is used heavily inside a loop.

For pattern matching, see re.match() / re.search() / re.fullmatch().

If you find any errors or copyright issues, please contact us.

Home

Python Dictionary

re.sub() / re.split() / re.compile()

Syntax

Function List

Sample Code

Notes