re.sub() / re.split() / re.compile()
Functions for replacing and splitting strings using regular expressions, and for compiling patterns. Commonly used in text processing and parser tasks.
Syntax
import re # Replaces parts of the string that match the pattern. new_string = re.sub(pattern, replacement, string) new_string, count = re.subn(pattern, replacement, string) # Splits the string at pattern matches. list = re.split(pattern, string) # Compiles a pattern for reuse. pattern = re.compile(pattern) pattern.search(string) pattern.findall(string)
Function List
| Function | Description |
|---|---|
| re.sub(pattern, repl, string, count=0) | Replaces all occurrences of the pattern in the string with the replacement. Use count to limit the maximum number of replacements. Returns the new string. |
| re.subn(pattern, repl, string) | Same as re.sub(), but returns a tuple of (new_string, number_of_replacements). |
| re.split(pattern, string, maxsplit=0) | Splits the string at each match of the pattern and returns a list. If the pattern contains a capturing group, the matched separators are also included in the result. |
| re.compile(pattern, flags=0) | Compiles a regular expression pattern into a reusable object. Efficient when the same pattern is used repeatedly. |
Sample Code
import re
# Replace matches using sub().
text = 'Hello World hello python'
result = re.sub(r'hello', 'Hi', text, flags=re.IGNORECASE)
print(result) # Outputs: 'Hi World Hi python'
# Collapse consecutive whitespace into a single space.
messy = 'This is a test.'
clean = re.sub(r'\s+', ' ', messy)
print(clean) # Outputs: 'This is a test.'
# Use capturing groups in the replacement string (referenced with \1).
date = '2025/04/15'
iso = re.sub(r'(\d{4})/(\d{2})/(\d{2})', r'\1-\2-\3', date)
print(iso) # Outputs: '2025-04-15'
# Use subn() to also get the number of replacements.
new_text, count = re.subn(r'\d+', 'N', 'item1 and item2 and item3')
print(new_text) # Outputs: 'itemN and itemN and itemN'
print(f'{count} replacement(s) made.') # Outputs: '3 replacement(s) made.'
# Use split() to split a string by a regular expression pattern.
csv_like = 'Alice,Bob、Charlie Dave' # Mixed delimiters
names = re.split(r'[,、\s]+', csv_like)
print(names) # Outputs: ['Alice', 'Bob', 'Charlie', 'Dave']
# Use compile() to reuse a pattern.
email_pattern = re.compile(r'[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}')
emails = [
'user@example.com',
'not-an-email',
'info@test.co.jp',
]
for e in emails:
if email_pattern.fullmatch(e):
print(f'{e} is valid.')
else:
print(f'{e} is invalid.')
Notes
The replacement string in re.sub() can reference capturing groups using \1, \2, and so on. You can also pass a function instead of a replacement string to generate the replacement dynamically for each match.
When re.split() is used with a pattern that contains a capturing group, the matched separator is included in the returned list. To split without including the separator, use a non-capturing group ((?:...)).
re.compile() improves performance when the same pattern is used repeatedly. Note that Python internally caches compiled patterns to some extent, so the benefit is minimal for a small number of uses. It is most effective when the pattern is used heavily inside a loop.
For pattern matching, see re.match() / re.search() / re.fullmatch().
If you find any errors or copyright issues, please contact us.