Remove a Substring from a String in Python

Modified: | Tags: Python, String, Regex

This article explains how to remove a substring (i.e., a part of a string) from a string in Python.

See the following article to learn how to remove file extensions and directory parts from a path string.

If you want to remove part of the contents of a text file, you can read the file into a string, process it, and then save it again.

Remove a substring by replacing it with an empty string

You can remove a substring by replacing it with an empty string ('').

The examples below demonstrate the basic usage of replace() and re.sub(). For a more in-depth guide on string replacement, refer to the following article.

Remove exact match string: replace()

You can replace a substring that exactly matches the given string using the replace() method of the str class. If the match is replaced with an empty string (''), the substring is effectively removed.

s = 'abc-xyz-123-789-ABC-XYZ'

print(s.replace('xyz', ''))
# abc--123-789-ABC-XYZ

Remove substrings using regex: re.sub()

To remove substrings using regular expressions, use the sub() function from the re module.

The following example uses the regular expression pattern \d+, which matches a sequence of one or more numbers. 123 and 789 are replaced by the empty string ('') and removed.

import re

s = 'abc-xyz-123-789-ABC-XYZ'

print(re.sub('\d+', '', s))
# abc-xyz---ABC-XYZ

Remove leading and/or trailing characters

To remove leading and/or trailing characters from a string, you can use the strip(), lstrip(), rstrip(), removeprefix(), and removesuffix() methods.

Remove leading and trailing characters: strip()

Use strip() to remove specified leading and trailing characters from a string.

By default, consecutive whitespace characters at both ends are removed. Newlines \n, full-width spaces \u3000, tabs \t, etc., are considered whitespace characters.

s = ' \n a b c \t'

print(s)
#  
#  a b c    

print(repr(s))
# ' \n a b c\u3000\t'

print(s.strip())
# a b c

print(repr(s.strip()))
# 'a b c'

Here, the built-in repr() function is used to print whitespace characters.

strip() returns a new string, leaving the original string unchanged. You can assign the result back to the original variable if desired. This behavior also applies to other string methods such as replace(), lstrip(), and rstrip().

s_strip = s.strip()
print(repr(s_strip))
# 'a b c'

print(repr(s))
# ' \n a b c\u3000\t'

s = s.strip()
print(repr(s))
# 'a b c'

When a string is passed to strip(), characters are removed from both ends.

Each character in the specified string is removed individually, not as a single unit. For example, the result would be the same for either 'abc' or 'cba'. If you want to remove the matched strings at both ends, use removeprefix() and removesuffix() as described below.

s = 'aabbcc-abc-aabbcc'

print(s.strip('abc'))
# -abc-

print(s.strip('cba'))
# -abc-

print(s.strip('ab'))
# cc-abc-aabbcc

If a string is specified, whitespace characters are not removed.

s = ' \n aabbcc-abc-aabbcc \t'

print(repr(s))
# ' \n aabbcc-abc-aabbcc\u3000\t'

print(repr(s.strip('abc')))
# ' \n aabbcc-abc-aabbcc\u3000\t'

If you want to remove whitespace characters in addition to the specified string, you need to either specify the whitespace characters explicitly or apply strip() multiple times.

print(repr(s.strip('abc \n \t')))
# '-abc-'

print(repr(s.strip().strip('abc')))
# '-abc-'

Remove leading characters: lstrip()

lstrip() removes only leading characters (left side) of a string. l is for left.

Its usage is the same as strip(), but it only affects the left side of the string.

s = ' \n a b c  \t'

print(repr(s.lstrip()))
# 'a b c \u3000\t'

s = 'aabbcc-abc-aabbcc'

print(s.lstrip('abc'))
# -abc-aabbcc

Remove trailing characters: rstrip()

rstrip() removes only trailing characters (right side) of a string. r is for right.

Its usage is the same as strip(), but it only affects the right side of the string.

s = ' \n a b c  \t'

print(repr(s.rstrip()))
# ' \n a b c'

s = 'aabbcc-abc-aabbcc'

print(s.rstrip('abc'))
# aabbcc-abc-

Remove prefix: removeprefix() (Python 3.9 or later)

removeprefix() removes the specified prefix from a string. This method was added in Python 3.9.

If the string starts with the specified prefix, the method returns a new string without it. If there is no match, the original string is returned unchanged.

s = 'abc-abcxyz'

print(s.removeprefix('abc-'))
# abcxyz

print(s.removeprefix('aabc-'))
# abc-abcxyz

Note that lstrip() removes any character from the specified string, repeatedly, from the left side of the input string.

print(s.lstrip('abc-'))
# xyz

Remove suffix: removesuffix() (Python 3.9 or later)

removesuffix() removes the specified suffix from a string. This method was added in Python 3.9.

The concept is the same as removeprefix().

s = 'abcxyz-xyz'

print(s.removesuffix('-xyz'))
# abcxyz

print(s.removesuffix('-xyzz'))
# abcxyz-xyz

To remove both the prefix and suffix, simply chain the removeprefix() and removesuffix() methods.

s = 'abc-abcxyz-xyz'

print(s.removeprefix('abc-').removesuffix('-xyz'))
# abcxyz

Remove a substring by position and length: slicing

You can use slicing to extract a portion of a string at a specific position.

s = '0123456789'

print(s[3:7])
# 3456

print(s[3:-3])
# 3456

print(s[:5])
# 01234

print(s[5:])
# 56789

If you want to remove characters from both ends of a string, you can use slicing to specify the part to be kept. For example, to delete the 6th character and everything following, you can slice the string to get only up to the 5th character.

To remove a substring from the middle of a string, slice and concatenate the parts you want to keep.

print(s[:3] + s[6:])
# 0126789

For example, you may define the following functions.

A function that removes a substring from start to end (inclusive):

def remove_str_start_end(s, start, end):
    return s[:start] + s[end + 1:]

print(remove_str_start_end(s, 3, 5))
# 0126789

A function that removes a substring of length characters from start:

def remove_str_start_length(s, start, length):
    return s[:start] + s[start + length:]

print(remove_str_start_length(s, 3, 5))
# 01289

Remove substrings from a list of strings

To remove substrings from a list of strings, you can use list comprehension to apply string methods such as strip() and slicing to each element.

l = ['Alice', 'Bob', 'Charlie']

print([s.strip('bce') for s in l])
# ['Ali', 'Bo', 'Charli']

print([s[:2] for s in l])
# ['Al', 'Bo', 'Ch']

Remove substrings from multiline strings

Use the following multiline string as an example.

s = 'Alice\nBob\nCharlie'
print(s)
# Alice
# Bob
# Charlie

For more information on line breaks in Python, see the following article.

Remove a part of each line

When removing a part of each line in a multiline string, you can use methods that operate on the entire string, such as replace(), without any special considerations.

print(s.replace('li', ''))
# Ace
# Bob
# Chare

On the other hand, methods like strip() act on the leading and trailing characters of the entire string, as demonstrated below.

print(s.strip('bce'))
# Alice
# Bob
# Charli

Slicing also operates on the entire string.

print(s[2:-2])
# ice
# Bob
# Charl

To process each line individually, you should first split the lines using the splitlines() method.

l_s = s.splitlines()
print(l_s)
# ['Alice', 'Bob', 'Charlie']

Use list comprehension for this list.

l_s_strip = [line.strip('bce') for line in l_s]
print(l_s_strip)
# ['Ali', 'Bo', 'Charli']

Then, join the modified lines back into a single string.

s_line_strip = '\n'.join(l_s_strip)
print(s_line_strip)
# Ali
# Bo
# Charli

You can also combine these steps into a single operation. In the following example, a slice is applied to each line.

print('\n'.join([line[:2] for line in s.splitlines()]))
# Al
# Bo
# Ch

Remove lines based on a condition

To remove lines that either meet or don't meet a specific condition, you can add a condition to the list comprehension.

l_remove = [line for line in s.splitlines() if not line.startswith('B')]
print(l_remove)
# ['Alice', 'Charlie']

Once you have processed the lines, you can concatenate them back into a single string using the join() method.

s_line_remove = '\n'.join(l_remove)
print(s_line_remove)
# Alice
# Charlie

You can also combine these steps into a single expression.

print('\n'.join([line for line in s.splitlines() if 'li' in line]))
# Alice
# Charlie

See the following article for more information on string conditions.

Related Categories

Related Articles