Split a String in Python (Delimiter, Line Breaks, Regex)
This article explains how to split strings in Python using delimiters, line breaks, regular expressions, or a number of characters.
For related topics such as concatenating and extracting strings, see the following articles:
- Concatenate strings in Python (+ operator, join, etc.)
- Extract a substring from a string in Python (position, regex)
Split a string by delimiter: split()
Use the split()
method to split a string using a specified delimiter.
If no argument is provided, the string is split using whitespace (spaces, newlines \n
, tabs \t
, etc.), treating consecutive whitespace characters as a single delimiter.
The method returns a list of substrings.
s_blank = 'one two three\nfour\tfive'
print(s_blank)
# one two three
# four five
print(s_blank.split())
# ['one', 'two', 'three', 'four', 'five']
print(type(s_blank.split()))
# <class 'list'>
To join the resulting list back into a string, use the join()
method:
Specify the delimiter: sep
You can specify a custom delimiter using the first argument, sep
.
s_comma = 'one,two,three,four,five'
print(s_comma.split(','))
# ['one', 'two', 'three', 'four', 'five']
print(s_comma.split('three'))
# ['one,two,', ',four,five']
To use multiple delimiters, refer to the section on regular expressions below.
When a delimiter appears consecutively, the resulting list will include empty strings (''
). Additionally, if the delimiter is at the beginning or end of the string, empty strings will also appear at those positions.
s_hyphen = '-one--two-'
print(s_hyphen.split('-'))
# ['', 'one', '', 'two', '']
Since empty strings are considered falsy in Python, you can use a list comprehension to filter them out.
print([s for s in s_hyphen.split('-') if s])
# ['one', 'two']
When sep
is omitted, the string is split by any whitespace and consecutive whitespace is treated as one. In this case, the resulting list will not include empty strings, even if the original string starts or ends with spaces.
Note that this behavior differs from explicitly specifying a whitespace character as the delimiter.
s_blank = ' one two three '
print(s_blank.split())
# ['one', 'two', 'three']
print(s_blank.split(' '))
# ['', 'one', 'two', '', 'three', '']
Limit the number of splits: maxsplit
You can limit the number of splits by specifying the second argument, maxsplit
.
If provided, at most maxsplit
splits are performed. The result will be a list containing at most maxsplit + 1
elements.
s_comma = 'one,two,three,four,five'
print(s_comma.split(',', 2))
# ['one', 'two', 'three,four,five']
print(s_comma.split(',', 10))
# ['one', 'two', 'three', 'four', 'five']
Split a string from the right by delimiter: rsplit()
The rsplit()
method splits a string from the right.
This behaves like split()
, but when maxsplit
is specified, the splitting begins from the right.
s_comma = 'one,two,three,four,five'
print(s_comma.rsplit(','))
# ['one', 'two', 'three', 'four', 'five']
print(s_comma.rsplit(',', 2))
# ['one,two,three', 'four', 'five']
print(s_comma.rsplit(',', 10))
# ['one', 'two', 'three', 'four', 'five']
Split a string by line breaks: splitlines()
The splitlines()
method splits a string at line boundaries.
As shown in the previous examples, split()
and rsplit()
split the string by whitespace, including line breaks, by default. You can also specify line breaks explicitly using the sep
argument. However, splitlines()
is more appropriate for line-based processing.
For example, consider a string with both \n
(LF, used on Unix-like systems including macOS) and \r\n
(CR+LF, used on Windows):
s_lines_multi = '1 one\n2 two\r\n3 three\n'
print(s_lines_multi)
# 1 one
# 2 two
# 3 three
#
By default, split()
splits on all types of whitespace, including line breaks:
print(s_lines_multi.split())
# ['1', 'one', '2', 'two', '3', 'three']
Because sep
only accepts a single string, split()
may not handle mixed newline formats as expected, and it removes the newline character from the result:
print(s_lines_multi.split('\n'))
# ['1 one', '2 two\r', '3 three', '']
In contrast, splitlines()
recognizes all common line boundaries but does not split on other whitespace characters:
print(s_lines_multi.split('\n'))
# ['1 one', '2 two\r', '3 three', '']
If you set the keepends
argument to True, newline characters are preserved in the output:
print(s_lines_multi.splitlines(True))
# ['1 one\n', '2 two\r\n', '3 three\n']
For more on working with line breaks in Python, see:
Split a string into three parts: partition()
, rpartition()
The partition()
method splits a string into three parts: the part before the delimiter, the delimiter itself, and the part after the delimiter.
Unlike split()
, it returns a tuple rather than a list. The delimiter is retained as the middle element.
s = 'abc@xyz'
print(s.partition('@'))
# ('abc', '@', 'xyz')
print(type(s.partition('@')))
# <class 'tuple'>
If the delimiter is not found, or if it appears at the start or end of the string, the corresponding elements of the tuple will be empty strings:
print(s.partition('123'))
# ('abc@xyz', '', '')
print(s.partition('abc'))
# ('', 'abc', '@xyz')
print(s.partition('xyz'))
# ('abc@', 'xyz', '')
If the delimiter appears more than once, partition()
splits at the first (left) occurrence. Use rpartition()
to split at the last (right) occurrence:
s = 'abc@xyz@123'
print(s.partition('@'))
# ('abc', '@', 'xyz@123')
print(s.rpartition('@'))
# ('abc@xyz', '@', '123')
If the string contains only one occurrence of the delimiter, both methods return the same result.
Split a string by regex: re.split()
The split()
and rsplit()
methods match the delimiter exactly.
To split a string using a regular expression pattern, use the split()
function from the re
module.
Pass the regex pattern as the first argument and the target string as the second. You can optionally provide the third argument, maxsplit
.
For example, to split a string on consecutive digits:
import re
s_nums = 'one1two22three333four'
print(re.split(r'\d+', s_nums))
# ['one', 'two', 'three', 'four']
print(re.split(r'\d+', s_nums, 2))
# ['one', 'two', 'three333four']
See the following article for other functions from the re
module.
Split by multiple delimiters
These two examples are good to know, even if you are not familiar with regex:
Use square brackets ([]
) to match any single character. This is useful for splitting on multiple individual characters:
s_marks = 'one-two+three#four'
print(re.split('[-+#]', s_marks))
# ['one', 'two', 'three', 'four']
Use the pipe symbol (|
) to match one of several string patterns. Each pattern may include special regex characters or simple strings. This is useful for splitting on multiple different strings:
s_strs = 'oneXXXtwoYYYthreeZZZfour'
print(re.split('XXX|YYY|ZZZ', s_strs))
# ['one', 'two', 'three', 'four']
Split a string by character count: slicing
To split a string based on character count, use slicing.
s = 'abcdefghij'
print(s[:5])
# abcde
print(s[5:])
# fghij
You can store the results in a tuple or assign them to individual variables:
s_tuple = s[:5], s[5:]
print(s_tuple)
# ('abcde', 'fghij')
print(type(s_tuple))
# <class 'tuple'>
s_first, s_last = s[:5], s[5:]
print(s_first)
# abcde
print(s_last)
# fghij
For example, to split a string into three parts:
s_first, s_second, s_last = s[:3], s[3:6], s[6:]
print(s_first)
# abc
print(s_second)
# def
print(s_last)
# ghij
Use the built-in len()
function to get the number of characters in a string. This allows you to split the string into two halves:
half = len(s) // 2
print(half)
# 5
s_first, s_last = s[:half], s[half:]
print(s_first)
# abcde
print(s_last)
# fghij
To join segments back together, use the +
operator:
print(s_first + s_last)
# abcdefghij