Split a String in Python (Delimiter, Line Breaks, Regex)

Modified: | Tags: Python, String, Regex

This article explains how to split strings in Python using delimiters, line breaks, regular expressions, or a number of characters.

For related topics such as concatenating and extracting strings, see the following articles:

Split a string by delimiter: split()

Use the split() method to split a string using a specified delimiter.

If no argument is provided, the string is split using whitespace (spaces, newlines \n, tabs \t, etc.), treating consecutive whitespace characters as a single delimiter.

The method returns a list of substrings.

s_blank = 'one two   three\nfour\tfive'
print(s_blank)
# one two   three
# four  five

print(s_blank.split())
# ['one', 'two', 'three', 'four', 'five']

print(type(s_blank.split()))
# <class 'list'>

To join the resulting list back into a string, use the join() method:

Specify the delimiter: sep

You can specify a custom delimiter using the first argument, sep.

s_comma = 'one,two,three,four,five'
print(s_comma.split(','))
# ['one', 'two', 'three', 'four', 'five']

print(s_comma.split('three'))
# ['one,two,', ',four,five']

To use multiple delimiters, refer to the section on regular expressions below.

When a delimiter appears consecutively, the resulting list will include empty strings (''). Additionally, if the delimiter is at the beginning or end of the string, empty strings will also appear at those positions.

s_hyphen = '-one--two-'
print(s_hyphen.split('-'))
# ['', 'one', '', 'two', '']

Since empty strings are considered falsy in Python, you can use a list comprehension to filter them out.

print([s for s in s_hyphen.split('-') if s])
# ['one', 'two']

When sep is omitted, the string is split by any whitespace and consecutive whitespace is treated as one. In this case, the resulting list will not include empty strings, even if the original string starts or ends with spaces.

Note that this behavior differs from explicitly specifying a whitespace character as the delimiter.

s_blank = ' one two  three '
print(s_blank.split())
# ['one', 'two', 'three']

print(s_blank.split(' '))
# ['', 'one', 'two', '', 'three', '']

Limit the number of splits: maxsplit

You can limit the number of splits by specifying the second argument, maxsplit.

If provided, at most maxsplit splits are performed. The result will be a list containing at most maxsplit + 1 elements.

s_comma = 'one,two,three,four,five'
print(s_comma.split(',', 2))
# ['one', 'two', 'three,four,five']

print(s_comma.split(',', 10))
# ['one', 'two', 'three', 'four', 'five']

Split a string from the right by delimiter: rsplit()

The rsplit() method splits a string from the right.

This behaves like split(), but when maxsplit is specified, the splitting begins from the right.

s_comma = 'one,two,three,four,five'
print(s_comma.rsplit(','))
# ['one', 'two', 'three', 'four', 'five']

print(s_comma.rsplit(',', 2))
# ['one,two,three', 'four', 'five']

print(s_comma.rsplit(',', 10))
# ['one', 'two', 'three', 'four', 'five']

Split a string by line breaks: splitlines()

The splitlines() method splits a string at line boundaries.

As shown in the previous examples, split() and rsplit() split the string by whitespace, including line breaks, by default. You can also specify line breaks explicitly using the sep argument. However, splitlines() is more appropriate for line-based processing.

For example, consider a string with both \n (LF, used on Unix-like systems including macOS) and \r\n (CR+LF, used on Windows):

s_lines_multi = '1 one\n2 two\r\n3 three\n'
print(s_lines_multi)
# 1 one
# 2 two
# 3 three
# 

By default, split() splits on all types of whitespace, including line breaks:

print(s_lines_multi.split())
# ['1', 'one', '2', 'two', '3', 'three']

Because sep only accepts a single string, split() may not handle mixed newline formats as expected, and it removes the newline character from the result:

print(s_lines_multi.split('\n'))
# ['1 one', '2 two\r', '3 three', '']

In contrast, splitlines() recognizes all common line boundaries but does not split on other whitespace characters:

print(s_lines_multi.split('\n'))
# ['1 one', '2 two\r', '3 three', '']

If you set the keepends argument to True, newline characters are preserved in the output:

print(s_lines_multi.splitlines(True))
# ['1 one\n', '2 two\r\n', '3 three\n']

For more on working with line breaks in Python, see:

Split a string into three parts: partition(), rpartition()

The partition() method splits a string into three parts: the part before the delimiter, the delimiter itself, and the part after the delimiter.

Unlike split(), it returns a tuple rather than a list. The delimiter is retained as the middle element.

s = 'abc@xyz'
print(s.partition('@'))
# ('abc', '@', 'xyz')

print(type(s.partition('@')))
# <class 'tuple'>

If the delimiter is not found, or if it appears at the start or end of the string, the corresponding elements of the tuple will be empty strings:

print(s.partition('123'))
# ('abc@xyz', '', '')

print(s.partition('abc'))
# ('', 'abc', '@xyz')

print(s.partition('xyz'))
# ('abc@', 'xyz', '')

If the delimiter appears more than once, partition() splits at the first (left) occurrence. Use rpartition() to split at the last (right) occurrence:

s = 'abc@xyz@123'
print(s.partition('@'))
# ('abc', '@', 'xyz@123')

print(s.rpartition('@'))
# ('abc@xyz', '@', '123')

If the string contains only one occurrence of the delimiter, both methods return the same result.

Split a string by regex: re.split()

The split() and rsplit() methods match the delimiter exactly.

To split a string using a regular expression pattern, use the split() function from the re module.

Pass the regex pattern as the first argument and the target string as the second. You can optionally provide the third argument, maxsplit.

For example, to split a string on consecutive digits:

import re

s_nums = 'one1two22three333four'
print(re.split(r'\d+', s_nums))
# ['one', 'two', 'three', 'four']

print(re.split(r'\d+', s_nums, 2))
# ['one', 'two', 'three333four']

See the following article for other functions from the re module.

Split by multiple delimiters

These two examples are good to know, even if you are not familiar with regex:

Use square brackets ([]) to match any single character. This is useful for splitting on multiple individual characters:

s_marks = 'one-two+three#four'
print(re.split('[-+#]', s_marks))
# ['one', 'two', 'three', 'four']

Use the pipe symbol (|) to match one of several string patterns. Each pattern may include special regex characters or simple strings. This is useful for splitting on multiple different strings:

s_strs = 'oneXXXtwoYYYthreeZZZfour'
print(re.split('XXX|YYY|ZZZ', s_strs))
# ['one', 'two', 'three', 'four']

Split a string by character count: slicing

To split a string based on character count, use slicing.

s = 'abcdefghij'
print(s[:5])
# abcde

print(s[5:])
# fghij

You can store the results in a tuple or assign them to individual variables:

s_tuple = s[:5], s[5:]
print(s_tuple)
# ('abcde', 'fghij')

print(type(s_tuple))
# <class 'tuple'>

s_first, s_last = s[:5], s[5:]
print(s_first)
# abcde

print(s_last)
# fghij

For example, to split a string into three parts:

s_first, s_second, s_last = s[:3], s[3:6], s[6:]
print(s_first)
# abc

print(s_second)
# def

print(s_last)
# ghij

Use the built-in len() function to get the number of characters in a string. This allows you to split the string into two halves:

half = len(s) // 2
print(half)
# 5

s_first, s_last = s[:half], s[half:]
print(s_first)
# abcde

print(s_last)
# fghij

To join segments back together, use the + operator:

print(s_first + s_last)
# abcdefghij

Related Categories

Related Articles