Count Characters or Substrings in a String in Python
This article explains how to count the number of specific characters or substrings within a string (str
) in Python.
To get the length of the entire string (the total number of characters), use the built-in len()
function.
For more information on reading a text file as a string or searching for a substring, see the following articles:
- Read, write, and create files in Python (with and open())
- Search for a string in Python (Check if a substring is included/Get a substring position)
Count characters and substrings in a string: count()
The count()
method allows you to count the number of specific characters or substrings in a string.
s = 'abc_aabbcc_abc'
print(s.count('abc'))
# 2
print(s.count('a'))
# 4
print(s.count('xyz'))
# 0
If the second (start
) and third (end
) arguments are specified, count()
only searches within the substring [start:end]
.
print(s.count('a', 4, 10))
# 2
print(s[4:10])
# aabbcc
print(s[4:10].count('a'))
# 2
As with slicing, negative values can be used to specify positions from the end of the string. If the end
argument is omitted, the range extends to the end of the string.
print(s.count('a', -9))
# 2
print(s[-9:])
# abbcc_abc
print(s[-9:].count('a'))
# 2
count()
only counts non-overlapping occurrences of the specified substring. Each match is counted once, even if the substrings could overlap.
s = 'abc_abc_abc'
print(s.count('abc_abc'))
# 1
To count overlapping substrings, you need to use regular expressions, as described below.
Count the number of specific words in a string
For example, if you count "am" using the count()
method, "Sam" will also be counted.
s = 'I am Sam'
print(s.count('am'))
# 2
To count the exact number of specific words, you can first split the string into a list of words using the split()
method, then use count()
on the resulting list.
l = s.split()
print(l)
# ['I', 'am', 'Sam']
print(l.count('am'))
# 1
For longer texts, the Counter
class from the collections
module is helpful for counting the frequency of each word. See the following article for more details:
Keep in mind that using split()
is a basic approach. Since real-world sentences often include punctuation and other symbols, it is safer to use a natural language processing library such as NLTK.
Count with regex: re.findall()
You can use re.findall()
to count substrings that match a regex pattern.
re.findall()
returns a list of all substrings that match the given pattern. Use the built-in len()
function to get the total number of matches.
import re
s = '123-456-789'
print(re.findall('[0-9]{3}', s))
# ['123', '456', '789']
print(len(re.findall('[0-9]{3}', s)))
# 3
In the example above, [0-9]{3}
is a regex pattern matching any three-digit number.
You can also count overlapping substrings using a lookahead assertion (?=...)
and grouping ()
.
s = 'abc_abc_abc'
print(re.findall('(?=(abc_abc))', s))
# ['abc_abc', 'abc_abc']
print(len(re.findall('(?=(abc_abc))', s)))
# 2
s = '12345'
print(re.findall('(?=([0-9]{3}))', s))
# ['123', '234', '345']
print(len(re.findall('(?=([0-9]{3}))', s)))
# 3
For more information on the re
module, see the following article.
Case-insensitive counting
count()
is case-sensitive.
s = 'abc_ABC'
print(s.count('abc'))
# 1
For case-insensitive counting, you can first convert the string to either uppercase or lowercase. Use upper()
to make a string all uppercase, or lower()
to make it all lowercase.
print(s.lower())
# abc_abc
print(s.lower().count('abc'))
# 2
print(s.upper())
# ABC_ABC
print(s.upper().count('ABC'))
# 2
With regex, you can set re.IGNORECASE
as the flags
argument in functions like re.findall()
for case-insensitive counting.
print(re.findall('abc', s, flags=re.IGNORECASE))
# ['abc', 'ABC']
print(re.findall('ABC', s, flags=re.IGNORECASE))
# ['abc', 'ABC']