Count Characters or Substrings in a String in Python

Modified: | Tags: Python, String, Regex

This article explains how to count the number of specific characters or substrings within a string (str) in Python.

To get the length of the entire string (the total number of characters), use the built-in len() function.

For more information on reading a text file as a string or searching for a substring, see the following articles:

Count characters and substrings in a string: count()

The count() method allows you to count the number of specific characters or substrings in a string.

s = 'abc_aabbcc_abc'
print(s.count('abc'))
# 2

print(s.count('a'))
# 4

print(s.count('xyz'))
# 0
source: str_count.py

If the second (start) and third (end) arguments are specified, count() only searches within the substring [start:end].

print(s.count('a', 4, 10))
# 2

print(s[4:10])
# aabbcc

print(s[4:10].count('a'))
# 2
source: str_count.py

As with slicing, negative values can be used to specify positions from the end of the string. If the end argument is omitted, the range extends to the end of the string.

print(s.count('a', -9))
# 2

print(s[-9:])
# abbcc_abc

print(s[-9:].count('a'))
# 2
source: str_count.py

count() only counts non-overlapping occurrences of the specified substring. Each match is counted once, even if the substrings could overlap.

s = 'abc_abc_abc'
print(s.count('abc_abc'))
# 1
source: str_count.py

To count overlapping substrings, you need to use regular expressions, as described below.

Count the number of specific words in a string

For example, if you count "am" using the count() method, "Sam" will also be counted.

s = 'I am Sam'
print(s.count('am'))
# 2
source: str_count.py

To count the exact number of specific words, you can first split the string into a list of words using the split() method, then use count() on the resulting list.

l = s.split()
print(l)
# ['I', 'am', 'Sam']

print(l.count('am'))
# 1
source: str_count.py

For longer texts, the Counter class from the collections module is helpful for counting the frequency of each word. See the following article for more details:

Keep in mind that using split() is a basic approach. Since real-world sentences often include punctuation and other symbols, it is safer to use a natural language processing library such as NLTK.

Count with regex: re.findall()

You can use re.findall() to count substrings that match a regex pattern.

re.findall() returns a list of all substrings that match the given pattern. Use the built-in len() function to get the total number of matches.

import re

s = '123-456-789'
print(re.findall('[0-9]{3}', s))
# ['123', '456', '789']

print(len(re.findall('[0-9]{3}', s)))
# 3
source: str_count.py

In the example above, [0-9]{3} is a regex pattern matching any three-digit number.

You can also count overlapping substrings using a lookahead assertion (?=...) and grouping ().

s = 'abc_abc_abc'
print(re.findall('(?=(abc_abc))', s))
# ['abc_abc', 'abc_abc']

print(len(re.findall('(?=(abc_abc))', s)))
# 2

s = '12345'
print(re.findall('(?=([0-9]{3}))', s))
# ['123', '234', '345']

print(len(re.findall('(?=([0-9]{3}))', s)))
# 3
source: str_count.py

For more information on the re module, see the following article.

Case-insensitive counting

count() is case-sensitive.

s = 'abc_ABC'
print(s.count('abc'))
# 1
source: str_count.py

For case-insensitive counting, you can first convert the string to either uppercase or lowercase. Use upper() to make a string all uppercase, or lower() to make it all lowercase.

print(s.lower())
# abc_abc

print(s.lower().count('abc'))
# 2

print(s.upper())
# ABC_ABC

print(s.upper().count('ABC'))
# 2
source: str_count.py

With regex, you can set re.IGNORECASE as the flags argument in functions like re.findall() for case-insensitive counting.

print(re.findall('abc', s, flags=re.IGNORECASE))
# ['abc', 'ABC']

print(re.findall('ABC', s, flags=re.IGNORECASE))
# ['abc', 'ABC']
source: str_count.py

Related Categories

Related Articles