String manipulations in Pandas DataFrame
Last Updated :
18 Mar, 2025
String manipulation is the process of changing, parsing, splicing, pasting or analyzing strings. As we know that sometimes data in the string is not suitable for manipulating the analysis or get a description of the data. But Python is known for its ability to manipulate strings. In this article we will understand how Pandas provides us the ways to manipulate to modify and process string data-frame using some builtin functions.
Create a String Dataframe using Pandas
First of all we will know ways to create a string dataframe using Pandas.
Python
import pandas as pd
import numpy as np
data = {'Names': ['Gulshan', 'Shashank', 'Bablu', 'Abhishek', 'Anand', np.nan, 'Pratap'],
'City': ['Delhi', 'Mumbai', 'Kolkata', 'Delhi', 'Chennai', 'Bangalore', 'Hyderabad']}
df = pd.DataFrame(data)
print(df)
Output:

Change Column Datatype in Pandas
To change the type of the created dataframe to string type. we can do this with the help of .astype() . Let's have a look at them in the below example
Python
print(df.astype('string'))
Output:

String Manipulations in Pandas
Now we see the string manipulations inside a Pandas Dataframe, so first create a Dataframe and manipulate all string operations on this single data frame below so that everyone can get to know about it easily.
Example:
Python
import pandas as pd
import numpy as np
data = {'Names': ['Gulshan', 'Shashank', 'Bablu', 'Abhishek', 'Anand', np.nan, 'Pratap'],
'City': ['Delhi', 'Mumbai', 'Kolkata', 'Delhi', 'Chennai', 'Bangalore', 'Hyderabad']}
df = pd.DataFrame(data)
print(df)
Output:

Let's have a look at various methods provided by this library for string manipulations.
- lower(): Converts all uppercase characters in strings in the DataFrame to lower case and returns the lowercase strings in the result.
Python
print(df['Names'].str.lower())
Output:

- upper(): Converts all lowercase characters in strings in the DataFrame to upper case and returns the uppercase strings in result.
Python
print(df['Names'].str.upper())
Output:

- strip(): If there are spaces at the beginning or end of a string, we should trim the strings to eliminate spaces using strip() or remove the extra spaces contained by a string in DataFrame.
Python
print(df['Names'].str.strip())
Output:

- split(' '): Splits each string with the given pattern. Strings are split and the new elements after the performed split operation, are stored in a list.
Python
df['Split_Names'] = df['Names'].str.split('a')
print(df[['Names', 'Split_Names']])
Output:

- len(): With the help of len() we can compute the length of each string in DataFrame & if there is empty data in DataFrame, it returns NaN.
Python
print(df['Names'].str.len())
Output:

- cat(sep=' '): It concatenates the data-frame index elements or each string in DataFrame with given separator.
Python
print(df)
print("\nafter using cat:")
print(df['Names'].str.cat(sep=', '))
Output:

- get_dummies(): It returns the DataFrame with One-Hot Encoded values like we can see that it returns boolean value 1 if it exists in relative index or 0 if not exists.
Python
print(df['City'].str.get_dummies())
Output:

- startswith(pattern): It returns true if the element or string in the DataFrame Index starts with the pattern.
Python
print(df['Names'].str.startswith('G'))
Output:

- endswith(pattern): It returns true if the element or string in the DataFrame Index ends with the pattern.
Python
print(df['Names'].str.endswith('h'))
Output:

- Python replace(a,b): It replaces the value a with the value b like below in example 'Gulshan' is being replaced by 'Gaurav'.
Python
print(df['Names'].str.replace('Gulshan', 'Gaurav'))
Output:

- Python repeat(value): It repeats each element with a given number of times like below in example, there are two appearances of each string in DataFrame.
Python
print(df['Names'].str.repeat(2))
Output:

- Python count(pattern): It returns the count of the appearance of pattern in each element in Data-Frame like below in example it counts 'n' in each string of DataFrame and returns the total counts of 'a' in each string.
Python
print(df['Names'].str.count('a'))
Output:

- Python find(pattern): It returns the first position of the first occurrence of the pattern. We can see in the example below that it returns the index value of appearance of character 'a' in each string throughout the DataFrame.
Python
print(df['Names'].str.find('a'))
Output:

- findall(pattern): It returns a list of all occurrences of the pattern. As we can see in below, there is a returned list consisting n as it appears only once in the string.
Python
print(df['Names'].str.findall('a'))
Output:

- islower(): It checks whether all characters in each string in the Index of the Data-Frame in lower case or not, and returns a Boolean value.
Python
print(df['Names'].str.islower())
Output:

- isupper(): It checks whether all characters in each string in the Index of the Data-Frame in upper case or not, and returns a Boolean value.
Python
print(df['Names'].str.isupper())
Output:

- isnumeric(): It checks whether all characters in each string in the Index of the Data-Frame are numeric or not, and returns a Boolean value.
Python
print(df['Names'].str.isnumeric())
Output:

- swapcase(): It swaps the case lower to upper and vice-versa. Like in the example below, it converts all uppercase characters in each string into lowercase and vice-versa (lowercase -> uppercase).
Python
print(df['Names'].str.swapcase())
Output:
Similar Reads
How to Manipulate Strings in Pandas? Pandas Library provides multiple methods that can be used to manipulate string according to the required output. But first, let's create a Pandas dataframe. Python3 import pandas as pd data = [[1, "ABC KUMAR", "xYZ"], [2, "BCD", "XXY"], [3, "CDE KUMAR", "ZXX"], [3, "DEF", "xYZZ"]] cfile = pd.DataFra
2 min read
DataFrame vs Series in Pandas Pandas is a widely-used Python library for data analysis that provides two essential data structures: Series and DataFrame. These structures are potent tools for handling and examining data, but they have different features and applications. In this article, we will explore the differences between S
7 min read
Pandas DataFrame.to_string-Python Pandas is a powerful Python library for data manipulation, with DataFrame as its key two-dimensional, labeled data structure. It allows easy formatting and readable display of data. DataFrame.to_string() function in Pandas is specifically designed to render a DataFrame into a console-friendly tabula
5 min read
Slicing Pandas Dataframe Slicing a Pandas DataFrame is a important skill for extracting specific data subsets. Whether you want to select rows, columns or individual cells, Pandas provides efficient methods like iloc[] and loc[]. In this guide weâll explore how to use integer-based and label-based indexing to slice DataFram
3 min read
Replace Characters in Strings in Pandas DataFrame In this article, we are going to see how to replace characters in strings in pandas dataframe using Python. We can replace characters using str.replace() method is basically replacing an existing string or character in a string with a new one. we can replace characters in strings is for the entire
3 min read