Split a column in Pandas dataframe and get part of it
Last Updated :
21 Jan, 2019
When a part of any column in Dataframe is important and the need is to take it separate, we can split a column on the basis of the requirement.
We can use Pandas
.str accessor, it does fast vectorized string operations for Series and Dataframes and returns a string object. Pandas str accessor has number of useful methods and one of them is
str.split, it can be used with split to get the desired part of the string. To get the
nth part of the string, first split the column by delimiter and apply
str[n-1] again on the object returned, i.e.
Dataframe.columnName.str.split(" ").str[n-1].
Let's make it clear by examples.
Code #1: Print a data object of the splitted column.
Python3 1==
import pandas as pd
import numpy as np
df = pd.DataFrame({'Geek_ID':['Geek1_id', 'Geek2_id', 'Geek3_id',
'Geek4_id', 'Geek5_id'],
'Geek_A': [1, 1, 3, 2, 4],
'Geek_B': [1, 2, 3, 4, 6],
'Geek_R': np.random.randn(5)})
# Geek_A Geek_B Geek_ID Geek_R
# 0 1 1 Geek1_id random number
# 1 1 2 Geek2_id random number
# 2 3 3 Geek3_id random number
# 3 2 4 Geek4_id random number
# 4 4 6 Geek5_id random number
print(df.Geek_ID.str.split('_').str[0])
Output:
0 Geek1
1 Geek2
2 Geek3
3 Geek4
4 Geek5
dtype: object
Code #2: Print a list of returned data object.
Python3 1==
import pandas as pd
import numpy as np
df = pd.DataFrame({'Geek_ID':['Geek1_id', 'Geek2_id', 'Geek3_id',
'Geek4_id', 'Geek5_id'],
'Geek_A': [1, 1, 3, 2, 4],
'Geek_B': [1, 2, 3, 4, 6],
'Geek_R': np.random.randn(5)})
# Geek_A Geek_B Geek_ID Geek_R
# 0 1 1 Geek1_id random number
# 1 1 2 Geek2_id random number
# 2 3 3 Geek3_id random number
# 3 2 4 Geek4_id random number
# 4 4 6 Geek5_id random number
print(df.Geek_ID.str.split('_').str[0].tolist())
Output:
['Geek1', 'Geek2', 'Geek3', 'Geek4', 'Geek5']
Code #3: Print a list of elements.
Python3 1==
import pandas as pd
import numpy as np
df = pd.DataFrame({'Geek_ID':['Geek1_id', 'Geek2_id', 'Geek3_id',
'Geek4_id', 'Geek5_id'],
'Geek_A': [1, 1, 3, 2, 4],
'Geek_B': [1, 2, 3, 4, 6],
'Geek_R': np.random.randn(5)})
# Geek_A Geek_B Geek_ID Geek_R
# 0 1 1 Geek1_id random number
# 1 1 2 Geek2_id random number
# 2 3 3 Geek3_id random number
# 3 2 4 Geek4_id random number
# 4 4 6 Geek5_id random number
print(df.Geek_ID.str.split('_').str[1].tolist())
Output:
['id', 'id', 'id', 'id', 'id']
Explore
Python Fundamentals
Python Data Structures
Advanced Python
Data Science with Python
Web Development with Python
Python Practice