pandas: Concat multiple DataFrame/Series with concat()

Posted: | Tags: Python, pandas

The pandas.concat() function allows you to concatenate (join) multiple pandas.DataFrame and pandas.Series objects.

To add new rows and columns to pandas.DataFrame, refer to the following article:

To merge multiple pandas.DataFrame objects based on columns or indexes, use the pandas.merge() function or the merge() and join() methods of pandas.DataFrame.

The sample code in this article uses pandas version 2.0.3. The following pandas.DataFrame and pandas.Series objects are used as examples.

import pandas as pd

print(pd.__version__)
# 2.0.3

df1 = pd.DataFrame({'A': ['A1', 'A2', 'A3'],
                    'B': ['B1', 'B2', 'B3'],
                    'C': ['C1', 'C2', 'C3']},
                   index=['ONE', 'TWO', 'THREE'])
print(df1)
#         A   B   C
# ONE    A1  B1  C1
# TWO    A2  B2  C2
# THREE  A3  B3  C3

df2 = pd.DataFrame({'C': ['C2', 'C3', 'C4'],
                    'D': ['D2', 'D3', 'D4']},
                   index=['TWO', 'THREE', 'FOUR'])
print(df2)
#         C   D
# TWO    C2  D2
# THREE  C3  D3
# FOUR   C4  D4

s1 = pd.Series(['X1', 'X2', 'X3'], index=['ONE', 'TWO', 'THREE'], name='X')
print(s1)
# ONE      X1
# TWO      X2
# THREE    X3
# Name: X, dtype: object

s2 = pd.Series(['Y2', 'Y3', 'Y4'], index=['TWO', 'THREE', 'FOUR'], name='Y')
print(s2)
# TWO      Y2
# THREE    Y3
# FOUR     Y4
# Name: Y, dtype: object

Basic usage of pandas.concat()

Concatenating objects: objs

To concatenate DataFrame and Series objects, pass them as a list or tuple to the first argument, objs. Examples using Series are provided later.

df_concat = pd.concat([df1, df2])
print(df_concat)
#          A    B   C    D
# ONE     A1   B1  C1  NaN
# TWO     A2   B2  C2  NaN
# THREE   A3   B3  C3  NaN
# TWO    NaN  NaN  C2   D2
# THREE  NaN  NaN  C3   D3
# FOUR   NaN  NaN  C4   D4

You can concatenate any number of objects.

df_concat_multi = pd.concat([df1, df2, df1])
print(df_concat_multi)
#          A    B   C    D
# ONE     A1   B1  C1  NaN
# TWO     A2   B2  C2  NaN
# THREE   A3   B3  C3  NaN
# TWO    NaN  NaN  C2   D2
# THREE  NaN  NaN  C3   D3
# FOUR   NaN  NaN  C4   D4
# ONE     A1   B1  C1  NaN
# TWO     A2   B2  C2  NaN
# THREE   A3   B3  C3  NaN

A new object is generated, and the original object is not changed.

Concatenation direction (vertical or horizontal): axis

Use the axis argument to specify the direction of concatenation, either vertical or horizontal.

If axis=0 (default), the objects are combined vertically.

df_v = pd.concat([df1, df2], axis=0)
print(df_v)
#          A    B   C    D
# ONE     A1   B1  C1  NaN
# TWO     A2   B2  C2  NaN
# THREE   A3   B3  C3  NaN
# TWO    NaN  NaN  C2   D2
# THREE  NaN  NaN  C3   D3
# FOUR   NaN  NaN  C4   D4

If axis=1, the objects are combined horizontally.

df_h = pd.concat([df1, df2], axis=1)
print(df_h)
#          A    B    C    C    D
# ONE     A1   B1   C1  NaN  NaN
# TWO     A2   B2   C2   C2   D2
# THREE   A3   B3   C3   C3   D3
# FOUR   NaN  NaN  NaN   C4   D4

Concatenation method (outer or inner): join

The join argument determines whether all columns/rows or only those with common names should be retained after concatenation.

With join='outer' (default), all columns/rows are retained. Absent elements in the original object are represented as NaN.

With join='inner', only columns/rows with common names are retained.

df_v_out = pd.concat([df1, df2], join='outer')
print(df_v_out)
#          A    B   C    D
# ONE     A1   B1  C1  NaN
# TWO     A2   B2  C2  NaN
# THREE   A3   B3  C3  NaN
# TWO    NaN  NaN  C2   D2
# THREE  NaN  NaN  C3   D3
# FOUR   NaN  NaN  C4   D4

df_v_in = pd.concat([df1, df2], join='inner')
print(df_v_in)
#         C
# ONE    C1
# TWO    C2
# THREE  C3
# TWO    C2
# THREE  C3
# FOUR   C4

The same join rules apply for horizontal concatenation (axis=1).

df_h_out = pd.concat([df1, df2], axis=1, join='outer')
print(df_h_out)
#          A    B    C    C    D
# ONE     A1   B1   C1  NaN  NaN
# TWO     A2   B2   C2   C2   D2
# THREE   A3   B3   C3   C3   D3
# FOUR   NaN  NaN  NaN   C4   D4

df_h_in = pd.concat([df1, df2], axis=1, join='inner')
print(df_h_in)
#         A   B   C   C   D
# TWO    A2  B2  C2  C2  D2
# THREE  A3  B3  C3  C3  D3

You can rename the columns or index using the rename() or set_axis() method.

df_h = pd.concat([df1, df2.rename(index={'FOUR': 'ONE'})], axis=1)
print(df_h)
#         A   B   C   C   D
# ONE    A1  B1  C1  C4  D4
# TWO    A2  B2  C2  C2  D2
# THREE  A3  B3  C3  C3  D3

Concatenate multiple pandas.DataFrame objects

As shown in the previous examples, when concatenating multiple DataFrame objects, a DataFrame is returned.

df_concat = pd.concat([df1, df2])
print(df_concat)
#          A    B   C    D
# ONE     A1   B1  C1  NaN
# TWO     A2   B2  C2  NaN
# THREE   A3   B3  C3  NaN
# TWO    NaN  NaN  C2   D2
# THREE  NaN  NaN  C3   D3
# FOUR   NaN  NaN  C4   D4

print(type(df_concat))
# <class 'pandas.core.frame.DataFrame'>

Concatenate multiple pandas.Series objects

When concatenating multiple Series objects, a Series is returned for vertical concatenation (axis=0, default).

s_v = pd.concat([s1, s2])
print(s_v)
# ONE      X1
# TWO      X2
# THREE    X3
# TWO      Y2
# THREE    Y3
# FOUR     Y4
# dtype: object

print(type(s_v))
# <class 'pandas.core.series.Series'>

For horizontal concatenation (axis=1), a DataFrame is returned. You can use the join argument.

s_h = pd.concat([s1, s2], axis=1)
print(s_h)
#          X    Y
# ONE     X1  NaN
# TWO     X2   Y2
# THREE   X3   Y3
# FOUR   NaN   Y4

print(type(s_h))
# <class 'pandas.core.frame.DataFrame'>
s_h_in = pd.concat([s1, s2], axis=1, join='inner')
print(s_h_in)
#         X   Y
# TWO    X2  Y2
# THREE  X3  Y3

You can also create a DataFrame by concatenating multiple Series using the pandas.DataFrame() constructor. In this case, the Series can also be arranged as rows in the DataFrame. Refer to the following article for details.

Concatenate pandas.DataFrame and pandas.Series

When you concatenate a DataFrame and a Series horizontally (axis=1), the Series becomes a new column, and its name is used as the column name. You can use the join argument.

df_s_h = pd.concat([df1, s2], axis=1)
print(df_s_h)
#          A    B    C    Y
# ONE     A1   B1   C1  NaN
# TWO     A2   B2   C2   Y2
# THREE   A3   B3   C3   Y3
# FOUR   NaN  NaN  NaN   Y4

df_s_h_in = pd.concat([df1, s2], axis=1, join='inner')
print(df_s_h_in)
#         A   B   C   Y
# TWO    A2  B2  C2  Y2
# THREE  A3  B3  C3  Y3

For vertical concatenation (axis=0, default), the result is as follows:

df_s_v = pd.concat([df1, s1])
print(df_s_v)
#          A    B    C    0
# ONE     A1   B1   C1  NaN
# TWO     A2   B2   C2  NaN
# THREE   A3   B3   C3  NaN
# ONE    NaN  NaN  NaN   X1
# TWO    NaN  NaN  NaN   X2
# THREE  NaN  NaN  NaN   X3

To add a Series as a new row, you could either convert it into a DataFrame before concatenating, or use .loc.

print(s1.set_axis(df1.columns).to_frame().T)
#     A   B   C
# X  X1  X2  X3

df_s_v = pd.concat([df1, s1.set_axis(df1.columns).to_frame().T])
print(df_s_v)
#         A   B   C
# ONE    A1  B1  C1
# TWO    A2  B2  C2
# THREE  A3  B3  C3
# X      X1  X2  X3
print(s2.values)
# ['Y2' 'Y3' 'Y4']

df1.loc[s2.name] = s2.values
print(df1)
#         A   B   C
# ONE    A1  B1  C1
# TWO    A2  B2  C2
# THREE  A3  B3  C3
# Y      Y2  Y3  Y4

Related Categories

Related Articles