What is the difference between join and merge in Pandas?
Last Updated :
09 Apr, 2025
In Pandas, join() combines DataFrames based on their indices and defaults to a left join, while merge() joins on specified columns and defaults to an inner join. Choosing the right method depends on how your data is aligned. To illustrate the difference between join() and merge() visually, Let's understand with help of examples.
Using join() for index- based merging
The join() method is useful when merging DataFrames based on their index values. By default, it performs a left join, meaning all rows from the left DataFrame are retained, and matching rows from the right DataFrame are added based on the index.
Example: In this example, two DataFrames are joined on their indexes. An outer join retains all index values, filling missing ones with NaN.
Python
import pandas as pd
df1 = pd.DataFrame({'A': [1, 2, 3]}, index=['a', 'b', 'c'])
df2 = pd.DataFrame({'B': [4, 5, 6]}, index=['a', 'b', 'd'])
print(df1, "\n")
print(df2, "\n")
# joins on index
res = df1.join(df2, how='outer')
print(res)
Output
Using join()
Explanation: join() method merges df1 and df2 on their indexes. With how='outer', it retains all unique index values, combining common ones and filling missing values with NaN.
For different types of joins, refer to Different Types of Joins in Pandas to explore alternatives like merge() for column-based joins.
Using merge() for coloumn-based merging
The merge() function is more versatile than join() and is commonly used for column-based merging. By default, merge() performs an inner join, which means it returns only the rows with matching values in both DataFrames based on the specified key columns. Unlike join(), which merges based on index values, merge() allows you to explicitly specify which columns to use as keys for merging.
Example 1 : In this example, we are merging two DataFrames based on the common column 'ID'. This allows us to combine matching rows from both DataFrames where the 'ID' values are the same.
Python
df1 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Alice', 'Bob', 'Charlie']})
df2 = pd.DataFrame({'ID': [1, 3, 4], 'Score': [90, 85, 80]})
print(df1, "\n")
print(df2, "\n")
# Performing merge on 'ID'
res = pd.merge(df1, df2, on='ID')
print(res)
Output
Using merge()Explanation: merge() function combines df1 and df2 using the common column 'ID'. It performs an inner join, so only rows with matching 'ID' values in both DataFrames are included in the result.
Refer to Joining Two Pandas DataFrames Using merge() to understand the various types of joins available in Pandas.
Example 2: In this example, we are merging two DataFrames using their indexes. Since both DataFrames have a column named 'P', suffixes are used to differentiate the overlapping column names in the merged result.
Python
import pandas as pd
df1 = pd.DataFrame([['a', 'b'], ['d', 'f']], index=['X', 'Y'], columns=['P', 'Q'])
df2 = pd.DataFrame([[1, 2], [4, 5]], index=['X', 'Y'], columns=['P', 'R'])
print(df1, "\n")
print(df2, "\n")
res = df1.merge(df2, left_index=True, right_index=True, suffixes=['_', ''])
print(res)
Output
Using merge()Explanation: merge() function merges df1 and df2 using their shared index labels ('X' and 'Y') by setting left_index=True and right_index=True. Since both have a 'P' column, suffixes=['_', ''] differentiates them as 'P_' (from df1) and 'P' (from df2).
Note: For simple index-based joins, use join(). For more complex merges involving specific columns or multiple keys, use merge().
Difference between join() and merge()
In real-world datasets, combining or enriching data is common. Choosing the right method, join() or merge() ensures accurate relationships, efficient processing and clear code. Understanding their differences helps avoid issues like missing rows, duplicates or incorrect joins.
Feature | join() | merge() |
---|
Merge Basis | Index-based | Column-based (default) |
---|
Default Join Type | Left join | Inner join |
---|
Specify Merge Columns? | No (works on index) | Yes (can specify columns) |
---|
Can Merge on Both Index & Columns? | No | Yes |
---|
Handles Overlapping Columns? | Requires suffix | Allows explicit column selection |
---|
Performance | Generally faster for index-based joins | More flexible but can be slower |
---|
Similar Reads
Python Pandas - Difference between INNER JOIN and LEFT SEMI JOIN In this article, we see the difference between INNER JOIN and LEFT SEMI JOIN. Inner Join An inner join requires two data set columns to be the same to fetch the common row data values or data from the data table. In simple words, and returns a data frame or values with only those rows in the data fr
3 min read
Difference between JOIN and UNION in SQL Pre-requisites: JOIN, UNION JOIN in SQL is used to combine data from many tables based on a matched condition between them. The data combined using the JOIN statement results in new columns. Consider the two tables: Boys Girls Example: sql> SELECT Boys.Name, Boys.Age, Girls.Address, FROM Boys INN
2 min read
Difference between JOIN and UNION in SQL Pre-requisites: JOIN, UNION JOIN in SQL is used to combine data from many tables based on a matched condition between them. The data combined using the JOIN statement results in new columns. Consider the two tables: Boys Girls Example: sql> SELECT Boys.Name, Boys.Age, Girls.Address, FROM Boys INN
2 min read
Difference between JOIN and UNION in SQL Pre-requisites: JOIN, UNION JOIN in SQL is used to combine data from many tables based on a matched condition between them. The data combined using the JOIN statement results in new columns. Consider the two tables: Boys Girls Example: sql> SELECT Boys.Name, Boys.Age, Girls.Address, FROM Boys INN
2 min read
Pandas - Find the Difference between two Dataframes In this article, we will discuss how to compare two DataFrames in pandas. First, let's create two DataFrames. Creating two dataframes Python3 import pandas as pd # first dataframe df1 = pd.DataFrame({ 'Age': ['20', '14', '56', '28', '10'], 'Weight': [59, 29, 73, 56, 48]}) display(df1) # second dataf
2 min read
Different Types of Joins in Pandas The Pandas module contains various features to perform various operations on Dataframes like join, concatenate, delete, add, etc. In this article, we are going to discuss the various types of join operations that can be performed on Pandas Dataframe. There are five types of Joins in Pandas. Inner Jo
4 min read