PostgreSQL - Deleting Duplicate Rows using Subquery
Last Updated :
26 Aug, 2024
In PostgreSQL, handling duplicate rows is a common task, especially when working with large datasets. Fortunately, PostgreSQL provides several techniques to efficiently delete duplicate rows, and one of the most effective approaches is using subqueries.
In this article, we will demonstrate how to identify and remove duplicate rows while keeping the row with either the lowest or highest ID, depending on your requirements.
Setting Up a Sample Table
For the purpose of demonstration let's set up a sample table(say, 'basket') that stores 'fruits' as follows:
PostgreSQL
CREATE TABLE basket(
id SERIAL PRIMARY KEY,
fruit VARCHAR(50) NOT NULL
);
INSERT INTO basket(fruit) values('apple');
INSERT INTO basket(fruit) values('apple');
INSERT INTO basket(fruit) values('orange');
INSERT INTO basket(fruit) values('orange');
INSERT INTO basket(fruit) values('orange');
INSERT INTO basket(fruit) values('banana');
SELECT * FROM basket;
This should result into below:

Now that we have set up the sample table, we will query for the duplicates using the following.
Query:
SELECT
fruit,
COUNT( fruit )
FROM
basket
GROUP BY
fruit
HAVING
COUNT( fruit )> 1
ORDER BY
fruit;
This should lead to the following results:

Deleting Duplicate Rows with a Subquery
To delete the duplicate rows while keeping the row with the lowest ID, you can use a subquery with the 'ROW_NUMBER()'
window function. This method ensures that only one row per fruit is retained, and all other duplicates are removed.
Query:
DELETE FROM basket
WHERE id IN
(SELECT id
FROM
(SELECT id,
ROW_NUMBER() OVER( PARTITION BY fruit ORDER BY id ) AS row_num
FROM basket ) t
WHERE t.row_num > 1 );
Explanation:
- The inner subquery assigns a row number to each row within each partition (grouped by 'fruit'), ordered by 'id'.
- The ROW_NUMBER() function starts counting from 1 for each group, so the first row in each group is retained, and the rest are marked for deletion.
- The outer DELETE statement removes the rows identified by the subquery.
Keeping the Row with the Highest ID
If you want to keep the duplicate row with highest id, just change the order in the subquery:
DELETE FROM basket
WHERE id IN
(SELECT id
FROM
(SELECT id,
ROW_NUMBER() OVER( PARTITION BY fruit ORDER BY id ) AS row_num
FROM basket ) t
WHERE t.row_num > 1 );
This query will retain the row with the highest ID for each duplicate group and delete all other duplicates.
Deleting Duplicates Based on Multiple Columns
In case you want to delete duplicate based on values of multiple columns, here is the query template.
Query:
DELETE FROM table_name
WHERE id IN
(SELECT id
FROM
(SELECT id,
ROW_NUMBER() OVER( PARTITION BY column_1, column_2 ORDER BY id ) AS row_num
FROM table_name ) t
WHERE t.row_num > 1 );
Explanation:
- The
PARTITION BY
clause includes multiple columns ('column_1', 'column_2'
), ensuring duplicates are identified based on the combination of those columns. - The rest of the logic remains the same.
Verifying the Result
In this case, the statement will delete all rows with duplicate values in the 'column_1' and 'column_2' columns. To verify the above use the below query.
Query:
SELECT
fruit,
COUNT( fruit )
FROM
basket
GROUP BY
fruit
HAVING
COUNT( fruit )> 1
ORDER BY
fruit;
Output:

If the deletion was successful, this query should return an empty result set, indicating no duplicates remain.
Similar Reads
PostgreSQL Tutorial In this PostgreSQL tutorial youâll learn the basic data types(Boolean, char, text, time, int etc.), Querying and Filtering techniques like select, where, in, order by, etc. managing and modifying the tables in PostgreSQL. Weâll cover all the basic to advance concepts of PostgreSQL in this tutorial.
8 min read
PostgreSQL DATEDIFF Function PostgreSQL doesnât have a DATEDIFF function like some other databases, but you can still calculate the difference between dates using simple subtraction. This approach allows you to find out how many days, months, or years separate two dates. In this article, we'll explore how to compute date differ
6 min read
PostgreSQL - Data Types PostgreSQL is a powerful, open-source relational database management system that supports a wide variety of data types. These data types are essential for defining the nature of the data stored in a database column. which allows developers to define, store, and manipulate data in a way that aligns w
5 min read
PostgreSQL - Psql commands PostgreSQL, or Postgres, is an object-relational database management system that utilizes the SQL language. PSQL is a powerful interactive terminal for working with the PostgreSQL database. It enables users to execute queries efficiently and manage databases effectively.Here, we highlight some of th
2 min read
Top 50 PostgreSQL Interview Questions and Answers Are you preparing for a PostgreSQL interview? PostgreSQL is a powerful open-source relational database management system (RDBMS) that is well-known for its reliability, scalability, and rich set of features. Itâs a favorite among developers and businesses alike, making it essential to master if we w
15+ min read
How to Dump and Restore PostgreSQL Database? PostgreSQL remains among the most efficient and widely applied open-source relational database management systems. It provides the superior function of saving, configuring, and extracting information most effectively. In the process of migrating data, creating backups, or transferring databases betw
6 min read
PostgreSQL - Create Database Creating a database in PostgreSQL is an important task for developers and database administrators to manage data effectively. PostgreSQL provides multiple ways to create a database, catering to different user preferences, whether through the command-line interface or using a graphical interface like
5 min read
PostgreSQL - SERIAL When working with PostgreSQL, we need to create tables with unique primary keys. PostgreSQL offers a powerful feature known as the SERIAL pseudo-type which simplifies generating auto-incrementing sequences for columns. In this article, weâll learn about the PostgreSQL SERIAL pseudo-type by explain h
5 min read
PostgreSQL Connection String A connection string is an essential component that enables applications to communicate with databases or other data sources by providing the necessary configuration details. It consolidates critical information such as the server address, database name, user credentials, and additional parameters li
4 min read
PostgreSQL - IF Statement PostgreSQL IF statement is an essential tool for implementing conditional logic within SQL queries and stored procedures. It allows developers to execute different actions based on specific conditions and enhances the flexibility of database operations. In this article, we will explore various Postg
5 min read