Remove/Extract Duplicate Elements from List in Python
This article describes how to generate a new list in Python by removing and extracting duplicate elements from a list. Note that removing duplicate elements is equivalent to extracting only unique elements.
The same idea can be applied to tuples instead of lists.
To check if a list or tuple has duplicate elements, or to extract elements that are common or non-common among multiple lists, see the following articles.
- Check if a list has duplicates in Python
- Extract common/non-common/unique elements from multiple lists in Python
Remove duplicate elements (Extract unique elements) from a list
Do not keep the order of the original list: set()
Use set()
if you don't need to keep the order of the original list.
By passing a list to set()
, it returns set
, which ignores duplicate values and keeps only unique values as elements.
set
can be converted back to a list or tuple with list()
or tuple()
.
l = [3, 3, 2, 1, 5, 1, 4, 2, 3]
print(set(l))
# {1, 2, 3, 4, 5}
print(list(set(l)))
# [1, 2, 3, 4, 5]
Of course, you can use set
as it is. See the following article for more information about set
.
Keep the order of the original list: dict.fromkeys()
, sorted()
If you want to keep the order of the original list, use dict.fromkeys()
or sorted()
.
dict.fromkeys()
creates a new dictionary with keys from the iterable. If the second argument is omitted, the default value is set to None
.
Since a dictionary key cannot have duplicate elements, duplicate values are ignored like set()
. Passing a dictionary to list()
returns a list with dictionary keys as elements.
l = [3, 3, 2, 1, 5, 1, 4, 2, 3]
print(dict.fromkeys(l))
# {3: None, 2: None, 1: None, 5: None, 4: None}
print(list(dict.fromkeys(l)))
# [3, 2, 1, 5, 4]
Starting from Python 3.7 (3.6 for CPython), dict.fromkeys()
guarantees that the sequence order is preserved. In earlier versions, you can use the built-in function sorted()
as described below.
The index()
method returns the index of a value. By specifying it as the key
parameter in sorted()
, the list can be sorted based on the order of the original list.
print(sorted(set(l), key=l.index))
# [3, 2, 1, 5, 4]
For a two-dimensional list (list of lists)
For a two-dimensional list (list of lists), using set()
or dict.fromkeys()
will raise a TypeError
.
l_2d = [[1, 1], [0, 1], [0, 1], [0, 0], [1, 0], [1, 1], [1, 1]]
# l_2d_unique = list(set(l_2d))
# TypeError: unhashable type: 'list'
# l_2d_unique_order = dict.fromkeys(l_2d)
# TypeError: unhashable type: 'list'
This is because unhashable objects such as lists cannot be set
type elements or dict
type keys.
Define the following function to resolve this issue. This function preserves the order of the original list and works for both one-dimensional lists and tuples.
def get_unique_list(seq):
seen = []
return [x for x in seq if x not in seen and not seen.append(x)]
print(get_unique_list(l_2d))
# [[1, 1], [0, 1], [0, 0], [1, 0]]
print(get_unique_list(l))
# [3, 2, 1, 5, 4]
List comprehension is used.
Extract duplicate elements from a list
Do not keep the order of the original list
If you want to extract only duplicate elements from the original list, use collections.Counter()
that returns collections.Counter
(dictionary subclass) whose key is an element and whose value is its count.
import collections
l = [3, 3, 2, 1, 5, 1, 4, 2, 3]
print(collections.Counter(l))
# Counter({3: 3, 2: 2, 1: 2, 5: 1, 4: 1})
Since it is a subclass of a dictionary, you can retrieve keys and values with items()
. You can extract keys with more than two counts by list comprehension.
print([k for k, v in collections.Counter(l).items() if v > 1])
# [3, 2, 1]
Keep the order of the original list
As in the above example, since Python 3.7, the key of collections.Counter
preserves the order of the original list.
In earlier versions, you can sort by sorted()
as in the example to remove duplicate elements.
l = [3, 3, 2, 1, 5, 1, 4, 2, 3]
print(sorted([k for k, v in collections.Counter(l).items() if v > 1], key=l.index))
# [3, 2, 1]
If you want to extract elements in their duplicated state, simply include elements with two or more occurrences from the original list. The order is also preserved.
cc = collections.Counter(l)
print([x for x in l if cc[x] > 1])
# [3, 3, 2, 1, 1, 2, 3]
For a two-dimensional list (list of lists)
For a two-dimensional list (list of lists):
l_2d = [[1, 1], [0, 1], [0, 1], [0, 0], [1, 0], [1, 1], [1, 1]]
def get_duplicate_list(seq):
seen = []
return [x for x in seq if not seen.append(x) and seen.count(x) == 2]
def get_duplicate_list_order(seq):
seen = []
return [x for x in seq if seq.count(x) > 1 and not seen.append(x) and seen.count(x) == 1]
print(get_duplicate_list(l_2d))
# [[0, 1], [1, 1]]
print(get_duplicate_list_order(l_2d))
# [[1, 1], [0, 1]]
print(get_duplicate_list(l))
# [3, 1, 2]
print(get_duplicate_list_order(l))
# [3, 2, 1]
print([x for x in l_2d if l_2d.count(x) > 1])
# [[1, 1], [0, 1], [0, 1], [1, 1], [1, 1]]
Note that count()
requires O(n)
, so the function that repeatedly executes count()
shown above is very inefficient. There may be more efficient and optimized approaches available.
Since collections.Counter
is a subclass of the dictionary, an error is raised if you pass a list or tuple whose elements are unhashable, such as a list, to collections.Counter()
.
# print(collections.Counter(l_2d))
# TypeError: unhashable type: 'list'