Remove/Extract Duplicate Elements from List in Python

Modified: 2023-05-08 | Tags: Python, List

This article describes how to generate a new list in Python by removing and extracting duplicate elements from a list. Note that removing duplicate elements is equivalent to extracting only unique elements.

Contents

Remove duplicate elements (Extract unique elements) from a list
Extract duplicate elements from a list

The same idea can be applied to tuples instead of lists.

To check if a list or tuple has duplicate elements, or to extract elements that are common or non-common among multiple lists, see the following articles.

Remove duplicate elements (Extract unique elements) from a list

Do not keep the order of the original list: `set()`

Use set() if you don't need to keep the order of the original list.

By passing a list to set(), it returns set, which ignores duplicate values and keeps only unique values as elements.

Built-in Functions - set() — Python 3.11.3 documentation

set can be converted back to a list or tuple with list() or tuple().

l = [3, 3, 2, 1, 5, 1, 4, 2, 3]

print(set(l))
# {1, 2, 3, 4, 5}

print(list(set(l)))
# [1, 2, 3, 4, 5]

source: list_unique.py

Of course, you can use set as it is. See the following article for more information about set.

Set operations in Python (union, intersection, symmetric difference, etc.)

Keep the order of the original list: `dict.fromkeys()`, `sorted()`

If you want to keep the order of the original list, use dict.fromkeys() or sorted().

dict.fromkeys() creates a new dictionary with keys from the iterable. If the second argument is omitted, the default value is set to None.

Built-in Types - dict.fromkeys() — Python 3.11.3 documentation

Since a dictionary key cannot have duplicate elements, duplicate values are ignored like set(). Passing a dictionary to list() returns a list with dictionary keys as elements.

l = [3, 3, 2, 1, 5, 1, 4, 2, 3]

print(dict.fromkeys(l))
# {3: None, 2: None, 1: None, 5: None, 4: None}

print(list(dict.fromkeys(l)))
# [3, 2, 1, 5, 4]

source: list_unique.py

Starting from Python 3.7 (3.6 for CPython), dict.fromkeys() guarantees that the sequence order is preserved. In earlier versions, you can use the built-in function sorted() as described below.

The index() method returns the index of a value. By specifying it as the key parameter in sorted(), the list can be sorted based on the order of the original list.

How to use a key parameter in Python (sorted, max, etc.)

print(sorted(set(l), key=l.index))
# [3, 2, 1, 5, 4]

source: list_unique.py

For a two-dimensional list (list of lists)

For a two-dimensional list (list of lists), using set() or dict.fromkeys() will raise a TypeError.

l_2d = [[1, 1], [0, 1], [0, 1], [0, 0], [1, 0], [1, 1], [1, 1]]

# l_2d_unique = list(set(l_2d))
# TypeError: unhashable type: 'list'

# l_2d_unique_order = dict.fromkeys(l_2d)
# TypeError: unhashable type: 'list'

source: list_unique.py

This is because unhashable objects such as lists cannot be set type elements or dict type keys.

Define the following function to resolve this issue. This function preserves the order of the original list and works for both one-dimensional lists and tuples.

python - How do you remove duplicates from a list whilst preserving order? - Stack Overflow

def get_unique_list(seq):
    seen = []
    return [x for x in seq if x not in seen and not seen.append(x)]

print(get_unique_list(l_2d))
# [[1, 1], [0, 1], [0, 0], [1, 0]]

print(get_unique_list(l))
# [3, 2, 1, 5, 4]

source: list_unique.py

List comprehension is used.

List comprehensions in Python

Extract duplicate elements from a list

Do not keep the order of the original list

If you want to extract only duplicate elements from the original list, use collections.Counter() that returns collections.Counter (dictionary subclass) whose key is an element and whose value is its count.

Count elements in a list with collections.Counter in Python

import collections

l = [3, 3, 2, 1, 5, 1, 4, 2, 3]

print(collections.Counter(l))
# Counter({3: 3, 2: 2, 1: 2, 5: 1, 4: 1})

source: list_unique.py

Since it is a subclass of a dictionary, you can retrieve keys and values with items(). You can extract keys with more than two counts by list comprehension.

Iterate dictionary (key and value) with for loop in Python

print([k for k, v in collections.Counter(l).items() if v > 1])
# [3, 2, 1]

source: list_unique.py

Keep the order of the original list

As in the above example, since Python 3.7, the key of collections.Counter preserves the order of the original list.

In earlier versions, you can sort by sorted() as in the example to remove duplicate elements.

l = [3, 3, 2, 1, 5, 1, 4, 2, 3]

print(sorted([k for k, v in collections.Counter(l).items() if v > 1], key=l.index))
# [3, 2, 1]

source: list_unique.py

If you want to extract elements in their duplicated state, simply include elements with two or more occurrences from the original list. The order is also preserved.

cc = collections.Counter(l)
print([x for x in l if cc[x] > 1])
# [3, 3, 2, 1, 1, 2, 3]

source: list_unique.py

For a two-dimensional list (list of lists)

For a two-dimensional list (list of lists):

l_2d = [[1, 1], [0, 1], [0, 1], [0, 0], [1, 0], [1, 1], [1, 1]]

def get_duplicate_list(seq):
    seen = []
    return [x for x in seq if not seen.append(x) and seen.count(x) == 2]

def get_duplicate_list_order(seq):
    seen = []
    return [x for x in seq if seq.count(x) > 1 and not seen.append(x) and seen.count(x) == 1]

print(get_duplicate_list(l_2d))
# [[0, 1], [1, 1]]

print(get_duplicate_list_order(l_2d))
# [[1, 1], [0, 1]]

print(get_duplicate_list(l))
# [3, 1, 2]

print(get_duplicate_list_order(l))
# [3, 2, 1]

source: list_unique.py

print([x for x in l_2d if l_2d.count(x) > 1])
# [[1, 1], [0, 1], [0, 1], [1, 1], [1, 1]]

source: list_unique.py

Note that count() requires O(n), so the function that repeatedly executes count() shown above is very inefficient. There may be more efficient and optimized approaches available.

Since collections.Counter is a subclass of the dictionary, an error is raised if you pass a list or tuple whose elements are unhashable, such as a list, to collections.Counter().

# print(collections.Counter(l_2d))
# TypeError: unhashable type: 'list'

source: list_unique.py

Remove/Extract Duplicate Elements from List in Python

Remove duplicate elements (Extract unique elements) from a list

Do not keep the order of the original list: `set()`

Keep the order of the original list: `dict.fromkeys()`, `sorted()`

For a two-dimensional list (list of lists)

Extract duplicate elements from a list

Do not keep the order of the original list

Keep the order of the original list

For a two-dimensional list (list of lists)

Related Categories

Related Articles

Remove/Extract Duplicate Elements from List in Python

Remove duplicate elements (Extract unique elements) from a list

Do not keep the order of the original list: set()

Keep the order of the original list: dict.fromkeys(), sorted()

For a two-dimensional list (list of lists)

Extract duplicate elements from a list

Do not keep the order of the original list

Keep the order of the original list

For a two-dimensional list (list of lists)

Related Categories

Related Articles

Do not keep the order of the original list: `set()`

Keep the order of the original list: `dict.fromkeys()`, `sorted()`