Sort a 2D List in Python
This article explains how to sort a 2D list (list of lists) in Python.
Working with external libraries like NumPy or pandas can simplify the task. We recommend using them if possible.
The sample code in this article uses the pprint
module for improved readability.
import pprint
In the provided code, width=20
is specified as an argument to pprint.pprint()
. This argument merely helps format the output and doesn't require special attention.
Default behavior of sorted()
or sort()
for 2D lists
Let's consider the following 2D list (list of lists) as an example:
l_2d = [[20, 3, 100], [1, 200, 30], [300, 10, 2]]
pprint.pprint(l_2d, width=20)
# [[20, 3, 100],
# [1, 200, 30],
# [300, 10, 2]]
By default, the sorted()
function and the sort()
method sort each list by comparing and arranging them.
The comparison is based on the first unequal elements, meaning the first element of each list is compared and sorted accordingly in this case.
pprint.pprint(sorted(l_2d), width=20)
# [[1, 200, 30],
# [20, 3, 100],
# [300, 10, 2]]
For more details on sort()
and sorted()
, refer to the following article:
Sort rows/columns independently in 2D lists
Use sorted()
and list comprehension
Sorting each row is equivalent to sorting each list. To do this, you can use list comprehension to apply sorted()
to each list.
l_2d = [[20, 3, 100], [1, 200, 30], [300, 10, 2]]
pprint.pprint(l_2d, width=20)
# [[20, 3, 100],
# [1, 200, 30],
# [300, 10, 2]]
pprint.pprint([sorted(l) for l in l_2d], width=20)
# [[3, 20, 100],
# [1, 30, 200],
# [2, 10, 300]]
If you want to sort each column, you need to transpose the original list of lists, sort each list, and then transpose it back. You can use zip()
and *
for transposing.
pprint.pprint([list(x) for x in zip(*[sorted(l) for l in zip(*l_2d)])], width=20)
# [[1, 3, 2],
# [20, 10, 30],
# [300, 200, 100]]
Use NumPy: np.sort()
NumPy simplifies the process. Let's consider the following 2D list (list of lists) as an example:
l_2d = [[20, 3, 100], [1, 200, 30], [300, 10, 2]]
pprint.pprint(l_2d, width=20)
# [[20, 3, 100],
# [1, 200, 30],
# [300, 10, 2]]
By default, the np.sort()
function sorts each row. If you set axis=0
, it will sort each column instead. The function returns a NumPy array (ndarray
).
import numpy as np
print(np.sort(l_2d))
# [[ 3 20 100]
# [ 1 30 200]
# [ 2 10 300]]
print(np.sort(l_2d, axis=0))
# [[ 1 3 2]
# [ 20 10 30]
# [300 200 100]]
print(type(np.sort(l_2d)))
# <class 'numpy.ndarray'>
If you need to convert ndarray
back to a list, you can use the tolist()
method.
print(np.sort(l_2d).tolist())
# [[3, 20, 100], [1, 30, 200], [2, 10, 300]]
print(type(np.sort(l_2d).tolist()))
# <class 'list'>
Note that NumPy only treats lists with an equal number of elements as multi-dimensional arrays. Lists with an unequal number of elements will result in an error.
l_2d_error = [[1, 2], [3, 4, 5]]
# print(np.sort(l_2d_error))
# ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.
Sort 2D lists according to given rows/columns
Instead of sorting each row and column independently, this section explains how to sort according to given rows or columns.
Use the key
argument of sorted()
or sort()
Basic idea
As mentioned above, by default, sorting is based on the first column (the first element of each list).
l_2d = [[20, 3, 100], [1, 200, 30], [300, 10, 2]]
pprint.pprint(l_2d, width=20)
# [[20, 3, 100],
# [1, 200, 30],
# [300, 10, 2]]
pprint.pprint(sorted(l_2d), width=20)
# [[1, 200, 30],
# [20, 3, 100],
# [300, 10, 2]]
If you want to sort according to the second or third column, you can use the key
argument in the sorted()
function or the sort()
method.
This argument should be a callable object, like a function. The sorting is based on the results of applying this function to each element.
In this case, you should specify a function that retrieves an element at a desired index in the list. You can use a lambda expression or the itemgetter()
function from the operator
module.
Here's an example using a lambda expression:
pprint.pprint(sorted(l_2d, key=lambda x: x[1]), width=20)
# [[20, 3, 100],
# [300, 10, 2],
# [1, 200, 30]]
pprint.pprint(sorted(l_2d, key=lambda x: x[2]), width=20)
# [[300, 10, 2],
# [1, 200, 30],
# [20, 3, 100]]
Here's an example using operator.itemgetter()
:
import operator
pprint.pprint(sorted(l_2d, key=operator.itemgetter(1)), width=20)
# [[20, 3, 100],
# [300, 10, 2],
# [1, 200, 30]]
pprint.pprint(sorted(l_2d, key=operator.itemgetter(2)), width=20)
# [[300, 10, 2],
# [1, 200, 30],
# [20, 3, 100]]
For more details on the key
argument and the operator
module, refer to the following articles:
- How to use the key argument in Python (sorted, max, etc.)
- The operator module in Python (itemgetter, attrgetter, methodcaller)
To sort according to a specific row, transpose the array, apply the sorting operation as described above, and then transpose the array back.
Sort according to multiple rows/columns
Let's consider a case with duplicated values.
l_2d_dup = [[1, 3, 100], [1, 200, 30], [1, 3, 2]]
pprint.pprint(l_2d_dup, width=20)
# [[1, 3, 100],
# [1, 200, 30],
# [1, 3, 2]]
By default, sorting compares and orders each list based on the first unequal element. Thus, if the first column's elements are identical, sorting will proceed according to the second column, and so forth.
pprint.pprint(sorted(l_2d_dup), width=20)
# [[1, 3, 2],
# [1, 3, 100],
# [1, 200, 30]]
If you want to sort according to multiple columns in an arbitrary order, you can use the key
argument.
When you provide multiple values (indices) to operator.itemgetter()
, the comparison for sorting proceeds according to the second value, if the first value is identical. In the following example, sorting is based on the first and third columns.
pprint.pprint(sorted(l_2d_dup, key=operator.itemgetter(0, 2)), width=20)
# [[1, 3, 2],
# [1, 200, 30],
# [1, 3, 100]]
The same process can also be executed with a lambda expression.
pprint.pprint(sorted(l_2d_dup, key=lambda x: (x[0], x[2])), width=20)
# [[1, 3, 2],
# [1, 200, 30],
# [1, 3, 100]]
Again, to sort according to a specific row, transpose the array, apply the sorting operation as described above, and then transpose the array back.
Use pandas: sort_values()
Pandas offers an even simpler approach. Let's consider the following 2D list (list of lists) as an example:
l_2d_dup = [[1, 3, 100], [1, 200, 30], [1, 3, 2]]
pprint.pprint(l_2d_dup, width=20)
# [[1, 3, 100],
# [1, 200, 30],
# [1, 3, 2]]
You can generate a pandas.DataFrame
from a list of lists. Row and column names are optional and can be anything that suits your needs.
import pandas as pd
df = pd.DataFrame(l_2d_dup, columns=['A', 'B', 'C'], index=['X', 'Y', 'Z'])
print(df)
# A B C
# X 1 3 100
# Y 1 200 30
# Z 1 3 2
The sort_values()
method enables sorting according to specific columns or rows. It sorts by columns by default; however, by setting axis=1
, you can sort by rows.
print(df.sort_values('C'))
# A B C
# Z 1 3 2
# Y 1 200 30
# X 1 3 100
print(df.sort_values('Z', axis=1))
# A C B
# X 1 100 3
# Y 1 30 200
# Z 1 2 3
You can specify multiple columns or rows.
print(df.sort_values(['A', 'C']))
# A B C
# Z 1 3 2
# Y 1 200 30
# X 1 3 100
If you omit the columns
and index
arguments when generating a pandas.DataFrame
, the row and column names will default to a series of numbers. Even though this may appear confusing with both row and column names being numeric, it does not affect the processing.
df = pd.DataFrame(l_2d_dup)
print(df)
# 0 1 2
# 0 1 3 100
# 1 1 200 30
# 2 1 3 2
print(df.sort_values(2))
# 0 1 2
# 2 1 3 2
# 1 1 200 30
# 0 1 3 100
print(df.sort_values(2, axis=1))
# 0 2 1
# 0 1 100 3
# 1 1 30 200
# 2 1 2 3
print(df.sort_values([0, 2]))
# 0 1 2
# 2 1 3 2
# 1 1 200 30
# 0 1 3 100
For more details on sort_values()
, such as specifying the sorting order with the ascending
argument, refer to the following article:
You can also convert a pandas.DataFrame
to a list or numpy.ndarray
.
- Convert pandas.DataFrame, Series and list to each other
- Convert pandas.DataFrame, Series and numpy.ndarray to each other
For example, you can convert a pandas.DataFrame
to a list as follows
print(df.sort_values([0, 2]).values.tolist())
# [[1, 3, 2], [1, 200, 30], [1, 3, 100]]
print(type(df.sort_values([0, 2]).values.tolist()))
# <class 'list'>