Home » Machine Learning » Pandas » Sorting in Pandas

Sorting in Pandas

It’s easy to sort data in a Pandas data frame, using the sort_values method.

This returns a sorted data frame by default, but if you set the inplace parameter to True, the data frame you call the method on will itself be sorted.

Here we’ll sort some cakes by salt content.

cakes.csv
                      Sugar   Fat     Salt
"Vienna Tart"         10.2    8.3     0.5
"Bakewell Tart"       15.3    7.3     0.2
"Victoria Sponge"     22.1    6.1     0.7
import pandas as pd

df = pd.read_csv('cakes.csv', delim_whitespace=True, index_col=0)
print(df)

df.sort_values(by='Salt', inplace=True)
print(df)
                 Sugar  Fat  Salt
Vienna Tart       10.2  8.3   0.5
Bakewell Tart     15.3  7.3   0.2
Victoria Sponge   22.1  6.1   0.7
                 Sugar  Fat  Salt
Bakewell Tart     15.3  7.3   0.2
Vienna Tart       10.2  8.3   0.5
Victoria Sponge   22.1  6.1   0.7

Notice we’re printing this twice, before and after sorting.

Sorting by Multiple Columns

Now let’s try sorting first by salt content, and then by sugar content.

Also we’ll reverse the order, so the largest values come first.

First we’ll need to add some more cakes to the table.

cakes.csv
                      Sugar   Fat     Salt
"Vienna Tart"         10.2    8.3     0.5
"Bakewell Tart"       15.3    7.3     0.2
"Victoria Sponge"     22.1    6.1     0.7
"Fairy Cake"          18.4    2.3     0.2
"Battenburg"          15.3    3.8     0.5
import pandas as pd

df = pd.read_csv('cakes.csv', delim_whitespace=True, index_col=0)

df.sort_values(by=['Salt', 'Sugar'], inplace=True, ascending=False)
print(df)
                 Sugar  Fat  Salt
Victoria Sponge   22.1  6.1   0.7
Battenburg        15.3  3.8   0.5
Vienna Tart       10.2  8.3   0.5
Fairy Cake        18.4  2.3   0.2
Bakewell Tart     15.3  7.3   0.2

Specifying an Axis

We can specify an axis for sorting. This defaults to 0, meaning the rows are sorted. If we set the axis parameter to 1, we sort the columns instead. Now we’ll need to specify row labels to sort by instead of column labels.

import pandas as pd

df = pd.read_csv('cakes.csv', delim_whitespace=True, index_col=0)
print(df)

df.sort_values(by=['Vienna Tart'], inplace=True, axis=1)
print(df)
                Sugar  Fat  Salt
Vienna Tart       10.2  8.3   0.5
Bakewell Tart     15.3  7.3   0.2
Victoria Sponge   22.1  6.1   0.7
Fairy Cake        18.4  2.3   0.2
Battenburg        15.3  3.8   0.5
                 Salt  Fat  Sugar
Vienna Tart       0.5  8.3   10.2
Bakewell Tart     0.2  7.3   15.3
Victoria Sponge   0.7  6.1   22.1
Fairy Cake        0.2  2.3   18.4
Battenburg        0.5  3.8   15.3

Leave a Reply

Blog at WordPress.com.

%d