Home » Machine Learning » Pandas » Filtering


We can filter Pandas dataframes using the same kind of syntax we use to filter Numpy arrays. We just need to use loc instead of applying square brackets directly to the dataframe.

Separate conditions can be combined using bitwise-style operators with the conditions in round brackets to ensure correct operator precedence.

Here are two examples. We load the mall_customers database, then display everyone aged 66. Then we display all females with an income over 110k.

import pandas as pd

df = pd.read_csv('mall_customers.csv', index_col=0)
df.columns = ['Gender', 'Age', 'Income', 'Spending']

filtered1 = df.loc[df['Age'] == 66]
filtered2 = df.loc[(df['Gender'] == 'Female') & (df['Income'] > 110)]

            Gender  Age  Income  Spending
107         Female   66      63        50
110           Male   66      63        48

            Gender  Age  Income  Spending
194         Female   38     113        91
195         Female   47     120        16
196         Female   35     120        79
197         Female   45     126        28

Notice in this example we’ve specified that the first column (customer ID) should be used as the index column (rather than generating an index) and we’ve renamed the columns to make them easier to refer to.

Leave a Reply

Blog at WordPress.com.