We can filter Pandas dataframes using the same kind of syntax we use to filter Numpy arrays. We just need to use loc instead of applying square brackets directly to the dataframe.
Separate conditions can be combined using bitwise-style operators with the conditions in round brackets to ensure correct operator precedence.
Here are two examples. We load the mall_customers database, then display everyone aged 66. Then we display all females with an income over 110k.
import pandas as pd
df = pd.read_csv('mall_customers.csv', index_col=0)
df.columns = ['Gender', 'Age', 'Income', 'Spending']
filtered1 = df.loc[df['Age'] == 66]
filtered2 = df.loc[(df['Gender'] == 'Female') & (df['Income'] > 110)]
print(filtered1)
print()
print(filtered2)
Gender Age Income Spending
CustomerID
107 Female 66 63 50
110 Male 66 63 48
Gender Age Income Spending
CustomerID
194 Female 38 113 91
195 Female 47 120 16
196 Female 35 120 79
197 Female 45 126 28
Notice in this example we’ve specified that the first column (customer ID) should be used as the index column (rather than generating an index) and we’ve renamed the columns to make them easier to refer to.
Leave a Reply