We can filter Pandas dataframes using the same kind of syntax we use to filter Numpy arrays. We just need to use loc instead of applying square brackets directly to the dataframe.
Separate conditions can be combined using bitwise-style operators with the conditions in round brackets to ensure correct operator precedence.
Here are two examples. We load the mall_customers database, then display everyone aged 66. Then we display all females with an income over 110k.
import pandas as pd df = pd.read_csv('mall_customers.csv', index_col=0) df.columns = ['Gender', 'Age', 'Income', 'Spending'] filtered1 = df.loc[df['Age'] == 66] filtered2 = df.loc[(df['Gender'] == 'Female') & (df['Income'] > 110)] print(filtered1) print() print(filtered2)
Gender Age Income Spending CustomerID 107 Female 66 63 50 110 Male 66 63 48 Gender Age Income Spending CustomerID 194 Female 38 113 91 195 Female 47 120 16 196 Female 35 120 79 197 Female 45 126 28
Notice in this example we’ve specified that the first column (customer ID) should be used as the index column (rather than generating an index) and we’ve renamed the columns to make them easier to refer to.