Home » Machine Learning » Pandas » Referencing Data with loc and iloc

Referencing Data with loc and iloc

We’ll begin by selecting only 3 of the 4 columns from the iris flower dataset. Also we’ll rename the columns slightly, to make them easier to work with.

To select entire columns, we specify the column name or a list of column names in square brackets.

import pandas as pd
from sklearn.datasets import load_iris

iris = load_iris(as_frame=True)
df = pd.DataFrame(iris['data'])

df.columns = ['sepal length', 'sepal width', 'petal length', 'petal width']

df = df[['sepal length', 'petal length', 'petal width']]

print(df)
sepal length  petal length  petal width
0             5.1           1.4          0.2
1             4.9           1.4          0.2
2             4.7           1.3          0.2
3             4.6           1.5          0.2
4             5.0           1.4          0.2
..            ...           ...          ...
145           6.7           5.2          2.3
146           6.3           5.0          1.9
147           6.5           5.2          2.0
148           6.2           5.4          2.3
149           5.9           5.1          1.8

[150 rows x 3 columns]

iloc

To refer to Pandas cells via indices, you can use iloc (index location).

Supply the row index or range first, then the column index.

Here we set the value of the cells from rows 2 up to (but not including 6) to 9.9, in column 1 only.

Then we display the first 10 rows of the dataframe using head.

import pandas as pd
from sklearn.datasets import load_iris

iris = load_iris(as_frame=True)
df = pd.DataFrame(iris['data'])

df.columns = ['sepal length', 'sepal width', 'petal length', 'petal width']
df = df[['sepal length', 'petal length', 'petal width']]

df.iloc[2:6, 1] = 9.9

print(df.head(10))
sepal length  petal length  petal width
0           5.1           1.4          0.2
1           4.9           1.4          0.2
2           4.7           9.9          0.2
3           4.6           9.9          0.2
4           5.0           9.9          0.2
5           5.4           9.9          0.4
6           4.6           1.4          0.3
7           5.0           1.5          0.2
8           4.4           1.4          0.2
9           4.9           1.5          0.1

We can supply a list of items to set the cell values to if we prefer.

Note that you can also mix data types in Pandas. Here we’ll set the values up to (but not including) the column at index 3, in row 4, to some strings.

import pandas as pd
from sklearn.datasets import load_iris

iris = load_iris(as_frame=True)
df = pd.DataFrame(iris['data'])

df.columns = ['sepal length', 'sepal width', 'petal length', 'petal width']
df = df[['sepal length', 'petal length', 'petal width']]

df.iloc[4, :3] = ['hello', 'to', 'you']

print(df.head(10))
 sepal length petal length petal width
0          5.1          1.4         0.2
1          4.9          1.4         0.2
2          4.7          1.3         0.2
3          4.6          1.5         0.2
4        hello           to         you
5          5.4          1.7         0.4
6          4.6          1.4         0.3
7          5.0          1.5         0.2
8          4.4          1.4         0.2
9          4.9          1.5         0.1

loc

Both iloc and loc allow you to either set cell values, or retrieve them.

However, where iloc works with indices, loc works with labels.

To illustrate the use of loc as clearly as possible, let’s first create a table using the following data.

Notice we’ve used delim_whitespace to set white space as the delimiter, but since the cake names have spaces in them, we must then place those in quotes.

cakes.txt
                      Sugar   Fat     Salt
"Vienna Tart"         10.2    8.3     0.5
"Bakewell Tart"       15.3    7.3     0.2
"Victoria Sponge"     22.1    6.1     0.7
import pandas as pd

df = pd.read_csv('cakes.csv', delim_whitespace=True, index_col=0)

print(df)
                 Sugar  Fat  Salt
Vienna Tart       10.2  8.3   0.5
Bakewell Tart     15.3  7.3   0.2
Victoria Sponge   22.1  6.1   0.7

We can now reference or set values in the data frame using the row and column labels.

Here’s all the Vienna Tart data, returned as a Pandas data series.

import pandas as pd

df = pd.read_csv('cakes.csv', delim_whitespace=True, index_col=0)

print(df.loc['Vienna Tart'])
Sugar    10.2
Fat       8.3
Salt      0.5
Name: Vienna Tart, dtype: float64

Here is the fat data for Vienna Tarts and Bakewell Tarts. Notice that here, the range does include the value specified as the end of the range, unlike with iloc.

import pandas as pd

df = pd.read_csv('cakes.csv', delim_whitespace=True, index_col=0)

print(df.loc['Vienna Tart':'Bakewell Tart', 'Fat'])
Vienna Tart      8.3
Bakewell Tart    7.3
Name: Fat, dtype: float64

Most often, loc and iloc return views that can be used to change the original data frame. However, sometimes copies are returned instead, depending on how you specify the values. If a view is returned, you can rely on Pandas to carry on returning a view when you run the code repeatedly.

import pandas as pd

df = pd.read_csv('cakes.csv', delim_whitespace=True, index_col=0)

df.loc['Vienna Tart':'Bakewell Tart', 'Fat'] = [11, 22]

print(df)
                 Sugar   Fat  Salt
Vienna Tart       10.2  11.0   0.5
Bakewell Tart     15.3  22.0   0.2
Victoria Sponge   22.1   6.1   0.7

Leave a Reply

Blog at WordPress.com.

%d