The Pandas package enables us to create “dataframes“, which are a lot like sheets in a spreadsheet app.
This means you can handle numerical and textual data together in the same data structure.
There are many different ways to create a dataframe, and we’ll take a look at a few here.
Creating and Populating Dataframes from Scratch
We can easily create empty dataframes and then populate them with data in a variety of different ways.
import pandas as pd
df = pd.DataFrame()
df['weight'] = [80, 50, 90]
df['height'] = [182, 159, 181]
print(df)
weight height
0 80 182
1 50 159
2 90 181
Notice the first pseudo-column is an index that numbers all the rows. This can be accessed via df.index, which is an iterator.
We can alternatively supply data in a dictionary.
import pandas as pd
data = {
'height': [80, 50, 90],
'weight': [182, 159, 181],
}
df = pd.DataFrame(data)
print(df)
height weight
0 80 182
1 50 159
2 90 181
Loading Existing Data
We can easily load csv data from files.
height,weight
80,182
50,159
90,181
import pandas as pd
df = pd.read_csv('heightweight.txt', delimiter=',')
print(df)
height weight
0 80 182
1 50 159
2 90 181
Loading Larger Datasets
Now let’s try to load the iris flower dataset. We’ll add the species into its own column.
import pandas as pd
from sklearn.datasets import load_iris
import numpy as np
iris = load_iris(as_frame=True)
df = pd.DataFrame(iris['data'])
df['species'] = np.choose(iris['target'], iris['target_names'])
print(df)
This is actually easier to read in a terminal, since in a browser I find the rows wrap, being too long to display as they are, but this will depend on your font size.
Notice that by default not all rows are displayed. All columns may not be displayed either, if you have many columns. Usually this is what you want, but this behaviour of hiding data may be reconfigured if needed.
Here the output is rather long, so I won’t reproduce it. But notice the set_option method.
import pandas as pd
from sklearn.datasets import load_iris
import numpy as np
iris = load_iris(as_frame=True)
df = pd.DataFrame(iris['data'])
df['species'] = np.choose(iris['target'], iris['target_names'])
pd.set_option("display.max_rows", 200)
pd.set_option("display.max_columns", 20)
print(df)
Leave a Reply