Pandas

  • Zipf’s Law: A Mysterious Word Frequency Law

    Zipf’s law says that, in any large body of ordinary text in any language, the probability of a word occurring is inversely proportional to its frequency rank. For example, suppose the most common word (frequency rank 1) is ‘the’, the second most common word is ‘and’ (frequency rank 2) and the third most common word…

    Read more

  • Filtering

    We can filter Pandas dataframes using the same kind of syntax we use to filter Numpy arrays. We just need to use loc instead of applying square brackets directly to the dataframe. Separate conditions can be combined using bitwise-style operators with the conditions in round brackets to ensure correct operator precedence. Here are two examples.…

    Read more

  • Grouping

    We can group Pandas dataframes by one or more columns. Rather than a DataFrame, this returns a DataFrameGroupBy object. You can then call aggregate methods on this to summarise the data; for example, mean(), sum() and count(). You can also retrieve particular columns via square brackets and run aggregate functions on them. Here we load…

    Read more

  • Sorting in Pandas

    It’s easy to sort data in a Pandas data frame, using the sort_values method. This returns a sorted data frame by default, but if you set the inplace parameter to True, the data frame you call the method on will itself be sorted. Here we’ll sort some cakes by salt content. cakes.csv Notice we’re printing…

    Read more

  • Referencing Data with loc and iloc

    We’ll begin by selecting only 3 of the 4 columns from the iris flower dataset. Also we’ll rename the columns slightly, to make them easier to work with. To select entire columns, we specify the column name or a list of column names in square brackets. iloc To refer to Pandas cells via indices, you…

    Read more

  • Referencing Columns

    To obtain the row indices in Pandas, we use the index property. To obtain an entire column, we use the column name rather as if we were specifying an index in a 2D array. This returns a Panda dataseries object, which represents a single column or data series. This can also be used to create…

    Read more

  • Creating DataFrames

    The Pandas package enables us to create “dataframes“, which are a lot like sheets in a spreadsheet app. This means you can handle numerical and textual data together in the same data structure. There are many different ways to create a dataframe, and we’ll take a look at a few here. Creating and Populating Dataframes…

    Read more

Blog at WordPress.com.