Other Useful Techniques

  • Naive Bayes

    Naive Bayes is a technique that has found wide application, notably in spam filters. While Bayes’ Theorem is a theorem in mathematics, there is no “Naive Bayes’ Theorem”. Rather, the “naive” comes from the naive assumption that the probability of some value occurring in your data is independent of the probability of various other values.…

    Read more

  • Decision Tree Classifiers

    Decision tree classifiers work by trying to divide up your data samples based on data series values, at every stage attempting to reduce the degree to which subsets are “mixed”, as judged by Gini coefficient or Shannon entropy. For example, if you have a collection of measurements on plants, a decision tree classifier might first…

    Read more

  • Using PCA and Logistic Regression to Predict Breast Cancer Diagnosis

    Let’s take a look at applying PCA to a dataset that has many more than just a few data series. The Wisconsin breast cancer dataset from Scikit-learn contains thirty columns of data. The data apparently concerns the digitally-detected shape of the nuclei of cells in breast tumour biopsies. We are also told whether the diagnosis…

    Read more

  • PCA for the Iris Flower Dataset

    In this post we’ll analyse the Iris Flower Dataset using principal component analysis and agglomerative clustering. We’ll use PCA both to reduce the number of data series we’re feeding to our agglomerative clustering model (potentially making clustering more efficient, although in this case we’ve only got a total of four data series so it won’t…

    Read more

  • Principal Component Analysis

    In this course we’ve created graphs of the well-known iris flower dataset repeatedly, but we were always faced with a frustrating choice. Even though we’ve often used all four data series in the dataset to fit models, we could only plot two data series on a plot, because plots are 2D. By using 3D plots…

    Read more

Blog at WordPress.com.