Other Useful Techniques
-
Naive Bayes
Naive Bayes is a technique that has found wide application, notably in spam filters. While Bayes’ Theorem is a theorem in mathematics, there is no “Naive Bayes’ Theorem”. Rather, the “naive” comes from the naive assumption that the probability of some value occurring in your data is independent of the probability of various other values.…
-
Decision Tree Classifiers
Decision tree classifiers work by trying to divide up your data samples based on data series values, at every stage attempting to reduce the degree to which subsets are “mixed”, as judged by Gini coefficient or Shannon entropy. For example, if you have a collection of measurements on plants, a decision tree classifier might first…
-
Using PCA and Logistic Regression to Predict Breast Cancer Diagnosis
Let’s take a look at applying PCA to a dataset that has many more than just a few data series. The Wisconsin breast cancer dataset from Scikit-learn contains thirty columns of data. The data apparently concerns the digitally-detected shape of the nuclei of cells in breast tumour biopsies. We are also told whether the diagnosis…
-
PCA for the Iris Flower Dataset
In this post we’ll analyse the Iris Flower Dataset using principal component analysis and agglomerative clustering. We’ll use PCA both to reduce the number of data series we’re feeding to our agglomerative clustering model (potentially making clustering more efficient, although in this case we’ve only got a total of four data series so it won’t…
-
Principal Component Analysis
In this course we’ve created graphs of the well-known iris flower dataset repeatedly, but we were always faced with a frustrating choice. Even though we’ve often used all four data series in the dataset to fit models, we could only plot two data series on a plot, because plots are 2D. By using 3D plots…