# Cave of Python

»»» The DBSCAN Algorithm

## The DBSCAN Algorithm

DBSCAN is short for density-based spatial clustering of applications with noise, although I can’t help but suspect that words here have been chosen partly to fit the cool-sounding algorithm!

It’s an algorithm for automatically clustering data, and unlike K-means clustering, you don’t have to specify the number of clusters in advance. DBSCAN will automatically find them for you, and classify any remaining points (points not in clusters) as “noise”.

#### How Does DBSCAN Work?

We start by picking a point p. Draw a circle around that point (or, a sphere if we’re dealing with 3-dimensional data, or a hypersphere if we have more dimensions). The radius of the sphere we’ll call ε, the Greek letter epsilon.

We’ll have to choose a value for ε. In other words, we’ll have to choose how big to draw our circle.

Now the question is, how many points are in that circle? If it’s equal to or greater than some number n (which we also have to choose), we’ll say that our original point is a core point.

Typically we’ll choose n to be twice the number of dimensions in our data. If our data is two dimensional (it contains two data series, meaning we have two values for every data sample), then we might choose n to be 4.

All the points within the circle, we say are directly reachable from this core point p.

Now we can look at each of the points in the circle and ask if each of those points is itself a core point. In other words, if we take each of those points in turn and draw a circle around it, are there at least n points within that circle?

Often we can now connect many other points to our original point by hopping from core point to core point. While only the points within the circle around p are directly reachable from p, many other points are still reachable from p in this way.

We only consider a point to be in the same cluster as p if it’s reachable from p.

#### An Analogy

Since this explanation is a bit involved, let’s consider an analogy involving clusters of people in a crowd.