Incremental Clustering with example: BIRCH Algorithm

Incremental Clustering with example: BIRCH Algorithm

BIRCH (incremental) clustering algorithm : Introduction

In one of the previous posts, we talked about incremental clustering with kmeans and saw an example. Here, we will see one more advanced incremental clustering technique called as BIRCH. I recommend you to read fundamentals of machine learning and information of incremental learning first before proceeding to this article.

BIRCH stands for balanced iterative reducing and clustering using hierarchies. It is an unsupervised data mining system used to perform hierarchical clustering over big datasets.

It is a memory-efficient, incremental learning based clustering technique stipulated as a substitute to MiniBatchKMeans. /online version of k-means clustering.

Its creators privilege that BIRCH to be the “initial clustering process projected in the database area to tackle ‘noise’ (data instances that do not belong to underlying configuration) efficiently”, thrashing DBSCAN by couple of months.

Past clustering algorithms performed less successfully over exceptionally vast databases and did not enough consider the case wherein a dataset was too extensive to even consider fitting in fundamental memory.

Subsequently, there was a great deal of overhead keeping up high clustering quality while limiting the expense of expansion IO (input/yield) activities. Besides, the greater part of BIRCH’s forerunners assess all information focuses (or all as of now existing clusters) similarly for each ‘clustering choice’ and do not perform heuristic weighting dependent on the separation between these information focuses.

Structure of BIRCH

It is dependent on the representation of CF (Clustering Feature) a CF Tree.

CF tree is a height balanced tree that saves the clustering features for performing a hierarchical clustering on given dataset.

Cluster of data instances is denoted by a triple of numbers (N,LS,SS)


N = Number of items in the sub cluster

LS = Linear sum of the points

SS = sum of the squared of the points

Each non-leaf node has maximum B records.

Each leaf node has maximum L CF records that fulfil threshold T, a maximum diameter of radius

P (page size in bytes) is the maximum size of a node

Dense: every leaf node is a sub-cluster, not a data sample

BIRCH incremental clustering Algorithm: Step by Step

Phase 1:

Examine entire dataset and construct a first in-memory CF tree, using the given quantity of memory and recycling space on disk

Phase 2 (optional):

Abbreviate into necessary length by constructing a smaller CF tree

Phase 3:

Global clustering technique is used.

Adapted agglomerative hierarchical clustering algorithm is applied straight to the sub-clusters
signified by their CF vectors

Phase 4 (optional):  

Cluster refining/filtering. This step is voluntary, and needs more scans over the dataset to filter the results

Benefits of BIRCH

It is nearby in that each clustering choice is made without examining all information focuses and as of now existing clusters. It misuses the perception that information space is not generally consistently involved and few out of every odd information point is similarly imperative.

It makes full utilization of accessible memory to infer the best conceivable sub-clusters while limiting I/O costs. Likewise, a steady technique does not require the entire informational collection ahead of time.

Benefit of BIRCH is its capacity to gradually and progressively group approaching, multi-dimensional metric information indicates in an endeavor produce the best quality clustering for a given arrangement of assets (memory and time imperatives).

Much of the time, BIRCH just requires a solitary sweep of the database.

Possible Drawbacks:

Since every node in a CF tree can grasp only a limited number of records due to the size,
a CF tree node does not constantly resemble to what a user may ruminate a nature cluster.

Furthermore, if the clusters are not spherical in nature,
it does not execute well because it uses the notion of radius or diameter to regulate the borderline of a cluster/group.

clustering output
clustering output

Performance tuning parameters of BIRCH:


The radius of the sub-cluster gained by integration of a new sample and the closest sub-cluster should be lesser than the threshold value. Sklearn sets the default value of this parameter to 0.5.

2. Branching Factor:

Maximum number of CF sub-clusters in each node. If a new data instance arrives such that the number of sub-clusters surpass the branching factor then that node should divide into two nodes with the sub-clusters redistributed in each. The parent sub-cluster of that node should be detached and two new-fangled sub-clusters are added as parents of the two split nodes. . Sklearn sets the default value of this parameter to 50.

Example of BIRCH (Python- SKlearn Implementation):

>>> from sklearn.cluster import Birch

>>> X = [[0, 1], [0.3, 1], [-0.3, 1], [0, -1], [0.3, -1], [-0.3, -1]]

>>> brc = Birch()


Birch(branching_factor=50, compute_labels=True, copy=True, n_clusters=None,


>>> brc.predict(X)

array([0, 0, 0, 1, 1, 1])


Please follow and like us:

7 thoughts on “Incremental Clustering with example: BIRCH Algorithm

  1. Sounds quite informative. Can I submit one or two questions? Because..I know more ideas that synergize with this.

Leave a Reply

Your email address will not be published. Required fields are marked *


Enjoy this blog? Please spread the word :)