K-Means Clustering

Akash Patel
3 min readJun 17, 2021

K-Means Clustering is an Unsupervised Machine Learning Algorithm, Which is used for the Classification Problem.

Content

  1. Definition
  2. Working of K-Means
  3. Elbow Method
  4. Assumptions in K-Means
  5. Advantages of K-Means
  6. Disadvantage of K-Means
  7. Application of K-Means
  8. References

Definition :-

K-Means separate out the labeled data into different groups ( also known as Clusters), on the basis of similar features and common patterns.

K-Means Clustering Algorithm

It is an Iterative Algorithm, which divides the whole dataset into K number of Clusters or Subgroup based on similarity and their mean distance from the centroid of that particular cluster formed.

Working of K-Means Algorithm :-

Following are the steps which explains the working of the K-Means :-

Step 1 : Using the Elbow Method, calculate the optimal value of K to choose the number of clusters.

Step 2 : Randomly initialize the K points ( or say Centroids ) on the datasets.

Step 3 : All data points should be assigned to their closest centroid.

Step 4 : Calculate mean value and place a new centroid to each cluster.

Step-5 : Repeat Step 3 and Step 4, till no further reassignment occurs.

Step 6 : Following are few criteria based on which we should stop K-Means Algorithm :-

a . If newly formed Centroid does not change

b . If points remain in the same cluster

c . If the Maximum number of iterations are completed

Working of K-Means Algorithm

Elbow Method :-

One of the most important steps in K-Means unsupervised Machine Learning Algorithm is to determine the optimal value of K and we can do so by using the elbow method.

Suppose we run the K ( number of clusters ) for 10 iterations. For each value of K, we will be calculating WCSS, where WCSS stands for Within-Cluster Sum of Square. WCSS is the sum of the squared distance between each data point and the centroid in a cluster.

Within Cluster Sum of Square Equation

It will look like elbow, when we will plot the WCSS with the value of K.

Elbow Method

After analyzing the graph we notice that the graph rapidly changes at a point and thus creating an elbow shape. The K-value at this point is considered as the optimal value of K ( optimal number of clusters ).

Assumptions in K-Means :-

Following are the few assumptions of K-Means Algorithm :-

  1. K-Means assumes that clusters are spherical.
  2. The prior probability for all K clusters are the same which means all clusters have approximately the same number of observations. In simple word we can say that it assumes that clusters are of similar size.

Advantages :-

Following are the few advantages of K-Means :-

  1. K-Means Algorithm is simple to implement.
  2. It is scalable to large datasets.
  3. It can easily adapt to new examples.
  4. Generalizes to the clusters of different shapes and sizes, such as elliptical clusters.

Disadvantages :-

Following are the few disadvantages of the K-Means :-

  1. K-Means are sensitive to outliers.
  2. Manually choosing the optimal value of K.
  3. With the increase in the dimensionality, scalability decreases.
  4. It does not perform well with clusters of Different size and Density.

Application :-

Following are few application of K-Means :-

  • Recommendation System
  • Customer Segmentation
  • Crime Hot-Spot detection
  • Optical Character Recognition

References :-

  • Wikipedia
  • Javapoint tutorials
  • Few Other Sources

--

--