K-Nearest Neighbor(KNN)
K-NN is a supervised Machine Learning algorithm and can be used for both classification as well as for regression problems.
Content
- Definition
- Working of K-NN
- Distance Metrics in K-NN
- Advantages of K-NN
- Disadvantage of K-NN
- Application of K-NN
- References
Definition :-
K-NN stands for K-Nearest Neighbors is one of the simplest supervised Machine Learning algorithms and it is mostly used for the classification.
K-NN is also called Lazy Learning Algorithm ( instance based learning ) because it does not learn from the training set immediately instead it stores the dataset and uses all the data for training while classification.
K-NN is a Non-parametric Learning Algorithm which means it doesn’t assume anything about the underlying data.
Working Of K-NN
K-Nearest Neighbors uses the features with similarity to predict the value of new data points. In other words we can say that, it classifies a new data point based on how its neighbors are classified.
In K-NN we are required to choose the value of K which can be any integer but mostly it is chosen as an odd number integer to avoid any confusion between two classes of data.
After choosing the value of K we calculate the K number of neighbors who are nearest to the test data using the Euclidean Distance, Manhattan Distance or Hamming Distance. Most of the time we use Euclidean Distance.
Now test data will be assigned to that category of class which is in majority.
Distance Metrics in K-NN :-
Following are two major distance metric used in K-NN :-
- Euclidean Distance :- It is used to represent the shortest distance between the two points.
2. Manhattan Distance :- It is used to represent the sum of absolute differences between points across all the dimensions.
Advantages:-
Following are the few major advantages of K-NN :-
- K-NN is robust to noisy training data.
- K-NN is a simple, easy to interpret and understand Algorithm.
- There is no assumption about data in K-NN, so it is very useful for non-linear data.
- It has no training step because it does not explicitly build any model. New data are classified to the majority class based on the nearest neighbor.
- Since K-NN does not require training before making predictions, new data can be added seamlessly.
Disadvantages :-
Following are the few major disadvantages of K-NN :-
- K-NN has no capability to deal with missing value.
- Main problem with this algorithm is to choose the optimal value of K.
- K-NN is a slow algorithm because as the size of the dataset will increases its speed will decline.
- As the number of variables grows K-NN finds it difficult to predict the output of new data points.
- K-NN is very sensitive to outliers and it also does not perform well on imbalanced data.
Application of K-NN Algorithm :-
Following are few application of K-NN :-
- Used for pattern recognition.
- Used in Finance as well as in Agricultural Fields.
- Used for Facial Recognition, Fingerprint detection.
- Used for gene expression, protein-protein predictions.
References :-
- Wikipedia
- TutorialsPoint
- Few Other sources