Implementing k-Nearest Neighbor in OpenCV Python
Machine learning algorithms are becoming increasingly popular in today’s tech-savvy world. With the advancement of machine learning, there are many algorithms in use that help developers analyze and interpret data like never before. One of these algorithms is k-Nearest Neighbor.
K-Nearest Neighbor is one of the simplest classification algorithms used in machine learning. It is a non-parametric algorithm used for classification and regression purposes. In this article, we will look at how to implement k-Nearest Neighbor in OpenCV Python.
What is k-Nearest Neighbor?
K-Nearest Neighbor (k-NN) is a type of supervised machine-learning algorithm used for classification and regression purposes. It is a non-parametric algorithm, which means there is no assumption made about the underlying data. The basic idea behind k-NN is that similar things are near to each other. Hence, it uses the distance metric to determine the similarity between points. k-NN algorithm is quite simple and easy to implement.
The Data Set
Before we move ahead with the implementation of k-NN, let’s first understand the data set we’ll be using. Here, we’ll be using the famous Iris flower data set. It contains 3 classes of 50 instances each.
Further, the data has been divided into two parts – training data and testing data. We use the training data to train our model and the testing data to evaluate the accuracy of our model.
Steps to Implement k-NN in OpenCV Python
Step 1: Importing Libraries
To begin with, we’ll need to import the required libraries in Python. Let’s start by importing the OpenCV library:
import cv2
Now, let’s import other necessary libraries:
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
- numpy – used for handling arrays and mathematical operations on those arrays.
- pandas – used for handling data frames.
- sklearn.datasets.load_iris – used to load the iris data set.
- sklearn.model_selection.train_test_split – used for splitting the data set into training and testing parts.
- sklearn.neighbors.KNeighborsClassifier – used to implement the k-NN algorithm.
- sklearn.metrics.accuracy_score – used to evaluate the accuracy of the model.
Step 2: Loading the Data
Next, let’s load the data set. As mentioned earlier, we’ll be using the iris data set. We can load this data set using the load_iris() function provided by sklearn.datasets. Here’s the code:
iris_dataset = load_iris()
X = iris_dataset.data
y = iris_dataset.target
In the above code, we’ve loaded the iris data set and assigned the feature values to X and the label values (that is, output values) to y.
Step 3: Splitting the Data Set into Training and Testing Sets
We divide the data set into two: training data and testing data. We use the training data to train our model and check the accuracy using the testing data.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
In the above code, we’ve used the train_test_split function from sklearn.model_selection to split the data set into training and testing data. Here, the test size represents the proportion of the data set that should be used for testing our model. We’ve used 0.2 (20%) as our test size.
Step 4: Implementing the k-NN Algorithm
Now, we’ll train our k-Nearest Neighbor classifier using the training data set. Let’s initialise the k-NN algorithm by specifying the value of k (which is the number of neighbors to consider). For simplicity, we’ll set k to 3.
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)
Here, we’ve initialized the KNeighborsClassifier object from the scikit-learn library. Then we’ve used the fit() method to train the model on the training data.
Step 5: Testing the Model
After training our model, we’ll use the testing data set to test the accuracy of our model. We’ll use the accuracy_score() method from the scikit-learn library to measure the accuracy of our model.
y_pred = knn.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {round(accuracy*100,2)}%')
Here, we’ve used the predict() function to predict the output values for the X_test set. Then, we’ve used the accuracy_score() function to calculate the accuracy by comparing the predicted values with the actual values from the y_test set.
Step 6: Final Code
Here’s the final code to implement the k-NN algorithm in OpenCV Python:
import cv2
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
iris_dataset = load_iris()
X = iris_dataset.data
y = iris_dataset.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {round(accuracy*100,2)}%')
Conclusion
In this article, we’ve covered the implementation of k-NN algorithm in OpenCV Python. We’ve also looked at the steps involved in implementing the algorithm, which include importing libraries, loading the data set, splitting the data set into training and testing sets, implementing the k-NN algorithm, testing the model, and measuring the accuracy of the model. By following the steps mentioned above, you can easily implement the k-NN algorithm in your project using OpenCV Python.