kNN and Decision Tree

Lab Manual: k-Nearest Neighbors (KNN) and Decision Tree Machine Learning Tools

Objective

The objective of this lab is to understand the working principles of K-Nearest Neighbors (KNN) and Decision Tree algorithms, implement them using Python, and evaluate their performance on a dataset.

Prerequisites

Basic knowledge of Python programming
Familiarity with NumPy, Pandas, Matplotlib, and Scikit-learn
Understanding of machine learning concepts like classification and regression

1. K-Nearest Neighbors (KNN)

Introduction

KNN is a simple, non-parametric, and lazy learning algorithm used for classification and regression. It classifies a data point based on the majority class of its k nearest neighbors.

Steps to Implement KNN

Load and preprocess the dataset.
Split the dataset into training and testing sets.
Normalize the feature variables.
Choose an appropriate value of k.
Compute the distance between test points and training points.
Predict the class based on the majority of k neighbors.
Evaluate the model using performance metrics.

Implementation

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler

from sklearn.neighbors import KNeighborsClassifier

from sklearn.metrics import accuracy_score, classification_report

# Load dataset

data = pd.read_csv('dataset.csv')

# Preprocess data

X = data.iloc[:, :-1].values

y = data.iloc[:, -1].values

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Normalize data

scaler = StandardScaler()

X_train = scaler.fit_transform(X_train)

X_test = scaler.transform(X_test)

# Train KNN model

k = 5

knn = KNeighborsClassifier(n_neighbors=k)

knn.fit(X_train, y_train)

# Predict and evaluate

y_pred = knn.predict(X_test)

print("Accuracy:", accuracy_score(y_test, y_pred))

print(classification_report(y_test, y_pred))

2. Decision Tree

Introduction

A Decision Tree is a supervised learning algorithm used for classification and regression tasks. It recursively splits the dataset based on feature conditions to form a tree-like structure for decision-making.

Steps to Implement Decision Tree

Load and preprocess the dataset.
Split the dataset into training and testing sets.
Train the Decision Tree model using the training set.
Make predictions on the test set.
Evaluate the model using performance metrics.

Implementation

from sklearn.tree import DecisionTreeClassifier

# Train Decision Tree model

dt = DecisionTreeClassifier(criterion='gini', max_depth=5, random_state=42)

dt.fit(X_train, y_train)

# Predict and evaluate

y_pred_dt = dt.predict(X_test)

print("Accuracy:", accuracy_score(y_test, y_pred_dt))

print(classification_report(y_test, y_pred_dt))

Comparison of KNN and Decision Tree

+----------------------------+--------+-----------------+

| Metric | KNN | Decision Tree |

+----------------------------+--------+-----------------+

| Complexity | Simple | Can be complex |

| Training Time | Slow | Fast |

| Prediction Time | Fast | Slow |

| Interpretability | Low | High |

| Performance on Large Data | Poor | Good |

+----------------------------+--------+-----------------+

Conclusion

In this lab, we implemented KNN and Decision Tree algorithms, analyzed their performance, and understood their key differences. While KNN is a simple and intuitive approach, Decision Trees provide a more structured and interpretable method for classification tasks. The choice of algorithm depends on the dataset characteristics and performance requirements.

Exercises

Experiment with different values of k in KNN and analyze the impact on accuracy.
Compare KNN and Decision Tree performance on a different dataset.
Visualize the decision boundary for both models using a 2D dataset.

References

Scikit-learn Documentation: https://scikit-learn.org/stable/
Python Data Science Handbook by Jake VanderPlas

Page updated

Google Sites

Report abuse