The objective of this lab is to understand the working principles of K-Nearest Neighbors (KNN) and Decision Tree algorithms, implement them using Python, and evaluate their performance on a dataset.
Basic knowledge of Python programming
Familiarity with NumPy, Pandas, Matplotlib, and Scikit-learn
Understanding of machine learning concepts like classification and regression
KNN is a simple, non-parametric, and lazy learning algorithm used for classification and regression. It classifies a data point based on the majority class of its k nearest neighbors.
Load and preprocess the dataset.
Split the dataset into training and testing sets.
Normalize the feature variables.
Choose an appropriate value of k.
Compute the distance between test points and training points.
Predict the class based on the majority of k neighbors.
Evaluate the model using performance metrics.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, classification_report
# Load dataset
data = pd.read_csv('dataset.csv')
# Preprocess data
X = data.iloc[:, :-1].values
y = data.iloc[:, -1].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Normalize data
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# Train KNN model
k = 5
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(X_train, y_train)
# Predict and evaluate
y_pred = knn.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))
A Decision Tree is a supervised learning algorithm used for classification and regression tasks. It recursively splits the dataset based on feature conditions to form a tree-like structure for decision-making.
Load and preprocess the dataset.
Split the dataset into training and testing sets.
Train the Decision Tree model using the training set.
Make predictions on the test set.
Evaluate the model using performance metrics.
from sklearn.tree import DecisionTreeClassifier
# Train Decision Tree model
dt = DecisionTreeClassifier(criterion='gini', max_depth=5, random_state=42)
dt.fit(X_train, y_train)
# Predict and evaluate
y_pred_dt = dt.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred_dt))
print(classification_report(y_test, y_pred_dt))
+----------------------------+--------+-----------------+
| Metric | KNN | Decision Tree |
+----------------------------+--------+-----------------+
| Complexity | Simple | Can be complex |
| Training Time | Slow | Fast |
| Prediction Time | Fast | Slow |
| Interpretability | Low | High |
| Performance on Large Data | Poor | Good |
+----------------------------+--------+-----------------+
In this lab, we implemented KNN and Decision Tree algorithms, analyzed their performance, and understood their key differences. While KNN is a simple and intuitive approach, Decision Trees provide a more structured and interpretable method for classification tasks. The choice of algorithm depends on the dataset characteristics and performance requirements.
Experiment with different values of k in KNN and analyze the impact on accuracy.
Compare KNN and Decision Tree performance on a different dataset.
Visualize the decision boundary for both models using a 2D dataset.
Scikit-learn Documentation: https://scikit-learn.org/stable/
Python Data Science Handbook by Jake VanderPlas