Naive Baysian

Lab Manual: Predicting Beach Visit using Naïve Bayes Classifier

Objective:

To apply the Naïve Bayes classifier to predict if given conditions are suitable for visiting the beach based on past data.

Theory:

Naïve Bayes is a probabilistic classifier based on Bayes' theorem, assuming independence among features. The theorem states:

P(Y|X) = P(X|Y) P(Y)/P(X)

where:

(P(Y|X)) is the posterior probability of class (Y) given feature (X).
(P(X|Y)) is the likelihood of (X) given class (Y).
(P(Y)) is the prior probability of class (Y).
(P(X)) is the prior probability of feature (X).

For prediction, we compute probabilities for both classes (Yes/No for Beach visit) and select the one with the highest probability.

Given Data:

The dataset includes attributes: Outlook, Temperature, Humidity with the target class Beach? (Yes/No).

Numerical Computation:

To classify:

(Cloudy, Mild, Normal)
(Sunny, Low, High)

We calculate probabilities for both conditions using the dataset.

Step 1: Compute Priors

P(Yes) = 4/10

P(No) = 6/10

Step 2: Compute Likelihoods

For each feature given Yes/No:

Example:

P(Outlook=Cloudy | Yes) = 1/4

P(Outlook=Cloudy | No) = 2/6

Following the same approach, compute for Temperature and Humidity.

Step 3: Compute Posterior

Using Bayes' theorem, compute: [ P(Yes | X) AND P(No | X) ] for given conditions and determine the class with the highest probability.

Python Implementation:

Below is the Python implementation to train and predict using Naïve Bayes.

import pandas as pd

from sklearn.naive_bayes import CategoricalNB

from sklearn.preprocessing import LabelEncoder

# Given dataset

data = {

'Outlook': ['Sunny', 'Sunny', 'Sunny', 'Sunny', 'Rain', 'Rain', 'Rain', 'Cloudy', 'Cloudy', 'Cloudy'],

'Temp': ['High', 'High', 'Low', 'Mild', 'Mild', 'High', 'Low', 'High', 'High', 'Mild'],

'Humidity': ['High', 'Normal', 'Normal', 'High', 'Normal', 'High', 'Normal', 'High', 'Normal', 'Normal'],

'Beach': ['Yes', 'Yes', 'No', 'Yes', 'No', 'No', 'No', 'No', 'Yes', 'No']

}

# Convert to DataFrame

df = pd.DataFrame(data)

# Encoding categorical variables

le = LabelEncoder()

for col in ['Outlook', 'Temp', 'Humidity', 'Beach']:

df[col] = le.fit_transform(df[col])

# Train Naïve Bayes model

X = df[['Outlook', 'Temp', 'Humidity']]

y = df['Beach']

model = CategoricalNB()

model.fit(X, y)

# Predict for given conditions

test_data = pd.DataFrame([[1, 2, 1], [0, 0, 0]], columns=['Outlook', 'Temp', 'Humidity'])

predictions = model.predict(test_data)

# Decode prediction

print(le.inverse_transform(['Beach'])[predictions])

Results & Conclusion:

The model predicts whether the given conditions are suitable for visiting the beach. The accuracy depends on the dataset and assumptions of Naïve Bayes.

Post Lab Tasks:

Compute probabilities manually for one case.
Modify the dataset and check how predictions change.
Compare Naïve Bayes with other classifiers.

Page updated

Google Sites

Report abuse