This lab manual is designed to provide hands-on experience with data science concepts, including data collection, preprocessing, analysis, visualization, and machine learning. The labs are structured to help students develop practical skills in Python using libraries like Pandas, NumPy, Matplotlib, and SciPy.
Learn how to collect and preprocess data
Understand structured and unstructured data
Create and manipulate arrays using Python.
Load structured data into Pandas DataFrames.
Clean data by handling missing values and incorrect entries.
Python Code:
import pandas as pd
import numpy as np
data = {'col1': [1, 2, 3, 4, 7], 'col2': [4, 5, np.nan, 9, 5], 'col3': [7, 8, 12, 1, 11]}
df = pd.DataFrame(data)
df.fillna(df.mean(), inplace=True)
print(df)
Learn how to analyze and visualize data
Use descriptive statistics and correlation analysis
Tasks:
Compute basic statistics like mean, median, and standard deviation.
Visualize data using Matplotlib.
Python Code:
import matplotlib.pyplot as plt
x = [80, 85, 90, 95, 100]
y = [240, 250, 260, 270, 280]
plt.plot(x, y, marker='o')
plt.xlabel('Average Pulse')
plt.ylabel('Calorie Burnage')
plt.title('Pulse vs Calorie Burnage')
plt.show()
Implement linear regression using Python
Understand the relationship between variables
Tasks:
Calculate the slope and intercept for a linear function.
Use NumPy to perform linear regression.
Python Code:
import numpy as np
x = np.array([80, 85, 90, 95, 100])
y = np.array([240, 250, 260, 270, 280])
slope, intercept = np.polyfit(x, y, 1)
print(f'Slope: {slope}, Intercept: {intercept}')
Clean and normalize data
Convert categorical data to numerical values
Tasks:
Convert object types to numerical data.
Normalize data using Min-Max scaling.
Python Code:
df['col1'] = df['col1'].astype(float)
df['col2'] = df['col2'] / df['col2'].max()
print(df)
Learn to distinguish correlation from causality
Use correlation matrices to find relationships
Tasks:
Compute correlation coefficients between variables.
Visualize correlations using a scatter plot.
Python Code:
import seaborn as sns
corr_matrix = df.corr()
sns.heatmap(corr_matrix, annot=True)
plt.show()
This lab manual provides structured exercises to help students understand and apply data science concepts. Through these labs, students will gain proficiency in data handling, analysis, visualization, and predictive modeling.