This lab manual is designed to provide hands-on experience with data science concepts, including data collection, preprocessing, analysis, visualization, and machine learning. The labs are structured to help students develop practical skills in Python using libraries like Pandas, NumPy, Matplotlib, and SciPy.
Learn how to collect and preprocess data
Understand structured and unstructured data
Create and manipulate arrays using Python.
Load structured data into Pandas DataFrames.
Clean data by handling missing values and incorrect entries.
import pandas as pd
import numpy as np
data = {'col1': [1, 2, 3, 4, 7], 'col2': [4, 5, np.nan, 9, 5], 'col3': [7, 8, 12, 1, 11]}
df = pd.DataFrame(data)
df.fillna(df.mean(), inplace=True)
print(df)
Learn how to analyze and visualize data
Use descriptive statistics and correlation analysis
Compute basic statistics like mean, median, and standard deviation.
Visualize data using Matplotlib.
import matplotlib.pyplot as plt
x = [80, 85, 90, 95, 100]
y = [240, 250, 260, 270, 280]
plt.plot(x, y, marker='o')
plt.xlabel('Average Pulse')
plt.ylabel('Calorie Burnage')
plt.title('Pulse vs Calorie Burnage')
plt.show()
Implement linear regression using Python
Understand the relationship between variables
Calculate the slope and intercept for a linear function.
Use NumPy to perform linear regression.
import numpy as np
x = np.array([80, 85, 90, 95, 100])
y = np.array([240, 250, 260, 270, 280])
slope, intercept = np.polyfit(x, y, 1)
print(f'Slope: {slope}, Intercept: {intercept}')
Clean and normalize data
Convert categorical data to numerical values
Convert object types to numerical data.
Normalize data using Min-Max scaling.
df['col1'] = df['col1'].astype(float)
df['col2'] = df['col2'] / df['col2'].max()
print(df)
Learn to distinguish correlation from causality
Use correlation matrices to find relationships
Compute correlation coefficients between variables.
Visualize correlations using a scatter plot.
import seaborn as sns
corr_matrix = df.corr()
sns.heatmap(corr_matrix, annot=True)
plt.show()
This lab manual provides structured exercises to help students understand and apply data science concepts. Through these labs, students will gain proficiency in data handling, analysis, visualization, and predictive modeling.