Objective: To use the pandas library in Python to compute statistical measures such as mean, median, and standard deviation, and to visualize the data using a histogram.
Prerequisites:
Basic knowledge of Python
Installation of pandas and matplotlib libraries
Required Libraries:
import pandas as pd
import matplotlib.pyplot as plt
Dataset: The following are weight values (in pounds) for 20 people:
164, 158, 172, 153, 144, 156, 189, 163, 134, 159,
143, 176, 177, 162, 141, 151, 182, 185, 171, 152
Step 1: Create a Pandas DataFrame
# Create a pandas Series with given data
weights = pd.Series([164, 158, 172, 153, 144, 156, 189, 163, 134, 159,
143, 176, 177, 162, 141, 151, 182, 185, 171, 152])
Step 2: Compute Statistical Measures
# Calculate mean
mean_weight = weights.mean()
print(f"Mean: {mean_weight}")
# Calculate median
median_weight = weights.median()
print(f"Median: {median_weight}")
# Calculate standard deviation
std_dev_weight = weights.std()
print(f"Standard Deviation: {std_dev_weight}")
Step 3: Plot a Histogram
# Plot histogram
plt.hist(weights, bins=5, color='blue', edgecolor='black')
plt.xlabel('Weight (lbs)')
plt.ylabel('Frequency')
plt.title('Histogram of Weights')
plt.show()
Expected Output:
The computed values of mean, median, and standard deviation.
A histogram displaying the distribution of the given weights.
Conclusion: This lab demonstrates how pandas can be used to perform statistical analysis efficiently. The histogram visualization helps understand the data distribution.