CPL Practical 1
CPL Practical 1
Name:
CRN No:
Course: 310302: Computational Programming Laboratory
Instructor: Prof.
Date of Completion:
Assignment Objectives:
● To compute the estimators of statistical measures such as mean, variance, and standard
deviation.
● To calculate covariance, correlation, and standard error for a given dataset.
● To visualize the distribution of samples graphically.
● To understand the relationship between different statistical measures through
computation.
Problem Statement:
Compute Estimators of the main statistical measures like Mean, Variance, Standard Deviation,
Covariance, Correlation and Standard error with respect to any example. Display graphically the
distribution of samples.
Software Requirements:
Hardware Requirements:
Theory:
1. Mean
The mean, often referred to as the average, is one of the most fundamental concepts in
statistics. It represents the central value of a dataset and is calculated by summing all data
points and dividing by the number of points. The formula is:
n
1
Mean = n ∑ X i
i=1
Where:
The mean gives a general idea of where the center of the data is but may not always represent
the dataset well if there are extreme values (outliers).
2. Variance
Variance is a measure of the spread or dispersion within a dataset. It tells us how much the
individual data points deviate from the mean. A higher variance indicates that the data points
are more spread out, while a lower variance means they are closer to the mean. The formula
for variance differs slightly between population and sample datasets:
1 N
σ 2
= N ∑ ❑(Xi − μ)2
i=1
1 N
s 2
= n−1 ∑ ❑(Xi − X )2
i=1
Where:
● N is the total number of data points in the population.
● n is the number of data points in the sample.
● Xi is each data point.
● μ is the population mean, and X is the sample mean.
Variance gives a sense of how widely spread the data is around the mean.
3. Standard Deviation
Standard deviation is a widely used measure of the dispersion or spread of a dataset. It tells
us how much the individual data points deviate from the mean, but unlike variance, it is
expressed in the same units as the data itself, making it easier to interpret. It is simply the
square root of the variance. The formula differs slightly for population and sample datasets:
σ = √❑
s = √❑
Where:
4. Covariance
Covariance is a statistical measure that indicates the direction of the linear relationship
between two variables. It tells us whether the variables tend to increase or decrease together.
If the covariance is positive, it means that as one variable increases, the other tends to
increase as well. Conversely, a negative covariance indicates that as one variable increases,
the other tends to decrease. However, covariance does not provide information about the
strength of this relationship.
n
1
Cov(X,Y) = ∑ (X − X )(Y i−Y )
n i=1 i
Where:
Covariance provides insight into the joint variability of two variables. If the covariance is
zero, it implies that there is no linear relationship between the variables. While covariance
indicates direction, its magnitude depends on the scale of the variables, which can make
interpretation difficult.
5. Correlation
Correlation is a statistical measure that expresses the strength and direction of a linear
relationship between two variables. Unlike covariance, which only shows direction,
correlation standardizes the relationship, allowing for easier comparison between different
datasets. The most common type of correlation is Pearson’s correlation coefficient, which
ranges from -1 to 1. A value of 1 indicates a perfect positive relationship, -1 indicates a
perfect negative relationship, and 0 means no linear relationship.
Where:
Correlation is unitless, making it easier to interpret than covariance. It provides both the
strength and direction of the relationship. A correlation close to 1 or -1 indicates a strong
relationship, while a value near 0 indicates a weak or no linear relationship.
6. Standard Error
The standard error (SE) measures the accuracy with which a sample mean represents the
population mean. It quantifies how much the sample mean is expected to vary from sample to
sample if you repeatedly draw random samples from the population. A smaller standard error
indicates that the sample mean is a more precise estimate of the true population mean. The
standard error decreases as the sample size increases.
s
SE = √❑
Where:
Example:
X 2 8 18 20 28 30
Y 5 12 18 23 45 50
Calculation of Mean:
2+ 8+18+20+28+ 30
Mean for X = = 17.667
6
5+12+18+23+ 45+50
Mean for Y = = 25.5
6
Calculation of Variance:
Variance for X = ¿ ¿
Calculation of Covariance:
Cov(X, Y) = (⅙) [(2 – 17.67)(5 – 25.5) + (8 – 17.67)(12 – 25.5) + (18 – 17.67)(18 – 25.5) +
(20 – 17.67)(23 – 25.5) + (28 – 17.67)(45 – 25.5) + (30 – 17.67)(50 – 25.5)]
Cov(X, Y) = 157.83
Calculation of Correlation:
ρ (X,Y) = Cov (X,Y) / sX.sY
ρ (X,Y) = 157.83 / 166.064328324
ρ (X,Y) = 0.95041482775
Code:
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import norm
x = [2,8,18,20,28,30]
y = [5,12,18,23,45,50]
mean_x = np.mean(x)
print(mean_x)
mean_y = np.mean(y)
#Line Graph
# plt.plot(x,y)
# plt.xlabel('X Axis')
# plt.ylabel('Y Axis')
# plt.title('First graph', fontdict={'color':'blue'})
# plt.show()
#Scatter Plot
# plt.scatter(x, y, color='blue')
# plt.axhline(mean_y, color='red', linestyle='--', label=f'Mean Y:
{mean_y}')
# plt.axvline(mean_x, color='green', linestyle='--', label=f'Mean X:
{mean_x}')
# plt.title('Scatter Plot of Sample Data')
# plt.xlabel('X values')
# plt.ylabel('Y values')
# plt.legend()
# plt.show()
#Line Plot
plt.plot(x, y, marker='o', linestyle='-', color='green')
plt.title('Line Plot of Sample Data')
plt.xlabel('X values')
plt.ylabel('Y values')
plt.show()
#Histogram
# plt.hist(x, bins=5, color='orange', edgecolor='black')
# plt.title('Histogram of Sample Data')
# plt.xlabel('X values')
# plt.ylabel('Frequency')
# plt.show()
# Box plot
plt.boxplot([x, y], labels=['X', 'Y'])
plt.title('Box Plot of Sample Data')
plt.ylabel('Values')
plt.show()
std_x = np.std(x)
std_y = np.std(y)
# Create histogram of X
plt.hist(x, bins=5, edgecolor='black',density=True, alpha=0.6, color='g')
#
----------------------------------------------------------------------------
--
#
----------------------------------------------------------------------------
--
# Calculate covariance
cov_matrix = np.cov(x, y)
cov_xy = cov_matrix[0, 1] # Covariance between x and y
# Scatter plot
plt.scatter(x, y, color='blue')
plt.title(f'Scatter Plot with Covariance: {cov_xy:.2f}')
plt.xlabel('X values')
plt.ylabel('Y values')
plt.show()
# Calculate correlation
correlation_matrix = np.corrcoef(x, y)
corr_xy = correlation_matrix[0, 1] # Correlation between x and y
# Scatter plot
plt.scatter(x, y, color='green')
plt.title(f'Scatter Plot with Correlation: {corr_xy:.2f}')
plt.xlabel('X values')
plt.ylabel('Y values')
plt.show()
#
----------------------------------------------------------------------------
--
se_x = standard_error(x)
se_y = standard_error(y)