0% found this document useful (0 votes)

62 views14 pages

CPL Practical 1

Uploaded by

Saniya Bonde

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views14 pages

CPL Practical 1

Uploaded by

Saniya Bonde

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 14

Practical Assignment 1

Name:
CRN No:
Course: 310302: Computational Programming Laboratory
Instructor: Prof.

Title: Computation of Statistical Estimators and Graphical Representation of Sample

Distribution

Date of Completion:

Assignment Objectives:

● To compute the estimators of statistical measures such as mean, variance, and standard
deviation.
● To calculate covariance, correlation, and standard error for a given dataset.
● To visualize the distribution of samples graphically.
● To understand the relationship between different statistical measures through
computation.

Problem Statement:
Compute Estimators of the main statistical measures like Mean, Variance, Standard Deviation,
Covariance, Correlation and Standard error with respect to any example. Display graphically the
distribution of samples.

Software and Hardware Requirements:

Software Requirements:

● Python (with libraries like NumPy, Pandas, Matplotlib)

● R (optional for statistical computation)
● Microsoft Excel (optional)

Hardware Requirements:

● A computer or laptop with at least 4GB RAM

● An internet connection for software installation and data resources (if needed)

Theory:

1. Mean
The mean, often referred to as the average, is one of the most fundamental concepts in
statistics. It represents the central value of a dataset and is calculated by summing all data
points and dividing by the number of points. The formula is:

n
1
Mean = n ∑ X i
i=1

Where:

● n is the number of observations.

● Xi is each data point.

The mean gives a general idea of where the center of the data is but may not always represent
the dataset well if there are extreme values (outliers).

2. Variance

Variance is a measure of the spread or dispersion within a dataset. It tells us how much the
individual data points deviate from the mean. A higher variance indicates that the data points
are more spread out, while a lower variance means they are closer to the mean. The formula
for variance differs slightly between population and sample datasets:

For Population Variance:

1 N
σ 2
= N ∑ ❑(Xi − μ)2
i=1

For Sample Variance:

1 N
s 2
= n−1 ∑ ❑(Xi − X )2
i=1

Where:
● N is the total number of data points in the population.
● n is the number of data points in the sample.
● Xi is each data point.
● μ is the population mean, and X is the sample mean.

Variance gives a sense of how widely spread the data is around the mean.

3. Standard Deviation

Standard deviation is a widely used measure of the dispersion or spread of a dataset. It tells
us how much the individual data points deviate from the mean, but unlike variance, it is
expressed in the same units as the data itself, making it easier to interpret. It is simply the
square root of the variance. The formula differs slightly for population and sample datasets:

For Population Standard Deviation:

σ = √❑

For Sample Standard Deviation:

s = √❑

Where:

● N is the total number of data points in the population.

● n is the number of data points in the sample.
● Xi is each data point.
● μ is the population mean, and X is the sample mean.

Standard deviation provides an intuitive measure of variability in a dataset. A low standard

deviation means the data points are close to the mean, while a high standard deviation
indicates that the data points are spread out over a wider range. It’s especially useful for
comparing the spread of different datasets.

4. Covariance
Covariance is a statistical measure that indicates the direction of the linear relationship
between two variables. It tells us whether the variables tend to increase or decrease together.
If the covariance is positive, it means that as one variable increases, the other tends to
increase as well. Conversely, a negative covariance indicates that as one variable increases,
the other tends to decrease. However, covariance does not provide information about the
strength of this relationship.

Formula for Covariance:

n
1
Cov(X,Y) = ∑ (X − X )(Y i−Y )
n i=1 i

Where:

● n is the number of data points.

● Xi and Yi are individual data points of variables X and Y.
● X and Y are the means of X and Y, respectively.

Covariance provides insight into the joint variability of two variables. If the covariance is
zero, it implies that there is no linear relationship between the variables. While covariance
indicates direction, its magnitude depends on the scale of the variables, which can make
interpretation difficult.

5. Correlation

Correlation is a statistical measure that expresses the strength and direction of a linear
relationship between two variables. Unlike covariance, which only shows direction,
correlation standardizes the relationship, allowing for easier comparison between different
datasets. The most common type of correlation is Pearson’s correlation coefficient, which
ranges from -1 to 1. A value of 1 indicates a perfect positive relationship, -1 indicates a
perfect negative relationship, and 0 means no linear relationship.

Formula for Pearson’s Correlation Coefficient:

Cov (X , Y )
r=
σ X σY

Where:

● Cov(X,Y) is the covariance between variables X and Y.

● σX and σY are the standard deviations of X and Y, respectively.

Correlation is unitless, making it easier to interpret than covariance. It provides both the
strength and direction of the relationship. A correlation close to 1 or -1 indicates a strong
relationship, while a value near 0 indicates a weak or no linear relationship.

6. Standard Error

The standard error (SE) measures the accuracy with which a sample mean represents the
population mean. It quantifies how much the sample mean is expected to vary from sample to
sample if you repeatedly draw random samples from the population. A smaller standard error
indicates that the sample mean is a more precise estimate of the true population mean. The
standard error decreases as the sample size increases.

Formula for Standard Error of the Mean:

s
SE = √❑

Where:

● s is the sample standard deviation.

● n is the number of observations in the sample.

Standard error is crucial in inferential statistics, as it helps in constructing confidence

intervals and conducting hypothesis tests. It gives an indication of how far the sample mean
is likely to be from the population mean. A larger sample size leads to a smaller standard
error, implying a more reliable estimate of the population parameter.
Implementation:

Example:

X 2 8 18 20 28 30

Y 5 12 18 23 45 50

Calculation of Mean:
2+ 8+18+20+28+ 30
Mean for X = = 17.667
6
5+12+18+23+ 45+50
Mean for Y = = 25.5
6

Calculation of Variance:
Variance for X = ¿ ¿

Variance for X(sX2) = 100.555555667

Variance for Y(sY2) = ¿ ¿

Variance for Y = 274.25

Calculation of Standard Deviation:

s = √❑
sX = 10.0277393099
sY = 16.5604951617

Calculation of Covariance:
Cov(X, Y) = (⅙) [(2 – 17.67)(5 – 25.5) + (8 – 17.67)(12 – 25.5) + (18 – 17.67)(18 – 25.5) +
(20 – 17.67)(23 – 25.5) + (28 – 17.67)(45 – 25.5) + (30 – 17.67)(50 – 25.5)]
Cov(X, Y) = 157.83

Calculation of Correlation:
ρ (X,Y) = Cov (X,Y) / sX.sY
ρ (X,Y) = 157.83 / 166.064328324
ρ (X,Y) = 0.95041482775

Calculation of Standard error:

s
Std Error =
√❑
10.0277393099
Std ErrorX = = 4.09380743048
√❑
16.5604951617
Std ErrorY = = 6.760793839
√❑

Code:
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import norm

x = [2,8,18,20,28,30]
y = [5,12,18,23,45,50]

mean_x = np.mean(x)
print(mean_x)
mean_y = np.mean(y)

#Line Graph
# plt.plot(x,y)
# plt.xlabel('X Axis')
# plt.ylabel('Y Axis')
# plt.title('First graph', fontdict={'color':'blue'})
# plt.show()
#Scatter Plot
# plt.scatter(x, y, color='blue')
# plt.axhline(mean_y, color='red', linestyle='--', label=f'Mean Y:
{mean_y}')
# plt.axvline(mean_x, color='green', linestyle='--', label=f'Mean X:
{mean_x}')
# plt.title('Scatter Plot of Sample Data')
# plt.xlabel('X values')
# plt.ylabel('Y values')
# plt.legend()
# plt.show()

#Line Plot
plt.plot(x, y, marker='o', linestyle='-', color='green')
plt.title('Line Plot of Sample Data')
plt.xlabel('X values')
plt.ylabel('Y values')
plt.show()

#Histogram
# plt.hist(x, bins=5, color='orange', edgecolor='black')
# plt.title('Histogram of Sample Data')
# plt.xlabel('X values')
# plt.ylabel('Frequency')
# plt.show()

# Box plot
plt.boxplot([x, y], labels=['X', 'Y'])
plt.title('Box Plot of Sample Data')
plt.ylabel('Values')
plt.show()

std_x = np.std(x)
std_y = np.std(y)

# Scatter plot with standard deviation bands

plt.scatter(x, y, color='blue')
plt.axhline(mean_y, color='red', linestyle='--', label=f'Mean Y: {mean_y}')
plt.axhline(mean_y + std_y, color='purple', linestyle=':', label=f'Y + 1 Std
Dev: {mean_y + std_y}')
plt.axhline(mean_y - std_y, color='purple', linestyle=':', label=f'Y - 1 Std
Dev: {mean_y - std_y}')
plt.title('Scatter Plot with Standard Deviation Bands')
plt.xlabel('X values')
plt.ylabel('Y values')
plt.legend()
plt.show()
#
----------------------------------------------------------------------------
--

# Create histogram of X
plt.hist(x, bins=5, edgecolor='black',density=True, alpha=0.6, color='g')

# Plot normal distribution

xmin, xmax = plt.xlim()
x_range = np.linspace(xmin, xmax, 100)
p = norm.pdf(x_range, mean_x, std_x)
plt.plot(x_range, p, 'k', linewidth=2)
plt.title('Histogram and Normal Distribution of X')
plt.xlabel('X values')
plt.ylabel('Density')
plt.show()

#
----------------------------------------------------------------------------
--

# Line plot with error bars using standard deviation

plt.errorbar(x, y, yerr=std_y, fmt='-o', color='blue', ecolor='orange',
capsize=5)
plt.title('Line Plot with Error Bars (Std Dev)')
plt.xlabel('X values')
plt.ylabel('Y values')
plt.show()

#
----------------------------------------------------------------------------
--

# Calculate covariance
cov_matrix = np.cov(x, y)
cov_xy = cov_matrix[0, 1] # Covariance between x and y

# Scatter plot
plt.scatter(x, y, color='blue')
plt.title(f'Scatter Plot with Covariance: {cov_xy:.2f}')
plt.xlabel('X values')
plt.ylabel('Y values')
plt.show()

# Print covariance value

print(f'Covariance between x and y: {cov_xy}')
#
----------------------------------------------------------------------------
--

# Calculate correlation
correlation_matrix = np.corrcoef(x, y)
corr_xy = correlation_matrix[0, 1] # Correlation between x and y

# Scatter plot
plt.scatter(x, y, color='green')
plt.title(f'Scatter Plot with Correlation: {corr_xy:.2f}')
plt.xlabel('X values')
plt.ylabel('Y values')
plt.show()

# Print correlation value

print(f'Correlation between x and y: {corr_xy}')

#
----------------------------------------------------------------------------
--

# Calculate standard error

def standard_error(data):
return np.std(data, ddof=1) / np.sqrt(len(data))

se_x = standard_error(x)
se_y = standard_error(y)

# Plot with error bars for SE

plt.errorbar(range(len(x)), x, yerr=se_x, fmt='o', label='X values (SE)',
color='blue', capsize=5)
plt.errorbar(range(len(y)), y, yerr=se_y, fmt='o', label='Y values (SE)',
color='green', capsize=5)

plt.title('Standard Error Visualization for X and Y')

plt.xlabel('Index')
plt.ylabel('Values')
plt.legend()
plt.show()

print(f'Standard Error for X: {se_x:.2f}')

print(f'Standard Error for Y: {se_y:.2f}')
Conclusion:
In this assignment, we computed key statistical measures like mean, variance, standard deviation,
covariance, and correlation, offering a deeper understanding of data distribution. Visualizing the
data through various plots highlighted relationships between variables and their dispersion.
These insights enhance our ability to interpret data trends, variability, and overall behavior
effectively.

Statistics: a QuickStudy Laminated Reference Guide
From Everand
Statistics: a QuickStudy Laminated Reference Guide
BarCharts Publishing, Inc.
No ratings yet
Statistic Part 2
No ratings yet
Statistic Part 2
22 pages
Ba Explaination
No ratings yet
Ba Explaination
5 pages
Business Mathematics & Statistics
No ratings yet
Business Mathematics & Statistics
31 pages
Data Analysis: Measures of Dispersion
No ratings yet
Data Analysis: Measures of Dispersion
6 pages
stastics for data science1 (quiz1 notes)
No ratings yet
stastics for data science1 (quiz1 notes)
2 pages
Session 3
No ratings yet
Session 3
61 pages
ADS-EXP1
No ratings yet
ADS-EXP1
4 pages
Statistics
No ratings yet
Statistics
152 pages
Session 1 On Descriptive Statistics
No ratings yet
Session 1 On Descriptive Statistics
24 pages
Data Analysis and Visualization EDA
No ratings yet
Data Analysis and Visualization EDA
51 pages
ML Lab Final R22
No ratings yet
ML Lab Final R22
67 pages
BDA 09 Shridhti Tiwari
No ratings yet
BDA 09 Shridhti Tiwari
12 pages
Class 1 - 20th August 2024 - Descriptive Statistic
No ratings yet
Class 1 - 20th August 2024 - Descriptive Statistic
6 pages
Statests
No ratings yet
Statests
20 pages
Mr Kinyera
No ratings yet
Mr Kinyera
6 pages
unit 5 brm
No ratings yet
unit 5 brm
17 pages
Statistics[1]
No ratings yet
Statistics[1]
152 pages
Two variables Chap3
No ratings yet
Two variables Chap3
47 pages
Measures of Variability For Ungrouped Data
100% (1)
Measures of Variability For Ungrouped Data
16 pages
APznzaZmf FjNZzQU2KZGNWcTIMyEPNieeXpEIC4txhLpx IW9aIcijwEdcvmrObIy4gDpcU78AYLsB6msaeqj47x3Fc6z9vdKhe5EnyMTtReSpFg 23R3DG W66DWWysqOW PfB BJrKuEN CsrKXdSrdM OKOdbGKa2ND0ltkJXrievcwimUpSlHEYiQCPleUm8zmyjmaz7 PPZRnRfUuizv
No ratings yet
APznzaZmf FjNZzQU2KZGNWcTIMyEPNieeXpEIC4txhLpx IW9aIcijwEdcvmrObIy4gDpcU78AYLsB6msaeqj47x3Fc6z9vdKhe5EnyMTtReSpFg 23R3DG W66DWWysqOW PfB BJrKuEN CsrKXdSrdM OKOdbGKa2ND0ltkJXrievcwimUpSlHEYiQCPleUm8zmyjmaz7 PPZRnRfUuizv
24 pages
Business Analytics
No ratings yet
Business Analytics
44 pages
Screenshot 2024-12-15 at 8.15.38 PM
No ratings yet
Screenshot 2024-12-15 at 8.15.38 PM
138 pages
Unit 1 Computational Statistics
No ratings yet
Unit 1 Computational Statistics
58 pages
HNS 2321 BIOSTATISTICS LECTURE 3 AND 4 DESCRITIVE STATISTICS
No ratings yet
HNS 2321 BIOSTATISTICS LECTURE 3 AND 4 DESCRITIVE STATISTICS
36 pages
Report Stats PDF
No ratings yet
Report Stats PDF
23 pages
DA Practical Lab 02 Statistical Functions
No ratings yet
DA Practical Lab 02 Statistical Functions
6 pages
Week 01 Introduction
No ratings yet
Week 01 Introduction
33 pages
ADS EXP 1
No ratings yet
ADS EXP 1
13 pages
Practice Problems and Some Formulae
No ratings yet
Practice Problems and Some Formulae
2 pages
Agrc 212 Lecture Three - June 2023 Covered
No ratings yet
Agrc 212 Lecture Three - June 2023 Covered
32 pages
Lecture Note On PCA1
No ratings yet
Lecture Note On PCA1
26 pages
Measures of Dispersion and Relative Standing
No ratings yet
Measures of Dispersion and Relative Standing
11 pages
DS Chapter - 2
No ratings yet
DS Chapter - 2
73 pages
SLIDES - Statistics-Descriptive Statistics
No ratings yet
SLIDES - Statistics-Descriptive Statistics
25 pages
Chapter Four: Measures of Variation
No ratings yet
Chapter Four: Measures of Variation
26 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
15 pages
Actuary_Math.Stat._Lec1-9
No ratings yet
Actuary_Math.Stat._Lec1-9
22 pages
Face Recognition Using PCA
No ratings yet
Face Recognition Using PCA
23 pages
Pca Tutorial
No ratings yet
Pca Tutorial
27 pages
Statistics_Compendium_DMS IIT DELHI_2025
No ratings yet
Statistics_Compendium_DMS IIT DELHI_2025
18 pages
Basic Statistics For Data Science
100% (1)
Basic Statistics For Data Science
45 pages
Statistical Treatment
No ratings yet
Statistical Treatment
7 pages
Difference Between (Median, Mean, Mode, Range, Midrange) (Descriptive Statistics)
No ratings yet
Difference Between (Median, Mean, Mode, Range, Midrange) (Descriptive Statistics)
11 pages
Document
No ratings yet
Document
23 pages
Mathematics Statistics
No ratings yet
Mathematics Statistics
4 pages
Chapter Four
No ratings yet
Chapter Four
21 pages
Chapter Four
No ratings yet
Chapter Four
21 pages
Biostat Ch-5
No ratings yet
Biostat Ch-5
58 pages
Chemometrics
No ratings yet
Chemometrics
201 pages
Notes Stats Quiz 2
No ratings yet
Notes Stats Quiz 2
10 pages
Maths
No ratings yet
Maths
30 pages
Introduction To Statistics (4485) : Semester: Spring, 2023
No ratings yet
Introduction To Statistics (4485) : Semester: Spring, 2023
26 pages
Methods of Center Measurement: X N X X X
No ratings yet
Methods of Center Measurement: X N X X X
85 pages
Data Analysis Guide
No ratings yet
Data Analysis Guide
4 pages
Measures of Variation, Quartiles and Percentiles, Skewness and Kurtosis
No ratings yet
Measures of Variation, Quartiles and Percentiles, Skewness and Kurtosis
16 pages
Statistics Learners' Working Manual
No ratings yet
Statistics Learners' Working Manual
25 pages
CH 00
No ratings yet
CH 00
4 pages
DV Stat
No ratings yet
DV Stat
39 pages
HASTS215_HSTS215 NOTES Chapter1_2
No ratings yet
HASTS215_HSTS215 NOTES Chapter1_2
24 pages
Mathematics: Quarter 4 - Module 6
100% (2)
Mathematics: Quarter 4 - Module 6
21 pages
Akashfinal
No ratings yet
Akashfinal
15 pages
03 Mcqs Stat Mod-III
100% (1)
03 Mcqs Stat Mod-III
8 pages
Applied Statistics - MBA
No ratings yet
Applied Statistics - MBA
62 pages
Measures of Dispersion
No ratings yet
Measures of Dispersion
10 pages
Measures of Central Tendency
No ratings yet
Measures of Central Tendency
14 pages
OCR S1 Revision Sheets
No ratings yet
OCR S1 Revision Sheets
12 pages
Sym 506 2nd Work
No ratings yet
Sym 506 2nd Work
5 pages
ASTM E2587.cartas Control
No ratings yet
ASTM E2587.cartas Control
29 pages
Mathematics in The Modern World: Group 1 P H 1 Y 1 - 1
No ratings yet
Mathematics in The Modern World: Group 1 P H 1 Y 1 - 1
20 pages
Find Mean and Variance of T Distribution
No ratings yet
Find Mean and Variance of T Distribution
9 pages
Multiple Regression Analysis
No ratings yet
Multiple Regression Analysis
61 pages
Unit 16
No ratings yet
Unit 16
24 pages
Week 1 Lab - Coursera
No ratings yet
Week 1 Lab - Coursera
4 pages
Guidelines Meeting - Minutes - NEP (2023-24) - B. COM. (PROG) SEM III DSC3.1 - Business Statistics
No ratings yet
Guidelines Meeting - Minutes - NEP (2023-24) - B. COM. (PROG) SEM III DSC3.1 - Business Statistics
5 pages
Kel388 XLS Eng
No ratings yet
Kel388 XLS Eng
9 pages
6.03.P Spread of Data
No ratings yet
6.03.P Spread of Data
6 pages
The Correct Answers Are Highlighted in Green
No ratings yet
The Correct Answers Are Highlighted in Green
11 pages
STAT 206 - Chapter 7 (Sampling Distributions)
No ratings yet
STAT 206 - Chapter 7 (Sampling Distributions)
32 pages
Statistics All Formulas by Pranav Popat
No ratings yet
Statistics All Formulas by Pranav Popat
101 pages
Measures of Variability
No ratings yet
Measures of Variability
6 pages
Sop Table
No ratings yet
Sop Table
3 pages
Chapter 3 Measure of Central Tendencies
No ratings yet
Chapter 3 Measure of Central Tendencies
24 pages
Math 115 - Summary Packet
No ratings yet
Math 115 - Summary Packet
10 pages
StatisticsandProbability Week3
No ratings yet
StatisticsandProbability Week3
26 pages
Measures of Central Tendency
No ratings yet
Measures of Central Tendency
12 pages
WST01_01_que_Jan20215213-Copy
No ratings yet
WST01_01_que_Jan20215213-Copy
20 pages
GRR & AAA
No ratings yet
GRR & AAA
9 pages
Quantiles or Fractiles - Grouped Data
No ratings yet
Quantiles or Fractiles - Grouped Data
4 pages
RM PPT (1)
No ratings yet
RM PPT (1)
18 pages

CPL Practical 1

Uploaded by

CPL Practical 1

Uploaded by

Practical Assignment 1

Title: Computation of Statistical Estimators and Graphical Representation of Sample

Software and Hardware Requirements:

● Python (with libraries like NumPy, Pandas, Matplotlib)

● A computer or laptop with at least 4GB RAM

● n is the number of observations.

For Population Variance:

For Sample Variance:

For Population Standard Deviation:

For Sample Standard Deviation:

● N is the total number of data points in the population.

Standard deviation provides an intuitive measure of variability in a dataset. A low standard

Formula for Covariance:

● n is the number of data points.

Formula for Pearson’s Correlation Coefficient:

● Cov(X,Y) is the covariance between variables X and Y.

Formula for Standard Error of the Mean:

● s is the sample standard deviation.

Standard error is crucial in inferential statistics, as it helps in constructing confidence

Variance for X(sX2) = 100.555555667

Variance for Y(sY2) = ¿ ¿

Variance for Y = 274.25

Calculation of Standard Deviation:

Calculation of Standard error:

# Scatter plot with standard deviation bands

# Plot normal distribution

# Line plot with error bars using standard deviation

# Print covariance value

# Print correlation value

# Calculate standard error

# Plot with error bars for SE

plt.title('Standard Error Visualization for X and Y')

print(f'Standard Error for X: {se_x:.2f}')

You might also like