0% found this document useful (0 votes)
4 views

Statistics

Statistics is the mathematical study of data collection, analysis, interpretation, and presentation, utilizing measures such as mean, median, and mode. It is divided into descriptive statistics, which summarizes data, and inferential statistics, which makes predictions about a population based on a sample. Key concepts include measures of central tendency, variability, and the importance of understanding population and samples.

Uploaded by

yogitas804
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Statistics

Statistics is the mathematical study of data collection, analysis, interpretation, and presentation, utilizing measures such as mean, median, and mode. It is divided into descriptive statistics, which summarizes data, and inferential statistics, which makes predictions about a population based on a sample. Key concepts include measures of central tendency, variability, and the importance of understanding population and samples.

Uploaded by

yogitas804
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 21

Statistics

Definition of Statistics
Any raw Data, when collected and organized in the form of numerical or
tables, is known as Statistics. Statistics is also the mathematical study of
the probability of events occurring based on known quantitative Data or
a Collection of Data.
Statistics attempts to infer the properties of a large Collection of Data
from inspection of a sample of the Collection thereby allowing educated
guesses to be made with a minimum of expense. There are generally 3
kinds of averages commonly used in Statistics. They are: (i) Mean, (ii)
Median, and (iii) Mode.
Statistics is the study of Data Collection, Analysis, Interpretation,
Presentation, and organizing in a specific way. Mathematical methods
used for different analytics include mathematical Analysis, linear algebra,
stochastic Analysis, the theory of measure-theoretical probability, and
differential equations. Collecting, classifying, organizing, and displaying
numerical Data is associated with Statistics. This helps one to grasp
different outcomes from it and foresee several possibilities of various
events. Statistics discuss information, observations, and Data in the form
of numerical Data.
Types of Statistics
There are two kinds of Statistics, which are descriptive Statistics and
inferential Statistics. In descriptive Statistics, the Data or Collection Data
are described in a summarized way, whereas in inferential Statistics, we
make use of it in order to explain the descriptive kind. Both of them are
used on a large scale. Also, there is another kind of Statistics where
descriptive transitions into inferential Statistics.
Statistics is mainly divided into the following two categories.
1. Descriptive Statistics
2. Inferential Statistics
Descriptive Statistics
In the descriptive Statistics, the Data is described in a summarized way.
The summarization is done from the sample of the population using
different parameters like Mean or standard deviation. Descriptive
Statistics are a way of using charts, graphs, and summary measures to
organize, represent, and explain a set of Data.
 Data is typically arranged and displayed in tables or graphs
summarizing details such as histograms, pie charts, bars or scatter
plots.
 Descriptive Statistics are just descriptive and thus do not require
normalization beyond the Data collected.
A) Types of Measures
Descriptive statistics are classified into two types:

1. Measure of central tendency


The central tendency measure is a single value that seeks to describe
the entire set of data. The three main characteristics of central tendency
are as follows:

a. Mean
It is calculated by dividing the total number of observations by the sum of
the observations. It can also be described as the sum divided by the
count.

b. Median
(n+1)/2
It is the data set's middle value. It divides the data into two halves. If the
number of items in the data set is odd, the center element is the median;
otherwise, the median is the average of two center elements.

c. Mode
It is the most often occurring value in the given data collection. If the
frequency of all data points is the same, the data set may not have a
mode. We can also have several modes if we meet two or more data
points with the same frequency.
2. Measure of variability
The spread of data, or how well our data is dispersed, is a measure of
variability. The most common measures of variability are:

a. Standard deviation
It is calculated by taking the square root of the variance. It is determined
by first determining the Mean, then subtracting each number from the
Mean, also known as the average, and squaring the result. Adding the
values, dividing by the number of words, and finally taking the square
root.

b. Range
The range represents the difference between the largest and smallest
data points in our data set. The range is proportional to the spread of
data, so the wider the range, the wider the spread of data, and vice
versa.
Range = Largest data value – smallest data value

c. Variance
It is defined as a squared deviation from the mean on average. It is
determined by squaring the difference between each data point and the
average, also known as the mean, adding all of them, and then dividing
by the number of data points in our data collection.
B) Population and Samples
The population is a grouping of all the elements or things you are
interested in statistics. Populations are frequently large, making them
unsuitable for data collection and analysis. That is why statisticians
typically attempt to draw conclusions about a population by selecting and
analyzing a representative subset of that group.
This subset of a population is referred to as a sample. Ideally, the
sample should preserve the population's key statistical traits to a
reasonable degree. You'll be able to conclude the population based on
the sample.

C) Outliers
A data point that deviates significantly from the rest of the data in a
sample or population is referred to as an outlier.
Outliers can have a variety of causes, but here is a handful to get you
started:
 Natural data variation
 Changes in the observed system's behavior
 Data gathering errors
 Outliers are frequently caused by data-gathering problems
Inferential Statistics
In the Inferential Statistics, we try to interpret the Meaning of descriptive
Statistics. After the Data has been collected, analyzed, and summarised
we use Inferential Statistics to describe the Meaning of the collected
Data.
 Inferential Statistics use the probability principle to assess whether
trends contained in the research sample can be generalized to the
larger population from which the sample originally comes.
 Inferential Statistics are intended to test hypotheses and
investigate relationships between variables and can be used to
make population predictions.
 Inferential Statistics are used to draw conclusions and inferences,
i.e., to make valid generalizations from samples.

Example
In a class, the Data is the set of marks obtained by 50 students. Now
when we take out the Data average, the result is the average of 50
students’ marks. If the average marks obtained by 50 students are 88
out of 100, on the basis of the outcome, we will draw a conclusion.
Mean, Median and Mode in Statistics
Mean: Mean is considered the arithmetic average of a Data set that is
found by adding the numbers in a set and dividing by the number of
observations in the Data set.
Median: The middle number in the Data set while listed in either
ascending or descending order is the Median.
Mode: The number that occurs the most in a Data set and ranges
between the highest and lowest value is the Mode.

For n number of observations, we have

Mode = The value which occurs most frequently


Measures of Dispersion in Statistics
The measures of central tendency do not suffice to describe the complete
information about a given Data. Therefore, the variability is described by a
value called the measure of dispersion.
The different measures of dispersion include:
1. The range in Statistics is calculated as the difference between the
maximum value and the minimum value of the Data points.
2. The quartile deviation that measures the absolute measure of
dispersion. The Data points are divided into 3 quarters. Find the Median
of the Data points. The Median of the Data points to the left of this
Median is said to be the upper quartile and the Median of the Data
points to the right of this Median is said to be the lower quartile. Upper
quartile - lower quartile is the interquartile range. Half of this is the
quartile deviation.
3. The Mean deviation is the statistical measure to determine the average
of the absolute difference between the items in a distribution and the
Mean or Median of that series.
4. The standard deviation is the measure of the amount of variation of a
set of values.
Stages of Statistics

1. Collection of Data:
This is the first step of statistical Analysis where we collect the Data
using different methods depending upon the case.
2. Organizing the Collected Data:
In the next step, we organize the collected Data in a Meaningful
manner. All the Data is made easier to understand.
3. Presentation of Data:
In the third step we simplify the Data. These Data are presented in the
form of tables, graphs, and diagrams.
4. Analysis of the Data:
Analysis is required to get the right results. It is often carried out using
measures of central tendencies, measures of dispersion, correlation,
regression, and interpolation.
5. Interpretation of Data:
In this last stage, conclusions are enacted. Use of comparisons is
made. On this basis, forecasting is made.
Uses of Statistics

 Statistics helps to obtain appropriate quantitative Data.


 Statistics helps to present complex Data for the simple and consistent
Interpretation of the Data in a suitable tabular, diagrammatic, and
graphic form.
 Statistics help to explain the nature and pattern of variability through
quantitative observations of a phenomenon.
 Statistics help to depict the Data in tabular form, or in a graphical form
in order to understand it properly.
Applications of Statistics

 Statistics is used in Machine Learning and Data Mining.


 Statistics is used in Mathematics.
 Statistics is used in Economics.
Calculating Descriptive Statistics in Python

Python statistical modules provide simple and effective techniques for


interacting with data.

Let’s get our hands filthy by implementing these libraries and techniques
in Python.

1. Measures of Central Tendency

a. Mean

import statistics

# initializing list

li = [1, 2, 3, 3, 2, 2, 2, 1]

# using mean() to calculate average of list

# elements

print ("The average of list values is : ",end="")

print (statistics.mean(li))

Output:

The average of list values is : 2


b. Median

from statistics import median

from fractions import Fraction as fr

data1 = (2, 3, 4, 5, 7, 9, 11)

# tuple of floating point values

data2 = (2.4, 5.1, 6.7, 8.9)

# tuple of fractional numbers

data3 = (fr(1, 2), fr(44, 12), fr(10, 3), fr(2, 3))

data4 = (-5, -1, -12, -19, -3)

data5 = (-1, -2, -3, -4, 4, 3, 2, 1)

# Printing the median of above datasets

print("Median of data-set 1 is % s" % (median(data1)))

print("Median of data-set 2 is % s" % (median(data2)))

print("Median of data-set 3 is % s" % (median(data3)))

print("Median of data-set 4 is % s" % (median(data4)))

print("Median of data-set 5 is % s" % (median(data5)))

Output:

Median of data-set 1 is 5

Median of data-set 2 is 5.9

Median of data-set 3 is 2

Median of data-set 4 is -5

Median of data-set 5 is 0.0


c. Mode

from statistics import mode

from fractions import Fraction as fr

# tuple of positive integer numbers

data1 = (2, 3, 3, 4, 5, 5, 5, 5, 6, 6, 6, 7)

# tuple of a set of floating point values

data2 = (2.4, 1.3, 1.3, 1.3, 2.4, 4.6)

# tuple of a set of fractional numbers

data3 = (fr(1, 2), fr(1, 2), fr(10, 3), fr(2, 3))

# tuple of a set of negative integers

data4 = (-1, -2, -2, -2, -7, -7, -9)

# tuple of strings

data5 = ("red", "blue", "black", "blue", "black",


"black", "brown")

# Printing out the mode of the above data-sets

print("Mode of data set 1 is % s" % (mode(data1)))

print("Mode of data set 2 is % s" % (mode(data2)))

print("Mode of data set 3 is % s" % (mode(data3)))

print("Mode of data set 4 is % s" % (mode(data4)))

print("Mode of data set 5 is % s" % (mode(data5)))

Output:
Mode of data set 1 is 5

Mode of data set 2 is 1.3

Mode of data set 3 is 1/2

Mode of data set 4 is -2

Mode of data set 5 is black


2. Measure of variability

a. Range

# Sample Data

arr = [1, 2, 3, 4, 5]

#Finding Max

Maximum = max(arr)

# Finding Min

Minimum = min(arr)

# Difference Of Max and Min

Range = Maximum-Minimum

print("Maximum = {}, Minimum = {} and Range =


{}".format(

Maximum, Minimum, Range))

Output:

Maximum = 5, Minimum = 1 and Range = 4


b. Variance

# Python code to demonstrate variance()

# function on varying range of data-types

# importing statistics module

from statistics import variance

# importing fractions as parameter values

from fractions import Fraction as fr

# tuple of a set of positive integers

# numbers are spread apart but not very much

sample1 = (1, 2, 5, 4, 8, 9, 12)

# tuple of a set of negative integers

sample2 = (-2, -4, -3, -1, -5, -6)

# tuple of a set of positive and negative numbers

# data-points are spread apart considerably

sample3 = (-9, -1, -0, 2, 1, 3, 4, 19)

# tuple of a set of fractional numbers

sample4 = (fr(1, 2), fr(2, 3), fr(3, 4),


fr(5, 6), fr(7, 8))

# tuple of a set of floating point values

sample5 = (1.23, 1.45, 2.1, 2.2, 1.9)

# Print the variance of each samples

print("Variance of Sample1 is % s " %


(variance(sample1)))

print("Variance of Sample2 is % s " %


(variance(sample2)))

print("Variance of Sample3 is % s " %


(variance(sample3)))

print("Variance of Sample4 is % s " %


(variance(sample4)))

print("Variance of Sample5 is % s " %


(variance(sample5)))

Output:

Variance of Sample1 is 15.80952380952381

Variance of Sample2 is 3.5

Variance of Sample3 is 61.125

Variance of Sample4 is 1/45

Variance of Sample5 is 0.17613000000000006


c. Standard Deviation

from statistics import stdev

# importing fractions as parameter values

from fractions import Fraction as fr

# creating a varying range of sample sets

# numbers are spread apart but not very much

sample1 = (1, 2, 5, 4, 8, 9, 12)

# tuple of a set of negative integers

sample2 = (-2, -4, -3, -1, -5, -6)

# tuple of a set of positive and negative numbers

# data-points are spread apart considerably

sample3 = (-9, -1, -0, 2, 1, 3, 4, 19)

# tuple of a set of floating point values

sample4 = (1.23, 1.45, 2.1, 2.2, 1.9)

# Print the standard deviation of

# following sample sets of observations

print("The Standard Deviation of Sample1 is % s"


% (stdev(sample1)))

print("The Standard Deviation of Sample2 is % s"

% (stdev(sample2)))

print("The Standard Deviation of Sample3 is % s"

% (stdev(sample3)))

print("The Standard Deviation of Sample4 is % s"

% (stdev(sample4)))

Output:

The Standard Deviation of Sample1 is 3.9761191895520196

The Standard Deviation of Sample2 is 1.8708286933869707

The Standard Deviation of Sample3 is 7.8182478855559445

The Standard Deviation of Sample4 is 0.4196784483387

You might also like