Statistics
Statistics
Definition of Statistics
Any raw Data, when collected and organized in the form of numerical or
tables, is known as Statistics. Statistics is also the mathematical study of
the probability of events occurring based on known quantitative Data or
a Collection of Data.
Statistics attempts to infer the properties of a large Collection of Data
from inspection of a sample of the Collection thereby allowing educated
guesses to be made with a minimum of expense. There are generally 3
kinds of averages commonly used in Statistics. They are: (i) Mean, (ii)
Median, and (iii) Mode.
Statistics is the study of Data Collection, Analysis, Interpretation,
Presentation, and organizing in a specific way. Mathematical methods
used for different analytics include mathematical Analysis, linear algebra,
stochastic Analysis, the theory of measure-theoretical probability, and
differential equations. Collecting, classifying, organizing, and displaying
numerical Data is associated with Statistics. This helps one to grasp
different outcomes from it and foresee several possibilities of various
events. Statistics discuss information, observations, and Data in the form
of numerical Data.
Types of Statistics
There are two kinds of Statistics, which are descriptive Statistics and
inferential Statistics. In descriptive Statistics, the Data or Collection Data
are described in a summarized way, whereas in inferential Statistics, we
make use of it in order to explain the descriptive kind. Both of them are
used on a large scale. Also, there is another kind of Statistics where
descriptive transitions into inferential Statistics.
Statistics is mainly divided into the following two categories.
1. Descriptive Statistics
2. Inferential Statistics
Descriptive Statistics
In the descriptive Statistics, the Data is described in a summarized way.
The summarization is done from the sample of the population using
different parameters like Mean or standard deviation. Descriptive
Statistics are a way of using charts, graphs, and summary measures to
organize, represent, and explain a set of Data.
Data is typically arranged and displayed in tables or graphs
summarizing details such as histograms, pie charts, bars or scatter
plots.
Descriptive Statistics are just descriptive and thus do not require
normalization beyond the Data collected.
A) Types of Measures
Descriptive statistics are classified into two types:
a. Mean
It is calculated by dividing the total number of observations by the sum of
the observations. It can also be described as the sum divided by the
count.
b. Median
(n+1)/2
It is the data set's middle value. It divides the data into two halves. If the
number of items in the data set is odd, the center element is the median;
otherwise, the median is the average of two center elements.
c. Mode
It is the most often occurring value in the given data collection. If the
frequency of all data points is the same, the data set may not have a
mode. We can also have several modes if we meet two or more data
points with the same frequency.
2. Measure of variability
The spread of data, or how well our data is dispersed, is a measure of
variability. The most common measures of variability are:
a. Standard deviation
It is calculated by taking the square root of the variance. It is determined
by first determining the Mean, then subtracting each number from the
Mean, also known as the average, and squaring the result. Adding the
values, dividing by the number of words, and finally taking the square
root.
b. Range
The range represents the difference between the largest and smallest
data points in our data set. The range is proportional to the spread of
data, so the wider the range, the wider the spread of data, and vice
versa.
Range = Largest data value – smallest data value
c. Variance
It is defined as a squared deviation from the mean on average. It is
determined by squaring the difference between each data point and the
average, also known as the mean, adding all of them, and then dividing
by the number of data points in our data collection.
B) Population and Samples
The population is a grouping of all the elements or things you are
interested in statistics. Populations are frequently large, making them
unsuitable for data collection and analysis. That is why statisticians
typically attempt to draw conclusions about a population by selecting and
analyzing a representative subset of that group.
This subset of a population is referred to as a sample. Ideally, the
sample should preserve the population's key statistical traits to a
reasonable degree. You'll be able to conclude the population based on
the sample.
C) Outliers
A data point that deviates significantly from the rest of the data in a
sample or population is referred to as an outlier.
Outliers can have a variety of causes, but here is a handful to get you
started:
Natural data variation
Changes in the observed system's behavior
Data gathering errors
Outliers are frequently caused by data-gathering problems
Inferential Statistics
In the Inferential Statistics, we try to interpret the Meaning of descriptive
Statistics. After the Data has been collected, analyzed, and summarised
we use Inferential Statistics to describe the Meaning of the collected
Data.
Inferential Statistics use the probability principle to assess whether
trends contained in the research sample can be generalized to the
larger population from which the sample originally comes.
Inferential Statistics are intended to test hypotheses and
investigate relationships between variables and can be used to
make population predictions.
Inferential Statistics are used to draw conclusions and inferences,
i.e., to make valid generalizations from samples.
Example
In a class, the Data is the set of marks obtained by 50 students. Now
when we take out the Data average, the result is the average of 50
students’ marks. If the average marks obtained by 50 students are 88
out of 100, on the basis of the outcome, we will draw a conclusion.
Mean, Median and Mode in Statistics
Mean: Mean is considered the arithmetic average of a Data set that is
found by adding the numbers in a set and dividing by the number of
observations in the Data set.
Median: The middle number in the Data set while listed in either
ascending or descending order is the Median.
Mode: The number that occurs the most in a Data set and ranges
between the highest and lowest value is the Mode.
1. Collection of Data:
This is the first step of statistical Analysis where we collect the Data
using different methods depending upon the case.
2. Organizing the Collected Data:
In the next step, we organize the collected Data in a Meaningful
manner. All the Data is made easier to understand.
3. Presentation of Data:
In the third step we simplify the Data. These Data are presented in the
form of tables, graphs, and diagrams.
4. Analysis of the Data:
Analysis is required to get the right results. It is often carried out using
measures of central tendencies, measures of dispersion, correlation,
regression, and interpolation.
5. Interpretation of Data:
In this last stage, conclusions are enacted. Use of comparisons is
made. On this basis, forecasting is made.
Uses of Statistics
Let’s get our hands filthy by implementing these libraries and techniques
in Python.
a. Mean
import statistics
# initializing list
li = [1, 2, 3, 3, 2, 2, 2, 1]
# elements
print (statistics.mean(li))
Output:
Output:
Median of data-set 1 is 5
Median of data-set 3 is 2
Median of data-set 4 is -5
data1 = (2, 3, 3, 4, 5, 5, 5, 5, 6, 6, 6, 7)
# tuple of strings
Output:
Mode of data set 1 is 5
a. Range
# Sample Data
arr = [1, 2, 3, 4, 5]
#Finding Max
Maximum = max(arr)
# Finding Min
Minimum = min(arr)
Range = Maximum-Minimum
Output:
Output:
% (stdev(sample2)))
% (stdev(sample3)))
% (stdev(sample4)))
Output: