0% found this document useful (0 votes)
17 views

Stats Unit I Notes

Uploaded by

ronichotu2310
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Stats Unit I Notes

Uploaded by

ronichotu2310
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Unit – I

Introduction to Statistics: Definition and Scope

Statistics and its Types: Statistics is a branch of math focused on collecting,


organizing, and understanding numerical data. It involves analyzing and
interpreting data to solve real-life problems, using various quantitative models. Some
view statistics as a separate scientific discipline rather than just a branch of
math. It simplifies complex tasks and offers clear insights into regular
activities. Statistics finds applications in diverse fields like weather forecasting,
stock market analysis, insurance, betting, and data science.

What are Statistics?


Statistics in Mathematics is the study and manipulation of data. It involves the
analysis of numerical data, enabling the extraction of meaningful conclusions from the
collected and analyzed data sets.

 According to Merriam-Webster: Statistics is the science of collecting,


analyzing, interpreting, and presenting masses of numerical data.

 According to Oxford English Dictionary: Statistics is a branch of mathematics


dealing with the collection, analysis, interpretation, presentation, and organization
of data.

Statistics Terminologies:
Some of the most common terms you might come across in statistics are:
 Population: It is actually a collection of a set of individual objects or events whose
properties are to be analyzed.
 Sample: It is the subset of a population.
 Variable: It is a characteristic that can have different values.
 Parameter: It is numerical characteristic of population.

Statistics Examples:
Some real-life examples of statistics that you might have seen:
Example 1: In a class of 45 students, we calculate their mean marks to evaluate
performance of that class.
Example 2: Before elections, you might have seen exit polls. Exit polls are opinion of
population sample, that are used to predict election results.

Types of Statistics
There are 2 types of statistics:
 Descriptive Statistics
 Inferential Statistics

Types of statistics is explained in the image added below:


Applications of Statistics
Various application of statistics in mathematics are added below,
 Statistics is used in mathematical computing.
 Statistics is used in finding probability and chances.
 Statistics is used in weather forcasting, etc.

Scope of Statistics
Statistics is a branch of mathematics that deals with the collection, organization,
analysis, interpretation, and presentation of data. It is used in a wide variety of fields,
including:
 Science: Statistics is used to design experiments, analyze data, and draw
conclusions about the natural world.
 Business: Statistics is used to market products, track sales, and make financial
decisions.
 Government: Statistics is used to track economic trends, measure the effectiveness
of government programs, and allocate resources.
 Healthcare: Statistics is used to develop new drugs, track the spread of
diseases, and assess the effectiveness of medical treatments.
 Sports: Statistics is used to analyze player performance, scout new talent, and
predict the outcome of games.

Concept of Statistical Population and Sample

A population refers to the entire set of individuals, objects, or data points that you want to
study. It can be large or small depending on the scope of your research. For example, all
students in a school or all people in a country.
A sample is a subset of the population that is selected for analysis. It’s used when
studying the entire population is impractical or impossible. Sampling allows for
inferences about the population using statistical techniques.

Population and Sample

The population gives a complete picture, while the sample provides an estimate.
Parameters (like population mean) describe the population; statistics (like sample mean)
describe the sample. The population refers to the entire group of individuals or items that
we are interested in studying and drawing conclusions about. In statistics, the population
is the entire set of items from which data is drawn in the statistical study. It can be a
group of individuals or a set of items. The population is usually denoted by N.
A sample is a subset of the population selected for study. It is a representative portion of
the population from which we collect data in order to make inferences or draw
conclusions about the entire population. The sample is denoted by n.
Population Sample

The population includes all members of


A sample is a subset of the population.
a specified group.

Samples offer a more feasible approach to


Collecting data from an entire population
studying populations, allowing researchers to
can be time-consuming, expensive, and
draw conclusions based on smaller, manageable
sometimes impractical or impossible.
datasets

Consists of 1000 households, a subset of the


Includes all residents in the city.
entire population.

Populations are used when your research question requires, or when you have
access to, data from every member of the population. Usually, it is only
straightforward to collect data from a whole population when it is small,
accessible and cooperative.

When your population is large in size, geographically dispersed, or difficult to


contact, it’s necessary to use a sample. With statistical analysis, you can use
sample data to make estimates or test hypotheses about population data.

When Should Samples be used?


 When studying a large population where it is impractical or impossible to collect data
from every individual.
 When resources such as time, cost, and manpower are limited, making it more
feasible to collect data from a subset of the population.
 When conducting research or experiments where it is important to minimize potential
biases in data collection.
(Lecture: https://youtu.be/PFMfNJiqt-s?si=OU3sQPA6aG9CbE7i)
Data in Statistics

Data is a simple record or collection of different numbers, characters, images, and others
that are processed to form Information. In statistics, we have different types of data that
are used to represent various information. In statistics, we analyze the data to obtain any
meaningful information and thus categorizing data into different types is very important.
Data types in statistics help us to make an informed decision about what type of process
is used to analyze the data.

Types of Data in Statistics


The data in statistics is classified into four categories:
 Nominal data
 Ordinal data
 Discrete data
 Continuous data

Qualitative Data (Categorical Data)


As the name suggest Qualitative Data tells the features of the data in the statistics.
Qualitative Data is also called Categorical Data and it categorizes the data into various
categories. Qualitative data includes data such as gender of people, their family name
and others in sample of population data.
Qualitative data is further categorized into two categories that includes,
 Nominal Data
 Ordinal Data

Nominal Data
Nominal data is a type of data that consists of categories or names that cannot be
ordered or ranked. Nominal data is often used to categorize observations into groups,
and the groups are not comparable. In other words, nominal data has no inherent order
or ranking. Examples of nominal data include gender (Male or female), race (White,
Black, Asian), religion (Hinuduism, Christianity, Islam, Judaism), and blood type (A, B,
AB, O).
Nominal data can be represented using frequency tables and bar charts, which display
the number or proportion of observations in each category. For example, a frequency
table for gender might show the number of males and females in a sample of people.
Nominal data is analyzed using non-parametric tests, which do not make any
assumptions about the underlying distribution of the data. Common non-parametric tests
for nominal data include Chi-Squared Tests and Fisher’s Exact Tests. These tests are
used to compare the frequency or proportion of observations in different categories.

Ordinal Data
Ordinal data is a type of data that consists of categories that can be ordered or ranked.
However, the distance between categories is not necessarily equal. Ordinal data is often
used to measure subjective attributes or opinions, where there is a natural order to the
responses. Examples of ordinal data include education level (Elementary, Middle, High
School, College), job position (Manager, Supervisor, Employee), etc.
Ordinal data can be represented using bar charts, line charts. These displays show the
order or ranking of the categories, but they do not imply that the distances between
categories are equal.
Ordinal data is analyzed using non-parametric tests, which make no assumptions about
the underlying distribution of the data. Common non-parametric tests for ordinal data
include the Wilcoxon Signed-Rank test and Mann-Whitney U test.

Quantitative Data (Numerical Data)


Quantitavive Data is the type of the data that represents the numerical value of the data.
They are also called the Numerical Data. This data type is used to represent the height,
weight, length and other things of the data. Quantitative data is further classified into two
categories that are,
 Discrete Data
 Continuous Data

Discrete Data
Discrete data type is a type of data in statistics that only uses Discrete Value or Single
Values. These data types have values that can be easily counted as whole numbers.
The example of the discrete data types are,
 Height of Students in a class
 Marks of the students in a class test
 Weight of different members of a family, etc.

Continuous Data
Continuous data is the type of the quantitative data that represent the data in a
continuous range. The variable in the data set can have any value between the range of
the data set. Examples of the continuous data types are,
 Temperature Range
 Salary range of Workers in a Factory, etc.

Difference between Quantitative and Qualitative Data


Quantitaive and Qualitative data has huge differences and the basic differences between
them are studied in the table added below,

Quantitative data Qualitative data

Data is depicted in numerical terms. Data is not depicted in numerical terms.

Can be shown in numbers and variables like Could be about the behavioral attributes
ratio, percentage, and more. of a person, or thing.

Examples: loud behavior, fair skin, soft


Example: 100%, 1:3, 123
quality, and more.

Difference between Discrete and Continuous Data


Discrete data and continuous data both come under Quantitaive data and the differences
between them is studied in the table added below,

Discrete Data Continuous Data

The type of data that has clear spaces


This information falls into a continuous series.
between values is discrete data.

Discrete Data is Countable Continuous Data is Measurable

There are distinct or different values in Every value within a range is included in
discrete data. continuous data.

Discrete Data is depicted using bar


Continuous Data is depicted using histograms
graphs
Discrete Data Continuous Data

Ungrouped frequency distribution of Grouped distribution of continuous data


discrete data is performed against a tabulation frequencies is performed against a
single value. value group.

Scales of Measurement

Data can be classified as being on one of four scales: nominal, ordinal, interval or
ratio.

Properties of Measurement Scales:


 Identity – Each value on the measurement scale has a unique meaning.
 Magnitude – Values on the measurement scale have an ordered relationship to
one another. That is, some values are larger and some are smaller.
 Equal intervals – Scale units along the scale are equal to one another. For
Example the difference between 1 and 2 would be equal to the difference between
11 and 12.
 A minimum value of zero – The scale has a true zero point, below which no
values exist.

1. Nominal Scale –
Nominal variables can be placed into categories. These don’t have a numeric value
and so cannot be added, subtracted, divided or multiplied. These also have no order,
and nominal scale of measurement only satisfies the identity property of
measurement.
For example, gender is an example of a variable that is measured on a nominal scale.
Individuals may be classified as “male” or “female”, but neither value represents more
or less “gender” than the other.

2. Ordinal Scale –
The ordinal scale contains things that you can place in order. It measures a variable
in terms of magnitude, or rank. Ordinal scales tell us relative order, but give us no
information regarding differences between the categories. The ordinal scale has the
property of both identity and magnitude.
For example, in a race If Ram takes first and Vidur takes second place, we do not
know competition was close by how many seconds.

3. Interval Scale –
An interval scale has ordered numbers with meaningful divisions, the magnitude
between the consecutive intervals are equal. Interval scales do not have a true zero
i.e In Celsius 0 degrees does not mean the absence of heat.
Interval scales have the properties of:

 Identity
 Magnitude
 Equal distance
For example, temperature on Fahrenheit/Celsius thermometer i.e. 90° are hotter than
45° and the difference between 10° and 30° are the same as the difference between
60° degrees and 80°.

4. Ratio Scale –
The ratio scale of measurement is similar to the interval scale in that it also
represents quantity and has equality of units with one major difference: zero is
meaningful (no numbers exist below the zero). The true zero allows us to know how
many times greater one case is than another. Ratio scales have all of the
characteristics of the nominal, ordinal and interval scales. The simplest example of a
ratio scale is the measurement of length. Having zero length or zero money means
that there is no length and no money but zero temperature is not an absolute zero.
Properties of Ratio Scale:

 Identity
 Magnitude
 Equal distance
 Absolute/true zero

Presentation of Data:

Tabular Form
It is a table that helps to represent even a large amount of data in an
engaging, easy to read, and coordinated manner. The data is arranged in
rows and columns. This is one of the most popularly used forms of
presentation of data as data tables are simple to prepare and read.

The most significant benefit of tabulation is that it coordinates data for


additional statistical treatment and decision making. The analysis used in
tabulation is of four types. They are:

1. Qualitative
2. Quantitative
3. Temporal
4. Spatial

1. Qualitative classification: When the classification is done according to


traits such as physical status, nationality, social status, etc., it is known as
qualitative classification.
2. Quantitative classification: In this, the data is classified on the basis of
features that are quantitative in nature. In other words, these features can
be estimated quantitatively.

3. Temporal classification: In this classification, time becomes the


categorizing variable and data are classified according to time. Time, maybe
in years, months, weeks, days, hours, etc.,

4. Spatial classification: When the categorization is done on the basis of


location, it is known as spatial classification. The place may be a country,
state, district, block, village/town, etc.

Graphical Form

Graphics Representation is a way of representing any data in picturized form. It


helps a reader to understand the large set of data very easily as it gives us various data
patterns in visualized form.
There are two ways of representing data,
 Tables
 Pictorial Representation through graphs.

Types of Graphical Representations

Line Graphs

A line graph is used to show how the value of a particular variable changes with time.
We plot this graph by connecting the points at different values of the variable. It can be
useful for analyzing the trends in the data and predicting further trends.
Bar Graphs

A bar graph is a type of graphical representation of the data in which bars of uniform width
are drawn with equal spacing between them on one axis (x-axis usually), depicting the
variable. The values of the variables are represented by the height of the bars.
Histograms

This is similar to bar graphs, but it is based frequency of numerical values rather than their
actual values. The data is organized into intervals and the bars represent the frequency of
the values in that range. That is, it counts how many values of the data lie in a particular
range.

Line Plot
It is a plot that displays data as points and checkmarks above a number line, showing
the frequency of the point.
Stem and Leaf Plot
This is a type of plot in which each value is split into a “leaf”(in most cases, it is the last
digit) and “stem”(the other remaining digits). For example: the number 42 is split into
leaf (2) and stem (4).

Box and Whisker Plot


These plots divide the data into four parts to show their summary. They are more
concerned about the spread, average, and median of the data.
Pie Chart
It is a type of graph which represents the data in form of a circular graph. The circle is
divided such that each portion represents a proportion of the whole.

Frequency Distribution in Statistics

A frequency distribution is an overview of all values of some variable and the


number of times they occur. It tells us how frequencies are distributed over the
values. That is how many values lie between different intervals. They give us an idea
about the range where most values fall and the ranges where values are scarce.

Frequency Distribution Table


A frequency distribution table is a way to organize and present data in a tabular
form which helps us summarize the large dataset into a concise table. In the
frequency distribution table, there are two columns one representing the data either in
the form of a range or an individual data set and the other column shows the
frequency of each interval or individual.

For example, let’s say we have a dataset of students’ test scores in a class.
Test Score Frequency

0-20 6

20-40 12

40-60 22

60-80 15

80-100 5

Types of Frequency Distribution


There are four types of frequency distribution:
1. Grouped Frequency Distribution
2. Ungrouped Frequency Distribution
3. Relative Frequency Distribution
4. Cumulative Frequency Distribution

1.Grouped Frequency Distribution


In Grouped Frequency Distribution observations are divided between different
intervals known as class intervals and then their frequencies are counted for each
class interval. This Frequency Distribution is used mostly when the data set is very large.
Example: Make the Frequency Distribution Table for the ungrouped data given as
follows:
23, 27, 21, 14, 43, 37, 38, 41, 55, 11, 35, 15, 21, 24, 57, 35, 29, 10, 39, 42, 27, 17, 45,
52, 31, 36, 39, 38, 43, 46, 32, 37, 25
Solution:
As there are observations in between 10 and 57, we can choose class intervals as 10-
20, 20-30, 30-40, 40-50, and 50-60. In these class intervals all the observations are
covered and for each interval there are different frequency which we can count for each
interval.
Thus, the Frequency Distribution Table for the given data is as follows:
Class Interval Frequency

10 – 20 5

20 – 30 8

30 – 40 12

40 – 50 6

50 – 60 3

2.Ungrouped Frequency Distribution


In Ungrouped Frequency Distribution, all distinct observations are mentioned and
counted individually. This Frequency Distribution is often used when the given
dataset is small.
Example: Make the Frequency Distribution Table for the ungrouped data given
as follows:
10, 20, 15, 25, 30, 10, 15, 10, 25, 20, 15, 10, 30, 25
Solution:
As unique observations in the given data are only 10, 15, 20, 25, and 30 with each
having a different frequency.
Thus the Frequency Distribution Table of the given data is as follows:
Value Frequency

10 4

15 3

20 2

25 3

30 2
3.Relative Frequency Distribution
This distribution displays the proportion or percentage of observations in each interval
or class. It is useful for comparing different data sets or for analyzing the distribution of
data within a set.
Relative Frequency is given by:
Relative Frequency = (Frequency of Event)/(Total Number of Events)
Example: Make the Relative Frequency Distribution Table for the following data:

Score
0-20 21-40 41-60 61-80 81-100
Range

Frequency 5 10 20 10 5

Solution:
To Create the Relative Frequency Distribution table, we need to calculate Relative
Frequency for each class interval. Thus Relative Frequency Distribution table is given
as follows:

Score Range Frequency Relative Frequency

0-20 5 5/50 = 0.10

21-40 10 10/50 = 0.20

41-60 20 20/50 = 0.40

61-80 10 10/50 = 0.20

81-100 5 5/50 = 0.10

Total 50 1.00
4.Cumulative Frequency Distribution

Cumulative frequency is defined as the sum of all the frequencies in the


previous values or intervals up to the current one. The frequency distributions
which represent the frequency distributions using cumulative frequencies are called
cumulative frequency distributions. There are two types of cumulative frequency
distributions:

 Less than Type: We sum all the frequencies before the current interval.
 More than Type: We sum all the frequencies after the current interval.

Let’s see how to represent a cumulative frequency distribution through an


example,
Example: The table below gives the values of runs scored by Virat Kohli in the last 25
T-20 matches. Represent the data in the form of less-than-type cumulative frequency
distribution:

45 34 50 75 22

56 63 70 49 33

0 8 14 39 86

92 88 70 56 50

57 45 42 12 39
Solution:
Since there are a lot of distinct values, we’ll express this in the form of grouped
distributions with intervals like 0-10, 10-20 and so. First let’s represent the data in the
form of grouped frequency distribution.

Runs Frequency

0-10 2

10-20 2

20-30 1

30-40 4

40-50 4

50-60 5

60-70 1

70-80 3

80-90 2

90-100 1
Now we will convert this frequency distribution into cumulative frequency distribution
by summing up the values of current interval and all the previous intervals.

Runs scored by Virat Kohli Cumulative Frequency

Less than 10 2

Less than 20 4

Less than 30 5

Less than 40 9

Less than 50 13

Less than 60 18

Less than 70 19

Less than 80 22

Less than 90 24

Less than 100 25

This table represents the cumulative frequency distribution of less than type.
Runs scored by Virat Kohli Cumulative Frequency

More than 0 25

More than 10 23

More than 20 21

More than 30 20

More than 40 16

More than 50 12

More than 60 7

More than 70 6

More than 80 3

More than 90 1

This table represents the cumulative frequency distribution of more than type.

You might also like