By Dr.
Neha Mathur
Statistics
Statistics is the study of data, including how to collect, analyze, and present
it.
It's a branch of applied mathematics that uses mathematical and
computational tools to help people make decisions based on data.
Statistics involve?
Data collection: Gathering data using sampling techniques like simple
random, systematic, stratified, or cluster sampling .
Data analysis: Using mathematical and computational tools to analyze data .
Data interpretation: Drawing conclusions from data .
Data presentation: Communicating results using tables, charts, and other
visual aids
Why is statistics important?
Statistics helps us make informed decisions by
reducing guesswork and adding substance to
decisions
Statistics can help us predict the future based on past
data
Statistics is used in many scientific fields and in
business and investing
What is DATA?
Data is a collection of measurements and
facts and a tool that helps an individual or a
group of individuals to reach a sound conclusion
by providing them with some information.
It helps the analyst to understand, analyze, and
interpret different socio-economic problems like
unemployment, poverty, inflation, etc.
Besides understanding the issues, it also helps in
determining the reasons behind the problem to find
possible solutions for them.
Data not only includes theoretical information but
some numerical facts too that can support the
information.
DATA
Qualitative Quantitative
This type of data covers This type of data
descriptions such as color, deals with numbers,
size, quality, and such as statistics, poll
appearance. numbers, percentages,
etc.
DATA COLLECTION
Data Collection refers to the systematic process of
gathering, measuring, and analyzing information from
various sources to get a complete and accurate picture
of an area of interest.
It is an essential phase in all types of research,
analysis, and decision-making.
NEED FOR DATA COLLECTION
Data collection is a crucial process for gathering
information that can be used to make informed decisions,
identify trends, and solve problems.
Whether you’re in academia, trying to conduct research,
or part of the commercial sector, thinking of how to
promote a new product, you need data collection to help
you make better choices.
The collection of data is the first step of the statistical
investigation.
SOURCES OF DATA
COLLECTION
Primary Source Secondary Source
The data already in existence
The data collected by the which has been previously
investigator from primary collected by someone else for
sources for the first time from other purposes is known as
scratch is known as primary secondary data.
data
It does not include any real-
. It is real-time data and is time data as the research has
always specific to the already been done on that
researcher’s needs. information.
It is available in raw form. the cost of collecting
the accuracy and reliability of secondary data is less.
primary data are more. It can be found in refined
form. The accuracy and
reliability is relatively less.
Techniques of Primary data
collection
Direct Personal Investigation :
This method involves collecting data personally from the
source of origin.
In simple words, the investigator makes direct contact with the
person from whom he/she wants to obtain information.
Indirect Oral Investigation :
The investigator collect the data orally from some other person
who has the necessary required information.
For example, collecting data of employees from their superiors
or managers.
Information from Local Sources or Correspondents:
In this method, for the collection of data, the investigator
appoints correspondents or local persons at various places, which
are then furnished by them to the investigator.
With the help of correspondents and local persons, the
investigators can cover a wide area.
Information through Questionnaires and Schedule:
In this method the investigator, while keeping in mind the
motive of the study, prepares a questionnaire. The investigator can
collect data through the questionnaire in two ways:
Mailing Method:
a) This method involves mailing the questionnaires to the
informants for the collection of data.
b) The investigator attaches a letter with the questionnaire in
the mail to define the purpose of the study or research.
c) He assures the informants that their information would be kept
secret, and then the informants note the answers to the
questionnaire and return the completed file.
Enumerator’s Method:
a) This method involves the preparation of a questionnaire
according to the purpose of the study or research.
b) In this case, the enumerator reaches out to the informants
himself with the prepared questionnaire.
c) Enumerators are not the investigators themselves, they are
the people who help the investigator in the collection of data.
Techniques of Secondary data
collection
Published Sources:
Researchers refer to books, academic journals, magazines, newspapers,
government reports, and other published materials that contain relevant
data.
Online Databases:
Numerous online databases provide access to a wide range of
secondary data, such as research articles, statistical information,
economic data, and social surveys.
Government and Institutional Records:
Government agencies, research institutions, organizations
often maintain databases or records that can be used for
research purposes.
• Publicly Available Data:
Data shared by individuals, organizations, or communities on
public platforms, websites, or social media can be accessed and utilized
for research.
• Past Research Studies:
Previous research studies and their findings can serve as
valuable secondary data sources. Researchers can review and analyze the
data to gain insights or build upon existing knowledge.
Principle Difference between Primary and Secondary Data
Difference in Objective:
The primary data collected by the investigator is
always for the specific objective. Therefore, there is no
need to make any adjustments for the purpose of the
study.
The secondary data collected by the
investigator has already been collected by someone else
for some other purpose. Therefore, the investigator has
to make necessary adjustments to the data to suit the
main objective of the present study.
Difference in Originality:
The primary data is collected from the
beginning from the source of origin, the data is
original.
The secondary data is already present
somewhere and hence is not original.
Difference in Cost of Collection:
The cost of collecting primary data is higher than
the cost of collecting secondary data in terms of
time, effort and money. It is because the data is
being collected for the first time from the source of
origin.
The cost of collecting secondary data is less as
the data is gathered from published or unpublished
sources.
Thus summing in short differences can be stated as:
Difference Between Primary and
Secondary Data
Primary Data
Secondary Data
Primary data is the first data Whereas secondary data is a
collected by a researcher for data that is already collected
the first time. by someone earlier.
Primary data is called real- While this is not real-time
time data. data, it is related to the past.
The process is very much While collecting secondary
involved in collecting primary data it does not involve much
data. process but rather quickly
and easily.
Primary data is expensive. While it is economical.
The primary data takes long While secondary data takes
time for collection. shorter time than primary
Primary data is available in data for collection.
crude form. While it is available in
Primary data is more accurate processed or refined form.
than secondary data. While it is less accurate than
Primary data is more reliable primary data.
than secondary data. While secondary data is less
There is also difficulty in reliable than primary data.
collecting data.
Conclusion
Primary and secondary data each have their own
strengths and best-use cases.
Primary data offers specificity and control but at a
higher cost and time investment.
Secondary data is more accessible and cost-
effective but may lack the precise relevance and
freshness of primary data.
Understanding these differences helps in choosing
the right type of data for our research needs.
Frequency Distribution
Frequency distribution is a method of organizing
and summarizing data to show the frequency (count)
of each possible outcome of a dataset.
It is an essential tool in statistics for understanding the
distribution and pattern of data.
Frequency distribution provides the information of
the number of occurrences (frequency) of distinct
values distributed within a given period of time or
interval, in a list, table, or graphical representation.
Types of frequency distributions
Grouped: In this type, the data is arranged and separated into groups called
class intervals. The frequency of data belonging to each class interval is noted
in a frequency distribution table. The grouped frequency table shows the
distribution of frequencies in class intervals.
Ungrouped: It shows the frequency of an item in each separate data value
rather than groups of data values.
Cumulative frequency distribution: Cumulative frequency is defined as the
sum of all the frequencies in the previous values or intervals up to the current
one.
Relative frequency distribution: This distribution displays the proportion or
percentage of observations in each interval or class. It is useful for comparing
different data sets or for analyzing the distribution of data within a set.
Frequency Distribution Contd…
Many times it is not easy or feasible to find the
frequency of data from a very large dataset. So to make
sense of the data we make a frequency table and
graphs.
Example : Take the example of the heights of ten
students in cms.
139, 145, 150, 145, 136, 150, 152, 144, 138, 138
Frequency Distribution Table
This frequency table will help us make better sense of the data given.
Also when the data set is too big (say if we
were dealing with 100 students),
we use tally marks for counting.
It makes the task more organised and easy.
Example shows how we use tally marks.
Example
The table gives the number of snacks ordered and the
number of days as a tally. Find the frequency of snacks
ordered.
Solution: From the frequency table the number of snacks ordered
ranging between
2-4 is 4 days
4 to 6 is 3 days
6 to 8 is 9 days
8 to 10 is 9 days
10 to 12 is 7 days.
So the frequencies for all snacks ordered are 4, 3, 9, 9, 7