Unit 1
Unit 1
• Single and isolated figures are not statistics for the reason that such figures are
unrelated and cannot be compared.
• Ex: The income of Mr. X is 90,000 per month.
• This would not constitute statistics although it is a numerical statement of fact.
Spatial classification
It is done with respect to space or places.
Ex: Production of cereals in quintals in various states, population of a
country according to states, etc.
STATISTICAL SERIES (DISCRETE/CONTINUOUS)
• Statistical series may be either discrete or continuous.
• A discrete series may be formed from items which are exactly
measurable.
For example, the number of students getting exactly 40, 50, 60, 70
marks can be easily counted.
• A continuous series may be formed from items which cannot be
measured with absolute accuracy.
For example, the height or weight of students in a class.
FREQUENCY
The number of occurrence of the value is termed as the “frequency” of
that value.
FREQUENCY DISTRIBUTION
The way of tabulating a pool of data of a variable and their respective
frequencies side by side is called a ‘frequency distribution’ of those
data.
Discrete or Simple or ungrouped frequency distribution
It does not condense the data much and is quite cumbersome to grasp
and comprehend.
It becomes handy if the values of the variable are largely repeated.
• Now let us present the above data in the form of a simple (or,
ungrouped) frequency distribution using the tally marks.
• A tally mark is an upward slanted stroke (/) which is put against a
value each time it occurs in the raw data.
• The fifth occurrence of the value is represented by a cross tally mark
(\) as shown across the first four tally marks.
• Finally, the tally marks are counted and the total of the tally marks
against each value is its frequency
Grouped Frequency Distribution
• The above data can be further condensed by putting them into
smaller groups, or, classes called “class-Intervals”.
• The number of items which fall in a class-interval is called its “class
frequency”.
• The tabulation of raw data by dividing the whole range of
observations into a number of classes and indicating the
corresponding class-frequencies against the class-intervals, is called
“grouped frequency distribution”.
• Let us now represent the data in Table 1.3 as grouped frequency
distribution.
• Find that the lowest value (56) and the highest value (73) in the given
data. Thus for approximately 10 classes the difference of values
between two consecutive classes will be (73-56)/10 =17/10=1.7 ~2
The steps in preparing the grouped frequency distribution are:
1. Determining the class intervals.
2. Recording the data using tally marks.
3. Finding frequency of each class by counting the tally marks.
• Several Important Terms
• The class-boundaries of the class-intervals of Table 1.5 will be 55.5 – 57.5; 57.5 – 59.5;
59.5 – 61.5; etc., since d = 58 – 57 = 60 – 59 = ...= 1.
• The class-boundaries convert a grouped frequency distribution (inclusive type) into a
continuous frequency distribution.
Exclusive type
• It is suitable for continuous variable data and facilitates mathematical
computations.
Inclusive type
• This is suitable for discrete variable data.
• There is no ambiguity to which an item belongs but the idea of
continuity is lost.
• To make it continuous, the class limits are converted into class
boundaries.
(e) Width or Length (or size) of a Class-interval:
• Width of a class-interval = Upper class boundary − Lower class-
boundary
• In the less than type, the cumulative frequency of each class-interval is obtained by adding
the frequencies of the given class and all the preceding classes, when the classes are
arranged in the ascending order of the value of the variable.
• In the more than type, the cumulative frequency of each class-interval is obtained by adding
the frequencies of the given class and the succeeding classes.
• For grouped frequency distribution, the cumulative frequencies are shown against the class-
boundary points
Here, d = Gap between two consecutive classes = 1,
Hence, (1 /2) d = 0 5
∴ Lower class-boundary points are 9.5, 19.5, 29.5, etc. and the
last upper class-boundary point is 59.5. Hence, the class
boundary points are 9.5, 19.5, ..., 59.5.
PRESENTATION OF STATISTICAL DATA
Statistical data can be presented in three different ways:
(1) Textual presentation
(2) Tabular presentation, and
(3) Graphical presentation.
Textual presentation: This is a descriptive form.
The disadvantages of textual presentation are:
• it is too lengthy
• there is repetition of words
• comparisons cannot be made easily
• it is difficult to get an idea and take appropriate action.
• Textual presentation: This is a descriptive form.
The disadvantages of textual presentation are:
• it is too lengthy
• there is repetition of words
• comparisons cannot be made easily
• it is difficult to get an idea and take appropriate action.
Tabular presentation, or, Tabulation
Tabulation may be defined as the systematic presentation of numerical data
in rows or/and columns according to certain characteristics.
It expresses the data in concise and attractive form which can be easily
understood and used to compare numerical figures.
Before drafting a table, you should be sure what you want to show and who
will be the reader.
The advantages of a tabular presentation over the textual presentation are:
• it is concise
• there is no repetition of explanatory matter
• comparisons can be made easily
• the important features can be highlighted and
• errors in the data can be detected.
• Tabular presentation, or, Tabulation
• Tabulation may be defined as the systematic presentation of numerical
data in rows or/and columns according to certain characteristics.
• It expresses the data in concise and attractive form which can be easily
understood and used to compare numerical figures.
• Before drafting a table, you should be sure what you want to show and
who will be the reader.
The advantages of a tabular presentation over the textual presentation are:
• it is concise
• there is no repetition of explanatory matter
• comparisons can be made easily
• the important features can be highlighted and
• errors in the data can be detected.
An ideal statistical table should contain the following items:
• Table number: A number must be allotted to the table for
identification, particularly when there are many tables in a study.
• Title: The title should explain what is contained in the table. It should
be clear, brief and set in bold type on top of the table. It should also
indicate the time and place to which the data refer.
• Date: The date of preparation of the table should be given.
• Stubs or Row designations: Each row of the table should be given a
brief heading. Such designations of rows are called “stubs”, or, “stub
items” and the entire column is called “stub column”.
• Column headings, or, Captions: Column designation is given on top
of each column to explain to what the figures in the column refer. It
should be clear and precise. This is called a “caption”, or, “heading”.
Columns should be numbered if there are four, or, more columns
• Body of the table: The data should be arranged in such a way that any
figure can be located easily. Various types of numerical variables should
be arranged in an ascending order, i.e., from left to right in rows and
from top to bottom in columns. Column and row totals should be given.
• Unit of measurement: If the unit of measurement is uniform
throughout the table, it is stated at the top right-hand corner of the
table along with the title. If different rows and columns contain figures
in different units, the units may be stated along with “stubs”, or,
“captions”. Very large figures may be rounded up but the method of
rounding should be explained.
• Source: At the bottom of the table a note should be added indicating
the primary and secondary sources from which data have been
collected.
• Footnotes and references: If any item has not been explained
properly, a separate explanatory note should be added at the bottom
of the table.
• A table should be logical, well-balanced in length and breadth and
the comparable columns should be placed side by side.
Light/heavy/thick or double rulings may be used to distinguish sub
columns, main columns and totals. For large data more than one table
may be used.
• Exercise Problem :
Draw up a blank table to show the number of employees in a large
commercial firm, classified according to (i) Sex: Male and Female; (ii)
Three age-groups: below 30, 30 and above but below 45, 45 and above;
and (iii) Four income-groups: below Rs. 400, Rs. 400–750, Rs. 750–1,
000, above Rs. 1, 000.
Objectives of Tabulation
The main objectives of tabulation are stated below:
• to carry out investigation;
• to do comparison;
• to locate omissions and errors in the data;
• to use space economically;
• to study the trend;
• to simplify data;
• to use it as future reference
Sorting
• Sorting of data is the last process of tabulation. It is a time-consuming
process when the data is too large.
• After classification the data may be sorted using either of the
following methods:
• Manual method: Here the sorting is done by hand by giving tally
marks for the number of times each event has occurred. Next the
total tally marks are counted. The method is simple and suitable for
limited data.
• Mechanical and electrical method:
To reduce the sorting time mechanical devices may be used. This is
described as mechanical tabulation. For electrical tabulation data
should be codified first and then punched on card. For each data a
separate card is used. The punched cards are checked by a machine
called ‘verifier’. Next the cards are sorted out into different groups as
desired by a machine called ‘sorter’. Finally, the tabulation is done by
using a tabulator. The same card may be sorted out more than once for
completing tables under different titles.
• Tabulation using electronic computer:
It is convenient to use electronic computer for sorting when (a) data
are very large; (b) data have to be sorted for future use and (c) the
requirements of the table are changing. Such a tabulation is less time-
consuming and more accurate than the manual method.
Diagrams
Diagrams are various geometrical shape such as bars, circles etc.
Diagrams are based on scale but are not confined to points or lines. They
are more attractive and easier to understand than graphs.
Merits
1. Most of the people are attracted by diagrams.
2. Technical Knowledge or education is not necessary.
3. Time and effort required are less.
4. Diagrams show the data in proper perspective.
5. Diagrams leave a lasting impression.
6. Language is not a barrier.
7. Widely used tool
• Demerits (or) limitations
1. Diagrams are approximations.
2. Minute differences in values cannot be represented properly in
diagrams.
3. Large differences in values spoil the look of the diagram.
4. Some of the diagrams can be drawn by experts only. eg. Pie chart.
5. Different scales portray different pictures to laymen
• Types of Diagrams
The important diagrams are
1. Simple Bar diagram.
2. Multiple Bar diagram.
3. Component Bar diagram.
4. Percentage Bar diagram.
5. Pie chart
6. Pictogram
7. Statistical maps or cartograms
In all the diagrams and graphs, the groups or classes are represented on
the x-axis and the volumes or frequencies are represented in the y-axis
• Simple Bar diagram If the classification is based on attributes and if
the attributes are to be compared with respect to a single character
we use simple bar diagram.
Example
1. The area under different crops in a state.
2. The food grain production of different years.
3. The yield performance of different varieties of a crop.
4. The effect of different treatments etc.
Simple bar diagrams Consists of vertical bars of equal width. The
heights of these bars are proportional to the volume or magnitude of
the attribute. All bars stand on the same baseline. The bars are
separated from each others by equal intervals. The bars may be
coloured or marked.
• Example
The cropping pattern in Tamil Nadu in the year 1974-75 was as follows.
• Multiple bar diagram
If the data is classified by attributes and if two or more characters or
groups are to be compared within each attribute we use multiple bar
diagrams. If only two characters are to be compared within each
attribute, then the resultant bar diagram used is known as double bar
diagram.
The multiple bar diagram is simply the extension of simple bar diagram.
For each attribute two or more bars representing separate characters
or groups are to be placed side by side. Each bar within an attribute will
be marked or coloured differently in order to distinguish them. Same
type of marking or colouring should be done under each attribute. A
footnote has to be given explaining the markings or colourings
• Component bar diagram
This is also called sub – divided bar diagram. Instead of placing the bars
for each component side by side we may place these one on top of the
other. This will result in a component bar diagram
Graphical representation
Graphs
Graphs are charts consisting of points, lines and curves. Charts are
drawn on graph sheets. Suitable scales are to be chosen for both x and
y axes, so that the entire data can be presented in the graph sheet.
Graphical representations are used for grouped quantitative data.
Histogram
• When the data are classified based on the class intervals it can be
represented by a histogram. Histogram is just like a simple bar
diagram with minor differences. There is no gap between the bars,
since the classes are continuous. The bars are drawn only in outline
without colouring or marking as in the case of simple bar diagrams. It
is the suitable form to represent a frequency distribution.
• Class intervals are to be presented in x axis and the bases of the bars
are the respective class intervals. Frequencies are to be represented
in y axis. The heights of the bars are equal to the corresponding
frequencies
• Frequency Polygon
The frequencies of the classes are plotted by dots against the mid-
points of each class. The adjacent dots are then joined by straight lines.
The resulting graph is known as frequency polygon.
• Frequency curve
The procedure for drawing a frequency curve is same as for frequency
polygon. But the points are joined by smooth or free hand curve.
Ogives
Ogives are known also as cumulative frequency curves and there are
two kinds of ogives. One is less than ogive and the other is more than
ogive.
• Less than ogive: Here the cumulative frequencies are plotted against
the upper boundary of respective class interval.
• Greater than ogive: Here the cumulative frequencies are plotted
against the lower boundaries of respective class intervals