Data Collection & Analysis
Data Collection & Analysis
Introduction:The word research originates from the French word ‘recerchier’ and it means
‘search again’. It assumes that the earlier search was not exhaustive and complete and hence a
repeated search is called for.Today research means a scientific and systematic investigation or
inquiry through search for new facts in any branch of knowledge. In short, the search for
knowledge through objective and systematic method of finding solution to a problem is
research.Actually, research means knowing knowledge through search for the development and
better future for all living beings of this universe.Data are the raw materials of all types of
research.
To make a decision in any business situation we need data. Facts expressed in quantitative
form can be termed as data. Collection of data and their statistical analysis are two important
operations in any research. Investigators generate data through some process of measurement,
counting or observation. Success of any statistical investigation depends on the availability of
accurate and reliable data. These depend on the appropriateness of the method chosen for data
collection. Therefore, data collection is a very basic activity in decision-making. In this chapter
first a brief overview of the important data collection methods is critically discussed and then
important statistical methods are systematically cited to analysis the collected data.
Research Process:
Research process consists of series of action or steps necessary to effectively carry out research
and the derived sequencing of these steps. The important steps of search process are:
1. Identification of the problem
2. Literature review
3. Objective and developing hypothesis
4. Research design
5. Sample design
6. Collection of data
7. Analysis of data
8. Report writing.
Data: The raw materials of statistics consists of numbers or observations usually obtained by
some process of counting or measurement, they are referred to collectively data. Thus, ‘A set of
observations is called data’.
Classification of Data: Data can be classified in a number of ways. Now we can classify data
according to some distinct criteria:
1. Data according to origin: (a) Population data (b) Sample data.
2. Data according to variable: (a) Qualitative (categorical) data (b) Quantitative data.
Again quantitative data can be classified as (i) Discrete data (ii) Continuous data
3. Data according to time: (a) Time series data (b) Cross-section data (c) Panel data
4. Data according to measurements of scale: (a) Nominal data (b) Ordinal data (c) Interval
data
(d) Ratio data.
5. Data according to subject (Discipline): Data are named according to different disciplines:
(a) Economic data (b) Agriculture data (c) Medical data (d) Business data (e) Metrological
data
(f) Import data (g) Export data etc.
Population data: Census inquiry requires population data. In this case we have to take
observations from all the experimental units of the population. That is we need population data. It
is also called census data. Population census, agriculture census, animal census are some
examples where we study population as a whole. Census inquiry involves a great deal of time,
money and energy. Sometimes experimental units may destroy during the time of taking
66
observation. Blood testing, bulb testing, rice testing etc are some examples where experimental
units destroy. All these sample inquiry is the appropriate method for collecting data.
Sample data:When we take observations from the sample experimental units under study we get
sample data. Sample data plays vital role in inferential statistics.
Qualitative data: In certain statistical investigation we are concerned with only presence or
absence of some characteristic in a set of object or individuals. This type of data is called
qualitative data. The characteristic used to classify an individual into different categories is called
an attribute. For example, honesty, sex, religion etc willconstitute a qualitative data.
Quantitative data: When we are concerned with only observations in a set of objects or
individuals, this type of data is called quantitative data. For example, the export values of
Bangladesh from 1972 to 2012 will constitute a quantitative data.
Cross-sectional data: Data that are observed at one point in time are referred to the cross
sectional data. The number of students of different universities for the year 2012 will constitute a
cross sectional data. Salary of the workers of a factory at a particular month, height of the students
of a class at a particular time, weight of the patients of a clinic in a particular time etc are the
examples of cross-section data.
Time series data: A set of figures observed over a period of time is called time series data.
Although it is not essential, it is common for these points to be equidistant in time.The GDP of
Bangladesh corresponding to different years will constitute a time series data.Year-
wiseproductions of a firm for last 15 years, month-wise prices of potatoes for last six years etc are
the examples of cross section data.
Panel Data: The combination of cross-section and time series data is called panel data. Year-wise
prices of different food stuffs for last ten years are another example of panel data. Following year-
wise GDPs growth rate of different countries for 1972-2012 are an example of panel data.
Country Bangladesh India Nepal Pakistan Bhutan
1972
1973
1974
1975
-
-
2012
Experimental data: The data which are collected by experiment i.e. by natural science are called
experimental data. Sometimes the investigators want to collect data in order to find the effect of
some factors on a given phenomenon that the effect of some other factors are constant, then these
types of data are called experimental data. For example, in order to find the impact of iodine on
arsenic doctor collect data keeping that the eating, smoking, and drinking habits of the people are
constant.
Sources of Data: The collection of data may range from a simple observation to a large scale
survey in any defined population. The tools and techniques to be employed to collect data depend
largely on the objectives of the study, the research design and the availability of time and money.
In any field of inquiry data can be collected from two sourcesnamely, (i) Primary sources (ii)
Secondary sources. And the data collected by these sources are known as primary data and
secondary data.
Primary Data: The primary data are those which are collected afresh and for the first time, and
thus happen to be original in character. Primary data are published by authorities who themselves
are responsible for their collection.
Methods of Collecting Primary Data: Questioning and observations are the two basic methods
of collecting primary data. In the observation method, the investigator asks no questions, but he
simply observes the phenomenon under consideration and records the necessary data. Sometimes
individuals make the observation, on other occasions, mechanical and electronic devices do the
job. In the observation method, it may be difficult to produce accurate data.
67
On the other hand, questioning as the name suggests is distinguished by the fact that data
are collected by asking questions of people who are thought to have the desired information.
Questions may be asked in person or in writing. A formal list of such questions is called a
questionnaire. Three different methods of communication with questionnaires are available:(a)
personal interview, (b) telephone and(c) mail. Personal interviews are those in which an
interviewer obtains information from respondents in face-to-face meetings. Telephone interviews
are similar except that communication between interviewer and respondent is via telephone
instead of direct personal contact. In most mail surveys, questionnaires are mailed to respondents
who also return them by mail.
Designing a Good Questionnaire:
(a) Number of questions should be kept to the minimum.
(b) Questions should be simple, short, and unambiguous.
(c) Questions of sensitive or personal nature should be avoided.
(d) Answers to questions should not require calculations.
(e) Questions should be capable of an objective answer.
(f) Questions should be arranged logically.
(g) Proper words should be used in the questionnaire.
(h) Questionnaire should look attractive.
(i) Questionnaire should be pre-tested to find out its shortcomings if any.
(j) Cross-Check and footnotes should be considered in the questionnaire.
(k) Necessary instructions should be given to the informant.
Editing of Primary Data:Editing involves reviewing the data collected by investigators to
ensure maximum accuracy and unambiguity. It should be done as soon as possible after the data
have been collected. If the size of the data is relatively small, it is desirable that only one person
edit all the data for the entire study. The different steps of editing are discussed below:
1. Checking legibility: Obviously, the data must be legible to be used. If a response is not
presented clearly, the concerned investigator should be asked to rewrite it.
2. Checking Completeness: An omitted entry on a fully structured questionnaire may mean
that no attempt was made to collect data from the respondent or that the investigator
simply did not record the data. If the investigator did not record the data, prompt editing
and questioning of the investigator may provide the missing item. If an entry is missing
because of the first possible cause, there is not much that can be done, except to make
another attempt to get the missing data. Obviously, this requires knowing why the entry is
missing.
3. Checking Consistency: The editor should examine each questionnaire to check
inconsistency or inaccuracy if any, in the statement. The income and expenditure figures
may be unduly inconsistent. The age and the date of birth may disagree. The area of an
agricultural plot may be unduly large. The concerned investigators should be asked to
make the necessary corrections. If there is any repetitive response pattern in the reports of
individual investigators they may represent investigator bias or perhaps attempted
dishonesty.
Secondary Data: When an investigator uses the data which have already been collected
andprocessed by others to satisfy their own needs such types of data are called secondary
data.Such data are primary for the agency that collected them, and becomes secondary for some
one else who uses these data for his own purposes. Secondary data are available in various
published and unpublished documents. Generally, secondary data are obtained from books,
magazines, newspaper, journals, reports, government& international publications, publications of
research organizations and professional bodies etc. The suitability, reliability, adequacy and
accuracy of the secondary data should be ensured before they are used for the study.
Scrutiny of Secondary Data: Primary data are to be scrutinized after the questionnaires are
completed by the interviewers. Likewise secondary data are to be scrutinized before they are
68
complied from the source.The scrutiny should be made to assess the suitability, reliability,
adequacy and accuracy of the data to be compiled and to be used for the proposed study.
1. Suitability: The complier should satisfy himself that the data contained in the publication
will be suitable for his study. In particular, the conformity of the definitions, units
measurement and time frame should be checked. For example, one US gallon is different
from one British gallon.
2. Reliability: The reliability of the secondary data can be ascertained from the collecting
agency, mode of collection and the time period of collection. For instance, secondary data
collected by a voluntary agency with unskilled investigators are unlikely to be reliable.
3. Adequacy: The source of data may be suitable and reliable but the data may not be
adequate for the proposed enquiry. The original data may cover a bigger or narrower
geographical region or the data may not cover suitable periods. For instance, per capita
income of Pakistan prior to 1971 is inadequate for reference during the subsequent periods
as it became separated into two different countries with considerable variation in standard
of living.
4. Accuracy: The user must be satisfied about the accuracy of the secondary data. The
process of collecting raw data, the reproduction of processed data in the publication, the
degree of accuracy desired and achieved should also be satisfactory and acceptable to the
researcher.
Measure of location: A single value that summarizes a set of data. It locates the center of the
values.
Arithmetic mean: The sum of observations divided by the total number of observations.
The mean is calculated as follows:
In terms of symbols, the formula for the arithmetic mean of a population is:
PopulationMean
Where:
is the population mean.
N is the number of items in the population.
X is a particular value.
∑ indicates the operation of adding all the values. It is pronounced “sigma.”
∑X is the sum of the X values. It is pronounced “sigma X.”
[3-1] indicates the formula number from the text.
SampleMean
Where:
is the sample mean; it is read AX bar@.
n is the number of values in the sample.
X is a particular value.
∑ indicates the operation of adding all the values.
∑X is the sum of the X values.
[3-2] is the formula number from the text.
The mean of a sample, or any other measure based on sample data, is called a statistic.
70
Statistic: A characteristic of a sample. Any function of sample observations.
“The mean weight of a sample of laptop computers is 15 pounds,” is an example of a statistic.
Note that in both of the above formulas the mean is calculated by summing the observations and dividing
by the total number of observations.
As an example, the Kellogg Company had quarterly earnings per share of $0.89, $0.77, $1.05, $0.79, and
$0.95. The mean is found by:
The mean quarterly earning per share is $0.89. In some situations the mean may not be representative of
the data.
As an example, the annual salaries of five executives are $40,000, $42,000, $44,000, $48,000, and
$300,000. The mean is:
Notice how the one extreme value ($300,000) pulled the mean upward. Four of the five executives
earned less than the mean, raising the question whether the arithmetic mean value of $94,800 is typical
of the salary of the five executives.
71
6. Mean of composite series: If X̄ i , (i=1 , 2 , ⋯⋯, k ) are the means of k-component series of sizes
ni, (i = 1, 2, ….., k) respectively, then the mean X̄ of the composite series obtained on combining
the component series is given by the formula:
k
n1 X̄ 1 + n2 X̄ 2 +⋯⋯ + nk X̄ k ∑ ni X̄ i
X̄ = = i=1k
n1 +n2 +⋯⋯ + nk
∑ ni
i =1
The geometric mean can be used for averaging percents. Suppose the return on investment for
McDermoll International for the past 4 years is 0.4%, 2.9%, 2.1%,and 12.3%. The GM increase over the
period is 4.3 percent, found by:
Uses of GM:
i) To find the rate of population growth and the rate of interest
ii) In the construction of index number
72
Harmonic mean of a number of observations is the reciprocal of the AM of the reciprocals of the given
n
1 1 1 1
+ + +⋯⋯+
X
values. 1 X 2 X 3 X n
Uses of HM:
It is used for calculating average speed of automobiles.
Weighted Mean
The weighted mean is a special case of the arithmetic mean. It is often useful when there are several
observations of the same value.
Weighted mean: The value of each observation is multiplied by the number of times it occurs.
The sum of these products is divided by the total number of observations to give the weighted
mean.
In general, the weighted mean of a set of numbers, designated X1, X2, X3, Xn, with the corresponding
weights w1, w2, w3, , wn is computed by:
Weighted Mean
The weighted mean is particularly useful when various classes or groups contribute differently to the
total. For example, the coronary care unit of a hospital consists of nurses= aides who are paid $12 per
hour, nurses = assistants who earn $15 per hour, and registered nurses who earn $21 per hour.
It would not be accurate to say the average hourly wage for the coronary unit is $16 per hour ($12 + $15
+ $21) / 3 unless there was the same number of people in each group.
Suppose the coronary care unit has ten employees: two aides who earn $12 per hour, 3 nurses=
assistants who earn $15 per hour, and five registered nurses who earn $21 per hour. The weighted mean
is:
73
Fifty percent of the observations are above the median and 50 percent are below the median. To
determine the median, the values are ordered from low to high, or high to low, and the middle value
selected. Hence, half the observations are above the median and half are below it. For the executive
incomes, the middle value is $44,000, the median.
$40,000 $42,000 $44,000 $48,000 $300,000
median
Obviously, it is a more representative value in this problem than the mean of $94,800.
Note that there were an odd number of executive incomes (5). For an odd number of ungrouped values
we just order them and select the middle value. To determine the median of an even number of
ungrouped values, the first step is to arrange them from low to high as usual, and then determine the
value half way between the two middle values.
As an example, the final grades of the six students in Mathematics 126 were 87, 62, 91, 58, 99, and 85.
Ordering these from low to high:
DD
58 62 85 87 91 99
The median grade is halfway between the two middle values of 85 and 87. The median grade is 86. Thus
we note that the median (86) may not be one of the values in a set of data.
The formula of finding median for grouped data is given below
N
Cf
Median L1 2 i
fm
Where
L1 is the lower limit of median class
N total number of observations
Cf Cumulative frequency of the class just preceding the median class
fm Frequency of the median class
i Width of the median class
Advantages of Median:
i) Well defined
ii) Readily comprehensible and easy to calculate.
iii) Not affected by extreme values
iv) Can be calculated for a distribution when extreme class is open
Disadvantages of Median:
i) Not based on all observations.
ii) Not suitable for further mathematical treatment
iii) As compared to AM it is affected much by sampling fluctuations
Uses of Median:
i) It is the only average to be used while dealing with qualitative data, which cannot be
measured quantitatively but still can be arranged in ascending or descending order of
magnitude. e.g., to find the average intelligence, or average honesty among a group of
people.
ii) It is to be used to determining the typical values in the problems concerning wages,
distribution of wealth, etc.
Properties of the Median
The major properties of the median are:
1. The median is a unique value, that is, like the mean, there is only one median for a set of data.
2. It is not influenced by extremely large or small values.
3. It can be computed for ratio level, interval level, and ordinal-level data.
74
4. Fifty percent of the observations are greater than the median and fifty percent of the
observations are less than the median.
Solved Problems
Problem 1: The monthly income (in Tk.) of 10 persons working in a firm as follows:
1500, 1600, 1800, 1700, 1600, 1200, 1500, 2000, 1500, 1800. Calculate the Arithmetic Mean.
75
n
∑ xi
16200
x̄= i=1 = =1620.
Solution: n 10
x= A+
∑ d =50+ 40 =54
We know that, n 10
Problem 3: Calculate the mean from the following data:
Value 1 2 3 4 5 6 7 8 9 10
Frequency 21 30 28 40 26 34 40 9 15 57
Solution:
x f fx
1 21 21
2 30 60
3 28 84
4 40 160
5 26 130
6 34 204
7 40 280
8 9 72
9 15 135
10 57 570
∑ f =300 ∑ fx=1716
x=
∑ f i x i =1716 =5 . 72
We know that ∑ f i=n 300
Solution:
Profits per shop Mid-point Number of Shops(f) fm
100-200 150 10 1500
200-300 250 18 4500
300-400 350 20 7000
400-500 450 26 11700
500-600 550 30 16500
600-700 650 28 18200
700-800 750 18 13500
∑ f =150 ∑ fm=72900
x=
∑ fm = 72900 =486
We know that ∑ f 150
Problem 5: Compute the median from the following series:
Daily Savings Number of Workers
30-35 3
36-41 10
42-47 18
48-53 25
54-59 8
60-65 6
Solution:
Daily Savings Number of Workers Cumulative frequency
30-35 3 3
36-41 10 13
42-47 18 31
48-53 25 56
54-59 8 64
60-65 6 70
Total 70
n
∴ =35 f =31 f m=25 and d = 6
Here n=70 2 , median item lies in the class 48-53, L=48 , c
( )
n
−f
2 c
m̄=L+ ×i
fm
We know that
35−31
=48+( )×6=48 . 96
25
77
31-36 21
37-42 47
43-48 62
49-54 37
55-60 16
61-66 5
Solution: By inspection mode lies in the class 43-48.
Here L= 43,
Δ 1= 62 – 47 = 15, Δ 2 = 62 – 37 = 25 and d = 6
( )
Δ1
M 0 =L+ ×i
Δ1+ Δ2
We know that
15
=43+( )×6=45 .25
15+25
Ages of Students
20, 20, 21, 22, 60
The range of ages is 40 years, yet four of the five students’ ages are within two years of each other. The
60-year old student has distorted the spread. Another disadvantage is that only two values, the largest
and the smallest, are used in its calculation.
Mean Deviation: The arithmetic mean of the absolute values of the deviations from the arithmetic mean.
In terms of symbols, the formula for the mean deviation is:
MD=
∑ |X− X̄|
Mean Deviation n [3-5]
Where:
X is the value of each observation.
X is the arithmetic mean.
n is the number of observations in the sample.
|| indicates the absolute value.
We disregard the signs of the deviations from the mean because if we didn’t, the positive and negative
deviations from the mean exactly offset each other, and the mean deviation would always be zero. Such a
measure (zero) would be a useless statistic.
The mean deviation is computed by first determining Absolute
the difference between each observation and the mean. X Deviation
These differences are then averaged without regard to 17 21 = 4 = 4
their signs. For the PM statistics class the mean 17 21 = 4 = 4
deviation is 4.0 years, found by the table on the right: 18 21 = 3 = 3
Then 20 21 = 1 = 1
25 21 = 4 = 4
29 21 = 8 = 8
= 24
The parallel lines indicate absolute value. To
interpret, 4.0 years is the mean amount by which the ages differ from the arithmetic mean age of 21.0
years for the PM students.
79
dollars. Because the standard deviation is easier to interpret, it is more widely used than the mean
deviation or the variance.
Population Variance
The formula for the population variance and the sample variance are slightly different. The formula for
the population variance is:
σ 2
=
∑ ( X−μ )2
Population Variance N [3 – 6]
Where:
2 is the symbol for the population variance.
X is a value of an observation in the population.
is the arithmetic mean of the population.
N is the total number of observations in the population.
Sample Variance
The conversion of the population variance formula to the sample variance formula is not as direct as the
change made when we went from the population mean formula to the sample mean formula. Recall that
we replaced with X and N with n.
The conversion from population variance to sample variance requires a change in the denominator.
Instead of substituting n, the number in the sample, for N, the number in the population, we replace N
with (n – 1). Thus the formula for the sample variance is:
s2
(X X )2
Sample Variance n 1 [3 – 8]
Where:
s2 is the symbol for the sample variance. It is pronounced as “s squared.”
X is the value of each observation in the sample.
X is the mean of the sample.
n is the total number of observations in the sample.
Changing the denominator to (n – 1) seems insignificant, however the use of n tends to underestimate the
population variance. The use of (n –1) in the denominator provides an appropriate correction factor.
80
2
If σ i , (i=1 , 2 , ⋯⋯, k ) are the variances of k-component series of sizes n i, (i = 1, 2, ….., k) respectively,
2
then the variance σ of the composite series obtained on combining the component
n1 ( σ 21 + d 21 )+ n2 ( σ 22 +d 22 )+⋯⋯+ nk ( σ 2k +d 2k )
2
σ =
series is given by the formula:
n1 + n2 +⋯⋯ +n k
Relative Dispersion
Suppose we want to compare the variability of two sets of data that are measured in different units such
as one in dollars and the other in years. How can this be done? Relative dispersion is the answer. Below
are the four relative measures of dispersion:
L−S
Coefficient of Range = L+S
Q3 −Q1
Coefficient of Quartile deviation = Q3 +Q1
MD σ
Coefficient of Mean deviation = X̄ , and Coefficient of Standard deviation = X̄
CHAPTER PROBLEMS
Problem 1
A comparison shopper employed by a large grocery chain recorded these Supermarket Price X
prices for a 340-gram jar of Kraft blackberry preserves at a sample of six 1 $1.31
supermarkets selected at random. 2 1.35
3 1.26
a. Compute the arithmetic mean.
4 1.42
5 1.31
b. Compute the median.
6 1.33
c. Compute the mode. Total $7.98
Solution:
a. Determine the mean price of this raw data by summing the prices for the six jars and dividing the
total by six. Using the formula for the mean of a sample we get.
81
X $7.98
X $1.33
n 6
b. As noted above the medianis defined as the middle value of a set of data, after the data is arranged
from smallest to largest. The prices for the six jars of blackberry preserves have been ordered from a
low of $1.26 up to $1.42. Because this is an even number of prices the median price is halfway
between the third and the fourth price. The median is $1.32.
Prices Arranged from Low to High:
$1.26 $1.31 $1.31 $1.33 $1.35 $1.42
Suppose there are an odd number of blackberry preserve prices, such as shown in the table.
$1.31 $1.31 $1.33 $1.35 $1.42
The median is the middle value ($1.33). To find the median, the values must first be ordered from low
to high.
c. The mode is the price that occurs most often. The price of $1.31 occurs twice in the original data and
is the mode.
Problem 2
A sample of the amounts spent in November for propane gas to heat homes of similar sizes in Duluth
revealed these amounts (to the nearest dollar):
191 212 176 129 106 92 108 109 103 121 175 194
What is the range? Interpret your results.
Solution:
Recall that the range is the difference between the largest value and the smallest value.
Range = Highest Value – Lowest Value = $212 - $92 = $120
This indicates that there is a difference of $120 between the largest and the smallest heating cost.
Problem 3
Using the heating cost data in Problem 2, compute the mean deviation.
Solution:
The mean deviation is the mean of the absolute deviations from the arithmetic mean. For raw, or
ungrouped data, it is computed by first determining the mean. Next, the difference between each value
and the arithmetic mean is determined. Finally, these differences are totaled and the total divided by the
number of observations.
The table below shows the data values, each data value minus the mean, and the absolute value of
the deviations from the mean. In other words, the signs of the deviations from the mean are disregarded.
Payment Absolute
Deviations
$191 |$+48 | = $48
212 | +69 | = 69
176 | +33 | = 33
129 | -14 | = 14
106 | –37 | = 37
92 | –51 | = 51
108 | –35 | = 35
109 | –34 | = 34
103 | –40 | = 40
121 | –22 | = 22
175 | +32 | = 32
194 | +51 | = 51
$1,716 $466
82
The mean deviation of $38.83 indicates that the typical electric bill deviates $38.83 from the mean of
$143.00.
Problem 4
The hourly wages for a sample of plumbers were grouped into the Hourly Number
following frequency distribution. Since the wages have been grouped Wages f
into classes, we refer to the following distribution as being grouped $8 up to $10 3
data. $10 up to $12 6
$12 up to $14 12
a. Compute the arithmetic mean.
$14 up to $16 10
b. Compute the mode. $16 up to $18 7
$18 up to $20 2
40
Solution:
a. The arithmetic mean of this sample data, grouped into a frequency distribution, is computed by
formula.
X̄ =
∑ fX
n
Where:
is the designation for the arithmetic mean.
x is the mid-value, or midpoint, of each class.
f is the frequency in each class.
fX is the frequency in each class times the midpoint of the class.
åfM is the sum of these products.
n is the total number of frequencies.
It is assumed that the observations in each class are represented by the midpoint of the class. The
midpoint of the first class is $9.00, found by ($8.00 + $10.00)/2. For the next higher class, the midpoint is
$11.00.
Using formula for the arithmetic mean hourly wage is $13.90, found by
Wage Frequency Class
fX
Rate f Midpoint X
b. The mode is the value that occurs most often.. So we can say that mode of this distribution lies in
the class $12 up to $14. For data grouped into a frequency distribution mode is
83
Δ1 6
Mode=L1 + × i 12+ × 2
Δ1+ Δ2 = 6+2 = 13.5
Problem 5
Determine the mean and SD of sales of 100 First Food Restaurants in the Eastern Districts (in ’ 000$)
Sales Number of Restaurants
700 - 800 4 Solution:
2. The mean marks obtained in an examination by a group of 100 students were found to be 49.96. The
mean marks obtained in the same examination by another group of 200 students were 52.32. Find
the mean of marks obtained by both groups of students taken together.[Ans. 51.53]
3. The mean marks got by 300 students in the subject statistics is 45. The mean of the top 100 of them
was found to be70 and the mean of last 100 was known to be 20. What is the mean of the remaining
100 students?
4. The mean weekly salary paid to all employees in a company is Tk. 500. The mean weekly salary paid
to male and female employee is Tk. 520 and 420 respectively. Determine the percentage of males
and females employed by the company.
84
2. Run scored by two cricketers in 10 ODI matches are as follows
Cricketer – A: 90, 27, 08, 80, 13, 105, 06, 60, 45, 00
Cricketer – B: 25, 50, 65, 43, 75, 56, 16, 67, 49, 37
Which cricketer may be considered a more consistent player?
3. The sum of squares corresponding to length X (in cm.) and weight Y (in gm.) of 50 tapioca tubes are
given below:
∑ X=212, ∑ X 2=902 . 8 , ∑ Y =261, and ∑ Y 2=1457.6
Which is more varying, the length or weight?
√
4. For a group containing 100 observations, the AM and SD are 8 and 10 .5 respectively. For 50
observations, selected from those 100 observations the mean and SD are 10 and 2 respectively. Find
the AM and SD of the other 50 observations.
5. In two factories A and B engaged in the same industry, the average weekly wages and SD’s are as
follows:
Factory Ave. weekly wage SD of wage No. of wage earners
A 460 50 100
B 490 40 80
a) Which factory A and B pays large amount as weekly wages?
b) Which factory shows greater variability in the distribution of wages?
c) What is the mean and SD of all workers in two factories taken together?
6. FundInfo provides information to its subscribers to enable them to evaluate the performance of
mutual funds they are considering as potential investment vehicles. A recent survey of Funds whose
started investment goals was growth and income produced the following data on total annual rate of
return over the five years:
Annual rate 11 - 12 12 - 13 13 - 14 14 - 15 15 - 16 16 - 17 17 - 18 18 - 19
of return
Frequency 2 2 8 10 11 8 3 1
Calculate the mean, Variance and SD of the annual rate of return for this sample of 45 funds.
85