Data Collection and Descriptive
Statistics
Lecture Notes
Week 2
Introduction
• Data collection and descriptive statistics are
fundamental in epidemiology, biostatistics,
and public health research.
Methods of Data Collection
Data collection methods:
• Surveys
• Experiments
• Secondary Data
Surveys
A structured method of collecting information
through from a sample of individuals using
questionnaires, interviews, or online forms.
Types Of Survey
• Cross-sectional and longitudinal surveys
• Self-administered and interviewer-
administered surveys
Advantages & Disadvantages
Advantages Disadvantages
• Collect data from a • Responses may be
large sample. biased (e.g., social
• Relatively inexpensive desirability bias).
and quick. • Low response rates can
• Allows collection of affect validity.
both qualitative and • Designing effective
quantitative data. questions can be
challenging.
Experiments
Involves manipulating variables to study their
effects on an outcome. This method is
commonly used in clinical trials and intervention
studies.
Types Of Experiments
• Randomized Controlled Trials (RCTs)
• Quasi-experimental studies
Advantages & Disadvantages
Advantages Disadvantages
• Allows for causal • Expensive and time-
inference. consuming.
• Provides high internal • Ethical concerns may
validity. arise (e.g., withholding
• Can control for treatment).
confounding variables. • Results may not be
generalizable to real-
world settings.
Secondary Data Collection
• Data collected by others, used for new
research.
• Examples: Government records, hospital data,
census.
Advantages & Disadvantages
Advantages Disadvantages
• Cost-effective and • May not be specific to
time-saving. the research question.
• Allows for large-scale • Data accuracy and
analysis. reliability depend on
• Useful for longitudinal the original source.
and trend analysis. • May contain missing or
outdated information.
Data Organization & Presentation
Once data is collected, it needs to be organized
and presented in a meaningful way using tables,
graphs, and histograms.
Methods:
• Tables
• Graphs (Bar, Pie, Line charts)
• Histograms
Tables
Tables organize raw data in rows and columns
for easy interpretation.
Example of a Frequency Table
Age Group (Year) Number Of Respondant
0-10 15
11-12 30
21-30 50
31-40 40
Graphs
Graphs visually represent data trends and
distributions.
Types of Graphs
1. Bar Chart: Used for categorical data
representation.
2. Pie Chart: Displays proportions of a whole.
3. Line Graph: Shows trends over time.
Histograms
A histogram is a graphical representation of the
distribution of numerical data. It differs from a bar
chart because it represents continuous data
rather than categories.
Example: Age Distribution of Participants in a
Study
• X-axis: Age groups
• Y-axis: Number of participants
• Bars represent frequency of each age group
Measures of Central Tendency
Measures of central tendency summarize data
by identifying the center or most representative
value.
Summarizes data:
• Mean
• Median
• Mode
Mean (Arithmetic Average)
Formula: Mean = Sum of values / Total Number of observations
Mean(x̄ )=∑ x/n
Example: If the weights of five individuals are 50 kg, 55 kg, 60 kg, 65
kg, and 70 kg. what is the mean value?
Mean = 50 + 55 + 60 + 65 + 70 / 5
= 300 / 5
= 60kg
Median (Middle Value)
• Odd observations: Middle value
• Even observations: Average of two middle
values
• 50 + 55 + 60 + 65 + 70+ 75
median = 60 + 65 / 2 = 62.5
Mode (Most Frequent Value)
• The value appearing most often in a dataset.
Measures of Dispersion
Describes spread of data:
• Range
• Variance
• Standard Deviation
Range
• Formula: Range = Max - Min
• Example: Data (12, 17, 24, 30, 35), Range = 35
- 12 = 23
Variance
• Measures how far values are from the mean.
• Formula: Variance = Σ(X - Mean)² / (N-1)
Standard Deviation
• Square root of variance, represents data
dispersion.
Conclusion
• Understanding data collection and statistics is
crucial for research analysis and
interpretation.
Assignment
1a) Explain the difference between primary and secondary data collection.
b) Describe three advantages and three disadvantages of using surveys for data
collection.
2. Data Organization and Presentation
a) Create a frequency table for the following dataset of student ages: 18, 19, 18, 20, 21,
19, 22, 23, 20, 21.
b) Differentiate between bar charts and histograms with examples.
3. Measures of Central Tendency
a) Calculate the mean, median, and mode for the dataset: 5, 10, 15, 20, 25, 10, 15, 20.
4. Measures of Dispersion
a) Given the data 4, 6, 8, 10, 12, calculate the range.
b) Explain the significance of standard deviation in data analysis.