Second Lecture
Second Lecture
التجارب
" المحاضرة الثانية"
د .سوزان عبد الرحمن
مدرس بكلية الدراسات العليا للبحوث اإلحصائية
Interpretation
presentation
Data
analysis
Data
collection
Design
of study
Sample
Population A subset of the large
Entire group for which group(population) for which
information is wanted information is collected to learn
about the larger group
Categorical data: an extension of binary data to include more than 2 possible values
• Nominal categorical data: no order to categories [Country of birth , marital status]
• Ordinal categorical data: order to categorize [Income level classified into four levels].
[Degree of agreement, five categories from strongly disagree to strongly agree ]
Mean
Measures of
the center of
data
Median
variance
Describing
continuous data Measure of
data variability
Standard
deviation
Other
measures of Percentiles
location
Measures of the center of data
▪ Systolic Blood pressure of 20 patients
𝒙𝟏 =120 𝒙𝟐 =80 𝒙𝟑 =95 𝒙𝟒 =115 𝒙𝟓 =89 𝒙𝟔 =160 𝒙𝟕 =140 𝒙𝟖 =120 𝒙𝟗 =115 𝒙𝟏𝟎 =120
𝑥11 =110 𝑥12 =115 𝑥13 =180 𝑥14 =190 𝑥15 =110 𝑥16 =105 𝑥17 =95 𝑥18 =80 𝑥19 =120 𝑥20 =115
▪ Disadvantages
▪ Sensitive of extreme values ( in smaller samples) m if we change the value of one data point
could make change in the sample mean, replace 𝑥18 =80 with 220
▪ 𝑥=
ҧ 125.7 𝑚𝑚𝐻𝐺 instead of 119 𝑚𝑚𝐻𝐺
Median
▪ The median is the middle value in an ordered set of continuous data
▪ The median is also called the 50th percentile
▪ The median value of five patient
𝑥1 =120 , 𝑥2 =80, 𝑥3 =95, 𝑥4 =115, 𝑥5 =89
Not sensitive to
Order : 80 89 95 115 120 the influence of
extreme sample
values
σ𝑛 (𝑥 − ҧ
𝑥) 2
𝑆2 = 𝑖=𝑛 𝑖
𝑛−1
▪ The sample standard deviation is the square root of the sample variance
σ𝑛𝑖=𝑛(𝑥𝑖 − 𝑥)ҧ 2
𝑆 =
𝑛−1
► s is the best estimate of the population standard deviation (σ)
▪ Systolic blood pressures (mmHg), n=5: 120 mmHg, 80 mmHg, 90 mmHg, 110 mmHg,
95 mmHg. The mean 𝒙 ഥ is 99 mmHg.
▪ ► The sample variance computation, numerator:
The sample variance
The more variability there is in the sample of data, the larger the value of s
► s measures the variability (spread) of the individual sample values around the
sample mean
► s can equal 0 only if there is no variability (if all n sample observations have
the same value)
► The units of s are the same as the units of the data measurements in the
sample (for example, mmHg)
► Often abbreviated SD or sd
s2 is the best estimate from the sample of the population variance σ2; s is the
best estimate of the population standard deviation σ
female (142) Female (142) Male (142) Male (162)
…………..
Estimate: …………
……………
Percentiles
Sample percentiles are Used to describe the distribution of the continuous data
𝑃𝑡ℎ sample percentile is the value in a sample of the data such that the p percent of the
sample values are less than or equal to this value.
Percentiles can be computed by hand or via computer.
Systolic blood pressure (SBP) measurements from a random sample of 113 adult men
taken from a clinical population (based on results from a computer)
► The 10th percentile for these 113 blood pressure measurements is 107 mmHg,
meaning that approximately 10% of the men in the sample have SBP ≤ 107 mmHg, and
(100−10) = 90% of the men have SBP > 107 mmHg
► The 75th percentile for these 113 blood pressure measurements is 132 mmHg,
meaning that approximately 75% of the men in the sample have SBP ≤ 132 mmHg, and
(100−75) = 25% of the men have SBP > 132 mmHg
Continuous Data: Visual Displays
Utilize histograms and boxplots to visualize the distributions of samples of
continuous data
Q1: lower quartile ( 25% of data points are under Q1 when arranged in increasing order.
Q3: upper quartile (75% of data points under Q3 when arranged in in creasing order)
Q2: median ( divide the data into two equal parts)
Interquartile range )IQR)= Q3-Q1
Left (negatively) skewed Right (positively) skewed
▪ Histograms and boxplots are useful visuals tools for characterizing the shape of a data
distribution above and beyond the information given by summary statistics.
▪ Relatively common shapes for samples of continuous data measures include symmetric and
“bell” shaped, right skewed, left skewed, and uniform
▪ Suggest graphical approaches to comparing distributions of continuous data between two or
more samples
► Explain why a difference in sample means can be used to quantify, in a single number summary,
differences in distributions of continuous data.
Such comparisons can be used to investigate questions, such as:
► How does weight change differ between those who are on a low-fat diet compared to those on
a low-carbohydrate diet?
► How do salaries differ between males and females?
► How do cholesterol levels differ across weight groups
Common numerical comparison: difference in means
On average, female children weigh less than male children by 0.7 kg,
which is the same as stating, “on average, male children weigh more
than female children by 0.7 kg
Normal distribution