0% found this document useful (0 votes)
4 views22 pages

Second Lecture

Uploaded by

dinaelkordy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views22 pages

Second Lecture

Uploaded by

dinaelkordy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

‫مادة تصميم وتحليل‬

‫التجارب‬
‫" المحاضرة الثانية"‬
‫د‪ .‬سوزان عبد الرحمن‬
‫مدرس بكلية الدراسات العليا للبحوث اإلحصائية‬
Interpretation
presentation

Data
analysis
Data
collection

Design
of study
Sample
Population A subset of the large
Entire group for which group(population) for which
information is wanted information is collected to learn
about the larger group

Continuous data Binary (dichotomous) data:


• Blood pressure takes on only two values, “yes” or “no”
• Weight
• Height • Having COVID-19:Yes/No
• Age • Sex: Male/Female
• Income • Smoking: Yes/No

Categorical data: an extension of binary data to include more than 2 possible values
• Nominal categorical data: no order to categories [Country of birth , marital status]
• Ordinal categorical data: order to categorize [Income level classified into four levels].
[Degree of agreement, five categories from strongly disagree to strongly agree ]
Mean
Measures of
the center of
data
Median

variance
Describing
continuous data Measure of
data variability
Standard
deviation

Other
measures of Percentiles
location
Measures of the center of data
▪ Systolic Blood pressure of 20 patients

𝒙𝟏 =120 𝒙𝟐 =80 𝒙𝟑 =95 𝒙𝟒 =115 𝒙𝟓 =89 𝒙𝟔 =160 𝒙𝟕 =140 𝒙𝟖 =120 𝒙𝟗 =115 𝒙𝟏𝟎 =120

𝑥11 =110 𝑥12 =115 𝑥13 =180 𝑥14 =190 𝑥15 =110 𝑥16 =105 𝑥17 =95 𝑥18 =80 𝑥19 =120 𝑥20 =115

The sample mean:


120+80+95+115+89+⋯…+115 2374
ഥ=
𝒙 = = 118.7 ≈ 119 𝑚𝑚𝐻𝐺
20 20
Sample average
( arithmetic mean)
σ𝑛
𝑖=𝑛 𝑥𝑖
𝑥=
ҧ
𝑛
▪ where σ𝑛𝑖=𝑛 𝑥𝑖 = 𝑥1 + 𝑥2 + 𝑥3 + 𝑥4 + 𝑥5 ……… 𝑥𝑛

▪ Sample mean(𝑥)ҧ is different from the population mean(𝜇)


▪ ► 𝑥ҧ is the best estimate of the population mean (µ)

▪ Disadvantages

▪ Sensitive of extreme values ( in smaller samples) m if we change the value of one data point
could make change in the sample mean, replace 𝑥18 =80 with 220
▪ 𝑥=
ҧ 125.7 𝑚𝑚𝐻𝐺 instead of 119 𝑚𝑚𝐻𝐺
Median
▪ The median is the middle value in an ordered set of continuous data
▪ The median is also called the 50th percentile
▪ The median value of five patient
𝑥1 =120 , 𝑥2 =80, 𝑥3 =95, 𝑥4 =115, 𝑥5 =89
Not sensitive to
Order : 80 89 95 115 120 the influence of
extreme sample
values

▪ If we replace 𝑥1 =220 , 𝑥2 =80, 𝑥3 =95, 𝑥4 =115, 𝑥5 =89


Order : 80 89 95 115 220
The sample variance
▪ Sample variance 𝑆 2 while the sample standard deviation (S or SD)
▪ The sample variance is the average of the square of the deviations about the sample
mean.

σ𝑛 (𝑥 − ҧ
𝑥) 2
𝑆2 = 𝑖=𝑛 𝑖
𝑛−1
▪ The sample standard deviation is the square root of the sample variance

σ𝑛𝑖=𝑛(𝑥𝑖 − 𝑥)ҧ 2
𝑆 =
𝑛−1
► s is the best estimate of the population standard deviation (σ)
▪ Systolic blood pressures (mmHg), n=5: 120 mmHg, 80 mmHg, 90 mmHg, 110 mmHg,
95 mmHg. The mean 𝒙 ഥ is 99 mmHg.
▪ ► The sample variance computation, numerator:
The sample variance
The more variability there is in the sample of data, the larger the value of s
► s measures the variability (spread) of the individual sample values around the
sample mean
► s can equal 0 only if there is no variability (if all n sample observations have
the same value)
► The units of s are the same as the units of the data measurements in the
sample (for example, mmHg)
► Often abbreviated SD or sd
s2 is the best estimate from the sample of the population variance σ2; s is the
best estimate of the population standard deviation σ
female (142) Female (142) Male (142) Male (162)

female (120) Female (123) Male (120) Male (183)

female (115) Female (107) Male (115) Male (187)

female (140) Female (129) Male (140) Male (179)

female (155) Female (114) Male (155) Male (154)

female (135) Female (105) Male (135) Male (195)

female ( 140) Female (128) Male ( 140) Male (178)

Female (150) Female (108) Male (150) Male (168)

…………..

Estimate: …………

……………
Percentiles
Sample percentiles are Used to describe the distribution of the continuous data
𝑃𝑡ℎ sample percentile is the value in a sample of the data such that the p percent of the
sample values are less than or equal to this value.
Percentiles can be computed by hand or via computer.
Systolic blood pressure (SBP) measurements from a random sample of 113 adult men
taken from a clinical population (based on results from a computer)
► The 10th percentile for these 113 blood pressure measurements is 107 mmHg,
meaning that approximately 10% of the men in the sample have SBP ≤ 107 mmHg, and
(100−10) = 90% of the men have SBP > 107 mmHg
► The 75th percentile for these 113 blood pressure measurements is 132 mmHg,
meaning that approximately 75% of the men in the sample have SBP ≤ 132 mmHg, and
(100−75) = 25% of the men have SBP > 132 mmHg
Continuous Data: Visual Displays
Utilize histograms and boxplots to visualize the distributions of samples of
continuous data

► Identify key summary statistics on the boxplot

► Name and describe basic characteristics of some common distribution


shapes for continuous data
► Means, standard deviations, and percentile values do not tell the whole story
of data distributions

► Differences in shape of the distribution


Histograms are a way of displaying the distribution of a set of data by charting the
number (or percentage) of observations whose values fall within pre-defined
numerical ranges
Data on systolic blood pressure (SBP) from a random clinical sample of 113 men
► A histogram can be created by:
► Breaking the data (blood pressure) range into bins of equal width
► Counting the number of the 113 observations whose blood pressure values fall
within each bin
► Plotting the number (or relative frequency) of observations that fall within
each bin as a bar graph
Percentage of observations on the vertical axis, larger bin width
Boxplots are graphics that display key characteristics of a dataset: these are especially nice tools for
comparing data from multiple samples visually

Q1: lower quartile ( 25% of data points are under Q1 when arranged in increasing order.
Q3: upper quartile (75% of data points under Q3 when arranged in in creasing order)
Q2: median ( divide the data into two equal parts)
Interquartile range )IQR)= Q3-Q1
Left (negatively) skewed Right (positively) skewed
▪ Histograms and boxplots are useful visuals tools for characterizing the shape of a data
distribution above and beyond the information given by summary statistics.
▪ Relatively common shapes for samples of continuous data measures include symmetric and
“bell” shaped, right skewed, left skewed, and uniform
▪ Suggest graphical approaches to comparing distributions of continuous data between two or
more samples
► Explain why a difference in sample means can be used to quantify, in a single number summary,
differences in distributions of continuous data.
Such comparisons can be used to investigate questions, such as:
► How does weight change differ between those who are on a low-fat diet compared to those on
a low-carbohydrate diet?
► How do salaries differ between males and females?
► How do cholesterol levels differ across weight groups
Common numerical comparison: difference in means

► On average, male children weigh more than female


children by 0.7 kg

On average, female children weigh less than male children by 0.7 kg,
which is the same as stating, “on average, male children weigh more
than female children by 0.7 kg
Normal distribution

▪ The normal distribution is a theoretical probability distribution


that is perfectly symmetric about its mean (and median and
mode)
▪ ► A “bell”-like shape
▪ Normal distributions are uniquely defined by two quantities: a
mean (µ) and standard deviation (σ)
▪ All normal distributions, regardless of mean and standard
deviation values, have the same structural properties:
► Mean = median (= mode)
► Values are symmetrically distributed around the mean
► Values “closer” to the mean are more frequent than values
“farther” from the mean

You might also like