Statistical Methods I Outline of Topics: (Tjones@cog - Ufl.edu)
Statistical Methods I Outline of Topics: (Tjones@cog - Ufl.edu)
Statistical Methods I
Tamekia L. Jones, Ph.D.
I. Descriptive Statistics
([email protected]) II. Hypothesis Testing
Research Assistant Professor III. Parametric Statistical Tests
Children’s Oncology Group Statistics & Data Center IV Nonparametric Statistical Tests
IV.
Department of Biostatistics
Colleges of Medicine and Public Health & Health
V. Correlation and Regression
Professions
2
Types of Data
Descriptive Statistics
• Nominal Data
– Gender: Male, Female • Descriptive statistical measurements are used
in medical literature to summarize data or
• Ordinal Data describe the attributes of a set of data
– Strongly disagree, Disagree, Slightly disagree,
Neutral,, Slightly
g y agree,
g , Agree,
g , Stronglyg y agree
g • Nominal data – summarize using
rates/proportions.
/ i
• Interval Data – e.g. % males, % females on a clinical study
– Numeric data: Birth weight Can also be used for Ordinal data
3 4
1
Descriptive Statistics (contd) Measures of Central Tendency
• Summary Statistics that describe the
• Two parameters used most frequently in location of the center of a distribution of
clinical medicine numerical or ordinal measurements where
– Measures of Central Tendency - A distribution consists of values of a characteristic
– Measures of Dispersion and the frequency of their occurrence
2
Measures of Central Tendency (contd)
Measures of Central Tendency (contd)
9 10
11 12
3
Measures of Dispersion Measures of Dispersion (contd)
• Measures that describe the spread or variation in Range = difference between the largest and the
the observations smallest
ll t observation
b ti
• Used with numerical data to emphasize
• Common measures of dispersion extreme values
• Range
• Standard Deviation
• Coefficient of Variation • Serum cholesterol example
• Percentiles Minimum = 3.8, Maximum = 7.1
• Inter-quartile Range
Range = 7.1 – 3.8 = 3.3
13 14
– Standard
St d d deviation
d i ti like
lik the
th mean requires
i numerical
i l data
d t
– Variance = s2
15 16
4
Measures of Dispersion (contd)
5
Measures of Dispersion (contd) Measures of Dispersion (contd)
• Confirms (or refutes) the assertion that the observed • Null Hypothesis (Ho )
findings did not occur by chance alone but due to a – Usually the hypothesis that the researcher wants to gather evidence
true association between the dependent and against
independent
p variable
• Alternative (or Research) Hypothesis (Ha)
• The aim of the researcher is to demonstrate that the – Usually the hypothesis for which the researcher wants to gather
observed findings from a study are statistically supporting evidence
significant.
23 24
6
Hypothesis Testing (contd)
Hypothesis Testing (contd)
Ho : There is no difference between smokers and nonsmokers
Example: A researcher studied the relationship with respect to the risk of developing lung cancer.
cancer That is,
is
between Smoking and Lung cancer. the observed difference (in the sample), if any, is by
chance alone.
Test Statistic
• Statistics whose primary use is in testing hypotheses Types of Errors
are called test statistics
Truth
27 28
7
Hypothesis Testing (contd)
Hypothesis Testing (contd)
Alpha (α) = Probability of Type I error; significance level of the test)
• Type I Error
− Rejecting the null hypothesis when it is true Beta (β) = Probability of Type II error
− If Ho is true in reality and the observed finding of a study
is statistically significant, the decision to reject Ho is Power of a test = 1 – β; probability that a test detects differences that
incorrect and an error has been made. actually exist; typically use 80%
29 30
H a : 0.0002
because he is interested in detecting whether the true incidence of TB
31
in the Haitian population is Miami is larger than 0.0002. 32
8
Two-Sided Test One-Sample Tests
e.g. A researcher would like to determine whether mean age of
onset of heart disease in males differs from the mean age for
females The null hypothesis of interest is
females. One Sample
p hypothesis
yp tests involve inferences about a
single population parameter – based on data from a
H0 : M F single sample.
where M is the mean age of onset of heart disease for males and
F is the the mean age of onset for females The parameter (mean, proportion) is compared to a single
yp
Versus the alternative hypothesis numeric value.
Two-Sample Tests
Parametric Tests
Two-Sample hypothesis tests involve comparisons of the • Parametric tests are based on assumptions about the distribution
of the observed data. (E.g., Normal distribution)
parameter values between two independent groups .
The parameter (mean, proportion) value is compared • Hypotheses are formulated in terms of the Mean or the Standard
between two groups. Deviation. Some examples of tests include:
Example: Does mean age of onset of heart disease in 1. Z-test (when sample sizes are large, or the population
males differ from the mean age for females? standard deviation is known) used to make inferences about
− Groups: Males versus Females means or proportions
− Age of onset measured in both groups − Example: Observe serum cholesterol levels among 150 Native
Americans in Arizona to study the association with coronary
− The observations from the sample of males and the sample artery disease
of females are used to conduct the test
35 36
9
Parametric Tests (contd) Parametric Tests (contd)
2. T-test (when sample sizes are small n < 30, or the 3. F-test is used to
population standard deviation is not known and the – Test hypotheses about a single population standard
sample standard deviation is used ) to make inferences deviation, or to compare two standard deviations.
about means.
– Compare three or more group means: Analysis of Variance
− Example: Temperatures of 26 patients were recorded 48 (ANOVA)
g y A researcher is interested in
hours after surgery.
determining if the mean temperatures of the surgical – Example: The four blood groups A, B, O, and AB were studied to
patients are significantly different from the standard normal compare the quantitative serologic differences among their
temperature of 98.6oF. antigenic structures. Use ANOVA to compare the 4 group means.
37 38
10
Nonparametric Tests (contd)
Nonparametric Tests
• Sign Test
Nonparametric tests – Used to test hypotheses about the Median of a population
• Based
B d on weaker
k assumptions
ti – One-Sample
O S l TTest
• Do not assume a normal distribution
• Wilcoxon Rank-Sum Test
• Called Distribution-Free Tests – Used to compare Medians of two groups
• Used when the assumption of normality (required for – Two-Sample Test
parametric tests) of the data is not met
• Fisher’s Exact test
• Hypotheses may be framed in terms of the Median,
quartiles, etc. instead of the Mean scores – Nonparametric equivalent of the Chi-square test
– Used when the expected frequencies in a table are small
(<5)
41 42
– Example: – Ranges
g from -1.0 to +1.0
• Test hypotheses about the mean reduction in weight for a group of – Positive and negative correlation.
subjects in a new weight loss program – Example: Interested in the correlation between cholesterol and
• Test whether the mean difference (change in weight) is greater than triglyceride levels (r = +1, +0.8, 0, -0.8, -1 )
zero
43 44
11
Prediction
Association and Prediction
Regression Analysis
• Models the relationship between two or more variables
• Pearson’s Product Moment Correlation Coefficient p
such that one can be expressed in terms of the other
(parametric) variables (mathematical equation)
Y = aX + b
• Spearman’s Rank Correlation Coefficient (nonparametric) Z = aX +bY + c
Prediction Prediction
• Simple Linear Regression – only one explanatory variable
Example:
p MCAT(Science)
( ) and ACT scores for 42
medical school applicants (Y = aX + b)
• Regression equation:
47 48
12
Conclusions
13