0% found this document useful (0 votes)
18 views24 pages

04 - 05s - Epidemiology (Haley)

The document outlines the fundamental concepts of epidemiology, emphasizing tools for quantifying disease rates and comparing risks, such as incidence, prevalence, relative risk, and attributable risk. It also discusses common biases in medical research and the importance of understanding study designs and sample size calculations. The aim is to equip readers with the skills necessary to critically evaluate medical journal articles and apply epidemiological principles in clinical practice.

Uploaded by

joshtxu97
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views24 pages

04 - 05s - Epidemiology (Haley)

The document outlines the fundamental concepts of epidemiology, emphasizing tools for quantifying disease rates and comparing risks, such as incidence, prevalence, relative risk, and attributable risk. It also discusses common biases in medical research and the importance of understanding study designs and sample size calculations. The aim is to equip readers with the skills necessary to critically evaluate medical journal articles and apply epidemiological principles in clinical practice.

Uploaded by

joshtxu97
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Epidemiology: A Skill for Critically Reading Journals Kidney Genital Urinary System 2023

EPIDEMIOLOGY: A SKILL FOR CRITICALLY READING JOURNALS


Robert W. Haley, MD, FACE, FACP; Office: E5.724, Phone: 83075
Email: [email protected]

LEARNING OBJECTIVES:

• Understand the basic tools of epidemiology


 The tools for quantifying observations
• Disease risk is measured by a rate.
• Three types of rates: incidence, prevalence, attack rates
• Three ways of calculating rates: crude, specific and standardized
 The tools for comparing risks
• Relative risk (odds ratio)
• Attributable risk
• How these are calculated from a 2x2 table
• The contrasting interpretations of RR (OR) and AR
 How the RR and AR are used in clinical trials of treatments
• Relative risk reduction
• Absolute risk reduction
• Number need to treat/harm
• Recognize the 4 types of error or bias in medical research studies
 Sampling error
• Type I error, the alpha probability and certainty of a finding
• Type II error, the beta probability and statistical power to test the
hypothesis
• The terminology of sampling error
 Selection bias
• 6 common examples of selection bias
 Information bias
• Differential vs nondifferential misclassification
• 3 common examples of information bias
 Confounding
• Given a study, understand its main design features
 Define the purpose of the study: classification of purposes
 Formulating the “null hypothesis”
 Experimental vs observational studies
 Experimental study designs used in clinical trials
• Simple experiment
• Repeated measures experiment
• Repeated measures experiment with crossover
 Observational study designs used in clinical and epidemiologic studies
• A cohort study
• A case-control study
 Prospective vs retrospective studies
• Correctly use the terminology for calculating the required sample size for a study
 Describe the strategy for calculating the required sample size
Epidemiology: A Skill for Critically Reading Journals Kidney Genital Urinary System 2023

 Define the terms used in sample size calculation


• Given the performance of a new diagnostic test against a “gold standard” test in a
validation study, calculate the sensitivity, specificity and predictive values of a positive
and a negative test.

Read these excerpts from a typical medical journal article to see the types of epidemiologic
terminology you must understand to evaluate whether the findings in the article are to be
believed and acted on in your clinical practice.
Epidemiology: A Skill for Critically Reading Journals Kidney Genital Urinary System 2023

What is epidemiology?

The word comes from –ology meaning “the study of,” epi meaning “upon, or things that fall upon,”
and demos meaning “the people or the population.” So Epidemiology is literally “the study of things
that happen to the population.” A more exacting definition is “the quantitative method for drawing
valid inferences from non-experimental observations on defined populations.” It is not limited to the
study of epidemics, and an epidemiologist is not a skin doctor.

I. The basic tools of epidemiology

Epidemiologists have two basic kinds of tools: tools for quantifying observations and tools for drawing
valid inferences from observations.

A. Tools for quantifying observations

1. Risk is measured by a rate.


You can’t directly observe the risk of a disease in an individual patient, but you can estimate
it by measuring the rate of that disease in a group of patients like this one.
2. Disease rates can be constructed as incidence rates, prevalence rates or attack rates.
3. Each of these rates can be presented as a crude, a specific or an adjusted rate.

B. Tools for drawing valid inferences from observations


1. Once you have measured the disease rate, you can use the Relative Risk or the
Attributable Risk to compare the risks of two groups or populations of people to infer
causal effects.
2. But before accepting a causal explanation, you must rule out the 4 types of error
or bias.
3. This requires designing valid, efficient observational studies of defined populations.

II. Tools for quantifying observations

A. The risk of a disease is measured by a disease rate. The rate has a numerator and a
denominator.
Number in the
denominator
Rate =
who got the disease
Number of subjects
exposed to orat risk for the
disease
B. Three types of rates

1. Incidence Rate = the number of new cases of a disease occurring in a defined


population in a defined time period, divided by the number of people at risk.

Example: In 1991 in Dallas County, 650 of the 1.5 million county residents
developed AIDS.
The Incidence Rate = 650/1,500,000 = 43 new AIDS cases per 100,000
residents per year.
Epidemiology: A Skill for Critically Reading Journals Kidney Genital Urinary System 2023

2. Prevalence Rate = the number of active cases of a disease present in a defined


population in a defined time period, divided by the number of people at risk.

Example: In 1991 in Dallas County, 1,155 of the 1.5 million living county
residents were known to be suffering from AIDS.
The Prevalence Rate = 1,155/1,500,000 = 77 AIDS patients per 100,000 living
residents.
Incidence = prevalence x duration

3. Attack Rate (a special type of Incidence or Prevalence Rate) = the number of new or
existing cases of disease occurring in a defined population during the time period of
an epidemic, divided by the number of people at risk.
Example: Thus far in the Dallas County AIDS Epidemic 1981-1991, 3,415 of the
estimated 60,000 members of the high risk group* in Dallas have contracted
AIDS, for an attack rate of 5.7 cases per 100 people at risk.

*The Centers for Disease Control currently defines the "high risk group" as men who have sex with
men, hemophiliacs, intravenous drug users, and their sexual partners and offspring.

C. Three ways of calculating any of these rates.

1. Crude Rates = simply the incidence rate, prevalence rate, or attack rate of a population,
often using a convenient denominator that does not precisely measure the population
at risk.

Example: In 1991 in Dallas County, 650 of the 1.5 million county residents
developed AIDS.
The Crude Incidence Rate = 650/1,500,000 = 43 new AIDS cases per 100,000
residents.

2. Specific Rates = an incidence, prevalence or attack rate that has been refined by
using a specific denominator that more precisely measures the population at risk.

Example: In the Dallas County AIDS Epidemic 1981-1987, 3,415 of the


estimated 60,000 members of the high risk group in Dallas developed AIDS.
The Risk-Group-Specific Attack Rate = 3,415/60,000 = 5.7 AIDS cases per 100
high risk people.

3. Standardized Rates = an incidence, prevalence or attack rate that has been adjusted to
a standard population to remove the biasing effects of extraneous factors such as age,
sex, or certain risk factors.

Example: Whereas in 1984 the crude fatality-to-case ratio of AIDS was higher in
New York City than in Dallas, when adjusted for duration of illness the adjusted
mortality ratios of New York City and Dallas were not significantly different
Epidemiology: A Skill for Critically Reading Journals Kidney Genital Urinary System 2023

III. Tools for comparing risks

A. There are two measures for comparing risks (rates) of 2 groups or populations to estimate
the causal effect of a risk factor or treatment. They are the Relative Risk and the
Attributable Risk. Here is how they are defined.

1. The Relative Risk, abbreviated RR, is the rate of disease in the group exposed to the
risk factor (𝑅𝑅𝐸𝐸) divided by the rate of disease in the group not exposed to the risk
factor (𝑅𝑅𝑈𝑈).
Disease Rate in the Exposed (𝑅𝑅𝐸𝐸)
Relative Risk (RR) =
Disease Rate in the Unexposed (𝑅𝑅𝑈𝑈)
For example, the RR of lung cancer in relation to cigarette smoking is the percentage of
cigarette smokers who develop lung cancer per year divided by the percentage of
nonsmokers who develop lung cancer per year.

Most important, the magnitude of the RR estimates the strength of the causal
relationship between the risk factor and the disease.

The distribution of the RR goes from 0 to plus infinity (see figure above).
• A value of RR=1.0 implies no causal relationship ("the 2 variables are not
associated”).
• Values above 1.0 imply a causal effect, such as with a risk factor like
smoking.
• Values between 1.0 and 0 imply a protective effect, such as by an effective
treatment.
• The farther the RR diverges from 1.0 in either direction, the stronger the
causal inference (“the stronger the causal and disease variable are
associated”). That is, the higher the RR is above 1.0 or the closer it is to 0,
the stronger the causal inference is.

When the risk factor (e.g., chewing gum) is not a cause of the disease (e.g., lung
cancer), the RR = 1.0 (because in this case [𝑅𝑅𝐸𝐸] = [𝑅𝑅𝑈𝑈]).
When the risk factor (e.g., smoking) is a cause of the disease (e.g., lung cancer), the
RR > 1 (because in this case [𝑅𝑅𝐸𝐸] > [𝑅𝑅𝑈𝑈]).
When the risk factor (e.g., vitamin D) protects from the disease (e.g., lung cancer), the
RR < 1 (because in this case [𝑅𝑅𝐸𝐸] < [𝑅𝑅𝑈𝑈]).
The RR also goes by the synonyms Risk Ratio or Rate Ratio. Do you see why?
Epidemiology: A Skill for Critically Reading Journals Kidney Genital Urinary System 2023

2. The Attributable Risk, abbreviated AR, is the rate of disease in the group exposed to
the risk factor (𝑅𝑅𝐸𝐸) minus the rate of disease in the group not exposed to the risk
factor (𝑅𝑅𝑈𝑈).
Attributable Risk (AR) = [𝑅𝑅𝐸𝐸] − [𝑅𝑅𝑈𝑈]
For example, the AR of lung cancer in relation to cigarette smoking is the
percentage of cigarette smokers who develop lung cancer per year minus the
percentage of nonsmokers who develop lung cancer per year.

Most important, the magnitude of the AR has nothing to do with causality.


Instead, if you assume that the risk factor is a cause of the disease, the AR
estimates the magnitude of the public health effect.

For example, public health officials may compare the ARs of several risk factors and
decide to spend their resources to remove the risk factor with the highest AR. This
would give the resource expenditures “the biggest bang for the buck.”

The AR also goes by the synonyms Absolute Risk (as a parallel with Relative Risk) and
the Rate Difference or Risk Difference. Do you see why?

B. How the Relative Risk, the Odds Ratio, and the Attributable Risk are calculated
in an epidemiologic study.

1. Calculation strategy. In an epidemiologic


study, the investigators determine for each Exposure
person in the study whether they were Disease Yes No
exposed to the risk factor and also whether
they later developed the disease. When all Yes a b
the data have been collected, they analyze the
data by classifying all the participants in a
2x2 contingency table (it’s called that
No c d
because it shows how the disease depends on
the risk factor). The letters a-d stand for the
number of participants who fall in each cell a+c b+d n
of the table. a+c and b+d are the marginal RE = a/(a+c)
totals, n is the total number of participants,
RE is the disease rate in those exposed to the risk RU = b/(b+d)
factor, and RU that in the Unexposed. The Relative Risk = RE/ RU
formulas show the calculations for the RR
and AR. Odds ratio = ad/bc
The odds ratio (OR), also known as the Attributable Risk = RE - RU
cross-products ratio, is an alternative way
of estimating the RR in the special case of a RR reduction = 1 - RR
case-control study, which will be discussed No. needed to treat = 1/AR
later.
Epidemiology: A Skill for Critically Reading Journals Kidney Genital Urinary System 2023

2. Practice example from a


Exposure
hypothetical study of the
Disease Yes No
effect of smoking on lung
cancer. Try your hand at
calculating the RR, the OR and Yes 50 20
the AR from the numbers of
participants in the 4 cells of the
2x2 contingency table below. No 1,950 9,980
(Check your answers with the
calculation results in the box a+c b+d n
on the last page of this
syllabus.) RE = a/(a+c) =

RU = b/(b+d) =

Relative Risk = RE/ RU =

Odds ratio = ad/bc =


Attributable Risk = RE - RU =
3. Illustration of the different uses of the Relative Risk and the Attributable Risk
from the results of a famous study. (The illustration below is based on the landmark
epidemiologic study that scientifically settled the controversy over whether smoking
causes lung cancer and led to the seminal 1964 U.S. Surgeon General’s Report
initiating the national campaign to discourage smoking.)

Study Design. The researchers measured the number of cigarettes smoked per day by
thousands of British physicians in the early 1950s and followed them for 10 years
identifying the causes of death for all who died.

Analysis. In the analysis, they classified them in five 2x2 contingency tables: smoking
(heavy vs none) by lung cancer death (yes vs no), smoking by other cancer death,
smoking by chronic bronchitis death, smoking by cardiovascular death, and smoking
by deaths of all causes. They then calculated the RR and the AR for each of the 5
analyses, and summarized them in the following table.

Results. The results are given in the table below. How did they calculate the RR and
AR on each line?

Interpretation: First examine the column of RRs. Notice that there is extremely
strong evidence that heavy smoking causes death from lung cancer (RR far above 1.0)
and death from chronic bronchitis (also RR very far above 10), but weak evidence for
a causal effect of smoking on death from other cancers, cardiovascular disease, and all
causes (RRs barely above 1.0). Why do you think the latter have such low RR
elevations?

But now notice the column of ARs. These tell us that, if heavy smoking causes death
from cardiovascular disease, heavy smoking kills more people by cardiovascular
disease than by lung cancer (the AR for cardiovascular disease is higher than that for
lung cancer). Specifically, heavy smoking causes 2.61 cardiovascular deaths per 100
physicians per year but only 2.20 lung cancer deaths per 100 physicians per year.
Epidemiology: A Skill for Critically Reading Journals Kidney Genital Urinary System 2023

This finding from the AR led federal officials to increase research expenditures on the
effect of smoking causes on cardiovascular disease, which later supported the causal
association.

C. How the relative risk and attributable risk are used in clinical trials of treatments

In epidemiologic studies you are generally analyzing risk factors that tend to increase the
risk of disease and thus have RR values greater than 1.0. In clinical trials, however, you are
studying treatments that tend to reduce the risk of disease and thus have RR values between
1 and 0, expressed as decimal values (e.g., RR of 0.75). As a result, journal articles reporting
the results of clinical trials express the RR and AR in slightly modified terminology that
seems to convey the preventive effects more clearly.

1. The Relative Risk Reduction, abbreviated RRR, is simply 1-RR. It is simply the
percentage reduction in the disease due to the treatment.

Disease Rate in the Treated Group (𝑅𝑅𝑇𝑇)


Relative Risk Reduction (RRR) = 1 −
Disease Rate in the Placebo Group (𝑅𝑅𝑃𝑃)
Sometimes you will see the RRR defined as follows, but the result is the same.
𝑅𝑅𝑇𝑇−
Relative Risk Reduction (RRR) =
𝑅𝑅𝑃𝑃

For example, if the RR is 0.3, then the RRR is 1 – 0.3, or 0.7, indicating a disease rate
in the group that received the treatment 70% lower than that in the group that received
the placebo (see figure).

2. The Absolute Risk Reduction, abbreviated ARR, is simply the same as the AR, just
with a name change to contrast it with the RRR. It is the disease rate in the treated
group minus the disease rate in the placebo group.

Absolute Risk Reduction (ARR) = [𝑅𝑅𝑇𝑇] − [𝑅𝑅𝑃𝑃]

3. The Number Needed to Treat, abbreviated NNT, is the number of patients who would
have to be treated with the drug to prevent or cure 1 case of the disease. It is simply
calculated as 1 over the attributable risk, or 1/AR.

NNT= 1
[𝑅𝑅𝑅𝑅]−[𝑅𝑅𝑅𝑅]
Epidemiology: A Skill for Critically Reading Journals Kidney Genital Urinary System 2023

If the clinical trial shows that the drug works and reduces the disease rate in the treated
group lower than in the placebo group, they call this statistic the number needed to
treat (NNT). If, however, the drug turns out to be detrimental, producing a higher
disease rate in the treated group than in the control group, they use the same calculation
but call it the Number Needed to Harm (NNH).

4. Practice example from


a hypothetical clinical Treatment
trial comparing Stroke Miraculase Placebo
treatment with the
new drug Miraculase Yes 25 75
to placebo in
preventing strokes.
Try your hand at No 575 525
calculating the RRR, the
ARR and the NNT from a+c b+d n
the numbers of
participants in the 4 RT = a/(a+c) =
cells of the 2x2 RP = b/(b+d) =
contingency table
below. (Check your Relative Risk Reduction = 1 - RE/ RU =
answers with the
calculation results in the Relative Risk Reduction = 1 - ad/bc =
box on the last page of Absolute Risk Reduction = RE - RU =
this syllabus.)
Number Needed to Treat = 1/ARR =
IV. The 4 main types of error or bias in epidemiologic studies

A. Overview. All research studies carry the risk of false conclusions because of errors or
biases in the data. The ability to read research papers in journals critically and adopt only
the valid findings in treating your patients requires the critical skill recognizing important
errors and bias when they are present in a scientific paper. Fortunately there are only 4
types of error and bias to know, and they are listed in the box.

The 4 Main Types of Error/Bias


1. Sampling error
2. Selection bias
3. Information bias
4. Confounding

To become a good consumer of medical advances, you need to memorize this list so you
can use it like a checklist a pilot uses in preparing for a take-off. Read these over; then close
your eyes and say them over and over until they become automatic. Then when you read a
journal article, you can go down the checklist and ask yourself if you see any of them in the
article.
Epidemiology: A Skill for Critically Reading Journals Kidney Genital Urinary System 2023

B. Sampling Error is a distortion in the estimate of an association between the 2 variables


(too high or too low a relative risk) resulting from chance variation in the selection of the
sample.

This usually results from selecting too small a sample of participants to


study. There are two types of sampling errors: Type I and Type II.

Type I error: Finding an association when, in truth, the 2 variables are not associated.
A false positive finding . . . incorrectly rejecting the null hypothesis.
Finding an association when none is really there.

Type II error: Finding no association when, in truth, the 2 variables are


associated. A false negative finding . . . incorrectly accepting
the null hypothesis. Overlooking an association that is really
there.

In planning a study, α is what we call the probability that a Type I error will occur, and
β is the probability that a Type II error will occur, given the sample size of participants.
(α and β are standard probabilities measured on a scale of 0 to 1 and expressed as a
decimal number or a percentage, e.g., “a probability of 0.05” or “a 5% probability.”)
Furthermore, we use the term power to mean 1-β, or the probability that our study will
not result in a Type II error, i.e., will not overlook an association if one is present.

In the final analysis of the study or when you are reading the paper, the sampling error
statistics you want to look at differ depending on whether the study found an association
between the 2 variables or not.

1. If the study found an association between the 2 variables, you want to know if the
association was true or due to a Type I sampling error, and a statistic called the p
value is literally an estimate of the probability that the association found was due to
sampling error and thus not a true causal relationship. The p value reported after the
study is completed is analogous to the α probability in the planning stage of the
study. A p value less than 0.05 (less than a 5% chance of a Type I error) is generally
considered sufficient to rule out sampling error as an explanation for the finding and
thus to accept it as a true positive finding.

2. If no association was found, you don’t care about the p value, but you want to know
the power (1-β) of the sample size to have found a meaningful difference if one was
actually there. A power value of 0.80 (80% chance of not committing a Type II
error) is generally considered sufficient to rule out sampling error as an explanation
for the finding no association and thus to accept it as a true negative finding.

The terminology of sampling error, including Type I and Type II error, α and β probabilities,
power, 1- β, and p value, is fundamental to being able to read the medical literature
meaningfully the rest of your career in medicine. You will be expected to use them cogently in
journal clubs in your residency, and they are tested on every professional board examination.
But they are always confusing when you first encounter them. Consequently, you must really
practice reasoning them out and using them until you are comfortable with them.
Epidemiology: A Skill for Critically Reading Journals Kidney Genital Urinary System 2023

C. Selection Bias is a distortion in the estimate of an association (too high or too low a RR)
resulting from the manner in which subjects were selected for the study population. It
occurs when the two comparison groups have different disease risks before the exposure
occurred, that is, the groups are unfairly chosen. Below are examples of selection bias
with nicknames.

1. Example: A study compared the 10 year mortality of people who volunteered for
colonoscopy with those did not. Those who volunteered may have been more afraid
of cancer due to a family history or recent rectal bleeding. Self-Selection Bias.

2. Example: The mortality rate in factory workers in a certain job was compared with
that of the general population. However, to be factory worker, you have to be healthy,
i.e., without diseases/disabilities seen in the general population. Healthy Worker
Effect.

3. Example: The efficacy of a new drug for preventing heart attacks was tested by
comparing people who chose to take it with matched controls who did not. Indication
Bias.

4. Example: In a randomized clinical trial of a new drug to treat or prevent a disease,


sometimes the scheme for randomly assigning people to the treatment and control
groups fails to generate groups of equal risk. Failed Randomization Bias or Cheating.

In the current age where


billions of corporate
profits ride on the results
of clinical trials and
pharmaceutical companies
pay doctors and hospitals
to run clinical trials, there
is the potential for random
assignment to be corrupted
to favor a new blockbuster
drug.

Also clinicians may enroll


a very sick patient into a
clinical trial to get access
to a promising new drug
and may be tempted to
subvert the randomization
process to ensure the drug Typical Table 1 in medical journal articles reporting
rather than the placebo. clinical trial findings showing the baseline characteristics
of the randomly assigned groups.
Epidemiology: A Skill for Critically Reading Journals Kidney Genital Urinary System 2023

Does this Table 1 from a paper reporting a clinical trial reassure you of fair assignment?

5. Example: In a randomized clinical trial of aspirin to prevent heart attacks, more patients
at higher risk of heart disease stopped taking the drug in the experimental group during
the trial, while adherence to the treatment schedule in the placebo group was far higher.
Since generally people with more severe disease fail to adhere to study protocols,
confining the analysis to the patients who adhered to the protocol (per-protocol
analysis) would be biased toward showing that the drug is effective in reducing the
heart disease outcome. Nonadherence Bias.

It is now widely agreed that this bias should be avoided by using an intention-to-treat
analysis (ITT analysis), where all patients who were randomized at baseline are
included in the analysis regardless of whether they adhered to the protocol.

While the intention-to-treat approach may underestimate the efficacy of a treatment, it


is considered a fairer estimate of how well the treatment will work in practice. Since
many people are uncomfortable with seeing non-adherent participants included in the
analysis, you will see clinical trials where investigators excluded participants after
random assignment, calling it a modified intention-to-treat analysis (mITT analysis), but
these generally have industry sponsorship or authors with conflicts of interest.

6. Example: In a randomized clinical trial of aspirin to prevent heart attacks, more high
risk heart patients dropped out of the treatment group than the control group.
Nonparticipation Bias.

Since study investigators generally cannot collect the data on the disease outcome in
participants who drop out of a study, they cannot be included in the ITT analysis, and
there is no way to correct the selection bias. Thus, the higher the dropout rate or the
rate of data missing for any reason, the more the results will be biased and the less
credence clinicians should place in its findings and recommendations.

D. Information Bias is a distortion in the estimate of an association (too high or too low a
RR) due to measurement error or misclassification of subjects.

Examples: invalid measurements, incorrect diagnostic criteria, systematic omissions in


medical records, or use of different measurement methods in exposed and unexposed
groups.

Errors in measurement can bias the RR in either direction by misclassifying participants on


either the exposure variable or the disease variable. Misclassification can occur in two ways:

1. Differential misclassification. The rate of misclassification of the disease measure


is different in the exposed and the unexposed groups. As you can see, this would bias
the RR, and the bias could drive the RR in either direction depending on the directions
of the misclassifications.

2. Non-differential misclassification. The rate of misclassification of the disease


measure is the same in the exposed and unexposed groups. This would not bias the RR
(but a high rate of misclassification, even if non-differential, will drive the p value up.)
Epidemiology: A Skill for Critically Reading Journals Kidney Genital Urinary System 2023

Example: A cross-sectional survey compared the rate of recalled exposures to fumes


from Love Canal in town residents who had become ill with those who remained well.
(Ill people tend to recall prior exposures more vividly than well people.) Recall Bias.

Example: Medication histories from mothers of children with birth defects were
compared with those from mothers of healthy children. (Mothers of problem babies
tend to recall exposures during pregnancy better than mothers of normal babies.)
Maternal Recall Bias.

Example: The rates of emphysema in smokers and nonsmokers were compared from
medical records reviews. (But physicians are more likely to think of, test for, and
diagnose emphysema in smokers than in nonsmokers.) Diagnosis Bias.

E. Confounding is the bias that results when the causal effect of a risk factor or treatment
(A) on a disease (B) is inflated (or deflated) in a particular set of data by the presence
of an extraneous variable (C) that is associated with both the risk factor (CA) and
the disease variable (CB).

For example, an epidemiologic study found that coffee drinking (A) was associated
with heart attacks (B), with the rate of heart attacks being 2.7 times greater in heavy
coffee drinkers than in non-coffee-drinkers (A B). However, people who drink
coffee tend to be smokers (CA), and smoking is a known cause of heart attacks
(CB). Therefore, the association of coffee drinking and heart attacks (AB) may
be confounded by smoking (C).

You can test for the confounding by redoing the analysis stratifying on smoking. Then
coffee drinking is found not to be associated with heart attacks (RR of approximately
1.0) in either the smoking stratum or the nonsmoking stratum. But when you put them
all back together into the same pot, it appears that coffee drinking is associated with
heart attacks, at a RR of 2.7. But this tells you that the association is confounded by
smoking and not a causal association. What other potential confounding variables
might you test in this analysis?

Besides stratified analysis, you can also control for confounding with multivariable
analysis. For example, in a multiple logistic regression analysis you obtain the OR
for the effect of the independent variable on the dependent variable, and then you add
the potential confounder and see if it changes the OR. If its addition changes the OR
significantly, it is a confounder.
Epidemiology: A Skill for Critically Reading Journals Kidney Genital Urinary System 2023

V. Designing valid, efficient studies to avoid error and bias

The box to the right lists the 7 key issues in designing valid, Key Issues in Designing Studies
efficient studies that you as a critical reader of journal articles
need to be aware of while reading. Define the Purpose of the Study
Formulate the "Null Hypothesis"
A. Define the purpose of the study Experimental vs Observational studies
Experimental Study Designs
When starting to read a journal article reporting a research Observational Study Designs
study, the first thing to do is discern the purpose of the Prospective vs Retrospective Studies
study. In this regard, studies can be classified into 3 Calculate the Required Sample Size
groups, defined in the following table.
Classification of medical research studies by main purpose
Purpose General type of study design
Descriptive Survey utilizing a statistical (random) sample, a convenience sample, or a
“chunk"
Hypothesis-
generating Cross-sectional survey, or "quick and dirty" cohort of case-control study
Hypothesis-testing Experiment (clinical trial)
OR
Observational study (cohort or case-control) employing extensive design features to
avoid or control for bias.

To describe the frequency and characteristics of a disease, treatment or other medical


phenomenon, one performs a descriptive study. To obtain a description that is a true picture of
what is occurring, one may take the expensive option of studying some type of random sample,
like in an opinion survey. Alternatively, one can study what we call a “convenience sample,”
which is not randomly selected but performed in a convenient group that, which not statistically
representative, still is known to be reasonably representative of what is happening in the world.
The burden is on the researcher to show that the convenience sample gives a representative
picture. Finally, the cheapest route is to study what we call a “chunk” of data, that is, some
convenient group with no idea of whether it gives a true picture. A discerning reader should be
skeptical of descriptions obtained from a “chunk.”

When researchers begin studying a new question for which they know too little to frame a formal
study hypothesis, they often do exploratory studies first to generate formal hypotheses to test in
later studies. They typically use cross-sectional surveys or "quick and dirty" cohort of case-
control studies, perhaps with insufficient sample sizes or with unfocused analyses exploring the
data for associations. Findings of these types of studies should be viewed as provisional at best
until the findings are verified by more formal hypothesis-testing studies.

Once researchers have enough knowledge about a subject, they design tighter hypothesis-
testing studies. These involve statement of a formal, limited hypothesis, selection of an
adequate sample size (see below), and either an experimental design, such as a prospective,
randomized, double-blind clinical trial, if studying a treatment, or an observational study, such
as a cohort or case-control design (see below), but employing extensive design features to avoid
or control for bias so that the causal interpretation becomes reasonable.
Epidemiology: A Skill for Critically Reading Journals Kidney Genital Urinary System 2023

B. Formulating the "Null Hypothesis"

A hypothesis is simply a prediction of how the variables will be found to be associated


in the final statistical analysis.

The hypothesis may be stated in one of two forms: the null hypothesis and the
alternative hypothesis. Statisticians generally prefer the former.

For example, in a study to test the efficacy of a new drug in treating hypertension, the null
hypothesis (abbreviated H0:) might be stated as follows:

H0: The mean diastolic blood pressure in the treated group will
be no different from that in the untreated group.
Notice that this statement of the null hypothesis is non-directional, that is, the null
hypothesis would be rejected (proved untrue) if the BP in the treated group was found to
be either higher or lower than that in the untreated group. The corresponding alternative
hypothesis (abbreviated HA:) would be written as follows:

HA: The mean diastolic blood pressure in the treated group will be
different (either higher or lower) from that in the untreated group.
For a non-directional hypothesis like this, a two-tailed for statistical significance must
be used in the analysis.

In contrast, the null hypothesis may sometimes be expressed in a directional


manner as follows:

H0: The mean diastolic blood pressure in the treated group will
be no lower than that in the untreated group.
The corresponding alternative hypothesis would be written as follows:

HA: The mean diastolic blood pressure in the treated group will
be lower than that in the untreated group.

For a directional hypothesis like this, a one-tailed test of statistical significance must be
used in the analysis.

The point of this discussion is that the form in which the hypothesis is expressed has
important implications for the analysis. If the null hypothesis is non-directional, the
statistical analysis will employ a two-tailed test of significance; whereas, if it is
directional, the analysis will employ a one-tailed test. A one-tailed test is more likely to
yield a statistically significant result than a two-tailed test, and thus, one should formulate
a directional null hypothesis if it is appropriate to specify a direction. In most studies,
however, the final result might plausibly go in either direction, thus requiring a non-
directional null hypothesis and subsequently the use of less powerful, two-tailed
significance tests.
Epidemiology: A Skill for Critically Reading Journals Kidney Genital Urinary System 2023

C. Experimental vs Observational studies

Medical research studies can be classified into two types on the basis of their general study
design. This classification turns entirely on whether the causal variable (e.g., treatment, risk
factor) is assigned to the subjects.

Experimental study = the causal variable is under the control of (assigned by) the
investigator. These are also called randomized controlled trials.
Observational study = the investigator observes the natural occurrence of the events but
may select the events (cases) so as to reduce or eliminate bias. Generally the study subjects
choose whether to be exposed to the causal variable (e.g., smoking).

D. Experimental Study Designs used in Clinical Trials

While there are innumerable study designs and variations on the main themes, most
experimental studies adopt some version of the 3 most fundamental designs. Experimental
study designs are used to test the efficacy of treatments, mostly drugs or medical
procedures.

Simple Experiment = subjects are randomly assigned to treatment or control (placebo)


groups and followed to determine how many become ill.

R* X O R
O

Ideally, both the investigator(s) and the subjects are unaware of which groups the patients
are in ("double blind" design)

*R stands for randomizing patients into two groups, one to get the treatment (X) and the other a placebo, and
O stands for the outcome measurement, such as blood pressure.

Repeated Measures Experiment = Same as simple experiment, except that two (or more)
measurements are performed in all subjects in both groups, one before the intervention
and the second after it has had time to have an effect.

R O X O
RO O

In this design, each patient serves has his/her own control to eliminate the effects of wide
variation in the outcome event.

Repeated Measures with Crossover = After completing a repeated measures study,


the experiment is continued in the same subjects except the treatment and placebo
groups are switched, after sufficient time delay for any residual treatment effects to
be washed out.

RO X O O
RO O X O
Epidemiology: A Skill for Critically Reading Journals Kidney Genital Urinary System 2023

Useful when there is extreme variation in the outcome event or when it is necessary
to increase the rigor of the study to overcome anticipated skepticism over the
results.

E. Observational Study Designs used in Epidemiologic Studies

For studies of causes and pathophysiology of diseases, cost-effectiveness of interventions,


and all other important questions in medicine besides the efficacy of drugs and medical
interventions, one cannot use experimental designs and must use observation designs.
Observational studies typically follow three design alternatives.

Cohort Study = Subjects without the disease are selected (or classified) on the basis of
whether they are exposed to the risk factor (or treatment) and are then followed forward
in time to determine which ones get the disease. Also called a Follow-up Study. The
association of the disease with the risk factor is measured by the Relative Risk.

Case-Control Study = After all the disease events have occurred, the study subjects are
selected on the basis of whether they have the disease (“cases”) or not (“controls”) and
are then studied to determine how many in each group had the risk factor (or treatment).
Also called a Backward-Going Study. The relative risk is not calculable due to the
arbitrary ratio of cases and controls selected for study; so the Odds Ratio must be used.

Cross-sectional Study (survey) = A group of subjects is surveyed to measure the


presence of disease and risk factors at the same time. Either relative risk or odds
ratio can be used to measure the association.

Epidemic Investigation = A special case of a survey in a group during an epidemic, all


of whom were healthy before the epidemic began. Has features of a survey, a cohort
study and a case-control study. Either relative risk or odds ratio can be used.

Reasons for choosing a cohort or a case-control study design. While the cohort
study design would seem preferable theoretically, in practice most medical research
studies are case-control studies. This is because cohort studies are highly inefficient
in two circumstances:

1. When the disease under study is rare, one would have to study an enormous cohort
to eventually get enough who develop the disease.
2. When the disease takes many years to develop, a cohort study would take many years
to complete.

In both of these circumstances the case-control design is a preferable way of


accomplishing the goal quickly and efficiently. You simply select a group of patients
who have had the disease as the cases, and then select a group of subjects without the
disease but similar in all other ways as the controls and measure the risk factors in both
groups.
Epidemiology: A Skill for Critically Reading Journals Kidney Genital Urinary System 2023

F. Prospective vs Retrospective Studies

Originally, prospective meant cohort study and retrospective meant case-control study.
However, medical researchers avoid these terms.

In times past they took on powerful connotations: “prospective” came to imply valid, while
“retrospective” came to imply invalid, and some authors used “prospective” unfairly to hype
their poorly done studies or condemn quite valid case-control studies. Now, these terms are
used to designate the point in time at which the data were collected with respect to when
the exposure and disease occurred.

• In Prospective Studies, the data on Relationship between the 2 Systems of Terminol


exposures and disease are for Epidemiologic Study Designs
measured as they occur.
Cohort Case-Control
• In Retrospective Studies, the Prospective **
data are collected long after the
events occur.
Retrospective *
Moreover, it turns out that the two sets of
terminology are not consistently
associated. As the diagram to the right *”Historical prospective study” design belongs here.
portrays, most cohort studies are
prospective, and most case- control **”Nested case-control study” design belongs here.
studies are retrospective; however, there
are important exceptions as follows.

An historical prospective study is an increasingly popular example of a retrospective


cohort study. The researcher looks back in time to a set of records that were prospectively
collected, such as a set of medical records, and constructs a cohort of people without the
disease and follows them forward in to in the records to see which ones get the disease.

A nested case-control design is also an increasingly popular example of what can be a


prospective case-control design. The researcher goes into a computer database collected
prospectively in a cohort study, selects a subset of those who developed the disease (cases)
and a matched group who did not get the disease (controls), and analyzes some stored
blood samples from these subjects to see which subjects had a certain genetic difference
that might differentiate the two groups.

G. Calculate the Required Sample Size

1. Strategy for calculating the required sample size

To be sure that a study will have a large enough sample size to test the hypothesis
without committing a sampling error, the researcher must make an educated guess
about how many subjects will be needed.

To do this, the researchers must make a prediction about what their numerical results
will turn out to be at the end of the study and work backward from these predicted
Epidemiology: A Skill for Critically Reading Journals Kidney Genital Urinary System 2023

results to the sample size needed to test them for statistical significance.

They then either consult a statistician or use a statistical software program to calculate
the required sample size needed to test for this predicted result.

2. Recap of terms Used in Sample Size Calculation (Practice these until memorized!)
Type I Error = Falsely rejecting the null hypothesis. A false positive finding. (Finding
an association when, in truth, the variables are not associated.)
Type II Error = Falsely accepting the null hypothesis. A false negative finding.
(Finding no association when, in truth, the variables are associated.)

Alpha (α) Probability = The probability of making a Type I sampling error. The value
of α most often chosen is 0.05.

Certainty (1-α) = The probability that you will not make a Type I sampling error.
The level of certainty most often chosen is 0.95. (This term is rarely used in
practice but is given for completeness.)

Beta (β) Probability = The probability of making a Type II sampling error. The value
of β most often chosen is 0.20.

Power (1-β) = The probability that you will not make a Type II error. The power level
most often chosen is 0.80. (In practice “power” is used far more often than “β”.)

In practice researchers select the values of alpha and power to be achieved in


their studies and use these in calculating the required sample size.

VIII. How physicians use diagnostic tests in medical decision-making: sensitivity,


specificity and predictive value

In caring for patients, physicians have a dizzying array of diagnostic tests to choose from in screening
for underlying diseases, diagnosing current illnesses and following patients’ progress. Good
physicians, however, must be judicious in ordering and interpreting tests because, while tests can be
extremely useful, they are often expensive and can also lead to serious harm when inappropriately
applied. Fortunately, there are good rules for using tests that maximize the good from testing.

A. Before ordering any tests, take a good medical history to estimate the prior
probability of the diagnosis

The first rule of medical diagnosis is to take a good history. The great philosopher and
teacher of medicine William Osler famously said, “Listen to your patient, he is telling you
the diagnosis.” This is paraphrased today as “Ninety percent of the diagnosis is in the
history.” From the history the physician forms an impression of the most likely diagnosis
and a short list of alternative possibilities and estimates the probability that each is the true
explanation for the patient’s symptoms. This probability is called the prior probability
because it is formed “prior to ordering any tests,” and it should be weighed heavily in
deciding whether to order a test and in interpreting the test result when it is later reported
back.
Epidemiology: A Skill for Critically Reading Journals Kidney Genital Urinary System 2023

B. Order tests to confirm or rule out the initial diagnostic impression.

An initial diagnosis is basically a scientific hypothesis, and the physical examination and
laboratory and radiological procedures are done primarily to test the hypothesis.

Since all diagnostic tests have their own rates of false-positive and false-negative errors,
tests should be ordered for a diagnosis with reasonably high prior probability and should
be avoided for conditions with low prior probability, based on the medical history. An
unfortunately common error is to order a list of tests “just to be complete” or “so we don’t
miss anything” and the list usually includes tests for conditions that are highly unlikely
given the patient’s history. So what does the physician do, then, if one of the tests for an
unlikely diagnosis comes back positive? This often leads to further testing and possibly
to some unwarranted intervention that could have expensive or harmful consequences for
the patient.

How do we avoid this?

C. Sensitivity and specificity of a test

Every medical test has 2 complementary measures of accuracy: sensitivity and specificity.
These are properties of the test that do not change.

Before the Food and Drug Administration will approve a test for clinical use, it must be
validated against a “gold standard” test. A gold standard test is an alternative test—usually
a more expensive one—that can determine whether the disease is present with close to
100% accuracy, for example, an expensive lab test, a tissue biopsy, or an autopsy. The
gold standard test and the new test are both performed in the same group of patients who
are similar to the types of patients the new test will be run on later in clinical practice, and
the results are tabulated in a table like the following.

The sensitivity is defined as: Among those


patients who truly have the disease by the gold “Gold standard” test
standard test, the sensitivity is the percentage
who have a positive test result [a/(a+c) in the New test
Disease No Disease
figure]. result
a b
The specificity is defined as: Among those Positive True False
patients who truly do not have the disease by Positives Positives
the gold standard test, the specificity is the c d
percentage who have a negative test result Negative False True
[d/(b+d) in the figure]. Negatives Negatives

Thus, the sensitivity measures the ability of a/(a+c) = d/(b+d) =


the new test to detect the disease when truly Sensitivity Specificity
present, and the specificity is its ability to
indicate the absence of the disease when it is
truly absent.
Epidemiology: A Skill for Critically Reading Journals Kidney Genital Urinary System 2023

D. Predictive value of a test

But the sensitivity and specificity of the test are not what the physician needs to know to
decide whether to order the test or how to interpret the result in an individual patient.
Instead, the clinician needs to know the predictive value of a positive or a negative test
result.
Positive Predictive Value (PPV)—Of those patients with a positive test, the percentage
who truly have the disease by the gold standard test [a/(a+b) in the figure below].

Negative Predictive Value (NPV)—Of those patients with a negative test, the
percentage who truly do not have the disease by the gold standard test [d/(c+d) below].
“Gold standard” test
New test Disease No Disease
result
Positive
a b "Predictive
Positive Predictive
True False a/(a+b) = = value of a
Positives Positives Value
positive test"
(PPV)
Negative “Predictive
Negative c d Predictive value of a
False True d/(c+d) = =
Negatives Negatives Value negative
(NPV) test”
a/(a+c) = d/(b+d) = n
Sensitivity Specificity
Prevalence of = (a+c)/n
the disease

While the sensitivity and specificity are unchanging properties of the test, the
predictive values change widely depending on the prior probability of the disease
in the patient, which the physician estimates from the patient’s history. To
estimate the prior probability of disease in your patient, you imagine what the
prevalence of the disease would be in 100 patients exactly like this one.

E. Clinical example of estimating the predictive values

On morning rounds the intern presents the case of a 58 year old man with fever
fluctuating between 99° and 101°F for a week but with no other symptoms, specifically
no abdominal pain. On physical examination there is no jaundice or right upper quadrant
abdominal tenderness. The routine laboratory tests show no abnormalities, including a
normal white blood cell count and normal bilirubin level. Suspecting acute cholecystitis
(gall bladder inflammation) as the cause of the fever, the house staff had obtained a
HIDA scan, and the result was abnormal non- visualization of the gall bladder.

On the basis of the radiologist’s information that the HIDA scan has “97% accuracy”
for diagnosing acute cholecystitis, the intern makes a provisional diagnosis and
recommends surgery to remove the gall bladder. Is this a good decision?
Epidemiology: A Skill for Critically Reading Journals Kidney Genital Urinary System 2023

The radiologist’s report “Gold standard” test


of “97% accuracy” was
based on a published New test
Disease No Disease
result
validation study which
reported the results 118 1
Positive 119 PPV = 99.16%
(95.2%) (0.6%)
against a gold standard
test, shown in the table 6 171
Negative 177 NPV = 99.61%
to the right. (Notice the (4.8%) (99.4%)
high prevalence of 124 172 296
cholecystitis in the
Sens=95.3% Spec=99.4%
validation sample, 42%.)
Prevalence of
the disease = 124/296 = 42%

The experienced attending physician reminded the intern that the sensitivity and specificity
of the test are not directly relevant to the decision for surgery and challenged him to estimate
the PPV and NPV for all patients exactly like this one. The intern correctly filled in the
following table.

1. First he estimated the prior probability of acute cholecystitis from this patient’s medical history.
2. Then from the prior probability
and the total number of patients in “Gold standard” test
the published study (x 10 to get New test
large enough numbers), he filled Disease No Disease
result
in the column totals for Disease
and No Disease. Positive (0.6%) PPV =
3. From these column totals and the (95.2%)
sensitivity and specificity, he
filled in the 4 cells of the table and Negative NPV =
the row totals. (4.8%) (99.4%)
4. From these he calculated the PPV
2,960
and the NPV.
Now, you fill in the table, Sens=95.3% Spec=99.4%
estimate the PPV and NPV, and Prevalence in
Prior
decide whether you would = 100 patients = ??
probability
recommend surgery to diagnose like this one
and treat acute cholecystitis.
Most importantly, what do you think was the prior probability this patient has acute
cholecystitis with no abdominal pain or tenderness, a normal white blood cell count, and
normal serum bilirubin? See the intern’s calculations and decision at the end of this syllabus.
Epidemiology: A Skill for Critically Reading Journals Kidney Genital Urinary System 2023

Correct Answers for the Practice Examples

Calculation of RR, OR and AR

Exposure
Disease Yes No

Yes 50 20

No 1,950 9,980

a+c b+d n
RE = a/(a+c) = 50/2,000 = 0.025

RU = b/(b+d) = 20/10,000 = 0.002

Relative Risk = RE/ RU = 0.025/0.002 = 12.5

Odds ratio = ad/bc = (50 x 9,980)/(20 x 1,950) = 12.8

Attributable Risk = RE - RU = 0.025 – 0.002 = 0.023


or 2.3 lung cancers per 100 participants

Calculation of Number Needed to Treat


Treatment
Stroke Miraculase Placebo

Yes 25 75

No 475 425

a+c b+d n
RT = a/(a+c) = 25/500 = 0.05
RP = b/(b+d) = 75/500 = 0.15

Relative Risk Reduction = 1 - RT/ RP = 1 – 0.05/0.15 = 1 – 0.33 = 0.67

Relative Risk Reduction = 1 - ad/bc = 1 – (25 x 425)/(75 x 475) = 1 – 0.30 = 0.70


Absolute Risk Reduction = RT – RP = 0.05 – 0.15 = -0.10
Number Needed to Treat = 1/ARR = 1/0.10 = 10 (For every 10 people you
treat with Miraculase you prevent 1 stroke. This is a miraculous drug!)
Epidemiology: A Skill for Critically Reading Journals Kidney Genital Urinary System 2023
The Intern’s Calculation of Predictive Values for the Patient

“Gold standard” test


New test Disease No Disease
result
Positive 14 18 32 PPV = 45%
(95.2%) (0.6%)

1 2,927_
Negative (4.8%) (99.4%) 2,929_ NPV = 99.9%

15 2,945_ 2,960

Sens=95.3% Spec=99.4%
Est. prevalence
of the disease = 0.5%

Acute cholecystitis occurs mostly in middle aged women and is unusual in older men. It almost always
presents with severe paroxysmal right upper quadrant pain and tenderness and jaundice and has an
elevated white blood cell count and high serum bilirubin. So the intern estimated that the prevalence
of acute cholecystitis in all patients exactly like this one, without any of these typical findings, would
be something like 0.5% (5 cases in 1,000 patients). The attending thought even that low estimate might
be generous.

He then multiplied his estimated prevalence (0.5%) times the total N of 2,960 to get 15 for the column
total for those with the disease, and filled in the rest of the numbers from there. From his final
calculations, he concluded that the positive HIDA scan gave the patient less than a 50/50 chance of
having acute cholecystitis (PPV = 45%) and decided to change his decision and recommend against
gall bladder surgery to make the diagnosis.

Later that day, as the team was looking for other causes of the fever, the patient almost died of a massive
pulmonary embolism from unsuspected deep vein thrombophlebitis, which had been the source of his
fever all along. Clearly surgery would have been a very bad, possibly fatal, decision.

This case illustrates how a false reliance on the sensitivity and specificity of a diagnostic test can lure
you into making a bad clinical decision that could hurt the patient. To interpret the test results for this
patient, you need to consider your prior probability of the diagnosis based on the patient’s history to
estimate the PPV and NPV, and use these to inform your decision.

You might also like