Chapter 04
Chapter 04
Descriptive Epidemiology
4.1 Introduction
• What is Descriptive Epidemiology?
• Case Series
• Surveillance Studies
• Morbidity and mortality surveys
4.2 Epidemiologic Variables
• Person
• Place
• Time
4.3 Ecological Studies
• Unit of Observation
• Ecological correlations
• Limitations of ecological correlations
• Advanced topics (Optional): Types of Aggregate-Level Variables
4.1 INTRODUCTION
The key interrogative words in this poem are What, Who, When, Where, Why, and How
(“five Ws and one H”). The “who, when, where” elements on this checklist correspond to
the traditional elements of descriptive epidemiology: person (who), time (when), and
place (where). The “what, why, and how” questions are usually reserved for more refined
etiologic research, with the “what” referring to studies that identify new syndromes and
complete their clinical picture, “why” referring to the explanations for occurrence, and
the “how” in reference to the disease mechanism.
Case Series
Case series describe the history and clinical manifestations of a limited number of
patients with a particular disease or syndrome. Information on the population that gave
rise to the cases (“denominator data”) is absent from case series. Therefore, the incidence
and prevalence of the conditions cannot be calculated. In addition, no control group is
present. Nevertheless, the observations of astute clinicians can often signal an emerging
threat and identify hypotheses for further investigation.
Surveillance Studies
Epidemiologic surveillance systems are organizations and structures set up to collect and
analyze outcome-specific health data for planning, carrying out, and evaluating public
health practices (Thacker & Berkelman, 1988). Surveillance activities are usually
authorized by legislative action to address public health needs. Example of national
surveillance systems include the National Notifiable Diseases Surveillance System
(NNDSS), the US Food and Drug Administrations safety and adverse event reporting
program known as MedWatch, and the Surveillance, Epidemiology and End Results
(SEER) which is primary source for cancer statistics in the United States. Results from
surveillance systems are often as a “taking-off” point for more detailed analysis.
Data from surveillance systems are distinguished by their practicability and rapidity. At
the same time, they are often limited in scope and accuracy. For example, they may miss
cases and have only partial information on the cases they’ve identified. In addition, they
could include duplicate cases reported by separate sources. Data are thus insensitive to
subtle changes and, when increases are evident, may indicate an artifact of the reporting
system. This is particularly true when data are based on the voluntary submission of
reports (passive surveillance). Nonetheless, when used judiciously, surveillance system
data may be used to generate case reports, track reporting rates, and signaling
unanticipated problems.
Illustrative Example 4.2 (Surveillance, Suprofen-associated flank pain). Since the Food
and Drugs Act of 1906, the U.S. Food and Drug Administration has been the federal
agency responsible for protecting and promoting public health through the regulation and
supervision of food, cosmetic, drug, and medical-device safety. In this example, data
from the agency’s adverse drug reporting system (passive surveillance) was used to
signal and identify a syndrome of flank-pain and transient renal failure (“flank-pain
syndrome”) caused by a nonsteroidal anti-inflammatory prescription drug called suprofen
(Rossi et al., 1988). FDA adverse event reporting system is based on voluntary reports
submitted by physicians, health care providers, and consumers. The system was
stimulated by “Dear Doctor” letters after initial reports signaled the problem (Figure 4.1).
This ultimately led to withdrawal of the drug from the market by the drug manufacturer.
Cancer statistics in the United States are compiled by the Surveillance, Epidemiology,
and End Results Program, also known as SEER ([Link] SEER collects
and compiles cancer statistics from various geographic regions in the United States and
uses these data to derive statistics on cancer occurrence and survival.
Illustrative Example 4.3 (Morbidity statistics, endometrial cancer). After a long period
of relative stability, endometrial (uterine) cancer rates rose sharply in the 1970s in the
United States, exceeding 10 percent increases per year in some regions. Figure 4.2 plots
the incidence rates of cancer of uterine cancer between 1969 and 1973 in 6 regions for
which data were most complete. Regional differences must be interpreted cautiously
because of differences in cancer types and nonuniform classifications, and differences in
race and age among regions. Data within regions, however, are consistent throughout the
interval. This plot demonstrates increases overall. Further scrutiny suggested that the
sharpest increases were among middle-aged women, where the increases were between
40 to 150 percent during the interval. These increases paralleled large scale increases in
the use of estrogen prescribed for symptoms of menopause and osteoporosis. Analytic
epidemiologic studies that followed confirmed the association, while animal studies
showed estrogen to stimulate proliferation of the cells of the inner lining of the uterus.
These findings led to discontinuing use of unopposed estrogen (estrogen without
progestin) in women with intact uteri.
Figure 4.2. Incidence rates of cancer of the uterine corpus in six regions of the United
States by year, 1969 – 1973. (Based on data in Weiss et al., 1976). {image = [Link]}
Methodological note: Figure 4.2 plots incidence rates of uterine cancer by region. Some
individuals may have the mistaken impression that these incidence rates and other rates
incidence rates derived when studying open populations are longitudinal. However, these
open population incidence rates are not based on individual follow-up and are therefore
“current” or “cross-sectional.” Longitudinal analysis requires the assemblage of a closed
population (cohort) with individual follow-up. For greater insight into the distinction
between current rates and longitudinal rates, see Chapter 3 (Incidence and Prevalence),
Chapter 17 (Survival Analysis) and Chapter 18 (Current Life Tables).
Two of the more important person variables are age and sex. Both of these factors are
associated with physiologic, socio-cultural, and behavioral factors. Age also parallels the
induction period required for degenerative chronic conditions.
Illustrative example 4.4 (“Person,” sports-related injuries by age and sex). Figure 4.3
displays the age and sex distribution of nonfatal sports- and recreation-related injuries
treated in emergency departments for the period July 2000 to June 2001. Rates are high in
all group, but are highest in males between the ages of 10 and 24. This suggests that
prevention efforts to reduce injuries are needed for all population groups, but special
attention should be directed toward young males.
Figure 4.3. Rates of nonfatal unintentional sports- and recreation-related injuries treated
in emergency departments by age and sex, United States, July 2000–June 2001 (Source:
CDC, 2002).{image = [Link]}
Race and ethnicity are related to genetic tendencies, the living habits of individuals, and
to environmental exposures.
A person's occupation is an important health determinant. People spend much of their life
at work, where they are exposed to chemical, physical, biological, and social agents as
part of their occupation. Occupation is also highly correlated with
socioeconomic status and specific behavioral and constitutional
tendencies. All these factors have a large influence on health.
. . . the brewers’ men seem to have suffered very lightly both in that and
the more recent [cholera] epidemics. The reason of this probably is, that
they never drink water, and are therefore exempted from imbibing the
cholera poison in that vehicle.
Place
Place variables are characteristics of the locale in which people live, work, and visit.
Place variables may be defined in terms of geographic boundaries (e.g., street, city, state,
region, country) or broad environmental classifications (e.g., rural/urban,
domestic/foreign, institutional/noninstitutional). Table 4.2 lists examples of host and
environmental characteristics associated with place.
Illustrative example 4.7 (“Place,” breast cancer by country). Figure 4.4 compares
international breast cancer mortality rates for 1958–1959. At that time, Japan’s rate was
one-quarter to one-half that of the other countries listed. This raises questions about
genetic and environmental contributors to breast cancer. Many hypotheses have been
generated to explain this observation. To address whether the differences are attributable
primarily to genetics or environment differences, studies in the United States have
demonstrated that breast cancer rates in Japanese-American women increase over
successive generations (Buell, 1973), suggesting a strong environmental cause. In
addition, rates of breast malignancies in Japan have risen over time, as the Japanese diet
and lifestyle have progressively westernized (Wynder et al., 1991). Environmental
theories that have been put forward to explain low rates of breast cancer in mid-20 th
century Japan include the lengthy breast-feeding and long lactation periods among
traditional Japanese women (Lilienfeld, 1963), the low body weights of Japanese women
(De Waard et al., 1977), dietary differences (Armstrong & Doll, 1975), age at menarche
and onset of regular ovulatory menstrual cycles (Henderson & Bernstein, 1991), and
menstrual cycle length (Wang et al., 1992). Epidemiologic hypotheses are continually
refined, reexamined, and tested. Note that dietary factors independent of body weight
have not been verified over time.
Mapping can be helpful when exploring patterns of occurrence by place. Various types
of mapping procedure may be considered. We may merely locate cases with a dot map.
Snow's celebrated map of the clustering of cholera deaths around the Broad Street pump
(Figure 1.13) provides a historical example of a dot plot. Less celebrated but of no less
importance were Snow’s maps of water distribution in Victorian London.
Illustrative Example 4.8 (“Place,” Victorian Water Supplies). Recall that during the
19th century, drinking water in London was supplied by private companies via networks
of pipes. The two main suppliers in the epidemic areas of London were the Southwark &
Vauxhall Water (S&V) Company and the Lambeth Water Company. Figure 4.5 is a
section of one of Snow’s maps showing water pipe networks during the 1849 cholera
epidemic in London. The map is hatched and cross-hatched areas supplied by S&V
Company, the Lambeth Company, and areas in which the pipes of both companies were
intermingled. Snow showed that the rates of cholera were highest in the S&V supplied
areas, lowest in the Lambeth areas, and intermediate in the mixed usage area, thus
supporting the theory that S&V was disseminating the morbid matter of cholera.
Figure 4.5. A section of John Snow's map showing the distribution of water pipes in 19th
century London. (Markup of a map in the 1936 reprint of Snow, 1855). {image =
[Link]}
Time
The occurrence of disease over time can be analyzed from multiple time-perspectives,
some of which are presented in Table 4.3.
A common way to explore the time patterns is in the form of an epidemic curve.
Epidemic curves provide insight into the past and future course of the disease. It also
provides insight into the incubation period of the agent. The Y axis of an epidemic curve
represents the number or percent of cases that occurred during the epidemic. The X axis is
a time line. Figure 4.6 demonstrates these temporal patterns of occurrence:
Illustrative Example 4.9 (Epidemic curve, reassessment of the 1854 London cholera
outbreak). Figure 4.7 shows an epidemic curve for the historically important 1854
cholera epidemic investigated by John Snow (Chapter 1). Although John Snow did not
produce an epidemic curve as part of this investigation, Hill’s (1955) analyses suggested
that removal of the handle from the putative source of contamination, Broad Street pump,
was not decisive in ending the epidemic: few susceptibles were left in the area by the
time the pump handle was removed and the epidemic was apparently burning itself out.
Figure 4.7. Epidemic curve of the London cholera epidemic of 1854 (Source of data:
Snow, 1855, p. 49). {image = Figure [Link]}
Figure 4.8. Pneumonia and influenza mortality for 122 U.S. cities for 2006 - 2010
(Source: [Link] {image = [Link]}
Rates of disease are tracked over time to document current occurrences and
make projections into the future.
Illustrative Example 4.10 (Trend curve, tuberculosis in the United States). Figure 4.9
plots tuberculosis rates from 1953 to 2008 in the United States. In 1953, when nationwide
tuberculosis reporting first began, there were more than 84,000 tuberculosis cases
annually for a rate of 52.6 per 100,000 per year. From 1953 through 1985, the rate of
tuberculosis dropped precipitously. Between 1985 and 1992, however, there was a
modest increase which was traced to the HIV epidemic, increases in immigration from
countries where tuberculosis was endemic, and increases in the transmission of
tuberculosis in high-risk environments such as homeless shelters. In 1993, the upward
trend was reversed and downward trend resumed.
Figure 4.9. Tuberculosis rates per 100,000 population, United States, 1953 - 2008. The
symbol * indicates change in reporting criteria (Source: CDC, 2009). {image = Figure
[Link]}
Before addressing ecological correlations, we must address the unit of observation upon
which data are based. The unit of observation in an epidemiologic study is the level of
human aggregation on which measurements are recorded. This is most often individual
persons, but can also constitute various levels of human aggregates, such as families,
communities, regions, and states. Epidemiologic studies based on aggregate-level data
are called ecological studies.
Bill 0 No
Dave 3650 No
Etc.
In contrast, an ecological study on the same issue would avail itself of aggregate-level
data, as illustrated in Table 4.5,
TABLE 4.5. Example of Aggregate-Level (Ecological) Data
Northwest 480 21
Northeast 1000 46
Southwest 250 11
Etc.
Ecological Correlations
Illustrative Example 4.11 (Ecological correlation, smoking and lung cancer). Table 4.6
lists ecological data for cigarette consumption in 1930 and lung cancer rates in 1950 in 11
countries. Figure 4.10 plots these data. A strong positive correlation between these
factors is observed. Figure 4.11 plots shows how lung cancer rates corresponded with
tobacco consumption in England and Wales in the early part of the century, providing
early support for the smoking–lung cancer hypothesis.
Finland 1100 35
Switzerland 510 25
Canada 500 15
Holland 490 24
Australia 480 18
Denmark 380 17
Sweden 300 11
Norway 250 9
Iceland 230 6
Data source: Doll (1955)
Figure 4.10. Lung cancer mortality (males) in 1950 and per-capita consumption of
cigarettes in 1930, various countries (Doll, 1955).{image = [Link]}
Figure 4.11. Mortality from lung cancer, tobacco consumption, and cigarette
consumption, United Kingdom, 1900–1947. The rates are based on 3 year averages for
all years except 1947 (Doll & Hill, 1950). {image = [Link]}
Figure 4.12. Atherosclerotic and degenerative heart disease mortality as a function of fat
calories as a percent of total calories (Based on Keys, 1953). {image = [Link]}
In the case of the dietary fat–coronary artery disease hypothesis, more refined
epidemiologic studies and laboratory studies were needed to sort out the effects of diet,
physical activity, genetics, and other factors in the multifactored etiology of coronary
artery disease. Dietary fat is now accepted as a valid component cause of coronary
disease, but even today, our understanding about this relation is far from complete. For
example, the dose–response relation between specific fatty acids, cholesterol, and
coronary heart disease risk have yet to be fully elucidated (Willett, 1990).
Data that form the basis of ecological studies tend to be incomplete and are not entirely
accurate. In addition, ecological data cannot be used to address the longitudinal
experience of individuals. Finally, ecological data often lack information on the multiple
factors that often contribute to disease occurrence. Thus, ecological studies tend to be fall
toward the descriptive end of the descriptive-analytic spectrum of epidemiologic study
designs.
Figure 4.13. London water districts arranged according to their elevation above sea
level. (Source: This is a facsimile of the table that appeared in Farr’s original 1852
article.) {image = [Link]}
Figure 4.14. Scatter plot of Farr’s data. The line represents Farr’s predictive model:
(cholera mortality rate) = 2226/(elevation + 13). {image = [Link]}
Extraneous factors that cause spurious associations are called confounders. The process
that causes this distortion is called confounding bias. In the Illustrative Example 4.13,
for example, the association between elevation and cholera was confounded by proximity
to contaminated water sources. Confounding in ecological studies is referred to as the
ecological fallacy or aggregation bias. Traditionally, the ecological fallacy consists in
thinking that an association seen in the aggregate holds true for individuals when in fact it
does not (Thorndike, 1939; Selvin, 1958).
Although ecological studies no longer play a prominent role is studies of disease etiology,
there is increasing interest in using hybrid designs that incorporate both individual- and
group-level variables. Hybrid designs are particularly useful in untangling relationships
between individual- and group-level risk factors.
EXERCISES
4.2 A national survey of college students found that 3314 of 17,096 respondents met
the criteria for being a frequent binge drinker (Wechsler et al., 1994). Binge
drinking was defined as having five or more alcoholic beverages three or more
times in the past 2-week period.
(B) Data were self-reported. How might this influence the study results?
(C) The response rate was only 69% How might this influence the study results?
4.3 Figure 4.15 display a scatter plot matrix from an ecological study on cigarette
consumption and cancers of the urinary tract (Fraumeni, 1968). Table 4.7 is a
correlation matrix for selected variables from the data. Interpret these results.
TABLE 4.7. Ecological correlation between cigarette consumption, cancers of the urinary
tract, and leukemia (Source: Fraumani, 1968).
bladder kidney
cigarettes cancer lung cancer cancer leukemia
sold per deaths per deaths per deaths per deaths per
capita 100,000 100,000 100,000 100,000
N 44 44 44 44 44
N 44 44 44 44 44
N 44 44 44 44 44
N 44 44 44 44 44
N 44 44 44 44 44
REFERENCES
Armstrong, B., & Doll, R. (1975). Environmental factors and cancer incidence and
mortality in different countries, with special reference to dietary practices.
International Journal of Cancer, 15, 617–631.
CDC. (1981). Pneumocystis pneumonia--Los Angeles. MMWR Morb Mortal Wkly Rep,
30(21), 250-252.
CDC. (2001). First report of AIDS. MMWR Morb Mortal Wkly Rep, 50(21), 429.
CDC. (2009). Reported Tuberculosis in the United States, 2008. Atlanta, GA: U.S.
Department of Health and Human Services. Available:
[Link]
De Waard, F., Cornelis, J. P., Aoki, K., & Yoshida, M. (1977). Breast cancer incidence
according to weight and height in two cities of the Netherlands and in Aichi
prefecture, Japan. Cancer, 40, 1269–1275.
Doll, R., & Hill, A. B. (1950). Smoking and carcinoma of the lung. British Medical
Journal, 2, 739–748.
Fraumeni, J. F., Jr. (1968). Cigarette smoking and cancers of the urinary tract: geographic
variation in the United States. Journal of the National Cancer Institute, 41(5),
1205-1211.
Henderson, B. E., & Bernstein, L. (1991). The international variation in breast cancer
rates: An epidemiological assessment. Breast Cancer Res Treat, 18 Suppl 1, S11–17.
Rossi, A. C., Bosco, L., Faich, G. A., Tanner, A., & Temple, R. (1988). The importance
of adverse reaction reporting by physicians. Suprofen and the flank pain
syndrome. JAMA, 259(8), 1203-1204.
Segi, M., & Kurihara, M. (1963). Trends in cancer mortality for selected sites in 24
countries, 1950–1959. Sendai, Japan: Department of Public Health, Tohoku
University School of Medicine.
Strøm, A., & Jensen, A. R. (1951). Mortality from circulatory diseases in Norway 1940–
45. Lancet, 260, 126.
Susser, M. (1994). The logic in ecological: I. The logic of analysis. American Journal of
Public Health, 84, 825–829.
Thacker, S. B., & Berkelman, R. L. (1988). Public health surveillance in the United
States. Epidemiologic Reviews, 10, 164–190.
Thorndike, E. L. (1939). On the fallacy of imputing and correlations found for groups to
the individuals or smaller groups composing them. American Journal of Psychology,
52, 122–124.
Wang, Q. S., Ross, R. K., Yu, M. C., Ning, J. P., Henderson, B. E., & Kimm, H. T.
(1992). A case-control study of breast cancer in Tianjin, China. Cancer Epidemiol
Biomarkers Prev, 1, 435–439.
Wechsler, H., Davenport, A., Dowdall, G., Moeykens, B., & Castillo, S. (1994). Health
and behavioral consequences of binge drinking in college. A national survey of
students at 140 campuses. Jama, 272(21), 1672-1677.
Wynder, E. L., Fujita, Y., Harris, R. E., Hirayama, T., & Hiyama, T. (1991). Comparative
epidemiology of cancer between the United States and Japan. A second look. Cancer,
67, 746–763.
Yerushalmy, J., & Hilleboe, H. E. (1957). Fat in the diet and mortality from heart disease.
New York State Journal of Medicine, 57, 2343-2354.