0% found this document useful (0 votes)
9 views19 pages

Chapter 04

Descriptive epidemiology focuses on exploring data to generate hypotheses about disease patterns, contrasting with analytic studies that test specific causal hypotheses. It encompasses case series, surveillance studies, and morbidity and mortality surveys, which provide insights into disease occurrence based on person, place, and time variables. Key examples include the identification of AIDS through case series and the use of surveillance systems to track health data for public health planning.

Uploaded by

suyantounri2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views19 pages

Chapter 04

Descriptive epidemiology focuses on exploring data to generate hypotheses about disease patterns, contrasting with analytic studies that test specific causal hypotheses. It encompasses case series, surveillance studies, and morbidity and mortality surveys, which provide insights into disease occurrence based on person, place, and time variables. Key examples include the identification of AIDS through case series and the use of surveillance systems to track health data for public health planning.

Uploaded by

suyantounri2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd

4

Descriptive Epidemiology
4.1 Introduction
• What is Descriptive Epidemiology?
• Case Series
• Surveillance Studies
• Morbidity and mortality surveys
4.2 Epidemiologic Variables
• Person
• Place
• Time
4.3 Ecological Studies
• Unit of Observation
• Ecological correlations
• Limitations of ecological correlations
• Advanced topics (Optional): Types of Aggregate-Level Variables

4.1 INTRODUCTION

What is descriptive epidemiology?

Descriptive epidemiology is a term used to refer to epidemiologic studies whose primarily


purpose is to explore data and generate hypotheses. This contrasts with analytic designs
which have as their purpose the testing of specific causal hypotheses. With this said, we
are careful to note the absence of a firm demarcation between descriptive and analytic
studies. All epidemiologic studies serve the objective of advancing the scientific
knowledge base of disease prevention and health promotion. Moreover, most
epidemiologic studies form a continuum of intentions, with many studies serving both
hypothesis generating (descriptive) and hypothesis testing (analytic) purposes. From a
historical and learning perspective, however, it remains useful to distinguish between
studies that tend to fall toward one or the other end of the descriptive-analytic spectrum.

Epidemiologic studies may be viewed as information gathering enterprises. This poem by


Rudyard Kipling reminds us of the type of information we need to gather.

I keep six honest serving men


(They taught me all I know);
Their names are what and why and when
And how and where and who.

The key interrogative words in this poem are What, Who, When, Where, Why, and How
(“five Ws and one H”). The “who, when, where” elements on this checklist correspond to
the traditional elements of descriptive epidemiology: person (who), time (when), and
place (where). The “what, why, and how” questions are usually reserved for more refined
etiologic research, with the “what” referring to studies that identify new syndromes and
complete their clinical picture, “why” referring to the explanations for occurrence, and
the “how” in reference to the disease mechanism.

As previously noted, there is no firm demarcation between descriptive and analytic


epidemiologic studies. Nevertheless, some types of studies tend to fall toward the
descriptive (hypothesis generating) end of the descriptive-analytic spectrum. Three of
these types of studies are case series, surveillance-based studies, and morbidity and
mortality surveys.

Case Series

Case series describe the history and clinical manifestations of a limited number of
patients with a particular disease or syndrome. Information on the population that gave
rise to the cases (“denominator data”) is absent from case series. Therefore, the incidence
and prevalence of the conditions cannot be calculated. In addition, no control group is
present. Nevertheless, the observations of astute clinicians can often signal an emerging
threat and identify hypotheses for further investigation.

Illustrative Example 4.1 (Case Series, Acquired Immune Deficiency Syndrome). In


1981, local clinicians and the Epidemic Intelligence Service Officer stationed at the Los
Angeles County Department of Public Health prepared and submitted a report of five
cases of Pneumocystis pneumonia in previously healthy young men (CDC, 1981). Before
publication, editorial staff at the CDC sent the submission to experts in parasitic and
sexually transmitted diseases who noted that the case histories suggested a cellular-
immune dysfunction and a disease acquired through sexual contact. At about the same
time, the sole distributor of the antifungal drug used to treat Pneumocystis pneumonia (a
drug called pentamidine) began receiving multiple requests for the medicine from
physicians throughout the country. The affected individuals were, again, young men. In
June 1981, CDC developed an investigative team to identify risk factors and to develop a
case definition for national surveillance of this new syndrome. Within a couple of years,
major risks factors for acquired immunodeficiency syndrome (AIDS) had been identified.

Surveillance Studies

Epidemiologic surveillance systems are organizations and structures set up to collect and
analyze outcome-specific health data for planning, carrying out, and evaluating public
health practices (Thacker & Berkelman, 1988). Surveillance activities are usually
authorized by legislative action to address public health needs. Example of national
surveillance systems include the National Notifiable Diseases Surveillance System
(NNDSS), the US Food and Drug Administrations safety and adverse event reporting
program known as MedWatch, and the Surveillance, Epidemiology and End Results
(SEER) which is primary source for cancer statistics in the United States. Results from
surveillance systems are often as a “taking-off” point for more detailed analysis.
Data from surveillance systems are distinguished by their practicability and rapidity. At
the same time, they are often limited in scope and accuracy. For example, they may miss
cases and have only partial information on the cases they’ve identified. In addition, they
could include duplicate cases reported by separate sources. Data are thus insensitive to
subtle changes and, when increases are evident, may indicate an artifact of the reporting
system. This is particularly true when data are based on the voluntary submission of
reports (passive surveillance). Nonetheless, when used judiciously, surveillance system
data may be used to generate case reports, track reporting rates, and signaling
unanticipated problems.

Illustrative Example 4.2 (Surveillance, Suprofen-associated flank pain). Since the Food
and Drugs Act of 1906, the U.S. Food and Drug Administration has been the federal
agency responsible for protecting and promoting public health through the regulation and
supervision of food, cosmetic, drug, and medical-device safety. In this example, data
from the agency’s adverse drug reporting system (passive surveillance) was used to
signal and identify a syndrome of flank-pain and transient renal failure (“flank-pain
syndrome”) caused by a nonsteroidal anti-inflammatory prescription drug called suprofen
(Rossi et al., 1988). FDA adverse event reporting system is based on voluntary reports
submitted by physicians, health care providers, and consumers. The system was
stimulated by “Dear Doctor” letters after initial reports signaled the problem (Figure 4.1).
This ultimately led to withdrawal of the drug from the market by the drug manufacturer.

Figure 4.1. Surveillance of suprofen-associated flank pain syndrome. (Based on Rossi et


al., 1988). {image = [Link]}

Morbidity and Mortality Surveys

Government agencies maintain morbidity and mortality surveys on an ongoing basis to


track the incidence and prevalence of disease and health-related conditions in populations
and population subgroups. In the United States, the agency primarily responsible for
compiling population-based health statistics is the National Center for Health Statistics
([Link]/nchs). In Canada, the comparable [analogous] agency is Statistics Canada
([Link]), and in Great Britain, the Office of National Statistics
([Link]) compiles health statistics. Each of these agencies maintains its
own morbidity and mortality databases. For example, the United States National Center
for Health Statistics maintains health and nutrition surveys (e.g., NHANES, NHIS),
health care surveys (e.g., National Hospital Discharge Survey, National Ambulatory
Health Care Survey, National Hospital Ambulatory Medical Care Survey), and various
vital statistics systems (National Vital Statistics System). The methods employed by these
data systems evolved over time and are documented on [Link]/nchs/.

Cancer statistics in the United States are compiled by the Surveillance, Epidemiology,
and End Results Program, also known as SEER ([Link] SEER collects
and compiles cancer statistics from various geographic regions in the United States and
uses these data to derive statistics on cancer occurrence and survival.
Illustrative Example 4.3 (Morbidity statistics, endometrial cancer). After a long period
of relative stability, endometrial (uterine) cancer rates rose sharply in the 1970s in the
United States, exceeding 10 percent increases per year in some regions. Figure 4.2 plots
the incidence rates of cancer of uterine cancer between 1969 and 1973 in 6 regions for
which data were most complete. Regional differences must be interpreted cautiously
because of differences in cancer types and nonuniform classifications, and differences in
race and age among regions. Data within regions, however, are consistent throughout the
interval. This plot demonstrates increases overall. Further scrutiny suggested that the
sharpest increases were among middle-aged women, where the increases were between
40 to 150 percent during the interval. These increases paralleled large scale increases in
the use of estrogen prescribed for symptoms of menopause and osteoporosis. Analytic
epidemiologic studies that followed confirmed the association, while animal studies
showed estrogen to stimulate proliferation of the cells of the inner lining of the uterus.
These findings led to discontinuing use of unopposed estrogen (estrogen without
progestin) in women with intact uteri.

Figure 4.2. Incidence rates of cancer of the uterine corpus in six regions of the United
States by year, 1969 – 1973. (Based on data in Weiss et al., 1976). {image = [Link]}

Methodological note: Figure 4.2 plots incidence rates of uterine cancer by region. Some
individuals may have the mistaken impression that these incidence rates and other rates
incidence rates derived when studying open populations are longitudinal. However, these
open population incidence rates are not based on individual follow-up and are therefore
“current” or “cross-sectional.” Longitudinal analysis requires the assemblage of a closed
population (cohort) with individual follow-up. For greater insight into the distinction
between current rates and longitudinal rates, see Chapter 3 (Incidence and Prevalence),
Chapter 17 (Survival Analysis) and Chapter 18 (Current Life Tables).

4.2 Epidemiologic Variables


Person

Descriptive epidemiology often studies disease occurrence according to person, place,


and time variables. Let us start by considering person variables.

Person variables address characteristics and attributes of population members. Variation


in disease rates by personal attributes often provide insights into etiologic exposures and
differences in host susceptibility. Table 4.1 lists selected person variables.
TABLE 4.1. Examples of Person Variables
Age Alcohol use
Sex Body mass index
Ethnicity / race Response to social and environmental
Genetic predispositions stressors
Physiologic states (e.g., pregnancy) Educational level
Concurrent disease Socioeconomic status
Immune status Occupation
Physical activity Customs
Marital status Religion
Dietary practices Foreign birth
Knowledge, attitudes and beliefs
Tobacco use

Two of the more important person variables are age and sex. Both of these factors are
associated with physiologic, socio-cultural, and behavioral factors. Age also parallels the
induction period required for degenerative chronic conditions.

Illustrative example 4.4 (“Person,” sports-related injuries by age and sex). Figure 4.3
displays the age and sex distribution of nonfatal sports- and recreation-related injuries
treated in emergency departments for the period July 2000 to June 2001. Rates are high in
all group, but are highest in males between the ages of 10 and 24. This suggests that
prevention efforts to reduce injuries are needed for all population groups, but special
attention should be directed toward young males.

Figure 4.3. Rates of nonfatal unintentional sports- and recreation-related injuries treated
in emergency departments by age and sex, United States, July 2000–June 2001 (Source:
CDC, 2002).{image = [Link]}

Race and ethnicity are related to genetic tendencies, the living habits of individuals, and
to environmental exposures.

Illustrative example 4.5 (“Person,” tuberculosis by ethnicity). Although African


Americans comprise 12% of the U.S. population, they accounted for 33% of the
tuberculosis cases reported in 1997. In addition, 23% of the tuberculosis cases were
Hispanics and 19% were Asians and Pacific Islanders, even though these groups
comprised 11 and 3.5% of the population, respectively (CDC, 2000). High rates of
tuberculosis in these groups are explained in terms of known risk factors such as birth in
a country where tuberculosis is common, HIV infection, and exposure in high-risk
settings such as nursing homes, correctional facilities, and homeless shelters.

A person's occupation is an important health determinant. People spend much of their life
at work, where they are exposed to chemical, physical, biological, and social agents as
part of their occupation. Occupation is also highly correlated with
socioeconomic status and specific behavioral and constitutional
tendencies. All these factors have a large influence on health.

Illustrative Example 4.6 (“Person,” benefits of being a brewery worker in Victorian


England). One of the founding members of the London Epidemiological Society,
William Augustus Guy (1810–1885), made this insightful observation about the rarity of
cholera among brewery workers (Snow, 1855, p. 124):

. . . the brewers’ men seem to have suffered very lightly both in that and
the more recent [cholera] epidemics. The reason of this probably is, that
they never drink water, and are therefore exempted from imbibing the
cholera poison in that vehicle.

Work in the brewing industry in this instance proved salubrious.

Place

Place variables are characteristics of the locale in which people live, work, and visit.
Place variables may be defined in terms of geographic boundaries (e.g., street, city, state,
region, country) or broad environmental classifications (e.g., rural/urban,
domestic/foreign, institutional/noninstitutional). Table 4.2 lists examples of host and
environmental characteristics associated with place.

TABLE 4.2. Host and Environmental Factors Associated with Place


Presence and level of agents Nutritional practices
Presence of vectors that facilitate Occupations
transmission Recreational practices
Socioeconomic differences Urban/rural differences
Genetic characteristics of residents Economic development
Physiologic and anatomic attributes of Social disruptions (e.g., war, natural
residents disasters, economic downturns)
Geology Social norms in behavior
Climate Medical practices
Population density Access to health care
Differences in the incidence or prevalence of disease by place variables may relate to
differences in the makeup of the population or the environment in which they live.

Illustrative example 4.7 (“Place,” breast cancer by country). Figure 4.4 compares
international breast cancer mortality rates for 1958–1959. At that time, Japan’s rate was
one-quarter to one-half that of the other countries listed. This raises questions about
genetic and environmental contributors to breast cancer. Many hypotheses have been
generated to explain this observation. To address whether the differences are attributable
primarily to genetics or environment differences, studies in the United States have
demonstrated that breast cancer rates in Japanese-American women increase over
successive generations (Buell, 1973), suggesting a strong environmental cause. In
addition, rates of breast malignancies in Japan have risen over time, as the Japanese diet
and lifestyle have progressively westernized (Wynder et al., 1991). Environmental
theories that have been put forward to explain low rates of breast cancer in mid-20 th
century Japan include the lengthy breast-feeding and long lactation periods among
traditional Japanese women (Lilienfeld, 1963), the low body weights of Japanese women
(De Waard et al., 1977), dietary differences (Armstrong & Doll, 1975), age at menarche
and onset of regular ovulatory menstrual cycles (Henderson & Bernstein, 1991), and
menstrual cycle length (Wang et al., 1992). Epidemiologic hypotheses are continually
refined, reexamined, and tested. Note that dietary factors independent of body weight
have not been verified over time.

Figure 4.4. Age-adjusted mortality per 100,000 women from breast


cancer in 23 countries, 1958–1959. (Based on data in Segi & Kurihara,
1962, p. 31.) {image = [Link]}

Mapping can be helpful when exploring patterns of occurrence by place. Various types
of mapping procedure may be considered. We may merely locate cases with a dot map.
Snow's celebrated map of the clustering of cholera deaths around the Broad Street pump
(Figure 1.13) provides a historical example of a dot plot. Less celebrated but of no less
importance were Snow’s maps of water distribution in Victorian London.

Illustrative Example 4.8 (“Place,” Victorian Water Supplies). Recall that during the
19th century, drinking water in London was supplied by private companies via networks
of pipes. The two main suppliers in the epidemic areas of London were the Southwark &
Vauxhall Water (S&V) Company and the Lambeth Water Company. Figure 4.5 is a
section of one of Snow’s maps showing water pipe networks during the 1849 cholera
epidemic in London. The map is hatched and cross-hatched areas supplied by S&V
Company, the Lambeth Company, and areas in which the pipes of both companies were
intermingled. Snow showed that the rates of cholera were highest in the S&V supplied
areas, lowest in the Lambeth areas, and intermediate in the mixed usage area, thus
supporting the theory that S&V was disseminating the morbid matter of cholera.

Figure 4.5. A section of John Snow's map showing the distribution of water pipes in 19th
century London. (Markup of a map in the 1936 reprint of Snow, 1855). {image =
[Link]}

Time

The occurrence of disease over time can be analyzed from multiple time-perspectives,
some of which are presented in Table 4.3.

TABLE 4.3. Examples of Time Variables


Calendar time Time under observation
Age (time since birth) Time since diagnosis
Time since exposure to an agent Circadian and other endogenous
Total exposure time physiological rhythms
Endocrinologic cycle Seasonal differences

A common way to explore the time patterns is in the form of an epidemic curve.
Epidemic curves provide insight into the past and future course of the disease. It also
provides insight into the incubation period of the agent. The Y axis of an epidemic curve
represents the number or percent of cases that occurred during the epidemic. The X axis is
a time line. Figure 4.6 demonstrates these temporal patterns of occurrence:

A. Sporadic (occurring rarely and without regularity)


B. Endemic (occurring predictably with only minor or predictable variation)
C. Point epidemic (occurring in clear excess over a time, then rapidly returning to
normal)
D. Propagating epidemic (occurring in clear excess with continuing increases over
time)

Figure 4.6. General patterns of occurrence: (A) sporadic, (B) endemic,


(C) point epidemic, and (D) propagating epidemic. {image = [Link]}

Illustrative Example 4.9 (Epidemic curve, reassessment of the 1854 London cholera
outbreak). Figure 4.7 shows an epidemic curve for the historically important 1854
cholera epidemic investigated by John Snow (Chapter 1). Although John Snow did not
produce an epidemic curve as part of this investigation, Hill’s (1955) analyses suggested
that removal of the handle from the putative source of contamination, Broad Street pump,
was not decisive in ending the epidemic: few susceptibles were left in the area by the
time the pump handle was removed and the epidemic was apparently burning itself out.

Figure 4.7. Epidemic curve of the London cholera epidemic of 1854 (Source of data:
Snow, 1855, p. 49). {image = Figure [Link]}

Some diseases demonstrate predictable seasonal fluctuations. Figure 4.8 exhibits


expected seasonal expectations and threshold for epidemics for pneumonia and influenza
in the United States from 2006 through 2010. When the observed number of cases
exceeds the epidemic threshold for two consecutive weeks, further investigation is
undertaken. Notice that epidemic thresholds were broken in the first part of 2008 and
toward the end of 2009.

Figure 4.8. Pneumonia and influenza mortality for 122 U.S. cities for 2006 - 2010
(Source: [Link] {image = [Link]}

Rates of disease are tracked over time to document current occurrences and
make projections into the future.
Illustrative Example 4.10 (Trend curve, tuberculosis in the United States). Figure 4.9
plots tuberculosis rates from 1953 to 2008 in the United States. In 1953, when nationwide
tuberculosis reporting first began, there were more than 84,000 tuberculosis cases
annually for a rate of 52.6 per 100,000 per year. From 1953 through 1985, the rate of
tuberculosis dropped precipitously. Between 1985 and 1992, however, there was a
modest increase which was traced to the HIV epidemic, increases in immigration from
countries where tuberculosis was endemic, and increases in the transmission of
tuberculosis in high-risk environments such as homeless shelters. In 1993, the upward
trend was reversed and downward trend resumed.

Figure 4.9. Tuberculosis rates per 100,000 population, United States, 1953 - 2008. The
symbol * indicates change in reporting criteria (Source: CDC, 2009). {image = Figure
[Link]}

4.3 ECOLOGICAL STUDIES


Unit of Observation

Before addressing ecological correlations, we must address the unit of observation upon
which data are based. The unit of observation in an epidemiologic study is the level of
human aggregation on which measurements are recorded. This is most often individual
persons, but can also constitute various levels of human aggregates, such as families,
communities, regions, and states. Epidemiologic studies based on aggregate-level data
are called ecological studies.

As an illustration of a study based on individual-level observations, consider data from a


fictitious study on cigarette smoking and lung cancer. Table 4.4 illustrates how such data
might be recorded. Notice that smoking and lung cancer status are recorded for each
individual subject in study.

TABLE 4.4. Example of Person-Level Data

Name Cigarettes per year Lung cancer case

Bill 0 No

Joe 7300 Yes

Dave 3650 No

Etc.

In contrast, an ecological study on the same issue would avail itself of aggregate-level
data, as illustrated in Table 4.5,
TABLE 4.5. Example of Aggregate-Level (Ecological) Data

Region Per capita cigarette Regional lung cancer rate


consumption per year
per 10,000 population

Northwest 480 21

Northeast 1000 46

Southwest 250 11

Etc.

Once data are recorded on an aggregate-level, they cannot be disaggregated to reveal


individual status. They can, however, be used to explore ecological correlations.

Ecological Correlations

A common type of aggregate-level analysis correlates disease rates various


environmental and economic factors by geographic region. In this type of analysis, the
independent variable is the environmental characteristic (e.g., smoking level) and the
dependent variable is the rate of the disease (e.g., lung cancer rate). This type of analysis
provided evidence in untangling some of the causes of the chronic disease epidemics of
the 20th century.

Illustrative Example 4.11 (Ecological correlation, smoking and lung cancer). Table 4.6
lists ecological data for cigarette consumption in 1930 and lung cancer rates in 1950 in 11
countries. Figure 4.10 plots these data. A strong positive correlation between these
factors is observed. Figure 4.11 plots shows how lung cancer rates corresponded with
tobacco consumption in England and Wales in the early part of the century, providing
early support for the smoking–lung cancer hypothesis.

TABLE 4.6. International Comparison of Per-Capita Cigarette Consumption and


Lung Cancer Mortality
Per-Capita Annual Cigarette Lung Cancer Deaths

Country Consumption (1930) per 100,000 (1950)

United States 1300 20

Great Britain 1100 46

Finland 1100 35

Switzerland 510 25

Canada 500 15

Holland 490 24

Australia 480 18

Denmark 380 17

Sweden 300 11

Norway 250 9

Iceland 230 6
Data source: Doll (1955)

Figure 4.10. Lung cancer mortality (males) in 1950 and per-capita consumption of
cigarettes in 1930, various countries (Doll, 1955).{image = [Link]}

Figure 4.11. Mortality from lung cancer, tobacco consumption, and cigarette
consumption, United Kingdom, 1900–1947. The rates are based on 3 year averages for
all years except 1947 (Doll & Hill, 1950). {image = [Link]}

Ecological observations also contributed to our early understanding of the relation


between dietary fat and coronary artery disease. In 1932, around the time of the great
depression, Raab remarked:

The regression of arteriosclerosis during the starvation years in Central Europe,


the relative rarity of atherosclerosis and hypertension among the chiefly
vegetable-consuming inhabitants of China, Africa and Dutch East Indies and
British India . . . on the one hand, and the enormous frequency of arteriosclerosis
and hypertension among the people of Europe and North America who consume
large quantities of eggs, butter, etc., on the other . . . seem to justify the
consideration of causal connection. (Translated by Stamler, 1989, p. S3).
After the depression but before World War II, northern European countries
again experienced notable increases in coronary artery disease. The war years
brought with it prominent reductions in dietary fat consumption, especially in
the lands conquered by Nazi Germany. In Norway, where public health and
vital statistics were carefully maintained during the war, there was a clear and
prominent decline in mortality from circulatory disease. However, after the
war, there was a swift rise in cardiovascular mortality, returning to prewar
levels (Strøm & Jensen, 1951).

Illustrative Example 4.12 (Ecological correlation, dietary fat and


cardiovascular disease). After the WWII, a strong ecological correlation between
dietary fat and coronary mortality was noted. Figure 4.12 is a replica of a graph from one
such early ecological study showing a strong correlation between percentage of calories
derived from fat and coronary mortality rates in 6 countries. One criticism that was raised
at the time suggested that countries with low coronary artery disease mortality and low
fat intake differed from high coronary disease countries in ways besides dietary habits,
notably in their higher rates of physical activity, lower levels of obesity, and lower rates
of smoking. Thus, although these data were supportive of the dietary fat–cardiovascular
disease hypothesis, they were not conclusive by themselves.

Figure 4.12. Atherosclerotic and degenerative heart disease mortality as a function of fat
calories as a percent of total calories (Based on Keys, 1953). {image = [Link]}

In the case of the dietary fat–coronary artery disease hypothesis, more refined
epidemiologic studies and laboratory studies were needed to sort out the effects of diet,
physical activity, genetics, and other factors in the multifactored etiology of coronary
artery disease. Dietary fat is now accepted as a valid component cause of coronary
disease, but even today, our understanding about this relation is far from complete. For
example, the dose–response relation between specific fatty acids, cholesterol, and
coronary heart disease risk have yet to be fully elucidated (Willett, 1990).

Limitations of Ecological Studies

Data that form the basis of ecological studies tend to be incomplete and are not entirely
accurate. In addition, ecological data cannot be used to address the longitudinal
experience of individuals. Finally, ecological data often lack information on the multiple
factors that often contribute to disease occurrence. Thus, ecological studies tend to be fall
toward the descriptive end of the descriptive-analytic spectrum of epidemiologic study
designs.

Illustrative Example 4.13 (The ecological fallacy). Careful consideration of


Farr’s historically important 1852 study on cholera and geographic altitude in 19th century
London provides an opportunity to illustrate one of the limitations of ecological data. Farr
had initially accorded only a small role for contagion as a cause of cholera, placing much
greater emphasis on social and environmental conditions (Eyler, 1980). In 1852 Farr
wrote: “Notwithstanding the disturbance produced by the operation of other causes, the
mortality from cholera in London bore a certain constant relation to the elevation of the
soil, as is evident when the districts are arranged by groups in the order of their altitude.”
Figure 4.13, a replica of a table from Farr’s 1852 paper, contains data on altitude above
sea-level and corresponding cholera mortality rates by neighborhood (along with several
other variables). Figure 4.14 plots these data as a scatter plot. The curved line is the
mathematical model Farr proposed to explain the relationship. Although the model (line)
shows a remarkably good fit with the data, we now know there is no direct causal
connection between these factors. Farr had failed to account for the fact that people living
at low elevations were more likely to draw their water from contaminated sources, and it
was the contaminated water that caused cholera, not the atmosphere associated with low
altitude.

Figure 4.13. London water districts arranged according to their elevation above sea
level. (Source: This is a facsimile of the table that appeared in Farr’s original 1852
article.) {image = [Link]}

Figure 4.14. Scatter plot of Farr’s data. The line represents Farr’s predictive model:
(cholera mortality rate) = 2226/(elevation + 13). {image = [Link]}

Extraneous factors that cause spurious associations are called confounders. The process
that causes this distortion is called confounding bias. In the Illustrative Example 4.13,
for example, the association between elevation and cholera was confounded by proximity
to contaminated water sources. Confounding in ecological studies is referred to as the
ecological fallacy or aggregation bias. Traditionally, the ecological fallacy consists in
thinking that an association seen in the aggregate holds true for individuals when in fact it
does not (Thorndike, 1939; Selvin, 1958).

Although ecological studies no longer play a prominent role is studies of disease etiology,
there is increasing interest in using hybrid designs that incorporate both individual- and
group-level variables. Hybrid designs are particularly useful in untangling relationships
between individual- and group-level risk factors.

Advanced Topic (Optional): Types of Aggregate-Level Variables


Aggregate level variables are not a uniform lot. We may distinguish between aggregative
group property variables and aggregative integral property variables (Selvin, 1963).
Aggregative group property variables are summaries of characteristics of smaller units
within the group. For example, per-capita cigarette consumption is an aggregative group
property since it is a compilation of individual cigarette consumption. In contrast,
aggregative integrative property variables are properties unique to the group and are
not a summary of individual properties. Whether a group has a written constitution, for
instance, is an integrative property.
Susser (1994) considers three distinct types of aggregate-level measurements. Integral
aggregate-level variables are factors that affect all or virtually all individuals within the
group. A social intervention such as a public information campaign, for example, is an
integral variable. Because individuals within groups are relatively homogeneous with
`respect to the integral variable, individual-level measurements are precluded when
working with integral variables.
Contextual variables are derived from a compilation of individual attributes that have an
effect that is beyond the sum of the effects in individuals. For example, the percent of a
population immune to an infectious agent is contextual because herd immunity will
decrease the risk of infection beyond that capable of individual immune status
Finally, contagion variables are simultaneously independent and dependent variables.
Whereas contextual variables are independent variables in their own right, contagion
variables are dependent variables that have an influence on future outcomes and are thus
also independent variables. For example, the prevalence of HIV in a population modifies
the probability an individual will come in contact with HIV in the future. Contagion
variables apply to social and psychological variables that produce their effect as the
product of interacting “contagious” forces.

EXERCISES

4.1 Log on to the CDC Wonder website ([Link]) and


navigate to the cancer statistics database. Your instructor will
identify a specific cancer that you will describe according to: (a)
age, (b) year, (c) age and year and, possibly, (d) gender. You
will then research the biology and natural history of the disease
Use an online medical reference (e.g.,
[Link]/pubs/mmanual) to learn about the diagnosis,
treatment, and natural history of the cancer you just described
epidemiologically. What different histological types of the
cancer exist? How is the disease staged? List known causes of
the disease. With this as background, generate hypotheses to
explain the trends you observed earlier in the exercise.

4.2 A national survey of college students found that 3314 of 17,096 respondents met
the criteria for being a frequent binge drinker (Wechsler et al., 1994). Binge
drinking was defined as having five or more alcoholic beverages three or more
times in the past 2-week period.

(A) Explain why this study is cross-sectional and not longitudinal.

(B) Data were self-reported. How might this influence the study results?

(C) The response rate was only 69% How might this influence the study results?

4.3 Figure 4.15 display a scatter plot matrix from an ecological study on cigarette
consumption and cancers of the urinary tract (Fraumeni, 1968). Table 4.7 is a
correlation matrix for selected variables from the data. Interpret these results.

TABLE 4.7. Ecological correlation between cigarette consumption, cancers of the urinary
tract, and leukemia (Source: Fraumani, 1968).

bladder kidney
cigarettes cancer lung cancer cancer leukemia
sold per deaths per deaths per deaths per deaths per
capita 100,000 100,000 100,000 100,000

cigarettes sold per Pearson 1 .704 .697 .487 -.068


capita Correlation

Sig. (2-tailed) .000 .000 .001 .659

N 44 44 44 44 44

bladder cancer deaths Pearson .704 1 .659 .359* .162


per 100,000 Correlation

Sig. (2-tailed) .000 .000 .017 .293

N 44 44 44 44 44

lung cancer deaths Pearson .697 .659 1 .283 -.152


per 100,000 Correlation

Sig. (2-tailed) .000 .000 .063 .326

N 44 44 44 44 44

kidney cancer deaths Pearson .487 .359* .283 1 .189


per 100,000 Correlation

Sig. (2-tailed) .001 .017 .063 .220

N 44 44 44 44 44

leukemia deaths per Pearson -.068 .162 -.152 .189 1


100,000 Correlation
TABLE 4.7. Ecological correlation between cigarette consumption, cancers of the urinary
tract, and leukemia (Source: Fraumani, 1968).

Sig. (2-tailed) .659 .293 .326 .220

N 44 44 44 44 44
REFERENCES

Armstrong, B., & Doll, R. (1975). Environmental factors and cancer incidence and
mortality in different countries, with special reference to dietary practices.
International Journal of Cancer, 15, 617–631.

Buell, P. (1973). Changing incidence of breast cancer in Japanese-American women.


Journal of the National Cancer Institute, 51, 1479–1483.

CDC. (1981). Pneumocystis pneumonia--Los Angeles. MMWR Morb Mortal Wkly Rep,
30(21), 250-252.

CDC. (2001). First report of AIDS. MMWR Morb Mortal Wkly Rep, 50(21), 429.

CDC. (2002). Nonfatal sports- and recreation-related injuries treated in emergency


departments–United States, July 2000–June 2001. MMWR, 51, 736–740.

CDC. (2009). Reported Tuberculosis in the United States, 2008. Atlanta, GA: U.S.
Department of Health and Human Services. Available:
[Link]

De Waard, F., Cornelis, J. P., Aoki, K., & Yoshida, M. (1977). Breast cancer incidence
according to weight and height in two cities of the Netherlands and in Aichi
prefecture, Japan. Cancer, 40, 1269–1275.

Doll, R., & Hill, A. B. (1950). Smoking and carcinoma of the lung. British Medical
Journal, 2, 739–748.

Doll, R. (1955). Etiology of lung cancer. Advances Cancer Research, 3, 1–50.

Eyler, J. M. (1980). The conceptual origins of William Farr’s epidemiology: Numerical


methods and social thought in the 1830s. In A. M. Lilienfeld (Ed.), Time, places, and
persons (pp. 1–21). Baltimore: Johns Hopkins University Press.

Farr, W. (1852). Influence of elevation on the fatality of cholera. Journal of Statistical


Society of London, 15, 155–183.

Fraumeni, J. F., Jr. (1968). Cigarette smoking and cancers of the urinary tract: geographic
variation in the United States. Journal of the National Cancer Institute, 41(5),
1205-1211.

Henderson, B. E., & Bernstein, L. (1991). The international variation in breast cancer
rates: An epidemiological assessment. Breast Cancer Res Treat, 18 Suppl 1, S11–17.

Hill, A. B. (1955). Snow; an appreciation. Proc R Soc Med, 48(12), 1008-1012.


Keys, A. (1953). Atherosclerosis: A problem in newer public health. Journal of Mt Sinai
Hospital, 20, 118–139.

Lilienfeld, A. M. (1963). The epidemiology of breast cancer. Cancer Research, 23,


1503–1513.

Raab, W. (1932). Alimentare Foktoren in der enstenhung von arteriosklerose and


hypertonie. Med Klin, 28, 487–521.

Rossi, A. C., Bosco, L., Faich, G. A., Tanner, A., & Temple, R. (1988). The importance
of adverse reaction reporting by physicians. Suprofen and the flank pain
syndrome. JAMA, 259(8), 1203-1204.

Segi, M., & Kurihara, M. (1963). Trends in cancer mortality for selected sites in 24
countries, 1950–1959. Sendai, Japan: Department of Public Health, Tohoku
University School of Medicine.

Selvin, H. (1958). Durkheim’s suicide and problems of empirical research. American


Journal of Sociology, 63, 607–619.

Selvin, H. (1963). The empirical classification of formal groups. American Sociological


Review, 28, 399–411.

Snow, J. (1849). On the pathology and mode of communication of cholera. London


Medical Gazette, 44, 745–752.

Snow, J. (1855). On the mode of communication of cholera (1936 Reprint), Snow on


cholera, being a reprint of two papers by John Snow, M.D., together with a
biographical memoir by B. W. Richardson, M.D., and an introduction by Wade
Hampton Frost, M.D. New York: The Commonwealth Fund.

Stamler, J. (1989). Opportunities and pitfalls in international comparisons related to


patterns, trends and determinants of CHD mortality. International Journal of
Epidemiology, 18, S3–18.

Strøm, A., & Jensen, A. R. (1951). Mortality from circulatory diseases in Norway 1940–
45. Lancet, 260, 126.

Susser, M. (1994). The logic in ecological: I. The logic of analysis. American Journal of
Public Health, 84, 825–829.

Thacker, S. B., & Berkelman, R. L. (1988). Public health surveillance in the United
States. Epidemiologic Reviews, 10, 164–190.

Thorndike, E. L. (1939). On the fallacy of imputing and correlations found for groups to
the individuals or smaller groups composing them. American Journal of Psychology,
52, 122–124.

Wang, Q. S., Ross, R. K., Yu, M. C., Ning, J. P., Henderson, B. E., & Kimm, H. T.
(1992). A case-control study of breast cancer in Tianjin, China. Cancer Epidemiol
Biomarkers Prev, 1, 435–439.

Wechsler, H., Davenport, A., Dowdall, G., Moeykens, B., & Castillo, S. (1994). Health
and behavioral consequences of binge drinking in college. A national survey of
students at 140 campuses. Jama, 272(21), 1672-1677.

Willett, W. (1990). Nutritional epidemiology. New York: Oxford University Press.

Wynder, E. L., Fujita, Y., Harris, R. E., Hirayama, T., & Hiyama, T. (1991). Comparative
epidemiology of cancer between the United States and Japan. A second look. Cancer,
67, 746–763.

Yerushalmy, J., & Hilleboe, H. E. (1957). Fat in the diet and mortality from heart disease.
New York State Journal of Medicine, 57, 2343-2354.

You might also like