0% found this document useful (0 votes)
205 views20 pages

Biodiversity Informatics: Norman F. Johnson

yes

Uploaded by

Edy Budiman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
205 views20 pages

Biodiversity Informatics: Norman F. Johnson

yes

Uploaded by

Edy Budiman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

ANRV297-EN52-20 ARI 21 November 2006 10:30

Biodiversity Informatics
Norman F. Johnson
Department of Entomology, The Ohio State University, Columbus,
Ohio 432121157; email: [email protected]
Annu. Rev. Entomol. 2007.52:421-438. Downloaded from www.annualreviews.org
Access provided by 36.74.238.33 on 11/20/16. For personal use only.

Annu. Rev. Entomol. 2007. 52:42138 Key Words


First published online as a Review in collections, databases, taxonomy
Advance on September 6, 2006

The Annual Review of Entomology is online at Abstract


ento.annualreviews.org
Biodiversity informatics is an emerging eld that applies informa-
This articles doi: tion management tools to the management and analysis of species-
10.1146/annurev.ento.52.110405.091259
occurrence, taxonomic character, and image data. A wide and grow-
Copyright  c 2007 by Annual Reviews. ing range of tools is available for both curators and researchers. The
All rights reserved
development and implementation of formal data exchange standards
0066-4170/07/0107-0421$20.00 and query protocols have made it possible to integrate data holdings
from collections around the world. The current technological en-
vironment is summarized; protocols, standards, and tools for data
management, sharing, and integration are reviewed; and methods
and tools for analyzing species-occurrence and character data are
examined. Direct access to primary data and imagery has the power
to transform the means by which taxonomy is practiced and its results
disseminated to the general community.

421
ANRV297-EN52-20 ARI 21 November 2006 10:30

INTRODUCTION cause even a modest-sized insect collection is


one or two orders of magnitude larger than
The worlds major insect collections house
herbaria or vertebrate collections with com-
Metadata: data hundreds of millions of specimens. Each spec-
about data parable stafng and budgets. However, the
imen bears at least in theory one label on
large size of insect collections makes effec-
Biodiversity which the metadata for that specimen are
informatics: tive management of information all the more
recorded. These data collectively document
application of important. Entomology has more species and
much of what is known about the diversity,
information more data points (i.e., specimens in a mu-
geographic distribution, and phenology of in-
technologies to the seum) than perhaps any other discipline and
management, sects around the world. Collections of recent
therefore has much to contribute to the un-
algorithmic insects also are snapshots of the entomofauna
exploration, analysis, derstanding of the worlds biodiversity. The
going back, in some cases, to the beginning of
and interpretation of data in both public and private collections are
the nineteenth century. Thus, they have the
primary data also relevant to all of biology, both basic and
potential to serve as baselines from which to
regarding life
Annu. Rev. Entomol. 2007.52:421-438. Downloaded from www.annualreviews.org

applied aspects, from molecular to ecosystem


document, among other things, environmen-
studies.
tal changes arising from natural or anthro-
Access provided by 36.74.238.33 on 11/20/16. For personal use only.

pogenic sources.
Although these data are nominally acces-
sible to researchers, for all practical purposes TECHNOLOGICAL
they have been unavailable to the general ENVIRONMENT
community. As a result, the investments that The basic tool of information management
have been made in acquiring, processing, and is the database. Most commercially available
storing the specimens and their data often products either are relational databases or em-
provide little in scientic knowledge. In the ulate them. A wide range of books is avail-
publication process, the link to the underly- able on relational theory (30). At its core, a
ing primary data usually is broken. After pub- relational database stores information in one
lication, data from newly collected material or more two-dimensional arrays called ta-
frequently inspired by that piece of research bles or relations. Each row of a table should
often cannot be incorporated into the collec- be a uniquely identiable occurrence of the
tive understanding. These problems are not concept modeled by the table; the columns
inherent in the data themselves. Rather, they represent different attributes of that occur-
arise from the tools that have traditionally rence. The value of one or more attributes
been used to manage and disseminate infor- that uniquely dene each row is that tables
mation, primarily paper-based publications. A primary key. The values recorded in each cell
new suite of tools that can effectively address of the table should be atomic, that is, bro-
these limitations is now available. ken down to represent indivisible values. Spe-
Biodiversity informatics has been dened cic rows may be located either by searching
as the application of information technolo- through the entire table, guratively from top
gies to the management, algorithmic explo- to bottom, or by using an index, a map of the
ration, analysis and interpretation of primary location of values within a table. An index oc-
data regarding life, particularly at the species cupies space in the computer but can signi-
level of organization (86). These data doc- cantly reduce search times in large tables.
ument primarily the occurrence of organ- Different types of information are repre-
isms in space and time. The development sented in different tables. The relationships
of this eld has been driven largely by the among these data are expressed by the use of
botanical and vertebrate research communi- foreign keys: In addition to its own primary
ties; entomologists, as a whole, have been re- key, a table may have a column that references
luctant participants. This is so possibly be- the value of the primary key of another table.

422 Johnson
ANRV297-EN52-20 ARI 21 November 2006 10:30

In a recursive relationship, the table uses its indicate emotional emphasis. Thus, an appli-
own primary key as a foreign key. This con- cation that seeks to nd and extract scientic
struct is useful for modeling a hierarchy, such names from an HTML document must rely
Standard: an agreed
as a taxonomic classication. on formatting or contextual hints to identify upon format or
Structured query language (SQL; pro- items of interest. Such page scraping re- structure
nounced s-q-l or sequel) is an industry quires intelligence to be built into the appli- Application:
standard used to add, retrieve, or update in- cation. Because each page on the Web is po- software that uses
formation within a database. Many commer- tentially unique in its formatting and can be the capability of a
cial products offer graphical user interfaces changed at will by the data provider, such ap- computer to perform
a task
for these same purposes or may have pro- plications are highly unstable.
cedural language extensions that allow for Extended markup language (XML) (38) XML: extended
markup language
more elaborate programming. In this con- provides a mechanism by which to commu-
text, the development of a standard is a for- nicate the semantic content of items. The
string <ScienticName>Musca domestica
Annu. Rev. Entomol. 2007.52:421-438. Downloaded from www.annualreviews.org

mal process in which a proposal is developed,


opened for comment and revision, and even- </ScienticName> explicitly indicates that
Access provided by 36.74.238.33 on 11/20/16. For personal use only.

tually adopted by means of a vote of the stan- the text is to be interpreted as a scientic
dards body. Once adopted, adherence to the name. There is no constraint on the number
standard by hardware or software develop- or meaning of XML tags used; these tags are
ers provides others with a stable base toward dened, usually externally, by a specialized
which to work and assures users of minimal document, an XML schema. A typical Web
performance. browser does not know what to do with such a
The Internet is now one of the most im- tag as <ScienticName>, and most browsers
portant mechanisms for the dissemination of ignore such things when formatting the text
information. Most database products provide for display. However, an application that
a mechanism, usually a form, for formulat- understands the underlying XML schema
ing a query to the database and displaying the can properly identify entities within the
response. The primary format for this is hy- document and the relationships between
pertext markup language (HTML) (71). This them. XML is not really intended to be read
is a set of tags, most of which serve to indi- by humans; rather, it is a medium of exchange
cate to a piece of software, typically a Web of data between software applications. An
browser, the manner in which the informa- XML style sheet can be used to process the
tion is to be displayed. For example, text that is information contained and present it in a
contained between the tag <em> and </em> format suitable for human consumption.
will be displayed in the browser in a prede-
ned font and style to indicate that the text
is emphasized, often by an italic font. The DATA CAPTURE AND STORAGE
HTML standard provides a number of tags
that, in the hands of talented and imaginative
Specimen-Occurrence Data
designers, can generate an amazing range of The term specimen occurrence encompasses
content. HTML has an important limitation: both an individual that is captured (or subsam-
The text and images displayed have no inher- pled) as well as observations (17): The same
ent meaning. That is, the string <em>Musca basic data apply to each. Observations have
domestica</em> when seen by an entomol- the disadvantage that the identity of the taxon
ogist is readily recognized, both by the words cannot be independently veried or updated
and the italic text, as a scientic name of a as taxonomies are revised. Specimens, on the
common species. To a software application, other hand, are essentially a snapshot of an in-
though, this interpretation is not obvious: The dividual at one point in time in its ontogeny.
text could just have easily been italicized to For this review, specimen occurrence refers

www.annualreviews.org Biodiversity Informatics 423


ANRV297-EN52-20 ARI 21 November 2006 10:30

to both physical specimens as well as observa- levels of quality assurance (62). Although the
tions. fundamental data elements are identical, dif-
Early formal attempts to explicitly enu- ferences in organization lead to signicantly
Protocol: the
structure of messages merate the types of data that document different protocols for retrospective data cap-
used to communicate specimen occurrences and the relationships ture (i.e., of information from material already
between computers between these data were developed by the incorporated into the collection) and prospec-
ECN: Entomology Association of Systematics Collections (ASC) tive data capture (i.e., information for new col-
Collections Network (3) and Colwell (19). The ASC information lections) (98, 99).
model is a fairly extensive general model, For new material, the specimens are typi-
although originally it did not extend into cally derived from a small number of collect-
the domains of literature. Colwells biota ing events, and they share all data elements
model is a specic database implementation. except determination, sex, or other individual
Most recently, the data elements involved, characteristics that are recorded. Therefore,
Annu. Rev. Entomol. 2007.52:421-438. Downloaded from www.annualreviews.org

and to some degree their interrelationships, the data entry personnel may enter the bulk
have been enumerated in the form of XML of the data only once, updating unique identi-
Access provided by 36.74.238.33 on 11/20/16. For personal use only.

schemas (1, 29). ers or taxonomy only as needed. This process


The basic elements that comprise can progress rapidly, with relatively little ex-
specimen-occurrence data are the place of pense and minimal opportunities for errors to
collection (where), collecting date (when), arise.
collectors (who), method of collection (how), In contrast, previously existing curated
and taxonomic name of the specimen (what). material is typically organized only by taxon.
The data elements may be supplemented Therefore, the information for every speci-
by the depository of the specimen, associ- men may be different, and the data must be
ations with other specimens, preparation entered individually. Retrospective data cap-
history, storage, and regime. Data relevant ture, as a result, often has a low priority
to biodiversity also include multimedia (e.g., because of increased costs, difculty, and the
images and sounds), literature, characters often accurate perception that the data them-
and their states, and analytical products, selves are generally of lower quality. Protocols
such as phylogenies and identication tools. for such work seek to homogenize sets of spec-
Database schemas can be as simple or as imens, typically by organizing them by col-
elaborate as needed to accomplish the users lecting events (place, time, agent, and method
requirements. They can be as simple as a of collection). This, however, requires han-
single table (i.e., a single spreadsheet page) or dling of specimens and, as a result, there is in-
as complex as dozens or hundreds of tables. creased opportunity for breakage. In addition,
There is no single best solution: Highly most insect collections house specimens from
complex, fully normalized schemas offer many different orders, a broader range than
the advantages of high levels of generality, the taxonomic expertise of any single individ-
exibility, and integrity. On the other hand, ual or small staff. The result is that identi-
they are more difcult to design, master, and cations below the level of family or subfamily
query, and may impose performance costs. rely on the work of outside experts who have
personally examined the specimens. The lim-
itations of retrospective data capture, both in
Concerns for Entomology terms of numbers of experts and the time that
The large size of entomological collections they have available, mean that many determi-
means that the process of data capture is a sig- nations are outdated at best. Because of this
nicant enterprise, in terms of both time and range of difculties, the Entomology Collec-
money. Well-dened protocols are necessary tions Network (ECN) has advocated that ret-
to minimize costs and to maintain acceptable rospective data capture be conducted as part

424 Johnson
ANRV297-EN52-20 ARI 21 November 2006 10:30

of the process of taxonomic research (98, 99), nent landmarks. Estimates of the error arising
thus affording greater authority for identi- from both low accuracy and precision are crit-
cations as well as taking advantage of the ical for users of the data to assess their tness
taxonomists ability to more accurately deci- for use (16, 18).
pher laconic or incomplete data, or bad hand- Tools available to facilitate the process
writing. The advantages of such an approach of georeferencing include BioGeoMancer
are clear, but this effort can become a pre- (9) and GEOLocate (46). Geographic name
scription for inaction. A preferable approach servers are freely available for the United
is to indicate the tness for use of the data, States (45), Canada (14), Mexico (58),
e.g., by indicating the names of the deter- Argentina (43), Australia (41), New Zealand
miners and the date at which the identica- (72), Germany (44), Austria (5), Italy (59),
tion was made. The risk is that the data will United Kingdom (70), and South Africa (88).
be misapplied; the advantage is that the ex- Several servers have global coverage (2, 47,
Annu. Rev. Entomol. 2007.52:421-438. Downloaded from www.annualreviews.org

istence of the specimens and their data are 50).


made public. Errors can then be recognized Errors in data reduce their tness for use.
Access provided by 36.74.238.33 on 11/20/16. For personal use only.

and corrected by the specialists, resulting in a No amount of training or expertise can elim-
more efcient and productive use of time and inate 100% of such errors. Therefore, it is
expertise. important that data be reviewed on a regu-
lar basis to nd and correct mistakes. General
principles of data cleaning are discussed by
Georeferencing Chapman (15); Guralnick & Neufeld (53) de-
One of the critical tasks in data capture is the scribe a protocol for monitoring the quality of
process of georeferencing, that is, the conver- georeferencing.
sion of descriptions of locations at which spec-
imens were collected into a common coordi-
nate system. This typically is a value-added Authority Files
component in the process of retrospective Authority les, of which the drop-down lists
data capture. Global positioning system re- are one example, are useful mechanisms to de-
ceivers largely eliminate this step for new col- crease error rates. Despite their name, author-
lections. However, georeferencing is probably ity les are not necessarily authoritative, but
the most signicant bottleneck in the digiti- provide a standard from which data can be se-
zation process. Effective management of this lected or against which records may be com-
step requires clear protocols, and a number of pared. Many places are well known as classical
tools are available and under development to collecting localities; a prominent example is
facilitate the process (66). Nova Teutonia, Brazil, from which thousands
The Mammal Networked Information of specimens were collected by Fritz Plau-
System (MaNIS) provides an excellent re- mann (unpublished observations). Unfortu-
view of the issues involved in georeferencing nately, there is not yet a georeferenced listing
a place name (66). The most thorough rep- of such localities for use by the general en-
resentation of a locality requires a pair of co- tomological community. The botanical com-
ordinates (latitude and longitude), elevation munity has developed a series of standards for
or depth (with explicit or implied units of some elements of occurrence data. Examples
measure), geodetic datum, and error estimate include geographic names (54), the authors of
(along with units). Online digital gazetteers plant names (12), the structure of plant names
(see below) provide a means of nding the ge- (11), and abbreviations for herbaria (55). With
ographic coordinates for place names (usually the exception of the unofcial but widely
populated places), as well as for other promi- used codens (abbreviations for the names of

www.annualreviews.org Biodiversity Informatics 425


ANRV297-EN52-20 ARI 21 November 2006 10:30

collections) to identify insect collections (37), Databases


such resources generally do not exist for
Two different approaches to database de-
entomology.
GBIF: Global sign are commonly encountered. A species
Biodiversity The value of standardization of the data
database records information at the level of
Information Facility themselves compared to the value of standard-
a taxon, typically the species. Fauna Europaea
ization of data elements is debatable. Taxo-
(39) is a prominent example of this approach.
nomic names are a conspicuous exception: For
Such an approach is an efcient means of stor-
any species-group, genus-group, or family-
ing data for rapid retrieval. Its weakness is ex-
group taxon at any point in time, there is
posed when specimens are reidentied. For
only one correct and valid full representation
example, if specimens from Germany identi-
of its name. There is an active and growing
ed as taxon A later are found to belong to
effort to develop global, electronically acces-
taxon B, does taxon A occur in Germany or
sible taxonomic authority les for insects by
not?
Annu. Rev. Entomol. 2007.52:421-438. Downloaded from www.annualreviews.org

specialists in these groups. Examples in the


Specimen-level databases or event-driven
format of searchable databases exist for the in-
databases (the event being either the collec-
Access provided by 36.74.238.33 on 11/20/16. For personal use only.

sect orders Diptera (100), Hymenoptera (60,


tion of specimens or, possibly, their acces-
76), Orthoptera (33), Neuropterida (56), Tri-
sion into a collection) can avoid this prob-
choptera (101), and Siphonaptera (40). Sim-
lem. Their weakness, however, is in the sheer
ple HTML lists exist for Collembola (6),
magnitude of data stored. For a typical collec-
Odonata (84), Embiidina (83), Zoraptera (97),
tion, this approach may increase the number
and Mecoptera (78). Authority les for the
of data points by two or three orders of mag-
fauna of continents are available for Europe
nitude. Database engines and disk storage ca-
(39), temperate North America (75), and Aus-
pacity can now store and manage datasets of
tralia (4). In large part, all of these are cur-
this size, but it inescapably results in slower
rently works in progress. The citations of
responses queries.
these particular resources should not be con-
A number of applications, designed for
strued as a denitive and comprehensive list-
both individual scientists and institutions,
ing: A large number of lists that vary in geo-
are currently available for specimen-level
graphic and taxonomic scope are available on
databases. Widely used applications in ento-
the Internet.
mology include Specify (90), Biota (19),
There is as yet little standardization among
and KE EMu (61); an extensive list of appli-
taxonomic authority les for either data struc-
cations has been collated by Berendsohn (7).
ture or functionality of the electronic inter-
A common complaint regarding most soft-
faces. Data aggregators, such as the Inte-
ware packages is that they are too complex,
grated Taxonomic Information System (ITIS)
meaning, presumably, that the requirements
(57), Species2000 (89), the Universal Bio-
of the user are exceeded by the applications
logical Indexer and Organizer (uBIO) (103),
design. As a result, the systematics community
and the Electronic Catalogue of Names of
continues to reinvent the wheel by preferring
Known Organisms (ECAT) (34) program of
to design and develop idiosyncratic applica-
the Global Biodiversity Information Facil-
tions rather than to adopt existing solutions.
ity (GBIF), are working to provide mecha-
This behavior signicantly contributes to the
nisms for merging such efforts into a seam-
lack of adoption of standards. The complex-
less whole. However, the documentation and
ity of the applications arises directly from the
quality assurance for the underlying lists con-
complexity of the data involved. Glossing over
tinue toand shouldreside with those indi-
complications almost inevitably leads to ap-
viduals who have the taxonomic expertise and
plications that are useful only in their initial,
motivation to develop and maintain commu-
limited context.
nity resources.

426 Johnson
ANRV297-EN52-20 ARI 21 November 2006 10:30

DATA SHARING AND ination of primary biodiversity data, so that


INTEGRATION people from all countries can benet from
the use of the information (48). It has four
The choices that are made in the adoption or TDWG:
primary work programs. Data Access and Taxonomic Database
development of software applications should
Database Interoperability aims to facilitate Working Group
be driven by the needs of the user. A database
free global access to biodiversity information, ABCD: Access to
application used strictly for internal adminis-
establish data standards for biodiversity con- Biological
tration of a museums collection (e.g., tracking
tent and its exchange, develop a broad range of Collections Data
loans, monitoring accessions, etc.) is required
well-dened biodiversity data services, create TCS: Taxonomic
to meet only those immediate needs. Such ad-
linkages between biological and nonbiologi- Concept Transfer
ministrative chores were much of the driving
cal information, and enable a global network Schema
force in the early development of information
to accelerate scientic investigation of global
models and database applications for biodi-
biodiversity. Digitisation of Natural History
versity. It is rare that a single collection would
Annu. Rev. Entomol. 2007.52:421-438. Downloaded from www.annualreviews.org

Collections seeks to facilitate the digitization


have the breadth and depth to accurately doc-
of legacy and newly acquired primary species-
ument the distribution and biology of any sig-
Access provided by 36.74.238.33 on 11/20/16. For personal use only.

occurrence data and the dynamic accessibility


nicant taxon. For that purpose, integration
of the resulting data. The Electronic Cata-
of the information stored in collections, ide-
logue of Names of Known Organisms pro-
ally all collections, is required.
vides Web access to the currently used names
Once data are ready to be shared with oth-
and synonyms for all 1.8 million described
ers, then a common language, a structure un-
species of organisms. Outreach and Capacity
derstood by both data provider and receiver,
Building seeks to ensure that all users access
is needed for exchange and integration of in-
and use digital biodiversity information.
formation. This common language is built on
Two current XML schemas are widely used
data standards, and the development and im-
for occurrence data themselves: Darwin Core
plementation of standards is an active area for
(29) and Access to Biological Collections Data
many data domains. Commonly, these stan-
(ABCD) (1). Darwin Core provides a lim-
dards are expressed as XML schemas.
ited set of basic elements associated with col-
Standards development for biodiversity
lections data. These elements both dene a
informatics is coordinated largely through
specimen record and serve as an access point
the Taxonomic Database Working Group
(i.e., an element that can form the basis of a
(TDWG) (95) in conjunction with the GBIF
query). Extensions to the core can provide ad-
(48). TDWG began with a rather narrow
ditional elements for specialized applications.
focus on taxonomy and botany. Despite its
The ABCD schema is highly detailed and aims
rather anachronistic name, it is a taxonomi-
to provide a complete set of data elements
cally broad, worldwide effort focused on the
for natural history collection items. Although
development, vetting, adoption, and distribu-
both schemas are applicable to any organism,
tion of standards for data. Inevitably, with
ABCD has many more details that are taxon
the focus today on the Internet and elec-
specic. Darwin Core is the most common
tronic data sharing, the standards under devel-
standard used in the GBIF data provider net-
opment have moved toward XML schemas.
work, probably partly due to its simplicity (see
These goals are relevant to all taxonomic
below).
groups, but TDWG membership remains bi-
The Taxonomic Concept Transfer Schema
ased toward the plant community and has little
(TCS) (94) is designed to provide a mecha-
input from entomology.
nism to exchange data concerning the names
The GBIF is an internationally sup-
of organisms. The basic element in the schema
ported organization with the mission of
is the taxonomic concept, a structure that em-
. . .facilitating digitization and global dissem-
bodies the idea that a text string, usually but

www.annualreviews.org Biodiversity Informatics 427


ANRV297-EN52-20 ARI 21 November 2006 10:30

not necessarily a formal scientic name, rep- of string such as the use of any number. But a
resents a particular authoritys assertion about piece of software cannot differentiate between
the circumscription of a taxon. Thus, it in- the meaning of the use of the term Head in
DELTA:
DEscriptive cludes not only the name itself, but also an ac- the phrase Head longer than wide. . . and
Language for cording to clause that can differentiate his- that in the phrase South Carolina: Hilton
Taxonomy torical or regional concepts of a taxon. TCS Head. . .. Attempts to develop XML stan-
SDD: structure of does not explicitly include the taxon itself dards that can be used to indicate the seman-
descriptive data in the model: Taxonomic concepts reference tic elements within taxonomic documents are
DiGIR: Distributed only the taxon implicitly. One concern with under way, but none have yet been formally
Generic Information this schema is that there is no clear criterion adopted. Weitzman & Lyal (104) are devel-
Retrieval to indicate when one authors concept of a oping taXMLit, using as a testbed the Bi-
taxon is sufciently different from anothers ologia Centrali-Americana, which is an impor-
to justify the recognition of a new concept. tant baseline of information on the ora and
Annu. Rev. Entomol. 2007.52:421-438. Downloaded from www.annualreviews.org

Uncontrolled concept ination would make fauna of the Americas. TaxonX (77) is under
this standard impractical. At the time of this development with the objective of delimiting
Access provided by 36.74.238.33 on 11/20/16. For personal use only.

writing, implementation of TCS is limited. the common semantic elements in taxonomic


Information on the intrinsic attributes of treatments such as the description, synonymy,
organisms, i.e., characters, is most simply en- and material-examined sections.
coded in a taxon-by-character matrix. A wide Today, the predominant means of querying
variety of formats have been used; many ap- a Web-accessible database is probably the use
plications for phylogenetic analysis have con- of HTML forms. With these, the user enters,
verged on the Nexus standard (65). DEscrip- by typing or by selecting from a menu, the pa-
tive Language for Taxonomy (DELTA) (22, rameters of the query. These values are then
25, 26) was designed specically for computer passed on to the database management sys-
processing of characters and the production tem. Although it is possible to write a program
of natural language descriptions. The XML that can submit a query to such a database, es-
schema SDD (structure of descriptive data) sentially by mimicking the action of the form,
(92) is a standard recently adopted by TDWG the difculty is that the query form for each
that is designed to provide an exchange format data provider is unique and subject to frequent
for character data. Insofar as SDD meets with change in details. Thus, the automated query
approval of the general community and tools mechanism quickly falls out of synchrony with
are developed for its implementation, it can the individual data provider.
provide a mechanism for transparently mov- Standardized Web services can provide a
ing data between different applications, ide- more stable means of sending queries to data
ally with no loss of information. As with TCS, providers. Essentially, a Web service sends and
there are yet few data providers or applications receives data using the hypertext transfer pro-
that implement SDD. tocol (HTTP), the same protocol used for
Although information on taxa and char- typical Web pages. The Distributed Generic
acters are largely the domain of expertise of Information Retrieval (DiGIR) protocol (32)
the biodiversity community, an extensive in- was developed by the biodiversity informatics
frastructure dealing with literaturethe li- community as a means of passing a query to
brary and publishing communitiesalready a data provider and dening the structure of
exists. The standards they have developed, the desired response. The message is an XML
however, minimally address the actual con- document that denes the method (i.e., the
tent of a document and the meaning of the action to be taken), the lter (i.e., the ques-
words or phrases within it. A document can tion being asked), and, optionally, the desired
be searched for a particular text string or, response structure. A limited number of ba-
with slightly more programming, for a type sic methods are possible: search, metadata,

428 Johnson
ANRV297-EN52-20 ARI 21 November 2006 10:30

inventory, and status. For the purposes of GBIF provides a UDDI registry for its net-
querying a biodiversity data provider, the work of biodiversity data providers (49). As of
search option asks the provider for the data this writing, over 9 million records are avail-
UDDI: Universal
elds specied in the response section for all able from 167 providers around the world. Description,
the records of specimens that match the lter. Some of the database applications avail- Discovery and
The lter is composed of access points, e.g., able for handling collection data include Integration
the elds of the Darwin Core, logical opera- the software needed to provide data using
tors (and, or, not), and comparison operators DiGIR protocols and Darwin Core elds. Al-
( = , <,. , >,
. , =, like). The code snippet be- ternatively, data provider software is available
low requests records for specimens from Ohio from GBIF (42). Probably few database ap-
of the family Acrididae (whitespace is unim- plications directly use either the Darwin Core
portant in XML). or ABCD structure to store data. Rather, they

<lter>
Annu. Rev. Entomol. 2007.52:421-438. Downloaded from www.annualreviews.org

<and>
Access provided by 36.74.238.33 on 11/20/16. For personal use only.

<equals><StateProvince>Ohio</StateProvince></equals>
<equals><Family>Acrididae</Family></equals>
</and>
</lter>

The DiGIR protocol is the primary query either map their internal structure to one of
interface for the GBIF data provider net- those XML schemas or save a snapshot of their
work. An alternative query format, the Bio- data in a single table with the necessary elds.
logical Collection Access Service (BioCASe) The internal structure of the database, there-
(10) was developed in conjunction with the fore, is hidden from outside view. The data
ABCD schema. It is widely used among data provider has full control over the access to
providers in the European Union. TAPIR data and can limit, modify, or even deny ac-
(TDWG Access Protocol for Information Re- cess to information on sensitive species. The
trieval) (93) is currently being developed to data provider is also free to adopt or develop
unite these two basic methods of querying col- database software that best suits his/her needs
lection databases. and programming ability.
Simple Object Access Protocol (SOAP) To combine results from a number of dif-
(13) is a protocol for messaging between com- ferent data providers, a user rst sends a
puters. It is basically an XML document com- DiGIR query to each data provider. Portal
prising an envelope, encoding for datatypes, software that can use DiGIR and Darwin
and encoding for remote procedure calls and Core to query data providers and to aggre-
responses. The generality and wide use of gate the results is available (42). The replies,
SOAP messaging in the Internet world sug- in the form of XML messages, can then be
gest that it will assume a prominent place in aggregated, sorted, and processed. Data for
the future of biodiversity informatics. a specimen may be held in more than one
The existence of a Web service on the In- database. Thus, some process is required to
ternet is of little value if potential users do recognize that two records from different data
not know that it is available. Universal De- sources actually refer to the same physical
scription, Discovery and Integration (UDDI) object. Within one collections database, the
provides a mechanism for data providers to primary key uniquely identies a specimen.
register the services that they provide, to indi- In entomology, this key is usually a bar code
cate their location (URL) on the Internet, and (99); the standard recommended by the ECN
to specify how to access those services. The is an alphabetic string, which identies the

www.annualreviews.org Biodiversity Informatics 429


ANRV297-EN52-20 ARI 21 November 2006 10:30

organization producing and storing the data nity. Most collecting techniques are capable
(such as those in Reference 37), and a se- of capturing such huge numbers of individu-
quential number. This information should als that identifying all of them is impractical.
be printed both as a bar code and as Many species are unidentiable to both the
human-readable text. However, when data are beginning student and the taxonomic expert.
aggregated from different collections, and po- This list could continue, but the practical im-
tentially from different disciplines, this com- port is that there are very few localities for
bination may not ensure a globally unique which the entire entomofauna is condently
identier (GUID) for the specimen. The known. Increased collecting effort almost in-
problem of dening such GUIDs is now being variably results in the discovery of additional
addressed through TDWG and GBIF (96). species, as evidenced in species-accumulation
curves, which, theoretically, should approach
an asymptote as the count approaches the true
DATA ANALYSIS
Annu. Rev. Entomol. 2007.52:421-438. Downloaded from www.annualreviews.org

value. In fact, because of time and energy


constraints, typical curves at best show only
Species Richness
Access provided by 36.74.238.33 on 11/20/16. For personal use only.

a small reduction in the rate at which new


A primary purpose for digitizing and provid- species are collected.
ing ready access to specimen-occurrence data Measures of species richness are statistical
is to use the information to address scientic measures, and a number of algorithms have
or management questions. Character analysis been developed to estimate the magnitude of
is a comparatively advanced eld, and many richness, i.e., number of species, and con-
algorithms are available for phylogenetic anal- dence intervals on that estimate (21, 52). The
yses (81). The availability of new volumes varying assumptions of estimators produce,
and kinds of data has spawned new analytical unsurprisingly, different results. There is no
tools that make it possible to address quanti- consensus on best estimators of species rich-
tatively longstanding, important questions for ness, and, typically, a range of estimators is
biodiversity. used. Software (20) is available for using these
The most straightforward measure of the estimators.
number of species occurring in an area
is a simple tabulation of species observed.
For large-bodied, sessile, apparent organisms, Geographic Distributions
such as trees, the primary practical limitations One of the most important concepts to extract
of this technique are the ability to identify the from species-occurrence data is a hypothe-
organism and the time and manpower avail- sis of geographic distribution. Such models
able to nd, examine, and identify each indi- are critical for making land-use decisions, for
vidual. Protocols are also well established in predicting the probability of success for bio-
ornithology to conduct surveys, taking into logical control introductions, and for predict-
account effort expended and the relative ap- ing the potential negative impact of invasive
parency of different species of birds (36, 73). species (see Reference 17 for more examples).
Estimates of species richness of insects, how- However, geographic distributions of insects
ever, must account for additional factors. The are rarely directly observable. Distributions
biology of many species is poorly known at are usually extrapolated from occurrence data.
best. Therefore, distinguishing vagrants from The literature is rife with descriptions of
truly rare local species is hardly possible and the distribution of plants and animals across
apparency is difcult to quantify. A wide range the planet. Rarely, however, are these hy-
of collecting techniques is available, each sam- potheses quantied or expressed with any pre-
pling with different efciencies from the true cision. One method of representing a distribu-
full complement of species in the commu- tion is to place dots on maps, in which the dots

430 Johnson
ANRV297-EN52-20 ARI 21 November 2006 10:30

represent the locations where individuals were bility of occurrence, sometimes interpreted as
either collected or observed. A large number its geographic distribution, then is a function
of mechanisms, from simple image manipula- of the values of those variables. Some models,
GARP: Genetic
tion programs to full-blown geographic infor- such as BIOCLIM (74), dene a set of vari- Algorithm for
mation systems, are available for this purpose ables for predicting the distribution. A typical Ruleset Production
(68). The typical weaknesses of this method- set might include a combination of temper-
ology are that the map projection is unspeci- ature and precipitation values, e.g., average
ed; the actual collecting data are not speci- annual temperature, maximum monthly pre-
ed; the area covered by the dot on the map cipitation, precipitation in warmest quarter of
may be very large; and the absence of dots the year, etc. The value for each of the vari-
may indicate either lack of collecting or true ables is determined for each of the localities
absence of the species. Sometimes authors at- from which specimens have been collected or
tempt to supplement the dots-on-maps ap- observed. Suitable habitat is then dened as
Annu. Rev. Entomol. 2007.52:421-438. Downloaded from www.annualreviews.org

proach by drawing a bounding line to indicate those geographic areas in which the value for
the hypothesized limits to distribution. The each of the variables falls between the maxi-
Access provided by 36.74.238.33 on 11/20/16. For personal use only.

problem here is that the decision of where to mum and minimum values observed for the
draw that line is based at best on unspoken as- specimens. A distinction between marginal
sumptions or inside knowledge on the part of and more suitable habitats may be made by
the author. At worst, these lines are entirely dening the latter as being bounded by, for
arbitrary. example, the ninety-fth and fth percentile
Just as the actual number of species in a values for each variable, and thus taking into
community, a geographic distribution is dy- account the relative abundance of specimens
namic. Distributions change over time, per- and disregarding individual extreme values.
haps in a cyclical fashion, perhaps in response An algorithm such as BIOCLIM is rela-
to more general, long-term factors such as hu- tively simple to dene and calculate when the
man population growth or climate change. A environmental data are available. It has some
distribution, therefore, may not be expressed important drawbacks, however. The number
best as an area delimited by solid black lines, of variables possible is unlimited, and there
but as a probability surface, and those proba- is no a priori manner to determine which are
bilities may uctuate through time. The data best at predicting the distribution of any given
that go into producing a model of the distri- species. Absence data are typically not in-
bution should be accessible and the methods cluded, even if available. The model identies
used to produce the model should be clearly only areas in which, according to the variables
stated. used, the habitat is suitable for the species to
The methods for modeling geographic occur. It fails to factor in whether those areas
distribution typically rely on the correlation are accessible to the species in question (87).
between environmental variables and the ob- Another class of models uses methods of
served presence or absence of the species articial intelligence to predict distributions.
at particular localities. Because of insects Basically these methods divide the dataset of
vagility, lack of apparency, and difculty in observations of presence or absence into two
identication, it is rare that one can deni- parts: a training set and a test set. A model is
tively state that a taxon does not occur in a constructed from the data in the training set;
locality. Presence data are more clear-cut; va- its accuracy in predicting distribution is tested
grancy is still an issue, but ideally this can be against the test set. The model is then mod-
detected by measures of relative abundance. ied, retested, modied, retested, etc., un-
A commonly adopted approach is to model til an acceptable level of accuracy is reached.
the niche of the species of interest (87) on the One widely used method, Genetic Algorithm
basis of environmental variables. The proba- for Ruleset Production (GARP) (31), uses an

www.annualreviews.org Biodiversity Informatics 431


ANRV297-EN52-20 ARI 21 November 2006 10:30

algorithm that mimics the process of natural taxa with wide distributions are not (105).
selection to modify and, ultimately, improve With explicit models of distribution, there-
models. Such a methodology allows the re- fore, the relative endemism of a taxon can be
searcher to use any set of environmental vari- quantied and the total level of endemism and
ables as predictors of distribution. For ex- extent of areas of endemism can be identied
ample, the network of highways within the on biological rather than political terms.
United States might not, at rst, appear to A number of organizations have, with great
have much to do with predicting the distribu- fanfare, drawn the spotlight to so-called bio-
tion of a species, but the distance from a road diversity hotspots, areas with reportedly high
may in fact be a good predictor of the presence values of species richness or endemism. The
of a weed species. Ultimately, though, even underlying empirical basis and methodolo-
these models depend on the extent of the envi- gies for such delimitations are rarely critically
ronmental layers against which the presence- examined. The availability and application
Annu. Rev. Entomol. 2007.52:421-438. Downloaded from www.annualreviews.org

absence data are compared. A desktop version of objective, testable models of distribution
of GARP modeling software is available on- would strengthen the scientic credibility of
Access provided by 36.74.238.33 on 11/20/16. For personal use only.

line (31). A large number of modeling tech- such demarcations and enable researchers to
niques are now available; Segurado & Araujo test the proposition that subsets of taxa (such
(85) provide an evaluation of several of these as owering plants, vertebrates, or butteries)
techniques, and Stockman et al. (91) speci- are capable of serving as proxies for overall
cally critique the accuracy of the GARP mod- richness and endemism.
eling approach. LifeMapper (63) was a recent, now sus-
One advantage of explicit models of dis- pended, effort to build upon federated bio-
tribution is that they allow the researcher diversity databases to build a library of dis-
to ask what if questions. How, for exam- tribution models. This project included a
ple, might the distribution of an endangered distributed computing component in which
species be affected by an increase in average the calculation of GARP models was spread
annual temperature (67, 79, and references through the world community as a screen
therein)? How far might an exotic species saver computation. The lack of a sufcient
spread if it invades the country and becomes body of electronically accessible data that con-
established (80)? A caveat is necessary, though: formed to community data standards has put
The scale at which the primary data were the project on hold. However, the library of
recorded and georeferenced, i.e., their accu- distributions, although currently off-line, is a
racy and precision, must be compatible with community resource that could support a wide
the scale at which the results are to be used. range of fruitful research.
Data for which there is an error of 1 km
are inappropriate for mapping the distribu-
tion of individuals of an endangered species. Identifications
This is another example of the importance of The classical tools for the identication of or-
recording estimated georeferencing errors so ganisms are images and dichotomous keys.
that users can assess the tness for use of the Optimally, these two resources are used to-
data (16, 18). gether to enable users to tap into and apply the
Discussions of endemism typically begin collective expertise of the community of taxo-
with the a priori denition of an area of in- nomic specialists. The traditional reliance on
terest; for example, how many species are en- hard-copy publications, however, placed lim-
demic to the Everglades National Park? How- its both on the numbers of images that could
ever, endemism is essentially the reciprocal of be used and on the structure of the keys. Inter-
the area of a species distribution: Taxa with active, or multiple-entry, keys provide a ready
limited distributions are highly endemic, and mechanism to sidestep such limitations (28).

432 Johnson
ANRV297-EN52-20 ARI 21 November 2006 10:30

Such tools have a relatively long history, with have been adopted or are under development.
early ones often marketed as expert systems. The Universal Biological Indexer and Orga-
Over the past 40 years many different applica- nizer Project (103) is working to develop tools
tions have been developed (24), and new ones that can contend with the practical difcul-
continue to be released. ties of taxonomy and effectively use the names
The DELTA project developed Intkey (22, of organisms to nd and organize the pub-
26, 27), one of the most widely used inter- lished information on every aspect of their bi-
active key applications. Lucid (64) is a re- ology. The International Commission on Zo-
cent commercial application that is becoming ological Nomenclature has set as one of its
widely used in the entomological commu- goals the development of an on-line registry
nity. Recent releases allow the key to be run for all animal names (82). Such a registry, op-
from a Web browser in addition to its stand- timally, would provide a mechanism for re-
alone version. Lucid 3 and the Electronic searchers to locate all newly proposed names,
Annu. Rev. Entomol. 2007.52:421-438. Downloaded from www.annualreviews.org

Field Guide project (35) can use data in SDD as well as documentation of the descriptions
format, thus freeing the application from a and specimens used to validate those names.
Access provided by 36.74.238.33 on 11/20/16. For personal use only.

unique data format and potentially permit- Many of these projects, and more, are accessi-
ting data to be freely transported to and from ble primarily through the World Wide Web:
other SDD-compliant applications. Dallwitz through project sites, wikis (forums for inter-
(23) provides a comparison of some interac- active discussion), and blogs. Other forums
tive key programs. for new developments include the electronic
Images form an important part of many journal Biodiversity Informatics (8), the GBIF
biodiversity studies, not only in identica- home page (48), and meetings and Web sites
tion keys, but also in descriptions of taxa and of groups interested in these issues, such as
habitat, morphometric studies, and even doc- TDWG (95) and ECN.
umentation of data-capture from specimen Entomology collections around the world
labels, especially if the labels are written in probably house more specimens and more
non-Latin alphabets or in nonalphabetic lan- data than any other subgroup of natural his-
guages. Dissemination of high-resolution im- tory museums. This information is impor-
ages over the Internet is usually constrained by tant not only for the discipline of entomology
bandwidth considerations. At the very least, a itself, but for wider concerns of conserva-
user should be warned of the size of the im- tion, ecology, and evolution. Yet these data re-
age that is being accessed. MorphBank (69) main hidden within the collections. The data
aims to provide a secure, replicated archive of management tools of biodiversity informat-
imagery used in biological studies. Metadata ics offer a powerful means to address such
standards for such imagery, above and beyond issues and to enable more effective steward-
that associated with the technical details of the ship of the signicant investments in time and
image itself, are being developed through the money represented by a collection. The de-
TDWG/GBIF process. velopment and broad implementation of com-
munity standards will make it possible to avoid
the problem of becoming trapped in an obso-
THE ROAD AHEAD lescent software application. As funding agen-
A large number of initiatives are underway cies emphasize digitization of collection hold-
that will probably be felt within the entomo- ings and providing reasonable open Internet
logical community in coming years, and only access to the data (71a), both curators and
a few can be mentioned here. The Unied their administrators can embrace these tech-
Biosciences Information Framework (102) at- nologies as a critical enhancement of their
tempts to dene a common foundation for mission and not as a secondary waste of
many of the TDWG/GBIF standards that time.

www.annualreviews.org Biodiversity Informatics 433


ANRV297-EN52-20 ARI 21 November 2006 10:30

Similarly, systematic entomology in par- change, the task of biodiversity discovery and
ticular has much to gain by embrac- conservation is an urgent imperative for the
ing these tools. Godfray (51) has offered current generation. In the nearly 250 years
some interesting suggestions for increas- since the publication of the tenth edition of
ing the currency and availability of tax- Systema Naturae, we have collectively recog-
onomic information. Tools of biodiversity nized no more than 30% of the range of or-
informatics make it possible for systema- ganisms with which we share the planet (91a).
tists to documentunconstrained by pub- The needed technologies are available and in-
lishing coststhe material foundation upon expensive and have the potential to enhance
which their work is based. This is a posi- dramatically both scientic productivity and
tive move away from argument by authority relevance to society. All that remains is for
and toward a more accountable, data-driven this generation of researchers and curators
science. to grasp the opportunity and collectively deal
Annu. Rev. Entomol. 2007.52:421-438. Downloaded from www.annualreviews.org

Finally, in the face of a growing human with the inevitable bottlenecks and roadblocks
population, habitat loss, and global climatic that will appear.
Access provided by 36.74.238.33 on 11/20/16. For personal use only.

SUMMARY POINTS
1. The rich holdings of the worlds entomological collections are an irreplaceable re-
source of data necessary for understanding insect diversity and biology.
2. Tools and standards of biodiversity informatics can provide the means for effective
dissemination of biodiversity data.
3. Entomologists, as a broad generalization, have not been extensively involved in the
development and use of these standards and tools. Thus, the experience and require-
ments of the entomological communityboth data providers and usersmay be
marginalized.
4. The integration of biodiversity informatics into the practice of systematics and collec-
tion curation promises to dramatically accelerate and improve the process of species
discovery and description in the near future.

ACKNOWLEDGMENTS
Thanks to L. Musetti and D. Agosti for fruitful discussions. This material is based upon work
supported in part by the National Science Foundation under grant No. DEB-3044034.

LITERATURE CITED
1. ABCD schema 2.06. http://www.bgbm.org/TDWG/CODATA/Schema/
2. Alexandria Digital Library (ADL) geospatial network. http://clients.alexandria.ucsb.edu/
webclient/index.jsp
3. Association of Systematics Collections. 1993. Committee on Computerization and
Networking. An information model for biological collections. http://www.nscalliance.org/
bioinformatics/asc%20model/Ascmodrpt.pdf
4. Australian faunal directory. http://www.deh.gov.au/biodiversity/abrs/online-
resources/fauna/afd/index.html
5. Austrian map online. http://www.austrianmap.at

434 Johnson
ANRV297-EN52-20 ARI 21 November 2006 10:30

6. Bellinger PF, Christiansen KA, Janssens F. 2006. Checklist of the Collembola of the world.
http://www.collembola.org/
7. Berendsohn W, ed. 2005. Standards, information models, and data dictionaries for biological
collections. http://www.bgbm.org/TDWG/acc/Referenc.htm
8. Biodiversity Informatics. http://jbi.nhm.ku.edu/index.php/jbi
9. BioGeomancer. http://www.biogeomancer.org/
10. Biological collection access services. http://www.biocase.org/
11. Bisby F. 1995. Plant names in botanical databases. Plant taxonomic database standards
No. 3. Pittsburgh: Hunt Institute Botanical Documentation. 30 pp. http://www.tdwg.
org/plants.html
12. Brummitt RK, Powell CE. 1992. Authors of Plant Names. Kew, UK: Royal Botanic Gar-
dens. 732 pp.
13. Box D, Ehnebuske D, Kakivaya G, Layman A, Mendelsohn N, et al. 2000. Simple object
Annu. Rev. Entomol. 2007.52:421-438. Downloaded from www.annualreviews.org

access protocol (SOAP) 1.1. http://www.w3.org/TR/2000/NOTE-SOAP-20000508/


14. Canadian geographical names data base (CGNDB). http://geonames.nrcan.gc.ca/
Access provided by 36.74.238.33 on 11/20/16. For personal use only.

15. Chapman AD. 2005. Principles and methods of data cleaning: primary species and species-
occurrence data, version 1.09. Report for the Global Biodiversity Information Facility,
Copenhagen. http://www.gbif.org/prog/digit/data quality
16. Chapman AD. 2005. Principles of data quality, version 1.0. Report for the
16. An excellent
Global Biodiversity Information Facility, Copenhagen. http://www.gbif.org/prog/ primer on the
digit/data quality important issues
17. Chapman AD. 2005. Uses of primary species-occurrence data, version 1.0. Re- surrounding
port for the Global Biodiversity Information Facility, Copenhagen. http://www. quality control and
assurance for data
gbif.org/prog/digit/data quality
providers.
18. Chrisman NR. 1983. The role of quality information in the long-term functioning of a
GIS. Proc. AUTOCART06 2:30321
19. Colwell RK. 1996. Biota: the biodiversity database manager. Sunderland, MA: Sinauer. 574 17. An extensive
and thorough
pp. http://viceroy.eeb.uconn.edu/Biota
illustration of the
20. Colwell RK. 2005. EstimateS 7.5 Users guide. http://viceroy.eeb.uconn.edu/estimates scientific and
21. Colwell RK, Coddington JA. 1994. Estimating terrestrial biodiversity through extrapo- practical
lation. Philos. Trans. R. Soc. London B 345:10118 importance of data
22. Dallwitz MJ. 1980. A general system for coding taxonomic descriptions. Taxon 29:4146 from museum
specimens and
23. Dallwitz MJ. 2005. A comparison of interactive identication programs. http://www.delta-
observations.
intkey.com/
24. Dallwitz MJ. 2006. Programs for interactive identication and information retrieval.
http://delta-intkey.com/www/idprogs.htm
25. Dallwitz MJ, Paine TA. 2005. Denition of the DELTA format. http://www.delta-
intkey.com/www/standard.pdf
26. Dallwitz MJ, Paine TA, Zurcher EJ. 1993. Users guide to the DELTA system: a general
system for processing taxonomic descriptions, fourth edition. http://www.delta-intkey.com/
27. Dallwitz MJ, Paine TA, Zurcher EJ. 1995. Users guide to Intkey: a program for interactive
identication and information retrieval. http://www.delta-intkey.com/
28. Dallwitz MJ, Paine TA, Zurcher EJ. 2000. Principles of interactive keys. http://www.delta-
intkey.com/
29. Darwin Core. http://digir.net/schema/conceptual/darwin/2003/1.0/darwin2.xsd
30. Date CJ. 2004. An Introduction to Database Systems. Boston: Pearson/Addison Wesley. 983
pp. 8th ed.
31. DesktopGarp. http://www.lifemapper.org/desktopgarp/

www.annualreviews.org Biodiversity Informatics 435


ANRV297-EN52-20 ARI 21 November 2006 10:30

32. Distributed generic information retrieval (DiGIR). http://digir.net/


33. Eades DC, Otte D, Naskrecki P. 2006. Orthoptera Species File Online, version 2.3.
http://osf2x.orthoptera.org/osf2.3/OSF2X2Frameset.htm
34. Electronic catalogue of names of known organisms. http://www.gbif.org/prog/ecat
35. Electronic eld guide project. http://efg.cs.umb.edu/
36. Estimation of community dynamics from breeding bird survey data using program COMDYN.
http://www.mbr-pwrc.usgs.gov/software/estimation of bird community dyn.htm
37. Evenhuis NL, Samuelson GA. 2004. The insect and spider collections of the world website.
http://hbs.bishopmuseum.org/codens/codens-r-us.html
38. Extensible markup language (XML). http://www.w3.org/XML/
39. Fauna Europaea. http://www.faunaeur.org
40. FLEAS: search interface. http://www.zin.ru/Animalia/Siphonaptera/taxfind2.htm
41. Gazetteer of Australia. http://www.ga.gov.au/map/names/
42. GBIF tools download. http://www.gbif.org/serv/gbif-tools
Annu. Rev. Entomol. 2007.52:421-438. Downloaded from www.annualreviews.org

43. GeoArgentina. http://www.geoargentina.com.ar/catalog/searchadv.htm


44. GeoDatenZentrum. http://www.geodatenzentrum.de/geodaten/gdz rahmen.gdz div
Access provided by 36.74.238.33 on 11/20/16. For personal use only.

45. Geographic names information system (GNIS). http://geonames.usgs.gov/domestic/


index.html
46. GEOLocate. http://www.museum.tulane.edu/geolocate/
47. GeoNet names server (GNS). http://earth-info.nga.mil/gns/html/
48. One of the main 48. Global Biodiversity Information Facility. http://www.gbif.org/
foci for monitoring 49. Global Biodiversity Information Facility UDDI registry. http://registry.gbif.net/uddi/web
and contributing to
50. Global gazetteer version 2.1. http://www.fallingrain.com/world/
developments in
biodiversity
51. Godfray HCJ. 2002. Challenges for taxonomy. Nature 417:1719
informatics. 52. Gotelli NJ, Colwell RK. 2001. Quantifying biodiversity: procedures and pitfalls in the
measurement and comparison of species richness. Ecol. Lett. 4:37991
53. Guralnick RP, Neufeld D. 2005. Challenges building online GIS services to support
global biodiversity mapping and analysis: lessons from the Mountain and Plains Database
and Informatics project. Biodivers. Inform. 2:5769
54. Hollis S, Brummitt RK. 1992. World geographical scheme for recording plant
distributions. Plant taxonomic database standards No. 2. Version 1.0. Pittsburgh:
Hunt Institute for Botanical Documentation. http://www.bgbm.fu-berlin.de/TDWG/
geo/default.htm
55. Holmgren PK, Holmgren NH, Barnett LC. 1990. Index Herbariorum. Part I: The
Herbaria of the World. New York: New York Botanical Garden Press. 8th ed.
http://sciweb.nybg.org/science2/IndexHerbariorum.asp
56. Index to the Neuropterida species of the world. Version 1.00. http://insects.tamu.edu/
research/neuropterida/neur sp index/ins search.html
57. Integrated taxonomic information system. http://www.itis.usda.gov/
58. Instituto Nacional de Estadstica Geografa e Informatica (INEGI). http://mapserver.
inegi.gob.mx/dsist/municipios/iter95.cfm?c=365
59. Istituto Geograco Militare. http://www.igmi.org/prodotti/elementi geodetici/
ricerca punti.php
60. Johnson NF, ed. 2005. Hymenoptera name server, version 1.0. http://atbi.biosci.ohio-
state.edu:210/hymenoptera/nomenclator.home page
61. KE EMu. http://www.kesoftware.com/emu/index.html
62. Lampe KH, Striebing D. 2005. How to digitize large insect collections: preliminary
results of the DIG project. In African Biodiversity: Molecules, Organisms, Ecosystems, ed.
BA Huber, BJ Sinclair, KH Lampe, pp. 38593. New York: Springer Science/Business
Media. 443 pp.

436 Johnson
ANRV297-EN52-20 ARI 21 November 2006 10:30

63. LifeMapper. http://www.lifemapper.org/


64. Lucidcentral.org. http://www.lucidcentral.org/
65. Maddison DR, Swofford DL, Maddison WP. 1997. NEXUS: an extensible le format
for systematic information. Syst. Biol. 46:590621
66. MaNIS/HerpNet/ORNIS georeferencing guidelines. http://manisnet.org/GeorefGuide.
html
67. Martnez-Meyer E. 2005. Climate change and biodiversity: some considerations in fore-
casting shifts in species potential distributions. Biodivers. Inform. 2:4255
68. Mitchell T. 2005. Web Mapping Illustrated. Cambridge: OReilly. 367 pp.
69. MorphBank. http://www.morphbank.com/
70. Multimap. http://uk8.multimap.com/
71. Musciano C, Kennedy B. 1998. HTML: The Denitive Guide. Cambridge: OReilly. 587
pp.
71a. National Science Foundation. 2006. Biological Research Collections (BRC), program
Annu. Rev. Entomol. 2007.52:421-438. Downloaded from www.annualreviews.org

solicitation. http://www.nsf.gov/pubs/2006/nsf06569/nsf06569.htm
72. New Zealand geographic placenames database. http://www.linz.govt.nz/core/placenames/
Access provided by 36.74.238.33 on 11/20/16. For personal use only.

searchplacenames/
73. Nichols JD, Boulinier T, Hines JE, Pollock KH, Sauer JR. 1998. Inference methods for
spatial variation in species richness and community composition when not all species are
detected. Conserv. Biol. 12:139098
74. Nix HA. 1986. A biogeographic analysis of Australian elapid snakes. In Atlas of Australian
Elapid Snakes, ed. R Longmore, Australian Flora and Fauna Series 7:415. Canberra: Aust.
Govt. Pub. Serv. 115 pp.
75. Nomina insecta nearctica. http://www.nearctica.com/nomina/main.htm
76. Noyes JS. 2005. Universal Chalcidoidea database. http://internt.nhm.ac.uk/jdsml/perth/
chalcidoids/
77. NSF taxonomic literature project: treatment markup. http://research.amnh.org/
informatics/taxlit/schemas
78. Penny ND. 1997. World checklist of extant Mecoptera species. http://www.calacademy.org/
research/entomology/Entomology Resources/mecoptera/index.htm
79. Peterson AT, Ortega-Huerta MA, Bartley J, Sanchez-Cordero V, Soberon J, et al. 2002.
Future projections for Mexican faunas under global climate change scenarios. Nature
416:62629
80. Peterson AT, Scachetti-Pereira R, Hargrove WW. 2004. Potential geographic distribu-
tion of Anoplophora glabripennis (Coleoptera: Cerambycidae) in North America. Am. Midl.
Nat. 151:17078
81. Phylogeny Programs. http://evolution.genetics.washington.edu/phylip/software.
html
82. Polaszek AD, Agosti D, Alonso-Zarazaga M, Beccaloni G, Bjrn PP, et al. 2005. A uni-
versal register for animal names. Nature 437:477
83. Ross ES. 1999. World list of extant and fossil Embiidina (=Embioptera). http://www.
calacademy.org/research/entomology/Entomology Resources/embiilist/embiilist.
html
84. Schorr M, Lindeboom M, Paulson D. 2005. World Odonata list. http://www.ups.
edu/x6140.xml
85. Segurado P, Araujo MB. 2004. An evaluation of methods for modelling species distribu-
tions. J. Biogeogr. 31:155568
86. Soberon J, Peterson AT. 2004. Biodiversity informatics: managing and applying primary
biodiversity data. Philos. Trans. R. Soc. London B 359:68998

www.annualreviews.org Biodiversity Informatics 437


ANRV297-EN52-20 ARI 21 November 2006 10:30

87. Soberon J, Peterson AT. 2005. Interpretation of models of fundamental ecological


87. A useful
summary of niches and species distributional areas. Biodivers. Inform. 2:110
considerations in 88. South African geographical names system. http://sagns.dac.gov.za/
modeling 89. Species 2000. http://www.sp2000.org/
geographic 90. Specify software project. http://www.specifysoftware.org/Specify
distributions with
91. Stockman AK, Beamer DA, Bond JE. 2006. An evaluation of a GARP model as an ap-
recommendations
for best practices. proach to predicting the spatial distribution of nonvagile invertebrate species. Divers.
Distrib. 12:8189
91a. Stork NE. 1997. Measuring global biodiversity and its decline. In Biodiversity II: Un-
derstanding and Protecting our Biological Resources, ed. ML Reaka-Kudla, DE Wilson, EO
Wilson, pp. 4168 . Washington, DC: Joseph Henry Press. 551 pp.
92. Structure of descriptive data WIKI. http://wiki.tdwg.org/twiki/bin/view/SDD/
WebHome
Annu. Rev. Entomol. 2007.52:421-438. Downloaded from www.annualreviews.org

93. TAPIR protocol wiki. http://ww3.bgbm.org/protocolwiki/FrontPage


94. Taxonomic concept transfer schema. http://tdwg.napier.ac.uk/index.php?pagename=
Access provided by 36.74.238.33 on 11/20/16. For personal use only.

HomePage
95. Taxonomic database working group. http://www.tdwg.org/
95. Forum for
discussions and 96. TDWG: globally unique identiers. http://wiki.gbif.org/guidwiki/wikka.php?wakka=
development of HomePage
standards for 97. The Zoraptera database: catalog of the order Zoraptera. http://www.famu.org/zoraptera/
biodiversity data. catalog.html
98. Thompson FC, ed. 1990. Automatic Data Processing for Systematic Entomology: Promises and
Problems. Washington, DC: Entomological Collections Network. 48 pp.
99. Thompson FC. 1994. Bar codes for specimen data management. Insect Collect. News 9:24
100. Thompson FC, ed. 2005. Biosystematic database of world Diptera, version 7.5. http://www.
sel.barc.usda.gov/Diptera/biosys.htm
101. Trichoptera world checklist. http://entweb.clemson.edu/database/trichopt/index.html
102. Unied biosciences information framework. http://wiki.cs.umb.edu/twiki/bin/view/UBIF/
WebHome
103. Universal biological indexer and organizer. http://www.ubio.org/
104. Weitzman AL, Lyal CHC. 2004. An XML schema for taxonomic literature:
taXMLit. http://www.sil.si.edu/digitalcollections/bca/documentation/taXMLitv1-
3Intro.pdf
105. Williams P, Gibbons D, Margules C, Rebelo A, Humphries C, Pressey R. 1996. A com-
parison of richness hotspots, rarity hotspots, and complementary areas for conserving
diversity of British birds. Conserv. Biol. 10:15574

438 Johnson
Contents ARI 24 October 2006 17:16

Annual Review of

Contents Entomology

Volume 52, 2007

Frontispiece
Charles D. Michener p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p xiv
The Professional Development of an Entomologist
Charles D. Michener p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 1
Insect/Mammal Associations: Effects of Cuterebrid Bot Fly Parasites
Annu. Rev. Entomol. 2007.52:421-438. Downloaded from www.annualreviews.org

on Their Hosts
Frank Slansky p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 17
Access provided by 36.74.238.33 on 11/20/16. For personal use only.

Phenology of Forest Caterpillars and Their Host Trees:


The Importance of Synchrony
Margriet van Asch and Marcel E. Visser p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 37
Arthropod Pest Management in Organic Crops
Geoff Zehnder, Geoff M. Gurr, Stefan Khne, Mark R. Wade, Steve D. Wratten,
and Eric Wyss p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 57
The Sublethal Effects of Pesticides on Benecial Arthropods
Nicolas Desneux, Axel Decourtye, and Jean-Marie Delpuech p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p81
Impact of Extreme Temperatures on Parasitoids in a Climate Change
Perspective
Thierry Hance, Joan van Baaren, Philippe Vernon, and Guy Boivin p p p p p p p p p p p p p p p p p p p p 107
Changing Paradigms in Insect Social Evolution: Insights from
Halictine and Allodapine Bees
Michael P. Schwarz, Miriam H. Richards, and Bryan N. Danforth p p p p p p p p p p p p p p p p p p p p p 127
Evolutionary Biology of Centipedes (Myriapoda: Chilopoda)
Gregory D. Edgecombe and Gonzalo Giribet p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 151
Gene Regulation by Chromatin Structure: Paradigms Established in
Drosophila melanogaster
Sandra R. Schulze and Lori L. Wallrath p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 171
Keys and the Crisis in Taxonomy: Extinction or Reinvention?
David Evans Walter and Shaun Winterton p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 193
Yellow Fever: A Disease that Has Yet to be Conquered
Alan D.T. Barrett and Stephen Higgs p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 209

vii
Contents ARI 28 September 2006 19:28

Molecular Mechanisms of Metabolic Resistance to Synthetic and


Natural Xenobiotics
Xianchun Li, Mary A. Schuler, and May R. Berenbaum p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 231
Group Decision Making in Nest-Site Selection Among Social Insects
P. Kirk Visscher p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 255
The Role of Allatostatins in Juvenile Hormone Synthesis in Insects and
Crustaceans
Barbara Stay and Stephen S. Tobe p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 277
Nectar and Pollen Feeding by Insect Herbivores and Implications for
Multitrophic Interactions
Felix L. Wckers, Jrg Romeis, and Paul van Rijn p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 301
Biology and Evolution of Adelgidae
Nathan P. Havill and Robert G. Foottit p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 325
Annu. Rev. Entomol. 2007.52:421-438. Downloaded from www.annualreviews.org

Biology of the Bed Bugs (Cimicidae)


Klaus Reinhardt and Michael T. Siva-Jothy p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 351
Access provided by 36.74.238.33 on 11/20/16. For personal use only.

The Use of Push-Pull Strategies in Integrated Pest Management


Samantha M. Cook, Zeyaur R. Khan, and John A. Pickett p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 375
Current Status of the Myriapod Class Diplopoda (Millipedes):
Taxonomic Diversity and Phylogeny
Petra Sierwald and Jason E. Bond p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 401
Biodiversity Informatics
Norman F. Johnson p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 421
Cockroach Allergen Biology and Mitigation in the Indoor Environment
J. Chad Gore and Coby Schal p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 439
Insect Conservation: A Synthetic Management Approach
Michael J. Samways p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 465
Interactions Between Mosquito Larvae and Species that Share the
Same Trophic Level
Leon Blaustein and Jonathan M. Chase p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 489

Indexes

Cumulative Index of Contributing Authors, Volumes 4352 p p p p p p p p p p p p p p p p p p p p p p p p p p p 509


Cumulative Index of Chapter Titles, Volumes 4352 p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p 514

Errata

An online log of corrections to Annual Review of Entomology chapters (if any, 1997 to
the present) may be found at http://ento.annualreviews.org/errata.shtml

viii Contents

You might also like