Dna Chip
Dna Chip
The advent of DNA microarray technology has ushered transcriptional output of a system. Underlying these
in an era of systems biology whereby researchers can experiments is the notion that analyzing the response of
study the transcriptional behavior of thousands of genes a system to a given perturbation can shed light on the
in parallel. Advances in manufacturing techniques and mechanism of signaling or the biological response to the
informatics, and the availability of several genome perturbation, or both. One of the best examples of a
sequences have furthered these capabilities to the point systematic genome-wide study comes from yeast, for which
where whole-transcriptome studies can be accom- more than 20 different genetic and growth condition
plished in yeast, flies and plants, and soon will be pos- perturbations have been analyzed and used to construct
sible in mammals. Concomitant with the expanding the galactose utilization regulatory network [5]. Using DNA
ability of the technology has been the development of microarrays and quantitative proteomics, changes in the
novel techniques and their application towards the expression of mRNA and protein in yeast in response to
study of cellular biology. each of the conditions were measured. These mRNA and
protein expression data, along with data on protein–protein
Recent years have seen an exponential rise in the number interactions, were used to construct a metabolic regulatory
of studies that have used DNA chip technology to study network that confirmed many known regulatory mechan-
cell biology. Built around the basic principles of nucleic isms and uncovered several putative regulatory steps.
acid hybridization [1], their most prevalent use so far A separate systematic study in Arabidopsis has
has been the detection of steady-state mRNA expression. identified a transcriptional network underlying a plant’s
These applications include basic molecular annotation response to pathogens [6]. Infection with avirulent
(e.g. Where is my gene? When does it change? Which other pathogens elicits a broad-spectrum ‘systemic acquired
genes change?), discovering disease markers and advanc- resistance’ throughout the plant, which requires the accu-
ing the prediction of clinical outcome, as well as a growing mulation of salicylic acid and is marked by activation of a
role as the tool of choice for studying transcriptional set of ‘pathogenesis-related’ (PR) marker genes. Transcrip-
tional profiling of Arabidopsis plants under more than a
output. New applications such as genomic DNA analysis to
dozen conditions of genetic perturbation and microbial
detect DNA synthesis, recombination and chromosomal
challenge identified a large cluster of genes that are
duplication and loss, as well as combined chromatin
coexpressed with PR-1. Promoter analysis of these puta-
immunoprecipitation and microarray chipping approaches
tive key mediators of systemic acquired resistance iden-
to identify transcription factor-binding sites, are being
tified an over-represented W-box motif TTGAC, which is a
developed and will enable other areas of biology to be
binding site for transcription factors of the WRKY class.
explored in this same, highly parallel manner. Finally,
In certain areas of biology, such as development, aging,
alongside sequence query tools, the standardization of
the cell cycle and biological rhythms, experiments based
data formats and new display and query tools will further
on temporal gene expression profiling have advanced our
both the penetrance of these techniques and their understanding of these processes. The cell cycle [7 – 9] and
applications in molecular biology. circadian rhythms [10 – 13] represent some of the best
Because excellent descriptions of the methodologies, examples of successful temporal expression profiling, in
platforms for DNA microarray technologies and publi- part because of the powerful application of curve-fitting
cation formats are available elsewhere (e.g. see [2 – 4]), this techniques and repeating patterns. In many independent
review focuses on the use of these technologies towards a studies, hundreds of transcripts have been found to show
better understanding of cell biology (outlined in Fig. 1). rhythmic expression patterns in their steady-state mess-
age levels, with a periodicity very close to that of the cell
Use of arrays in the study of mRNA expression cycle (e.g. see [7]). These transcripts have been classified
DNA arrays have been classically used for investigating into separate clusters on the basis of the stage of the
the effect of a given biotic or abiotic perturbation on the cell cycle corresponding to the peak phase of expression.
In silico analysis of the promoter elements of these gene
Corresponding author: John B. Hogenesch ([email protected]).
* This article is the third review in our ‘Interdisciplinary Biology’ series that clusters has identified over-represented 50 motifs, many
commenced in the January 2003 issue of TCB. eds. corresponding to binding sites for known transcription
http://ticb.trends.com 0962-8924/03/$ - see front matter q 2003 Elsevier Science Ltd. All rights reserved. doi:10.1016/S0962-8924(03)00006-0
152 Review TRENDS in Cell Biology Vol.13 No.3 March 2003
Experiment design
External
Classification schemes
Data analysis
databases
(Statistic alanalysis, curvefitting, SOMs, clustering)
Comparative genomics
Functional clustering Regulatory element
Expression validation Mapping to the genome (within same species,
(mapping to pathways) analysis within clusters
between species)
Discovery/new hypothesis
Test of hypothesis
Fig. 1. DNA microarrays in systems biology. A typical microarray experiment is designed to measure the spatial and/or temporal expression pattern of genes in a specific
set of conditions. After data acquisition, analysis is done to identify those genes that change informatively in an experiment. Expression changes can be validated, and the
genes themselves can be mapped to biological pathways or the genome, used for regulatory element analysis or even analyzed across model systems. Several analysis
strategies can be used to identify genes in a given process and/or to model regulatory networks. The adoption of standards in data formatting and annotation, as well
as new databases to disseminate gene expression information, will facilitate analysis within and across different species. Models and/or hypothesis built on microarray
experiments are eventually tested by conventional approaches to ultimately generate new knowledge in systems biology.
factors and some corresponding to previously unknown identification of causal disease genes themselves. For
motifs, suggesting the involvement of additional tran- diagnostic markers of human diseases such as cancer,
scription factors in regulating the cell cycle. clinicians have traditionally relied on altered expression
A comparison of gene expression patterns between wild- levels of serendipitously discovered genes and proteins.
type yeast and a strain containing null mutations in two For example, increased serum levels of prostate-specific
members of the highly conserved family of forkhead antigen (PSA) have long been used as an indicator for
transcription factors revealed their roles in regulating prostate cancer, and overexpression of the tyrosine kinase
the cell-cycle-dependent expression patterns of genes growth factor receptor, Erb-B2, has been linked to breast
important for mitosis [14]. A similar study on the circadian cancer. Typically, however, such markers are limited:
system in Arabidopsis identified hundreds of genes show- increased serum PSA can also result from benign diseases,
ing rhythmic expression patterns in their steady-state and overexpression of Erb-B2 occurs in only a fraction of
levels of mRNA [12]. Analysis of the promoter regions of all breast tumors and is not completely predictive of a
clusters of genes cycling in the evening phase identified an individual’s response to Erb-B2 antagonists such as her-
over-represented cis-acting 50 motif, which is very similar ceptin. Thus, genomic and bioinformatic approaches based
to the binding site for transcription factors containing a on microarrays can be used to supplement existing tools to
Myb domain in mammalian cells. Genetic disruption of produce more accurate diagnoses.
this motif was found to abolish rhythmic expression, Proof of principle that these methods can identify
thereby establishing that this cis-acting element has a disease-specific markers comes from the study of two types
crucial role in regulating circadian gene expression. of acute leukemia: acute myeloid leukemia (AML) and
Collectively, these studies underscore the power of com- acute lymphoblastic leukemia (ALL). Both of these dis-
bining large-scale, parallel, experimental approaches such orders are treatable by traditional chemotherapy; how-
as DNA arrays with computational and validation tools. ever, successful treatment is largely dependent on correct
diagnosis. To find a distinct molecular signature for these
Microarrays in the study of human disease two diseases, oligonucleotide microarrays were used to
An increasingly popular application of DNA arrays is in identify a set of 50 genes that can differentiate between
the study of human disease. The central goals of these AML and ALL with great accuracy [15]. These methods
studies have been the early detection of disease pathology, have also been extended to the analysis of several classes
diagnosis including class and outcome prediction, and the of tumor. For example, Su et al. [16] have established a list
http://ticb.trends.com
Review TRENDS in Cell Biology Vol.13 No.3 March 2003 153
of about 110 genes that are highly characteristic and techniques such as in situ hybridization. In addition to
therefore diagnostic of colon, bladder, kidney, liver, providing valuable information confirming the size and
pancreas, ovary, prostate, lung, gastric and breast cancers. nature of transcripts derived from a structural gene,
More recently, it has been possible to show that these implicit in these studies has been the idea that under-
methods can be applied to predicting disease outcome. standing where and when a gene is expressed sheds light
Some individuals affected with ALL also have a chromo- on its physiological function.
somal translocation in the mixed-lineage leukemia (MLL) The recent sequence assembly of mammalian genomes
gene. Unfortunately for these people, this additional and subsequent efforts at gene prediction has highlighted
translocation is linked to relapse after chemotherapy the need for a higher-throughput approach towards
and a poor prognosis for survival. Array profiling of expression annotation. For example, we and others have
leukemic cells taken from individuals with ALL or looked at the expression of thousands of transcripts from
ALL/MLL identified a predictor set of 100 genes with the mouse across 50 types of tissue, gaining insight into
expression patterns that can differentiate between ALL global patterns of transcription and at the same time
and ALL/MLL [17]; in addition, this profiling highlighted making this information available in publicly available
the marked changes in gene expression that can occur with databases (see [20]).
chromosomal translocations. It is important to note that New methods to predict genes in silico, and the appli-
there are many remaining challenges involved in trans- cation of these methods to several recently solved genome
lating these techniques from the research laboratory to sequences, have brought about another use for mRNA
the clinic. The potential benefits that they facilitate, expression – that is, transcript validation [21,22]. This
however, warrant the considerable attention that these method has several advantages over the use of expressed
studies have received. sequence tag sequences to quantify gene expression, not
Applying DNA chip technology to experiments aimed least of which is that it can be directed to any particular
towards our understanding of disease causality has proved target sequence and thus can be used to validate the
more difficult. When coupled with transgenic model organ- expression of hypothetical, predicted genes.
isms, however, microarrays can provide valuable mechan-
istic insight into human disease. Microarray comparisons Transcriptional output and mechanism
between poorly metastatic and highly metastatic melan- An obvious use for DNA arrays is in the study of tran-
oma lines isolated by in vivo selection in mice detected scriptional output. By relating transcription factors to the
a strong correlation between the severity of tumor meta- output genes that they regulate, it is possible to construct
stasis and expression of the small GTPase RhoC [18]. In transcriptional regulatory networks that link key factors
follow-up experiments, overexpression of RhoC alone in to the biology that they control. By characterizing tran-
weakly metastatic melanoma cells enhanced metastasis scription factor deficiency and overexpression in cell lines
in the mouse. Thus, through microarrays, a putative and mice, several groups have begun to describe such
causal role for RhoC in the development of metastases has networks. For example, oligonucleotide microarrays
been identified. have been used to compare the transcriptional readouts
DNA microarrays have also been used to identify con- of wild-type and transgenic mice with a null mutation in
tributing factors in the development of the neurological TAFII105, a gene encoding a cell-type-specific component
disease multiple sclerosis, by determining changes in of the RNA polymerase II transcription factor TFIID [23].
gene expression in the brains of rats with experimentally The expression of genes from the inhibin – activin –
induced autoimmune encephalomyelitis (EAE) – the best- follistatin folliculogenesis pathway are markedly down-
characterized animal model for multiple sclerosis [19]. regulated in the mutant strain, explaining the defects
This analysis identified increased expression of the pro- in ovarian development and fertility found in TAFII105
inflammatory cytokine, osteopontin, which has been found knockout mice.
to be also overexpressed in the brain lesions of indi- This microarray analysis of transgenic organisms has
viduals affected with multiple sclerosis. Investigation of not been limited to knockout strains. Specific activation of
osteopontin-deficient mice revealed a requirement for the forkhead transcription factor FOXO3a in rat fibro-
osteopontin in the normal progression and maintenance blasts was found to induce the expression of genes involved
of EAE. From the initial microarray profiling of disease in cellular responses to stress [24]. One such gene,
states, genes found to be differentially expressed such Gadd45a, has been characterized further, identifying a
as osteopontin and RhoC could serve as drug targets, role for FOXO3a in the transcriptional regulation of DNA
whose inhibition might provide relief for the many people repair pathways.
afflicted with devastating diseases such as cancer and
multiple sclerosis. Expanding uses and emerging possibilities
Above we have discussed the use of DNA arrays primarily
Gene annotation in mRNA profiling experiments and also several of the
One of the most basic uses of arrays in the study of ways in which they are being used to study cell biology.
cell biology is the annotation of gene function by Recently, however, researchers have been using arrays to
mRNA expression. After a new gene has been cloned, explore other areas of cell biology in the same, highly
basic characterization has traditionally included the parallel, experimental manner (summarized in Fig. 2). For
analysis of mRNA expression by a multiple tissue northern example, the sensitivity and accuracy of microarray
blot, which is sometimes followed by higher-resolution hybridization have been applied to studies of the replication
http://ticb.trends.com
154 Review TRENDS in Cell Biology Vol.13 No.3 March 2003
dynamics of the yeast genome. By comparing changes in to yeast genomic DNA directly [31]. This method uses
DNA copy number at thousands of sites across the genome chromatin immunoprecipitation (ChIP) as a powerful
during progression through S phase, the sites of origin of method to detect physical interactions between known
replication were identified [25]. The temporal distribution proteins and their DNA target sites. Instead of sequencing
of origin activation, rates of replication fork movement immunoprecipitated targets genes, however, Ren et al. [31]
across the genome, and the relationship between replicating used DNA arrays to deconvolute the target identities.
DNA and transcription during S phase were also explored. They describe the use of this technique, ‘ChIP-chip’, for the
This genomic DNA hybridization technique has also characterization of two transcription factors involved in
been used in conjunction with traditional mRNA hybrid- carbon utilization and mating, thereby identifying several
ization to investigate meiotic transcription in budding known and unknown target genes.
yeast. Primig et al. [26] compared meiotic gene transcrip- In an independent study, ChIP-chips were used to
tion in two yeast strains and described considerable also identify unknown origins of replication by immuno-
differences in the underlying genomes, including poly- precipitation of Orc1, the complete ORC complex, as well
morphisms and deletions, as well as resultant differences as Mcm3, Mcm4 and Mcm7. These proteins are known
in mRNA expression patterns. Comparative genomic DNA to bind DNA and to function in the formation of origins
hybridization has found an increasingly popular use in of replication [32]. In addition, investigations with
genome-wide scans for changes in DNA copy number. Such ChIP-chips have also led to significant understanding of
studies on hundreds of types of tumor have identified novel how transcription factors control specific processes and
gene amplifications (see [27,28] for examples; reviewed in transition stages of the yeast cell cycle. With this method,
[29]). Arrays have also been used to interrogate targets of it was found that stage-specific cell-cycle transcription
RNA-binding proteins. For example, in an effort to identify factors often control the expression of transcription factors
mRNA-binding targets of fragile X mental retardation that regulate the next stage, revealing mechanistic insight
protein (FMRP), Brown et al. [30] immunoprecipitated into the continuous transcriptional coordination of the cell
FMRP and probed microarrays with the mRNA that cycle [33]. Furthermore, ChIP-chip analysis has identified
co-immunoprecipitated (Fig. 2). Half of the immunopreci- distinct functional gene clusters (e.g. groups of genes
pitated mRNAs were found to be translated abnormally involved in DNA synthesis or repair) that are controlled by
in cells from individuals affected with fragile X and, specific cell-cycle transcription factors [34].
notably, many of these mRNA species correspond to genes The use of ChIP-chips has not been limited to study-
implicated in neuronal function and development. ing DNA-binding transcription factors and can be used
Arrays are also being used to construct transcriptional to delineate the functions of histone modification
networks by monitoring the binding of transcription factors and mRNA-binding proteins. Chromatin remodeling by
Protein
immunoprecipitation
Substrate
Sample labeling
Hybridization to
microarray of probe Transcribed and/or untranscribed DNA RNA DNA or RNA
sets for
Detection and quantification of
DNA copy number at thousands of Binding sites for DNA binding
Readout thousands of RNA PoI I
sites across the genome proteins or protein bound RNA
generated transcripts
Fig. 2. Current applications for DNA microarrays. Classical microarray experiments use isolated genomic DNA or mRNA from a whole organism or tissue. The DNA or
mRNA is transformed and amplified into fluorescently labeled cDNA or cRNA, respectively, which is then hybridized to microarrays. These types of experiment have been
used to identify changes in DNA copy number and mRNA expression patterns. Recent innovations in microarray approaches have used an additional purification step by
protein immunoprecipitation to identify DNA (chromatin immunoprecipitation or ChIP) or mRNA-binding proteins. Protein bound to DNA or mRNA is first crosslinked
and then immunoprecipitated by an antibody to a specific protein of interest. Crosslinks are then reversed, which releases the co-purified DNA or mRNA for amplification,
labeling and hybridization to microarrays. These procedures have been successful in determining the targets of transcription factors, as well as genomic DNA-binding and
mRNA-binding proteins.
http://ticb.trends.com
Review TRENDS in Cell Biology Vol.13 No.3 March 2003 155
histone acetylation/deacetylation has emerged recently as 4 Ball, C.A. et al. (2002) Standards for microarray data. Science 298,
a chief regulatory mechanism of gene expression. Gene 539
5 Ideker, T. et al. (2001) Integrated genomic and proteomic analyses
targets of yeast histone deacetylases (HDACs), which of a systematically perturbed metabolic network. Science 292,
antagonize chromatin remodeling and gene expression, 929 – 934
have also been identified by using ChIP-chips [35]. This 6 Maleck, K. et al. (2000) The transcriptome of Arabidopsis thaliana
study also showed that HDACs display target-site speci- during systemic acquired resistance. Nat. Genet. 26, 403 – 404
ficity for their action. Thus, ChIP-chip is emerging as a 7 Cho, R.J. et al. (1998) A genome-wide transcriptional analysis of the
mitotic cell cycle. Mol. Cell 2, 65 – 73
powerful tool for the exploration of protein – DNA inter-
8 Whitfield, M.L. et al. (2002) Identification of genes periodically
actions and its use should provide fertile ground for the expressed in the human cell cycle and their expression in tumors.
growth of whole-genome transcriptional networks. Mol. Biol. Cell 13, 1977– 2000
9 Mata, J. et al. (2002) The transcriptional program of meiosis and
sporulation in fission yeast. Nat. Genet. 32, 143– 147
Concluding remarks 10 Ueda, H.R. et al. (2002) A transcription factor response
element for gene expression during circadian night. Nature
DNA arrays and the parallel biology that they empower
418, 534 – 539
have pushed experimental approaches to genome-wide 11 Panda, S. et al. (2002) Coordinated transcription of key pathways in
and whole-system levels. Their use in the study of mRNA the mouse by the circadian clock. Cell 109, 307 – 320
expression and transcriptional regulation has been adopted 12 Harmer, S.L. et al. (2000) Orchestrated transcription of key
widely by the research community for exploring virtually pathways in Arabidopsis by the circadian clock. Science 290,
2110– 2113
every area of biology. Despite many successes, however,
13 Ceriani, M.F. et al. (2002) Genome-wide expression analysis in
difficult challenges remain. Importantly, one should have Drosophila reveals genes controlling circadian behavior. J. Neurosci.
realistic expectations for the application of DNA arrays, 22, 9305 – 9319
because many problems in cell biology cannot be addressed 14 Zhu, G. et al. (2000) Two yeast forkhead genes regulate the cell cycle
by looking at transcriptional responses or signatures. and pseudohyphal growth. Nature 406, 90 – 94
Technical limitations currently prevent higher- 15 Golub, T.R. et al. (1999) Molecular classification of cancer: class
discovery and class prediction by gene expression monitoring. Science
eukaryotic transcriptomes from being analyzed in a 286, 531 – 537
whole-genome fashion. Current sample preparation 16 Su, A.I. et al. (2001) Molecular classification of human carcinomas by
methods require relatively large quantities of RNA, use of gene expression signatures. Cancer Res. 61, 7388 – 7393
which limits studies on discrete cell types in complex 17 Armstrong, S.A. et al. (2002) MLL translocations specify a distinct
structures, such as small nuclei in the brain. The cost and gene expression profile that distinguishes a unique leukemia. Nat.
Genet. 30, 41 – 47
infrastructure required for array experiments remain a
18 Clark, E.A. et al. (2000) Genomic analysis of metastasis reveals an
significant entry barrier for many laboratories and can essential role for RhoC. Nature 406, 532 – 535
result in studies with less than satisfying experimental 19 Chabas, D. et al. (2001) The influence of the proinflammatory cytokine,
and statistical designs. Current methods to analyze, osteopontin, on autoimmune demyelinating disease. Science 294,
visualize and disseminate data are sometimes cumber- 1731– 1735
20 Gardiner-Garden, M. and Littlejohn, T.G. (2001) A comparison of
some, expensive or piece-meal, and have not yet been
microarray databases. Brief Bioinform. 2, 143 – 158
standardized to facilitate the exchange of data and results 21 Shoemaker, D.D. et al. (2001) Experimental annotation of the
between researchers. human genome using microarray technology. Nature 409,
Although these problems are significant, current efforts 922– 927
to address these and other issues involved in DNA array 22 Kapranov, P. et al. (2002) Large-scale transcriptional activity in
experiments will undoubtedly improve on current circum- chromosomes 21 and 22. Science 296, 916– 919
23 Freiman, R.N. et al. (2001) Requirement of tissue-selective
stances and will translate into many future discoveries in TBP-associated factor TAFII105 in ovarian development. Science
cell biology. The adaptation of DNA arrays to other highly 293, 2084– 2087
parallel, experimental approaches such as ChIP-chip, 24 Tran, H. et al. (2002) DNA repair pathway stimulated by the forkhead
RNA binding and comparative genomic DNA hybridiz- transcription factor FOXO3a through the Gadd45 protein. Science 296,
ation has enabled other areas of cell biology to benefit from 530– 534
25 Raghuraman, M.K. et al. (2001) Replication dynamics of the yeast
the same comprehensive advantages. The further deve-
genome. Science 294, 115– 121
lopment of these techniques and other parallel approaches 26 Primig, M. et al. (2000) The core meiotic transcriptome in budding
to cell biology (e.g. see [36,37]), as well as the emergence yeasts. Nat. Genet. 26, 415– 423
of data standards [3] and computational and visualization 27 Fritz, B. et al. (2002) Microarray-based copy number and expression
methods, will continue to transform the process of experi- profiling in dedifferentiated and pleomorphic liposarcoma. Cancer Res.
62, 2993 – 2998
mentation from the study of a single gene in a single
28 Pollack, J.R. et al. (2002) Microarray analysis reveals a major
process to a whole-genome approach. direct role of DNA copy number alteration in the transcriptional
program of human breast tumors. Proc. Natl Acad. Sci. U. S. A. 99,
12963 – 12968
References 29 Struski, S. et al. (2002) Compilation of published comparative genomic
1 Southern, E.M. (1975) Detection of specific sequences among DNA hybridization studies. Cancer Genet. Cytogenet. 135, 63 – 90
fragments separated by gel electrophoresis. J. Mol. Biol. 98, 503– 517 30 Brown, V. et al. (2001) Microarray identification of FMRP-associated
2 Heller, M.J. (2002) DNA microarray technology: devices, systems, and brain mRNAs and altered mRNA translational profiles in fragile X
applications. Annu. Rev. Biomed. Eng. 4, 129 – 153 syndrome. Cell 107, 477 – 487
3 Brazma, A. et al. (2001) Minimum information about a microarray 31 Ren, B. et al. (2000) Genome-wide location and function of DNA
experiment (MIAME) – toward standards for microarray data. Nat. binding proteins. Science 290, 2306– 2309
Genet. 29, 365 – 371 32 Wyrick, J.J. et al. (2001) Genome-wide distribution of ORC and MCM
http://ticb.trends.com
156 Review TRENDS in Cell Biology Vol.13 No.3 March 2003
proteins in S. cerevisiae: high-resolution mapping of replication genome-wide functions for yeast histone deacetylases. Cell 109,
origins. Science 294, 2357 – 2360 437– 446
33 Simon, I. et al. (2001) Serial regulation of transcriptional regulators in 36 Ho, Y. et al. (2002) Systematic identification of protein complexes
the yeast cell cycle. Cell 106, 697– 708 in Saccharomyces cerevisiae by mass spectrometry. Nature 415,
34 Iyer, V.R. et al. (2001) Genomic binding sites of the yeast cell-cycle 180– 183
transcription factors SBF and MBF. Nature 409, 533– 538 37 Giaever, G. et al. (2002) Functional profiling of the Saccharomyces
35 Robyr, D. et al. (2002) Microarray deacetylation maps determine cerevisiae genome. Nature 418, 387 – 391
http://ticb.trends.com