0% found this document useful (0 votes)
52 views12 pages

La Souris, La Mouche Et L'Homme A. Human Genome Project: B. DNA Computers/DNA Nanorobots: C. Phylogenomics

The document provides an overview of biology concepts including: 1) DNA encodes hereditary information through complex chemical reactions that result in the development of a human from a single cell. 2) A cell contains genetic material encoded in DNA along with machinery like proteins and organelles that process information dynamically. 3) The basic components of DNA include nucleotides, sugars, phosphates, and bases that bond together to form the signature double helix structure.

Uploaded by

Fadhili Dunga
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PS, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views12 pages

La Souris, La Mouche Et L'Homme A. Human Genome Project: B. DNA Computers/DNA Nanorobots: C. Phylogenomics

The document provides an overview of biology concepts including: 1) DNA encodes hereditary information through complex chemical reactions that result in the development of a human from a single cell. 2) A cell contains genetic material encoded in DNA along with machinery like proteins and organelles that process information dynamically. 3) The basic components of DNA include nucleotides, sugars, phosphates, and bases that bond together to form the signature double helix structure.

Uploaded by

Fadhili Dunga
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PS, PDF, TXT or read online on Scribd
You are on page 1/ 12

Chapter 2

Bio...

2.1 Introduction
\What is wonderful about the appearance of a new
human being is not the nature of the receptacle in
which the rst stage takes place. It would not even
be the accomplishment of making the entire develop-
ment take place in a test tube. The incredible is the
process itself. It is that the meeting of the sperm with
the egg initiates a gigantic set of chemical reactions,
hundreds and thousands of which follow each other,
overlap and cross each other in an orderly network
of unbelievable complexity. All this is to result...in
the appearance of a human baby and never a little
duck, a gira e or a butter y "
|Francois Jacob, La Souris, La Mouche et
L'Homme
2.1.1 Syllabus Selected
A. Human Genome Project: 4 lectures
B. DNA Computers/DNA Nanorobots: 3 lectures
C. Phylogenomics: 2 lectures
9
10 Bio... Chapter 2

D. Linkage Analysis: 3 lectures


E. Cell Informatics: 2 lectures
F. Viral Cell Assembly: ??
G. Functional Genomics: ??
H. Single Nucleotide Polymorphisms (SNP): ??

2.2 Information Processing inside a Cell


It is an interesting exercise to try to look at biology as simply
a study of certain special kinds of information processing sys-
tems: or even more precisely as multi-agent repeated game work-
ing under some replicator dynamics . This approach necessarily
disregards most of what biologists (still) study: biochemistry ,
molecular biology , cell biology , etc. as these can only lead up
to an understanding of the structural machinery underlying the
biological systems. Perhaps, this approach of biologists would
be analogous to saying that one can understand a computer by
simply looking at how p-doped and n-doped areas tile a silicon
surface of a computer chip.
Who are the agents in this game? Depending on your view-
point you could say these are the genes (are they sel sh?), or the
cells or even the species , or phyla . At each level, the molecular
substrates for encoding the information and chemical reactions
for transforming the informations di er. But, one can look at
the information structures at two important levels:
1. Macroscopic/Population level: Hereditary and evolu-
tionary roles of the information.
2. Microscopic/Cell level: Cell biological roles of the in-
formation.
We start with the Genome .

c Mishra, 1999
Section 2.3 Bio... 11

2.3 Genome
Hereditary information of an organism is encoded in its DNA
and enclosed in a cell (unless you are a virus). All the informa-
tion contained in the DNA of a single organism is its genome .
A rst step in understanding information encoding in DNA
would be by envisioning a DNA molecule to be just a very long
sequence of nucleotides or bases over the alphabet:
 = fa; t; c; gg:
DNA is a double-stranded polymer and should be thought of
as a pair of sequences over . However, there is a relation of
complementarity between the two sequences: that is if there is an
a (respectively, t, c, g) on one sequence at a particular position
then the other sequence must have a t (respectively a, g, c) at
the same position. a and t form one complementary pair and
c and g another. Thus it makes sense to simply describe one
sequence, as the other one completely determined. However, we
will measure the sequence length (or the DNA length) in terms
of base pairs (bp): for instance, human (H. sapiens ) DNA is
3:3  109 bp measuring about 6 ft of DNA polymer, when it is
completely streched out!
The genomes vary widely in size: measuring from few thou-
sand base pairs for viruses to 2  3  1011 bp for certain am-
phibian and owering plants. Coliphage MS2 (a virus) has the
smallest genome: only 3:5  103 bp. Mycoplasmas (a unicellular
organism) has the smallest cellular genome: 5  105 bp. C. el-
egans (nematode worm, a primitive multicellular organism) has
a genome of size  108 bp.
The goal of a genome study (say for example, the Human
Genome Project ) would consist of the following:
1. Genetic Maps
2. Physical Maps
(For instance, the Human Genome Project [HGP] requires
a complete map of the human genome at a resolution of 100
c Mishra, 1999
12 Bio... Chapter 2

Kb = 105 bp. That is, the map would consist of \markers"


spaced at most 105 bp apart.)
3. DNA Sequencing
4. Gene Identi cation
Identify genes (parts of the DNA involved in controlling
the metabolic processes through proteins they encode) on
physical maps or sequenced DNA.
5. Informatics
Elucidate the structure of the DNA as encoding of all the
relevant information.
(a) Diagnostic and Therapeutic Tools: Necessary
for the treatment of genetic diseases.
(b) Phylogenetic Tools: Used in understanding the
process and mechanism of evolution.

2.4 DNA|Structure and Components


The usual con guration of DNA is in terms of a double helix
consisting of two chains or strands coiling around each other
with two alternating grooves of slighltly di erent spacing. The
\backbone" in each strand is made of alternating big sugar
molecules (Deoxyribose residues: C5O4H10) and small phosphate
((PO4) 3) molecules.
Now, one of the four bases (the letters in our alphabet ),
each one an almost planar nitrogenic organic compound, is con-
nected to the sugar molecule. The bases are: adenine (a),
thymine (t), cytosine (c) and guanine (g). So, if one reads
the sequence of bases then that de nes the information encoded
by the DNA. Complementary base pairs (a-t, and c-g) are con-
nected by hydrogen bonds and the base-pair forms an essentially
coplanar \rung" connecting the two strands.
Note that cytosine and thymine are smaller (lighter) molecules,
called pyrimidines , where as guanine and adenine are bigger

c Mishra, 1999
Section 2.5 Bio... 13

(bulkier) molecules, called purines . Furthermore, adenine and


thymine allow only for double hydrogen bonding, while cytosine
and guanine allow for triple hydrogen bonding. Thus the chem-
ical (through hydrogen bonding) and the mechanical (purine to
pyrimidine) constraints on the pairing lead to the complementar-
ity and makes the double stranded DNA both chemically inert
and mechanically quite rigid and stable.
From a chemist's point of view, the building blocks of the
DNA molecule are four kinds of deoxyribonucleotides, where
each deoxyribonucleotide is made up of a sugar residue, a phos-
phate group and a base. So if a chemist has fours kinds of such
building blocks (or related, dNTPs deoxyribonucleoside triphos-
phates) then he can synthesize a strand of DNA.
The sugar molecule in the strand is in the shape of a pentagon
(4 carbons and 1 oxygen) in a plane parallel to the helix axis
and with the 5th carbon (5' C) sticking out. The phosphodiester
bond (-O-P-O-) between the sugars connects this 5' C to a car-
bon in the pentagon (3' C) and provides a directionality to each
strand. The strands in a double-stranded DNA molecule has
opposite directions|the strands are antiparallel . When DNA
molecule breaks (say by interacting with a restriction enzyme)
it breaks at one of these -O-P-O-bonds. Also, remember that
most of the enzymes moving along the backbone moves in the
5'-3' direction. Thus when we represent a DNA sequence, say
by writing gattaca, what we mean is the following:
50 gattaca 30
30 ctaatgt 50

which is also (the unpronounciable) tgtaatc.

2.5 The Cell


The next more complicated player in the game of life is a cell|
it is a small coalition of a set of genes held together in a set
of chromosomes and even perhaps unrelated extrachromosomal
elements. They also have set of machinery made of proteins,
c Mishra, 1999
14 Bio... Chapter 2

enzymes, lipids and organelles taking part in a dynamic process


of information processing.
In eukaryotic cells the genetic materials are enclosed in the
cell nucleus separated from the other organelles in the cytoplasm
by a membrane. In prokaryotic cells the genetic materials are
distributed homogeneously as it does not have a nucleus. Exam-
ple of prokaryotic cells are bacteria with a considerably simple
genome.
The organelles common to eukaryotic plant and animal cells
include
 Mitochondria in animal cells and chloroplasts in plant cells
(responsible for energy production and photosynthesis, re-
spectively);
 A Golgi apparatus (responsible for modifying, sorting and
packaging various macromolecules for distribution within
and outside the cell);
 Endpolastic reticulum (responsible for synthesizing pro-
tein); and
 Nucleus (responsible for holding the DNA as chromosomes
and replication and transcription).
The entire cell is contained in a sack made of plasma mem-
brane. In plant cells, they are further surrounded by a cellulose
cell wall.
The nucleus of the eukaryotic cells contain its genome in
several chromosomes, where each chromosome is simply a single
molecule of DNA as well as some proteins (primarily histones).
The chromosomes can be a circular molecule or linear, in which
case the ends are capped with special sequence of telomeres .
The protein in the nucleus binds to the DNA and e ects the
compaction of the very long DNA molecules. In somatic cells (as
opposed to gametes: egg and sperm cells) of most eukaryotic or-
ganisms, the chromosomes occur in homologous pairs, with the
only exceptions being the X and Y chromosomes|sex chromo-
somes . Gemetes contain only unpaired chromosomes; the egg

c Mishra, 1999
Section 2.6 Bio... 15

cell contains only X chromosome and the sperm cell either an X


or an Y chromosome. The male has X and Y chromosomes; the
female, 2 X's. Cells with single unpaired chromosomes are called
haploid ; the cells with homologous pairs, diploid ; the cells with
homologous triplet, quadruplet, etc., chromosomes are called
polyploid |many plant cells are polyploid.
The dynamics of cell is manifested in several manners: the
cell cycle (the set of events that occur within a cell between its
birth by mitosis and its division into daughter cells again by
mitosis) made up of an interphase period when DNA is syn-
thesized and a mitotic phase ; the cell division by mitosis (into
2 daughter cells) and meiosis (into 4 gametes from germ-line
cells); and working of the machinery within the cell|mainly the
ones involving replication of DNA, transcription of DNA into
RNA and translation of RNA into protein.

2.6 The Central Dogma


The intermediate molecule carrying the information out of the
nucleus of an eukaryotic cell is RNA, a single stranded poly-
mer with the same bases as DNA except the base thymine is
replaced by uracil , u. RNA also controls the translation process
in which amino acids are created making up the proteins. The
central dogma (due to Francis Crick in 1958) states that these
information ows are all unidirectional:

\The central dogma states that once `information'


has passed into protein it cannot get out again. The
transfer of information from nucleic acid to nucleic
acid, or from nucleic acid to protein, may be pos-
sible, but transfer from protein to protein, or from
protein to nucleic acid is impossible. Information
means here the precise determination of sequence,
either of bases in the nucleic acid or of amino acid
residues in the protein ."

c Mishra, 1999
16 Bio... Chapter 2

2.6.1 RNA and Transcription


The polymer RNA (ribonucleic acid ) is similar to DNA but
di er in several ways: it's single stranded; its nucleotide has a
ribose sugar (instead of deoxyribose) and it has the pyrimidine
base uracil, u, substituting thymine t|u is complementary to
a just as thymine is.
One consequence of an RNA molecule's single-strandedness
is that it tends to fold back on itself to make helical twisted and
rigid segments. For instance, if a segment of an RNA is
50 ggggaaaacccc 30 ;

then the c's fold back on the g's to make a hairpin structure
(with a 4 bp stem and a 5 bp loop ). The secondary RNA struc-
ture can even be more complicated, for instance, in case of E. coli
Ala tRNA (transfer RNA) forming a cloverleaf . Prediction of
RNA structure is an interesting computational problem.
A speci c region of DNA that ultimately determines the syn-
thesis of proteins (through the transcription and translation) is
called a gene 1 . Transcription of a gene to a messenger RNA is
keyed by an RNA polymerase enzyme, which attaches to a core
promoter (a speci c sequence adjacent to the relevant structural
gene). Regulatory sequences such as silencers and enhancers
are responsible in controlling the rate of transcription by their
in uence on the RNA polymerase through a feedback control
loop involving many large families of activator and repressor
proteins that bind with DNA and which in turn, transpond the
RNA polymerase by coactivator proteins and basal factors . The
entire structure of transcriptional regulation of gene expression
is rather dispersed and fairly complicated: The enhancer and
silencer sequences occur over a wide region spanning many Kb's
from the core promoter on either directions (upstream and down-
stream); a gene may have many silencers and enhancers and
can be shared among the genes; they are not unique|di erent
Originally, a gene meant something more abstract|a unit of hereditary
1
inheritance. Understanding of molecular biological basis of heredity has led
to an understanding of a gene with a physical molecular existence.


c Mishra, 1999
Section 2.6 Bio... 17

genes may have di erent combinations; the proteins involved in


control of the RNA polymerase number around fty and di er-
ent cliques of transcriptional factors operate in di erent cliques.
Any disorder in their proper operation can lead to cancer, im-
mune disorder, heart disease, etc.
The transcription of DNA in to m-RNA is performed with a
single strand of DNA (the sense strand) around the region cor-
responding to a gene. The double helix untwists momentarily to
create a transcriptional bubble which moves along the DNA in
the 3' - 5' direction (of the sense strand) as the complementary
m-RNA synthesis progresses adding one RNA nucleotide at a
time at the 3' end of the RNA, attaching an u (respectively, a,
g and c) for the corresponding DNA base of a (respectively,
t, c and g). The transcription process ends when a special se-
quence called the termination signal is encountered. This newly
synthesized m-RNA are capped by attaching special nucleotide
sequences to the 5' and 3' ends. This molecule is called a pre-
m-RNA.
In eukaryotic cells, the region of DNA that is transcribed into
a pre-m-RNA involves more than just the information needed
to synthesize the proteins. The DNA subsequences that contain
the information or code for protein (somewhat indirectly) are the
so-called exons which are interrupted by regions of introns , the
non-coding regions. Note that pre-m-RNA contains both exons
and introns and needs to be altered to excise all the intronic
subsequences in preparation for the translation process|this is
done by the spliceosome . The location of splice sites, separating
the introns and exons, is dictated by short sequences and simple
rules (which are frequently violated) such as \introns begin with
the dinucleotide gt and end with the dinucleotide ag" (the gt-
ag rule).

2.6.2 Protein and Translation


The translation process begins at a particular location of the m-
RNA called the translation start sequence (usually aug) and is
mediated by the transfer RNA (t-RNA), made up of a group of
c Mishra, 1999
18 Bio... Chapter 2

small RNA molecules, each with speci ty for a particular amino


acid. The t-RNA's carry the amino acids to the ribosomes , the
site of protein synthesis, where they are attached to a growing
polypeptide. The translation stops when one of the three trinu-
cleotides uaa, uag, uga is encountered.
Each 3 consecutive (nonoverlapping) bases of m-RNA (cor-
responding to a codon ) codes for a speci c amino acid. There
are 43 = 64 possible trinucleotide codons belonging to the set
fu; a; g; cg  fu; a; g; cg  fu; a; g; cg:
The codon aug is the start codon and the codons uaa, uag,
uga are the stop codons . Clearly, that leaves 61 codons (start
codon codes for the amino acid Met) to code for 20 amino acids
with an expected redundancy of 3.05. Multiple codons (one to
six) are used to code a single amino acid. The line of nucleotides
between and including the start and stop codons is called an open
reading frame (ORF) and one can assume that all the informa-
tion of interest to us resides in the ORF's. The mapping from
the codons to amino acid (and naturally extended to a mapping
from ORF's to polypeptides by a homomorphism) given by
FP : fu; a; g; cg3 ! fA; R; D; N; C; E; Q; G; H;
I; L; K; M; F; P; S; T; W; Y; V g
uuu 7! F (= Phe = phenylamine)


c Mishra, 1999
Section 2.6 Maps 19

RF1
RF0 g a c u RF2
g Gly Glu Ala Val g
Gly GLu Ala Val a
Gly Asp Ala Val c
Gly Asp Ala Val u
a Arg Lys Thr Met g
Arg Lys Thr Ile a
Ser Asn Thr Ile c
Ser Asn Thr Ile u
c Arg Gln Pro Leu g
Arg Gln Pro Leu a
Arg His Pro Leu c
Arg His Pro Leu u
u Trp Stop Ser Leu g
Stop Stop Ser Leu a
Cys Tyr Ser Phe c
Cys Tyr Ser Phe u

The genetic code for each triplet can be read of by looking


at the entry given by the rst letter (RF0, base in the reading
frame 0) along the left column, the second letter (RF1, base in
the reading frame 1) along the row and the third letter (RF2,
base in the reading frame 2). In an ORF, a given occurrence of
a base is said to be in reading frame 0, 1, or 2, if it is the rst,
second or third letter in a codon, respectively. A codon is said to
be in-frame if its rst base is in reading frame 0. The ribosome is
simply a transducer that reads the open reading frame one codon
at a time to create the amino acids and subsequently a protein.
The translation process is carried out by two non-protein coding
RNA molecules, r-RNA (ribosomal RNA) and t-RNA (transfer
RNA).
Following is the shorter table of 1 letter abbreviations of the
amino acids.
c Mishra, 1999
20 Maps Chapter 2

1 ltr code 3 ltr code amino acid inverse homomorphism


A Ala alanine gc(u + a + c + g)
C Cys cysteine ug(u + c)
D Asp aspertic acid ga(u + c)
E Glu glutamic acid ga(g + a)
F Phe phenylanine uu(u + c)
G Gly glycine gg(u + a + c + g)
H His histine ca(u + c)
I Ile isoleucine au(u + a + c)
K Lys lysine aa(a + g)
L Leu leucine (c + u)u(a + g) + cu(u + c)
M Met methionine aug
N Asn asparginine aa(u + c)
P Pro proline cc(u + a + c + g)
Q Gln glutamine ca(a + g)
R Arg arginine (a + c)g(a + g) + cg(u + c)
S Ser serine (ag + uc)(u + c) + uc(a + g)
T Thr threonine ac(u + a + c + g)
V Val valine gu(u + a + c + g)
W Trp tryptophan ugg
Y Tyr tyrosine ua(u + c)


c Mishra, 1999

You might also like