Retrieval of Data

Database searching involves aligning nucleotide or protein sequences with database sequences to identify similarities, with primary databases containing experimentally derived data. Methods like BLAST and FASTA are used for efficient searching, providing local alignments that are more informative than global alignments. Secondary databases offer additional insights into sequence relationships and motifs, enhancing the analysis of biological sequences.

Uploaded by

tassera9

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views22 pages

Retrieval of Data

Uploaded by

tassera9

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 22

Database

searching ?
• Database searching is the matching of query nucleotide or protein
sequences with database sequences. To do this , we align the query
sequence with database sequences to find similarity among them.
Database searching is the application of knowledge achieved from
previous biological experiments to the gene discovery problem.
• A DNA sequence comprises only 4 nucleotides, while a protein
sequence is made up of twenty amino acids. Hence it is easier to
search for similar patterns in proteins than in DNA. To perform a
database search , it is better to translate the DNA sequence for
encoding proteins into protein sequence for a more reliable result.
Type of database searching
Primary database Secondary database
searching searching

BLAS FAST MOTIF PATTER

T A SEARC N
H
Primary database
• A primarysearching
sequence is one that has been experimentally
determined, and a primary database is one that contain
experimentally derived data.
• Searching a database in an efficient manner is a matter of prime
importance.
• Methods that can run on small databases may not be effective
with larger databases in terms of time and space efficiency.
• Due to large amount of information in database it is difficult to
perform a database search using dynamic programming as it is a
computationally intensive programme and is therefore too slow.
• To save time and space heuristic methods (BLAST and FASTA)
are used for database search.
• These programs are local similarity search programs that
provide short matches.
• These short matches called local alignment (a segment of
sequence showing similarity) are more informative than global
alignment (complete sequence showing similarity).
• Local alignments can return highly conserved region of
the sequence even in two sequences that do not produce
any reasonable alignment when aligned globally.
• Dynamic algorithm programs can return local alignments and
are guaranteed to find the best alignment, but are very
insensitive due to their mathematical rigour.
• BLAST and FASTA are major primary database search method
and are very fast though less sensitive used to carry out
database searches.
FAST
 FASTA A
developedby Pearsonand Lipman in 1905, and was
so it was faster than other methods used for sequence
named because
alignment at that time.
 FASTA uses the Pearson and Lipman algorithm for similarity search
between a query sequence and a database sequence.
 Given a query sequence , FASTA searches for local alignment with the
sequences in the database.
 Originally, the FASTAP program was designed for protein sequence
similarity.
 It is a rapid alignment program for protein and DNA sequence
pairs.
 No individual residue search is performed, saving time.
 Input sequence must be in FASTA format for alignment.
Basic FASTA
programs
Program name Query Database Algorithm used
sequence sequence
Nucleotide NUCLEOTIDE NUCLEOTIDE DNA/RNA
BLAST FASTA,
FASTM,
FASTS
Protein BLAST PROTEIN PROTEIN FASTA,
SSEARCH,
FASTS, FASTF
FASTX/FASTY TRANSLATE PROTEIN
D
NUCLEOTID
E
TFASTX/TFASTY PROTEIN TRRANSLAT
ED DNA
TFASTs PEPTIDES TRANSLATE
D DNA
BLAST (Basic local alignment search
tool)
A local similarity search program, BLAST compares nucleotides or protein
sequences to sequence databases and calculate the statistical significance of the
matches. The functional and evolutionary relationship between sequences are
construed and members of gene families identified by the BLAST search program.
It is a simplification of the Smith-waterman algorithm and It is faster than
FASTA.
This is the algorithm that is most commonly used for database search and sequence
alignment. It looks for similar regions in two sequences without allowing a gap, though
now there is gapped BLAST (WU-BLAST).
It is more selective and less sensitive.
It does not allow gaps in the alignment.
FASTA is more sensitive than BLAST for nucleotide sequences
BLAST(word size 3) is more sensitive for protein
sequence as compared to FASTA(Word size 2)
Basic BLAST
programs
Program name Query Database Algorithm used
sequence sequence
Nucleotide NUCLEOTIDE NUCLEOTIDE BLASTN,
BLAST MEGABLAS
T
Protein BLAST PROTEIN PROTEIN BLASTP,
PHIBLAST,
PSI BLAST
BLASTX TRANSLATE PROTEIN BLASTP
D
NUCLEOTID
E
TBLASTN PROTEIN TRRANSLAT BLASTP
ED
NUCLEOTIDE
TBLASTX TRANSLATE TRANSLATE BLASTP
D D
NUCLEOTID NUCLEOTID
Secondary database
 Primary database searching does not always provide a satisfactory answer to
searching
the questions of sequence analysis.
 The presence of highly repetitive and low complexity sequences can result in
irrelevant matches and may even complete the interpretation.
 Secondary databases provides information about the relationship of a given
sequence with other sequences within multiple alignment and some more
information (family, domain and motif) as well, depending on the method
used.
 These databases contain the results of primary sequence analysis.
 Some important secondary database searches are motif or pattern search and
profile search.
 PROSITE is a database and a tool consisting of documentation entries
describing protein domain, families and functional sites as well as associated
pattern and profiles to identify them.
Motif
search
• Motifs are specific geometric arrangement of protein
secondary structure elements (alpha, beta and loops).
Some motif are associated with a particular function and
some are part of other structural and functional
arrangements. Simple motifs are combined to form
complex motifs. These are biologically conserved regions
from protein sequences.
Types of
motif
The Hairpin Beta- Helix loop
Greek
key beta motif alpha- helix
motif beta motif motif
Similarity and
•
Identity
These are terms that illustrate the relationship between two
proteins with one another.
• The residue position at which both sequences being compared
have the same type of residue is called identical residue.
• The residue positions at which both sequences being
compared have amino acids with similar properties are called
similarity residues.
• Similarity is the likeness (resemblance) between two
sequences in comparison while identity is the number of
characters that match exactly between two different
sequences.
For
example
A F NTT (Seq1)
: :
L N NTS (Seq2)

AL and TS are similar residues. Similar residues are

represented with a colon (:). The residues N and T are
identical residues in the given example which represented
by solid line ( ) .
Sequence
alignment
• In bioinformatics , a sequence alignment is a way of
arranging the sequences of DNA, RNA, or protein to
identify regions of similarity that may be a consequence
of functional, structural, or evolutionary relationships
between the sequences.
Pairwise sequence
alignment
• This alignment used to identify regions of similarity that
may indicate functional, structural or evolutionary
relationships between two biological sequences.
• EMBOSS, LAGAN, Bl2seq, Dotlet and Dotter are the
common tools for pairwise sequence alignment.
• It is of two types ; local alignment and global
alignment.
Local
•
alignment
If the two given sequences are not so similar and it is
difficult to align the two sequences across the full length,
then local alignment can be used to align the sequences.
• Local alignment provides information about conserved
regions or domains. From these conserved regions it is
possible to get an idea of the evolutionary history.
• Local alignment is more meaningful than global alignment as
it can achieve some alignment even with sequences that are
not so similar. It can also be used to align sequences of
unequal length or when only a conserved domain is found in
two sequences.
Global
alignment
• Global alignment is done across the entire length of the
sequence, including matches characters, gaps and
mismatches.
• Choosing different mismatch and gap penalties may
produce different alignments for the same sequences.
Multiple sequence
• alignment
For multiple sequence alignment more than two sequences are
required.
• A database search usually reveals many homologous sequences. The
residues of the homologous sequences are aligned together I column
for multiple sequence alignment.
• While aligning, wherever a sequence does not possess an amino acid
in a particular position, it is denoted by a dash.
• Highly identical sequences are used to give some meaningful results.
These multiple sequence alignment can be used to establish
phylogenetic relationship.
• ClustalW, T-Coffee, Multalin, DCA, HMMER, DIALIGN are tools
for Multiple sequence alignment.
Homologous
gene
• Homologous gene is a gene inherited in two species by a
common ancestor. While homologous gene can be
similar in sequence.
Orthologous and paralogous gene
sequences
 Both orthologs and paralogs are types of homologs.
Orthologs are homologous genes where a gene diverges
after a speciation event, but the gene and its main
function are conserved.
If a gene is duplicated in a species, the resulting
duplicated genes are paralogs of each other, even though
over time they might become different in sequence
composition and function.
Globin
gene Gene Duplication

Alpha chain gene Beta chain gene

Gene Speciation
Gene Speciat

Frog Mous Mous Fro

e e g
Ortholog Ortholog
s s
Paralogs

Homolog
Thank
you

Gods and Goddesses of Ancient Egypt
95% (99)
Gods and Goddesses of Ancient Egypt
365 pages
JBL Control 5 Speakers
No ratings yet
JBL Control 5 Speakers
2 pages
Task 1.1 - Physical Activity Readiness - Questionnaire YES NO
No ratings yet
Task 1.1 - Physical Activity Readiness - Questionnaire YES NO
3 pages
Basics of Bioinformatics
100% (7)
Basics of Bioinformatics
99 pages
2. Sequence alignment
No ratings yet
2. Sequence alignment
25 pages
Diploma - Practical
No ratings yet
Diploma - Practical
11 pages
Sequence Analysis - Alignment
No ratings yet
Sequence Analysis - Alignment
57 pages
Application in Establishing Epidemiology and Variability: Genome & Protein " Sequence Analysis Programs"
100% (3)
Application in Establishing Epidemiology and Variability: Genome & Protein " Sequence Analysis Programs"
23 pages
Chap 03 BioInfo
No ratings yet
Chap 03 BioInfo
15 pages
Bioinformatics Intro
No ratings yet
Bioinformatics Intro
69 pages
Introduction To Bioinformatics Lecture 3
No ratings yet
Introduction To Bioinformatics Lecture 3
20 pages
Bioinformatics
No ratings yet
Bioinformatics
22 pages
Introduction To Different Resources of Bioinformatics and Application PDF
No ratings yet
Introduction To Different Resources of Bioinformatics and Application PDF
55 pages
Blast ND Fasta
No ratings yet
Blast ND Fasta
28 pages
Bif401 Manual 2023
No ratings yet
Bif401 Manual 2023
27 pages
Sequence Alignment
No ratings yet
Sequence Alignment
27 pages
UNIT IV _ BLAST (1)
No ratings yet
UNIT IV _ BLAST (1)
21 pages
Introduction To Bioinformatics: Database Search (FASTA)
No ratings yet
Introduction To Bioinformatics: Database Search (FASTA)
35 pages
Bioinformatics Seminar3rdOct18
No ratings yet
Bioinformatics Seminar3rdOct18
25 pages
Fasta and Blast
No ratings yet
Fasta and Blast
3 pages
Data Retrieval
67% (3)
Data Retrieval
17 pages
BIF401 MID Term Exam 2022 Preparation by BADSHA ALI
No ratings yet
BIF401 MID Term Exam 2022 Preparation by BADSHA ALI
6 pages
Unit 3 Sequence Alignment and Phylogenetic Tree
No ratings yet
Unit 3 Sequence Alignment and Phylogenetic Tree
70 pages
Blast:: Protein Sequence Database
No ratings yet
Blast:: Protein Sequence Database
1 page
Bioinformatics: ABE 2007 Kent Koster Group 3
No ratings yet
Bioinformatics: ABE 2007 Kent Koster Group 3
43 pages
msa_MTech
No ratings yet
msa_MTech
17 pages
Bioinformatics Database and Applications
100% (3)
Bioinformatics Database and Applications
82 pages
BLAST and Sequence Alignment
No ratings yet
BLAST and Sequence Alignment
36 pages
Unit2 2
No ratings yet
Unit2 2
30 pages
Bioinformatics: Intended Learning Outcomes
No ratings yet
Bioinformatics: Intended Learning Outcomes
9 pages
BLAST (Basic Local Alignment Search Tool)
100% (1)
BLAST (Basic Local Alignment Search Tool)
23 pages
Blast Fasta
No ratings yet
Blast Fasta
27 pages
Introduction To Bioinformatics Presentation
No ratings yet
Introduction To Bioinformatics Presentation
13 pages
5 Database Similarity Search BLAST
No ratings yet
5 Database Similarity Search BLAST
47 pages
3
No ratings yet
3
107 pages
Fasta Sequence Database
No ratings yet
Fasta Sequence Database
17 pages
Tools in Bioinformatics
100% (1)
Tools in Bioinformatics
17 pages
Bioinformatics Is The Inter-Disciplinary Branch of Biology Which Merges Computer Science, Mathematics and Engineering To Study The Biological Data
No ratings yet
Bioinformatics Is The Inter-Disciplinary Branch of Biology Which Merges Computer Science, Mathematics and Engineering To Study The Biological Data
26 pages
Blast
100% (1)
Blast
21 pages
Lecture 9 and 10 half
No ratings yet
Lecture 9 and 10 half
4 pages
Unit 2.1
No ratings yet
Unit 2.1
77 pages
Lecture - 02 - Comparative Sequence Analysis
No ratings yet
Lecture - 02 - Comparative Sequence Analysis
28 pages
BIF401 Current Papers solution Part 1
No ratings yet
BIF401 Current Papers solution Part 1
6 pages
BLAST
No ratings yet
BLAST
30 pages
Bioinformatics Lab 2
No ratings yet
Bioinformatics Lab 2
9 pages
ALLIENU Blast and Fasta
No ratings yet
ALLIENU Blast and Fasta
27 pages
Bio-3 (1)
No ratings yet
Bio-3 (1)
51 pages
Lab Report 3 Bioinformatics
No ratings yet
Lab Report 3 Bioinformatics
18 pages
Basic Bioinformatics
No ratings yet
Basic Bioinformatics
40 pages
Sequence Alignments: Felix Sappelt Irina Wagner
100% (1)
Sequence Alignments: Felix Sappelt Irina Wagner
34 pages
Sequence Alignment and Searching
No ratings yet
Sequence Alignment and Searching
37 pages
FASTA Programs and Applications
No ratings yet
FASTA Programs and Applications
1 page
Module_4_Reference Course content
No ratings yet
Module_4_Reference Course content
25 pages
Bioinformatics Chaper3
No ratings yet
Bioinformatics Chaper3
34 pages
Sequence Analysis Primer 1st Edition ISBN 0195098749, 9780195098747 Full Text Download
No ratings yet
Sequence Analysis Primer 1st Edition ISBN 0195098749, 9780195098747 Full Text Download
16 pages
Bioinformatics Lab 2 (Evelyn)
No ratings yet
Bioinformatics Lab 2 (Evelyn)
9 pages
Introduction to Bioinformatics, Sequence and Genome Analysis
From Everand
Introduction to Bioinformatics, Sequence and Genome Analysis
Jerry H. Swift
No ratings yet
Introduction to Bioinformatics Using Action Labs
From Everand
Introduction to Bioinformatics Using Action Labs
Jean-Louis Lassez
5/5 (1)
Neuroevolution: Fundamentals and Applications for Surpassing Human Intelligence with Neuroevolution
From Everand
Neuroevolution: Fundamentals and Applications for Surpassing Human Intelligence with Neuroevolution
Fouad Sabry
No ratings yet
Statistical Analysis Techniques in Particle Physics: Fits, Density Estimation and Supervised Learning
From Everand
Statistical Analysis Techniques in Particle Physics: Fits, Density Estimation and Supervised Learning
Ilya Narsky
No ratings yet
ElasticSearch Server
From Everand
ElasticSearch Server
Rafal Kuc
No ratings yet
Search Tree: Fundamentals and Applications
From Everand
Search Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
DNA Code Basics
From Everand
DNA Code Basics
Zara Sagan
No ratings yet
Pubmed and OMIM
No ratings yet
Pubmed and OMIM
44 pages
GENE FAMILIES AND PROTEIN FAMILIES
No ratings yet
GENE FAMILIES AND PROTEIN FAMILIES
17 pages
Bio in for Matics
No ratings yet
Bio in for Matics
20 pages
Biological Databases (1)
No ratings yet
Biological Databases (1)
41 pages
2011 08 Sprayer MevaTrenn GB
No ratings yet
2011 08 Sprayer MevaTrenn GB
2 pages
The Berenstain Bears Fathers Day Blessings Mike Berenstain instant download
No ratings yet
The Berenstain Bears Fathers Day Blessings Mike Berenstain instant download
27 pages
8458 23594 1 SP
No ratings yet
8458 23594 1 SP
6 pages
Fruit Chain Store
No ratings yet
Fruit Chain Store
7 pages
Flood Frequency Analysis For Sarawak Using Weibull, Gringorten and L-Moments Formula
No ratings yet
Flood Frequency Analysis For Sarawak Using Weibull, Gringorten and L-Moments Formula
10 pages
GENE THERAPY FOR SICKLE CELL DISEASE by Adeleye Abayomi (Seminar Report) - CORRECTIONS ADAPTED
No ratings yet
GENE THERAPY FOR SICKLE CELL DISEASE by Adeleye Abayomi (Seminar Report) - CORRECTIONS ADAPTED
54 pages
M. Tech - Dig Elo. Error Control Coding
No ratings yet
M. Tech - Dig Elo. Error Control Coding
5 pages
Char-Lynn 10000 Series Repair Manual
100% (1)
Char-Lynn 10000 Series Repair Manual
12 pages
1 Qmath 8
No ratings yet
1 Qmath 8
8 pages
Operating Instructions: SR-SAT102 SR-SAT182
No ratings yet
Operating Instructions: SR-SAT102 SR-SAT182
18 pages
Method Statement For Application of Drywall Paint
No ratings yet
Method Statement For Application of Drywall Paint
6 pages
Portfolio Output No. 9: My Stress Signals: Lerog, Leonardo O. Iii 12-Humss 5
0% (1)
Portfolio Output No. 9: My Stress Signals: Lerog, Leonardo O. Iii 12-Humss 5
1 page
13.1 Postwar Social Changes
No ratings yet
13.1 Postwar Social Changes
7 pages
Class 1 SFD and BMD
No ratings yet
Class 1 SFD and BMD
16 pages
Physics Level One PHYS110 - Model Paper
No ratings yet
Physics Level One PHYS110 - Model Paper
3 pages
Walsc AAC Flooring System Design and Installation Guide - V.202107 - 1007 PDF
No ratings yet
Walsc AAC Flooring System Design and Installation Guide - V.202107 - 1007 PDF
24 pages
Rmu Tech Spec
No ratings yet
Rmu Tech Spec
28 pages
N03_BinaryAdders
No ratings yet
N03_BinaryAdders
25 pages
Carbon Neutrality
No ratings yet
Carbon Neutrality
12 pages
Learning Module: Science 10
No ratings yet
Learning Module: Science 10
9 pages
Cswip 3.1 Question Paper
No ratings yet
Cswip 3.1 Question Paper
214 pages
Grade 9 Rationalized Integrated Science Schemes of Work Term 1
No ratings yet
Grade 9 Rationalized Integrated Science Schemes of Work Term 1
6 pages
DETAILED-ITINERARY-5D4N-with-Vuhus
No ratings yet
DETAILED-ITINERARY-5D4N-with-Vuhus
2 pages
NARAYANA OLYMPIAD MATHEMATICS Chapter 1 Circles (Class X)
No ratings yet
NARAYANA OLYMPIAD MATHEMATICS Chapter 1 Circles (Class X)
46 pages
Hayward Pro Grid de 4820
No ratings yet
Hayward Pro Grid de 4820
12 pages
16243028884745
No ratings yet
16243028884745
4 pages
Legend Legend Legend
No ratings yet
Legend Legend Legend
1 page

Retrieval of Data

Uploaded by

Retrieval of Data

Uploaded by

Database

BLAS FAST MOTIF PATTER

AL and TS are similar residues. Similar residues are

Alpha chain gene Beta chain gene

Frog Mous Mous Fro

You might also like