Biological Databases
Pharmamatrix Workshop 2010
- Philip Winter
- Ishwar V. Hosamani
Some databases in the field of molecular biology…
AATDB, AceDb, ACUTS, ADB, AFDB, AGIS, AMSdb,
ARR, AsDb,BBDB, BCGD,Beanref,Biolmage,
BioMagResBank, BIOMDB, BLOCKS, BovGBASE,
BOVMAP, BSORF, BTKbase, CANSITE, CarbBank,
CARBHYD, CATH, CAZY, CCDC, CD4OLbase, CGAP,
ChickGBASE, Colibri, COPE, CottonDB, CSNDB, CUTG,
CyanoBase, dbCFC, dbEST, dbSTS, DDBJ, DGP, DictyDb,
Picty_cDB, DIP, DOGS, DOMO, DPD, DPlnteract, ECDC,
ECGC, EC02DBASE, EcoCyc, EcoGene, EMBL, EMD db,
ENZYME, EPD, EpoDB, ESTHER, FlyBase, FlyView,
GCRDB, GDB, GENATLAS, Genbank, GeneCards,
Genline, GenLink, GENOTK, GenProtEC, GIFTS,
GPCRDB, GRAP, GRBase, gRNAsdb, GRR, GSDB,
HAEMB, HAMSTERS, HEART-2DPAGE, HEXAdb, HGMD,
HIDB, HIDC, HlVdb, HotMolecBase, HOVERGEN, HPDB,
HSC-2DPAGE, ICN, ICTVDB, IL2RGbase, IMGT, Kabat,
KDNA, KEGG, Klotho, LGIC, MAD, MaizeDb, MDB,
Medline, Mendel, MEROPS, MGDB, MGI, MHCPEP5
Micado, MitoDat, MITOMAP, MJDB, MmtDB, Mol-R-Us,
MPDB, MRR, MutBase, MycDB, NDB, NRSub, 0-lycBase,
OMIA, OMIM, OPD, ORDB, OWL, PAHdb, PatBase, PDB,
PDD, Pfam, PhosphoBase, PigBASE, PIR, PKR, PMD,
PPDB, PRESAGE, PRINTS, ProDom, Prolysis, PROSITE,
PROTOMAP, RatMAP, RDP, REBASE, RGP, SBASE,
SCOP, SeqAnaiRef, SGD, SGP, SheepMap, Soybase,
SPAD, SRNA db, SRPDB, STACK, StyGene,Sub2D,
SubtiList, SWISS-2DPAGE, SWISS-3DIMAGE, SWISS-
MODEL Repository, SWISS-PROT, TelDB, TGN, tmRDB,
TOPS, TRANSFAC, TRR, UniGene, URNADB, V BASE,
VDRR, VectorDB, WDCM, WIT, WormPep, YEPD, YPD,
YPM, etc .................. !!!!
What we expect from a database..!!
• Sequence, functional, structural information,
related bibliography
• Well Structured and Indexed
• Well cross-referenced (with other databases)
• Periodically updated
• Tools for analysis and visualization
Biological Databases
• Sequence databases
• Structure databases
Sequence databases
• Nucleotide databases
• Protein databases
Sequence databases
Nucleotide databases
• International Nucleotide Sequence
Database Collaboration (INSDC)
– NCBI
– EMBL
– DDBJ
Standard contents of a sequence
database
• Sequences
• Accession number
• References
• Taxonomic data
• Annotation/curation
• Keywords
• Cross-references
• Documentation
NCBI
• Very comprehensive biological database
• GENBANK: The nucleotide sequence database
• Provides 42 different resource
• Provides a simple and easy to use web
interface
http://www.ncbi.nlm.nih.gov/
• Sequence submission: done using Bankit or
Sequin
• Search Engine for data retrieval: Entrez
• Retrieves information across all the resources
under NCBI
Example: PubMed, taxonomy, SNP, PubChem
etc.
Tools for analysis
• BLAST
• Primer-BLAST
• B-Link
• ORF finder
• Genome workbench
Protein Sequence databases
• UniProt
• PFAM
• Gene Index project
UniProt
• Universal Protein Resource
• Formed through the merger of :
– SIB
– EBI-SwissProt
– TrEMBL
– PIR-PSD
• Entry names are often the names of the gene
followed by the species.
• Accession numbers are of the following
format:
• e.g. P26367 (PAX6_HUMAN)
Uniprot features
• Blast
• Align
• Retrieve
• ID mapping
Pfam
• Proteins contain conserved regions
• Based on the conserved regions, proteins are
classified into families
• Provides links to external databases like PDB,
SCOP, CATH etc.
Pfam: Features
• Sequence search
• View Pfam family
• View a clan
• View a sequence
• View a structure
• Keyword search
Gene Indices
• Project aimed at indexing genes and their
variants in the various genome sequences.
• Creating a catalogue of genes in a wide range
of organisms
• Reduce redundancy
Gene Indices Software Tools
• TGI Clustering tools
• Clview
• SeqClean
• Cdbfasta/cdbyank
Structural databases
• PDB – Protein Data Bank
• CATH
• SCOP – Structural Classification of Proteins
wwPDB
• Contains information about experimentally
determined structures of proteins, nucleic
acids, and complex assemblies
• RCSB-PDB, PDBe, PDBj, BMRB – repositories of
protein structure data
• Files in PDB, mmCIF, PDBML/XML formats
• Advanced search – provides comprehensive
information about a protein.
• Sequence info, domain info, sequence
similarity, literature, apart from the details of
the structure.
• Cross referenced to SCOP and CATH
CATH
• Classification of proteins based on domain
structures
• Each protein chopped into individual domains
and assigned into homologous superfamilies.
• Hierarchial domain classification of PDB
entries.
CATH hierarchy
• Class – derived from secondary structure content is assigned
automatically
• Architecture – describes gross orientation of secondary
structures, independent of connectivity
• Topology – clusters structures according to their
topological connections and numbers of secondary
structures
• Homologous superfamily – this level groups
together protein domains which are thought to
share a common ancestor and can therefore be
described as homologous
SCOP
• Description of structural and evolutionary
relationships between all the proteins with
known structures
• Uses the PDB entries
• Search using keywords or PDB identifiers
Hierarchy in SCOP
• Class
• Fold
• Superfamily
• Family
• Species
Thank you

More Related Content

PPT
Bioinformatic databases 2
PPTX
PPTX
Presentation on Biological database By Elufer Akram @ University Of Science ...
PPTX
Biological databases
PPT
Biological Database Systems
PPT
Biological databases
PDF
BITS: Overview of important biological databases beyond sequences
PPTX
Databases ii
Bioinformatic databases 2
Presentation on Biological database By Elufer Akram @ University Of Science ...
Biological databases
Biological Database Systems
Biological databases
BITS: Overview of important biological databases beyond sequences
Databases ii

What's hot (20)

PPT
PPTX
Biological databases
PPTX
Biological databases
PPTX
NCBI Boot Camp for Beginners Slides
PPT
Biological databases
PDF
Tools and database of NCBI
PPTX
Genomic databases
PPT
Databases
PDF
100505 koenig biological_databases
PPT
Biological Databases
PPT
B.sc biochem i bobi u 2 database
PPT
RML NCBI Resources
PPT
Biodatabases 101220022654-phpapp02
PPTX
Biological database by kk sahu
PPT
Biological databases
DOCX
Major biological nucleotide databases
PPT
Biological data base
PPTX
PPTX
BITS training - UCSC Genome Browser - Part 2
PPT
Biological databases: Challenges in organization and usability
Biological databases
Biological databases
NCBI Boot Camp for Beginners Slides
Biological databases
Tools and database of NCBI
Genomic databases
Databases
100505 koenig biological_databases
Biological Databases
B.sc biochem i bobi u 2 database
RML NCBI Resources
Biodatabases 101220022654-phpapp02
Biological database by kk sahu
Biological databases
Major biological nucleotide databases
Biological data base
BITS training - UCSC Genome Browser - Part 2
Biological databases: Challenges in organization and usability
Ad

Similar to Bioinformatic databases 2 (20)

PDF
PDF文档.pdf
PPT
Bioinformatic_Databases_2.ppt Bioinformatics
PPT
Bioinformatic_Databases and Sequence Analysis
PDF
Biological Database (1)pptxpdfpdfpdf.pdf
PPTX
Biological database ppt(1).pptx Introuction
PPTX
Sequence and Structural Databases of DNA and Protein, and its significance in...
PPTX
Sequence and Structural Databases of DNA and Protein, and its significance in...
PPTX
Introduction to Biological database ppt(1).pptx
PDF
Bioinformatics introduction
PPTX
BIOINFORMATICS BIOLOGICAL DATABASES DATA BASES.pptx
PDF
BIOLOGICAL DATABASE AND ITS TYPES,IMPORTANCE OF BIOLOGICAL DATABASE
PPTX
Biological databases
PPTX
Protein sequence data bases in animals.pptx
PPTX
Introduction to databases.pptx
PPTX
Protein databases in Bioinformatics.pptx
PPTX
Databases_CSS2.pptx
PDF
Bioinformatics: History of Bioinformatics, Components of Bioinformatics, Geno...
PPTX
biological databases.pptx
PPT
Data Base in Bioinformatics.ppt
PPTX
Proteins databases
PDF文档.pdf
Bioinformatic_Databases_2.ppt Bioinformatics
Bioinformatic_Databases and Sequence Analysis
Biological Database (1)pptxpdfpdfpdf.pdf
Biological database ppt(1).pptx Introuction
Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...
Introduction to Biological database ppt(1).pptx
Bioinformatics introduction
BIOINFORMATICS BIOLOGICAL DATABASES DATA BASES.pptx
BIOLOGICAL DATABASE AND ITS TYPES,IMPORTANCE OF BIOLOGICAL DATABASE
Biological databases
Protein sequence data bases in animals.pptx
Introduction to databases.pptx
Protein databases in Bioinformatics.pptx
Databases_CSS2.pptx
Bioinformatics: History of Bioinformatics, Components of Bioinformatics, Geno...
biological databases.pptx
Data Base in Bioinformatics.ppt
Proteins databases
Ad

Recently uploaded (20)

PPTX
Cite It Right: A Compact Illustration of APA 7th Edition.pptx
PDF
Compact First Student's Book Cambridge Official
PDF
The TKT Course. Modules 1, 2, 3.for self study
PPTX
Designing Adaptive Learning Paths in Virtual Learning Environments
PDF
FYJC - Chemistry textbook - standard 11.
PDF
BSc-Zoology-02Sem-DrVijay-Comparative anatomy of vertebrates.pdf
PDF
Hospital Case Study .architecture design
PPTX
Neurological complocations of systemic disease
PPTX
pharmaceutics-1unit-1-221214121936-550b56aa.pptx
PDF
0520_Scheme_of_Work_(for_examination_from_2021).pdf
PPTX
operating_systems_presentations_delhi_nc
PDF
Lecture on Viruses: Structure, Classification, Replication, Effects on Cells,...
PDF
Fun with Grammar (Communicative Activities for the Azar Grammar Series)
PDF
Review of Related Literature & Studies.pdf
PPTX
UNIT_2-__LIPIDS[1].pptx.................
PPT
hemostasis and its significance, physiology
PPTX
Key-Features-of-the-SHS-Program-v4-Slides (3) PPT2.pptx
PPTX
2025 High Blood Pressure Guideline Slide Set.pptx
PPTX
BSCE 2 NIGHT (CHAPTER 2) just cases.pptx
PPTX
Neurology of Systemic disease all systems
Cite It Right: A Compact Illustration of APA 7th Edition.pptx
Compact First Student's Book Cambridge Official
The TKT Course. Modules 1, 2, 3.for self study
Designing Adaptive Learning Paths in Virtual Learning Environments
FYJC - Chemistry textbook - standard 11.
BSc-Zoology-02Sem-DrVijay-Comparative anatomy of vertebrates.pdf
Hospital Case Study .architecture design
Neurological complocations of systemic disease
pharmaceutics-1unit-1-221214121936-550b56aa.pptx
0520_Scheme_of_Work_(for_examination_from_2021).pdf
operating_systems_presentations_delhi_nc
Lecture on Viruses: Structure, Classification, Replication, Effects on Cells,...
Fun with Grammar (Communicative Activities for the Azar Grammar Series)
Review of Related Literature & Studies.pdf
UNIT_2-__LIPIDS[1].pptx.................
hemostasis and its significance, physiology
Key-Features-of-the-SHS-Program-v4-Slides (3) PPT2.pptx
2025 High Blood Pressure Guideline Slide Set.pptx
BSCE 2 NIGHT (CHAPTER 2) just cases.pptx
Neurology of Systemic disease all systems

Bioinformatic databases 2

  • 1. Biological Databases Pharmamatrix Workshop 2010 - Philip Winter - Ishwar V. Hosamani
  • 2. Some databases in the field of molecular biology… AATDB, AceDb, ACUTS, ADB, AFDB, AGIS, AMSdb, ARR, AsDb,BBDB, BCGD,Beanref,Biolmage, BioMagResBank, BIOMDB, BLOCKS, BovGBASE, BOVMAP, BSORF, BTKbase, CANSITE, CarbBank, CARBHYD, CATH, CAZY, CCDC, CD4OLbase, CGAP, ChickGBASE, Colibri, COPE, CottonDB, CSNDB, CUTG, CyanoBase, dbCFC, dbEST, dbSTS, DDBJ, DGP, DictyDb, Picty_cDB, DIP, DOGS, DOMO, DPD, DPlnteract, ECDC, ECGC, EC02DBASE, EcoCyc, EcoGene, EMBL, EMD db, ENZYME, EPD, EpoDB, ESTHER, FlyBase, FlyView, GCRDB, GDB, GENATLAS, Genbank, GeneCards, Genline, GenLink, GENOTK, GenProtEC, GIFTS, GPCRDB, GRAP, GRBase, gRNAsdb, GRR, GSDB, HAEMB, HAMSTERS, HEART-2DPAGE, HEXAdb, HGMD, HIDB, HIDC, HlVdb, HotMolecBase, HOVERGEN, HPDB, HSC-2DPAGE, ICN, ICTVDB, IL2RGbase, IMGT, Kabat, KDNA, KEGG, Klotho, LGIC, MAD, MaizeDb, MDB, Medline, Mendel, MEROPS, MGDB, MGI, MHCPEP5 Micado, MitoDat, MITOMAP, MJDB, MmtDB, Mol-R-Us, MPDB, MRR, MutBase, MycDB, NDB, NRSub, 0-lycBase, OMIA, OMIM, OPD, ORDB, OWL, PAHdb, PatBase, PDB, PDD, Pfam, PhosphoBase, PigBASE, PIR, PKR, PMD, PPDB, PRESAGE, PRINTS, ProDom, Prolysis, PROSITE, PROTOMAP, RatMAP, RDP, REBASE, RGP, SBASE, SCOP, SeqAnaiRef, SGD, SGP, SheepMap, Soybase, SPAD, SRNA db, SRPDB, STACK, StyGene,Sub2D, SubtiList, SWISS-2DPAGE, SWISS-3DIMAGE, SWISS- MODEL Repository, SWISS-PROT, TelDB, TGN, tmRDB, TOPS, TRANSFAC, TRR, UniGene, URNADB, V BASE, VDRR, VectorDB, WDCM, WIT, WormPep, YEPD, YPD, YPM, etc .................. !!!!
  • 3. What we expect from a database..!! • Sequence, functional, structural information, related bibliography • Well Structured and Indexed • Well cross-referenced (with other databases) • Periodically updated • Tools for analysis and visualization
  • 4. Biological Databases • Sequence databases • Structure databases
  • 5. Sequence databases • Nucleotide databases • Protein databases
  • 7. Nucleotide databases • International Nucleotide Sequence Database Collaboration (INSDC) – NCBI – EMBL – DDBJ
  • 8. Standard contents of a sequence database • Sequences • Accession number • References • Taxonomic data • Annotation/curation • Keywords • Cross-references • Documentation
  • 9. NCBI • Very comprehensive biological database • GENBANK: The nucleotide sequence database • Provides 42 different resource • Provides a simple and easy to use web interface http://www.ncbi.nlm.nih.gov/
  • 10. • Sequence submission: done using Bankit or Sequin • Search Engine for data retrieval: Entrez • Retrieves information across all the resources under NCBI Example: PubMed, taxonomy, SNP, PubChem etc.
  • 11. Tools for analysis • BLAST • Primer-BLAST • B-Link • ORF finder • Genome workbench
  • 12. Protein Sequence databases • UniProt • PFAM • Gene Index project
  • 13. UniProt • Universal Protein Resource • Formed through the merger of : – SIB – EBI-SwissProt – TrEMBL – PIR-PSD
  • 14. • Entry names are often the names of the gene followed by the species. • Accession numbers are of the following format: • e.g. P26367 (PAX6_HUMAN)
  • 15. Uniprot features • Blast • Align • Retrieve • ID mapping
  • 16. Pfam • Proteins contain conserved regions • Based on the conserved regions, proteins are classified into families • Provides links to external databases like PDB, SCOP, CATH etc.
  • 17. Pfam: Features • Sequence search • View Pfam family • View a clan • View a sequence • View a structure • Keyword search
  • 18. Gene Indices • Project aimed at indexing genes and their variants in the various genome sequences. • Creating a catalogue of genes in a wide range of organisms • Reduce redundancy
  • 19. Gene Indices Software Tools • TGI Clustering tools • Clview • SeqClean • Cdbfasta/cdbyank
  • 21. • PDB – Protein Data Bank • CATH • SCOP – Structural Classification of Proteins
  • 22. wwPDB • Contains information about experimentally determined structures of proteins, nucleic acids, and complex assemblies • RCSB-PDB, PDBe, PDBj, BMRB – repositories of protein structure data • Files in PDB, mmCIF, PDBML/XML formats
  • 23. • Advanced search – provides comprehensive information about a protein. • Sequence info, domain info, sequence similarity, literature, apart from the details of the structure. • Cross referenced to SCOP and CATH
  • 24. CATH • Classification of proteins based on domain structures • Each protein chopped into individual domains and assigned into homologous superfamilies. • Hierarchial domain classification of PDB entries.
  • 25. CATH hierarchy • Class – derived from secondary structure content is assigned automatically • Architecture – describes gross orientation of secondary structures, independent of connectivity • Topology – clusters structures according to their topological connections and numbers of secondary structures • Homologous superfamily – this level groups together protein domains which are thought to share a common ancestor and can therefore be described as homologous
  • 26. SCOP • Description of structural and evolutionary relationships between all the proteins with known structures • Uses the PDB entries • Search using keywords or PDB identifiers
  • 27. Hierarchy in SCOP • Class • Fold • Superfamily • Family • Species

Editor's Notes

  • #8: Each database exchange data every day. Each database has its own sequence submission and retrieval tools They follow a standardized annotation The Collaboration created a Feature Table Definition that outlines legal features and syntax
  • #10: Currently, NCBI receives and processes about 20,000 direct submission sequences per month, in addition to the approximately 200,000 bulk submissions that are processed automatically. Collaboration with EMBL and DDBJ
  • #11: Database continues to grow at exponential rate. Doubling in size every 10 months Has sequences of 250,000 distinct organisms
  • #12: All tools can be downloaded and used on your local workstations as standalone.
  • #19: The goal of this project is ultimately to represent a non-redundant view of all human genes and data on their expression patterns, cellular roles, functions, and evolutionary relationships. The database will also include links to genomic sequences, mapping data, 3D structures, and literature references