Biological Databases
Pharmamatrix Workshop 2010
- Philip Winter
- Ishwar V. Hosamani
Some databases in the field of molecular biology…
AATDB, AceDb, ACUTS, ADB, AFDB, AGIS, AMSdb,
ARR, AsDb,BBDB, BCGD,Beanref,Biolmage,
BioMagResBank, BIOMDB, BLOCKS, BovGBASE,
BOVMAP, BSORF, BTKbase, CANSITE, CarbBank,
CARBHYD, CATH, CAZY, CCDC, CD4OLbase, CGAP,
ChickGBASE, Colibri, COPE, CottonDB, CSNDB, CUTG,
CyanoBase, dbCFC, dbEST, dbSTS, DDBJ, DGP, DictyDb,
Picty_cDB, DIP, DOGS, DOMO, DPD, DPlnteract, ECDC,
ECGC, EC02DBASE, EcoCyc, EcoGene, EMBL, EMD db,
ENZYME, EPD, EpoDB, ESTHER, FlyBase, FlyView,
GCRDB, GDB, GENATLAS, Genbank, GeneCards,
Genline, GenLink, GENOTK, GenProtEC, GIFTS,
GPCRDB, GRAP, GRBase, gRNAsdb, GRR, GSDB,
HAEMB, HAMSTERS, HEART-2DPAGE, HEXAdb, HGMD,
HIDB, HIDC, HlVdb, HotMolecBase, HOVERGEN, HPDB,
HSC-2DPAGE, ICN, ICTVDB, IL2RGbase, IMGT, Kabat,
KDNA, KEGG, Klotho, LGIC, MAD, MaizeDb, MDB,
Medline, Mendel, MEROPS, MGDB, MGI, MHCPEP5
Micado, MitoDat, MITOMAP, MJDB, MmtDB, Mol-R-Us,
MPDB, MRR, MutBase, MycDB, NDB, NRSub, 0-lycBase,
OMIA, OMIM, OPD, ORDB, OWL, PAHdb, PatBase, PDB,
PDD, Pfam, PhosphoBase, PigBASE, PIR, PKR, PMD,
PPDB, PRESAGE, PRINTS, ProDom, Prolysis, PROSITE,
PROTOMAP, RatMAP, RDP, REBASE, RGP, SBASE,
SCOP, SeqAnaiRef, SGD, SGP, SheepMap, Soybase,
SPAD, SRNA db, SRPDB, STACK, StyGene,Sub2D,
SubtiList, SWISS-2DPAGE, SWISS-3DIMAGE, SWISS-
MODEL Repository, SWISS-PROT, TelDB, TGN, tmRDB,
TOPS, TRANSFAC, TRR, UniGene, URNADB, V BASE,
VDRR, VectorDB, WDCM, WIT, WormPep, YEPD, YPD,
YPM, etc .................. !!!!
What we expect from a database..!!
• Sequence, functional, structural information,
related bibliography
• Well Structured and Indexed
• Well cross-referenced (with other databases)
• Periodically updated
• Tools for analysis and visualization
Biological Databases
• Sequence databases
• Structure databases
Sequence databases
• Nucleotide databases
• Protein databases
Sequence databases
Nucleotide databases
• International Nucleotide Sequence
Database Collaboration (INSDC)
– NCBI
– EMBL
– DDBJ
Standard contents of a sequence
database
• Sequences
• Accession number
• References
• Taxonomic data
• Annotation/curation
• Keywords
• Cross-references
• Documentation
NCBI
• Very comprehensive biological database
• GENBANK: The nucleotide sequence database
• Provides 42 different resource
• Provides a simple and easy to use web
interface
http://www.ncbi.nlm.nih.gov/
• Sequence submission: done using Bankit or
Sequin
• Search Engine for data retrieval: Entrez
• Retrieves information across all the resources
under NCBI
Example: PubMed, taxonomy, SNP, PubChem
etc.
Tools for analysis
• BLAST
• Primer-BLAST
• B-Link
• ORF finder
• Genome workbench
Protein Sequence databases
• UniProt
• PFAM
• Gene Index project
UniProt
• Universal Protein Resource
• Formed through the merger of :
– SIB
– EBI-SwissProt
– TrEMBL
– PIR-PSD
• Entry names are often the names of the gene
followed by the species.
• Accession numbers are of the following
format:
• e.g. P26367 (PAX6_HUMAN)
Uniprot features
• Blast
• Align
• Retrieve
• ID mapping
Pfam
• Proteins contain conserved regions
• Based on the conserved regions, proteins are
classified into families
• Provides links to external databases like PDB,
SCOP, CATH etc.
Pfam: Features
• Sequence search
• View Pfam family
• View a clan
• View a sequence
• View a structure
• Keyword search
Gene Indices
• Project aimed at indexing genes and their
variants in the various genome sequences.
• Creating a catalogue of genes in a wide range
of organisms
• Reduce redundancy
Gene Indices Software Tools
• TGI Clustering tools
• Clview
• SeqClean
• Cdbfasta/cdbyank
Structural databases
• PDB – Protein Data Bank
• CATH
• SCOP – Structural Classification of Proteins
wwPDB
• Contains information about experimentally
determined structures of proteins, nucleic
acids, and complex assemblies
• RCSB-PDB, PDBe, PDBj, BMRB – repositories of
protein structure data
• Files in PDB, mmCIF, PDBML/XML formats
• Advanced search – provides comprehensive
information about a protein.
• Sequence info, domain info, sequence
similarity, literature, apart from the details of
the structure.
• Cross referenced to SCOP and CATH
CATH
• Classification of proteins based on domain
structures
• Each protein chopped into individual domains
and assigned into homologous superfamilies.
• Hierarchial domain classification of PDB
entries.
CATH hierarchy
• Class – derived from secondary structure content is assigned
automatically
• Architecture – describes gross orientation of secondary
structures, independent of connectivity
• Topology – clusters structures according to their
topological connections and numbers of secondary
structures
• Homologous superfamily – this level groups
together protein domains which are thought to
share a common ancestor and can therefore be
described as homologous
SCOP
• Description of structural and evolutionary
relationships between all the proteins with
known structures
• Uses the PDB entries
• Search using keywords or PDB identifiers
Hierarchy in SCOP
• Class
• Fold
• Superfamily
• Family
• Species
Thank you

More Related Content

PDF
PDF文档.pdf
PPT
Bioinformatic_Databases_2.ppt Bioinformatics
PPT
Bioinformatic_Databases and Sequence Analysis
PDF
Biological Database (1)pptxpdfpdfpdf.pdf
PPTX
Biological database ppt(1).pptx Introuction
PPTX
Sequence and Structural Databases of DNA and Protein, and its significance in...
PPTX
Sequence and Structural Databases of DNA and Protein, and its significance in...
PPTX
Introduction to Biological database ppt(1).pptx
PDF文档.pdf
Bioinformatic_Databases_2.ppt Bioinformatics
Bioinformatic_Databases and Sequence Analysis
Biological Database (1)pptxpdfpdfpdf.pdf
Biological database ppt(1).pptx Introuction
Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...
Introduction to Biological database ppt(1).pptx

Similar to Bioinformatic_Databases_2xcxzczxcxzxcxzc (20)

PDF
Bioinformatics introduction
PPTX
BIOINFORMATICS BIOLOGICAL DATABASES DATA BASES.pptx
PDF
BIOLOGICAL DATABASE AND ITS TYPES,IMPORTANCE OF BIOLOGICAL DATABASE
PPTX
Biological databases
PPTX
Protein sequence data bases in animals.pptx
PPTX
Introduction to databases.pptx
PPTX
Protein databases in Bioinformatics.pptx
PDF
BITS: Overview of important biological databases beyond sequences
PPTX
Databases_CSS2.pptx
PDF
Bioinformatics: History of Bioinformatics, Components of Bioinformatics, Geno...
PPTX
biological databases.pptx
PPT
Data Base in Bioinformatics.ppt
PPTX
Proteins databases
PPTX
Biological database
PPT
bioinfomatics
PPTX
Protein database
PPT
Bioinformatics and Databases in Biological Science
PPT
Project report-on-bio-informatics
PPT
Intro to databases
PPTX
Introduction to bioinformatics.pptx
Bioinformatics introduction
BIOINFORMATICS BIOLOGICAL DATABASES DATA BASES.pptx
BIOLOGICAL DATABASE AND ITS TYPES,IMPORTANCE OF BIOLOGICAL DATABASE
Biological databases
Protein sequence data bases in animals.pptx
Introduction to databases.pptx
Protein databases in Bioinformatics.pptx
BITS: Overview of important biological databases beyond sequences
Databases_CSS2.pptx
Bioinformatics: History of Bioinformatics, Components of Bioinformatics, Geno...
biological databases.pptx
Data Base in Bioinformatics.ppt
Proteins databases
Biological database
bioinfomatics
Protein database
Bioinformatics and Databases in Biological Science
Project report-on-bio-informatics
Intro to databases
Introduction to bioinformatics.pptx
Ad

Recently uploaded (20)

PPTX
Critical Issues in Periodontal Research- An overview
PPTX
ANTI BIOTICS. SULPHONAMIDES,QUINOLONES.pptx
PDF
Nematodes - by Sanjan PV 20-52.pdf based on all aspects
PDF
heliotherapy- types and advantages procedure
PPTX
etomidate and ketamine action mechanism.pptx
PPTX
Biostatistics Lecture Notes_Dadason.pptx
PPTX
FORENSIC MEDICINE and branches of forensic medicine.pptx
PPSX
Man & Medicine power point presentation for the first year MBBS students
PPTX
Genetics and health: study of genes and their roles in inheritance
PPTX
Peripheral Arterial Diseases PAD-WPS Office.pptx
PPTX
Indications for Surgical Delivery...pptx
PPTX
Nutrition needs in a Surgical Patient.pptx
PDF
Diabetes mellitus - AMBOSS.pdf
PDF
Tackling Intensified Climatic Civil and Meteorological Aviation Weather Chall...
PPTX
Type 2 Diabetes Mellitus (T2DM) Part 3 v2.pptx
PPTX
Acute Abdomen and its management updates.pptx
PPTX
gut microbiomes AND Type 2 diabetes.pptx
PDF
NCCN CANCER TESTICULAR 2024 ...............................
PPTX
1.-THEORETICAL-FOUNDATIONS-IN-NURSING_084023.pptx
PDF
Integrating Traditional Medicine with Modern Engineering Solutions (www.kiu....
Critical Issues in Periodontal Research- An overview
ANTI BIOTICS. SULPHONAMIDES,QUINOLONES.pptx
Nematodes - by Sanjan PV 20-52.pdf based on all aspects
heliotherapy- types and advantages procedure
etomidate and ketamine action mechanism.pptx
Biostatistics Lecture Notes_Dadason.pptx
FORENSIC MEDICINE and branches of forensic medicine.pptx
Man & Medicine power point presentation for the first year MBBS students
Genetics and health: study of genes and their roles in inheritance
Peripheral Arterial Diseases PAD-WPS Office.pptx
Indications for Surgical Delivery...pptx
Nutrition needs in a Surgical Patient.pptx
Diabetes mellitus - AMBOSS.pdf
Tackling Intensified Climatic Civil and Meteorological Aviation Weather Chall...
Type 2 Diabetes Mellitus (T2DM) Part 3 v2.pptx
Acute Abdomen and its management updates.pptx
gut microbiomes AND Type 2 diabetes.pptx
NCCN CANCER TESTICULAR 2024 ...............................
1.-THEORETICAL-FOUNDATIONS-IN-NURSING_084023.pptx
Integrating Traditional Medicine with Modern Engineering Solutions (www.kiu....
Ad

Bioinformatic_Databases_2xcxzczxcxzxcxzc

  • 1. Biological Databases Pharmamatrix Workshop 2010 - Philip Winter - Ishwar V. Hosamani
  • 2. Some databases in the field of molecular biology… AATDB, AceDb, ACUTS, ADB, AFDB, AGIS, AMSdb, ARR, AsDb,BBDB, BCGD,Beanref,Biolmage, BioMagResBank, BIOMDB, BLOCKS, BovGBASE, BOVMAP, BSORF, BTKbase, CANSITE, CarbBank, CARBHYD, CATH, CAZY, CCDC, CD4OLbase, CGAP, ChickGBASE, Colibri, COPE, CottonDB, CSNDB, CUTG, CyanoBase, dbCFC, dbEST, dbSTS, DDBJ, DGP, DictyDb, Picty_cDB, DIP, DOGS, DOMO, DPD, DPlnteract, ECDC, ECGC, EC02DBASE, EcoCyc, EcoGene, EMBL, EMD db, ENZYME, EPD, EpoDB, ESTHER, FlyBase, FlyView, GCRDB, GDB, GENATLAS, Genbank, GeneCards, Genline, GenLink, GENOTK, GenProtEC, GIFTS, GPCRDB, GRAP, GRBase, gRNAsdb, GRR, GSDB, HAEMB, HAMSTERS, HEART-2DPAGE, HEXAdb, HGMD, HIDB, HIDC, HlVdb, HotMolecBase, HOVERGEN, HPDB, HSC-2DPAGE, ICN, ICTVDB, IL2RGbase, IMGT, Kabat, KDNA, KEGG, Klotho, LGIC, MAD, MaizeDb, MDB, Medline, Mendel, MEROPS, MGDB, MGI, MHCPEP5 Micado, MitoDat, MITOMAP, MJDB, MmtDB, Mol-R-Us, MPDB, MRR, MutBase, MycDB, NDB, NRSub, 0-lycBase, OMIA, OMIM, OPD, ORDB, OWL, PAHdb, PatBase, PDB, PDD, Pfam, PhosphoBase, PigBASE, PIR, PKR, PMD, PPDB, PRESAGE, PRINTS, ProDom, Prolysis, PROSITE, PROTOMAP, RatMAP, RDP, REBASE, RGP, SBASE, SCOP, SeqAnaiRef, SGD, SGP, SheepMap, Soybase, SPAD, SRNA db, SRPDB, STACK, StyGene,Sub2D, SubtiList, SWISS-2DPAGE, SWISS-3DIMAGE, SWISS- MODEL Repository, SWISS-PROT, TelDB, TGN, tmRDB, TOPS, TRANSFAC, TRR, UniGene, URNADB, V BASE, VDRR, VectorDB, WDCM, WIT, WormPep, YEPD, YPD, YPM, etc .................. !!!!
  • 3. What we expect from a database..!! • Sequence, functional, structural information, related bibliography • Well Structured and Indexed • Well cross-referenced (with other databases) • Periodically updated • Tools for analysis and visualization
  • 4. Biological Databases • Sequence databases • Structure databases
  • 5. Sequence databases • Nucleotide databases • Protein databases
  • 7. Nucleotide databases • International Nucleotide Sequence Database Collaboration (INSDC) – NCBI – EMBL – DDBJ
  • 8. Standard contents of a sequence database • Sequences • Accession number • References • Taxonomic data • Annotation/curation • Keywords • Cross-references • Documentation
  • 9. NCBI • Very comprehensive biological database • GENBANK: The nucleotide sequence database • Provides 42 different resource • Provides a simple and easy to use web interface http://www.ncbi.nlm.nih.gov/
  • 10. • Sequence submission: done using Bankit or Sequin • Search Engine for data retrieval: Entrez • Retrieves information across all the resources under NCBI Example: PubMed, taxonomy, SNP, PubChem etc.
  • 11. Tools for analysis • BLAST • Primer-BLAST • B-Link • ORF finder • Genome workbench
  • 12. Protein Sequence databases • UniProt • PFAM • Gene Index project
  • 13. UniProt • Universal Protein Resource • Formed through the merger of : – SIB – EBI-SwissProt – TrEMBL – PIR-PSD
  • 14. • Entry names are often the names of the gene followed by the species. • Accession numbers are of the following format: • e.g. P26367 (PAX6_HUMAN)
  • 15. Uniprot features • Blast • Align • Retrieve • ID mapping
  • 16. Pfam • Proteins contain conserved regions • Based on the conserved regions, proteins are classified into families • Provides links to external databases like PDB, SCOP, CATH etc.
  • 17. Pfam: Features • Sequence search • View Pfam family • View a clan • View a sequence • View a structure • Keyword search
  • 18. Gene Indices • Project aimed at indexing genes and their variants in the various genome sequences. • Creating a catalogue of genes in a wide range of organisms • Reduce redundancy
  • 19. Gene Indices Software Tools • TGI Clustering tools • Clview • SeqClean • Cdbfasta/cdbyank
  • 21. • PDB – Protein Data Bank • CATH • SCOP – Structural Classification of Proteins
  • 22. wwPDB • Contains information about experimentally determined structures of proteins, nucleic acids, and complex assemblies • RCSB-PDB, PDBe, PDBj, BMRB – repositories of protein structure data • Files in PDB, mmCIF, PDBML/XML formats
  • 23. • Advanced search – provides comprehensive information about a protein. • Sequence info, domain info, sequence similarity, literature, apart from the details of the structure. • Cross referenced to SCOP and CATH
  • 24. CATH • Classification of proteins based on domain structures • Each protein chopped into individual domains and assigned into homologous superfamilies. • Hierarchial domain classification of PDB entries.
  • 25. CATH hierarchy • Class – derived from secondary structure content is assigned automatically • Architecture – describes gross orientation of secondary structures, independent of connectivity • Topology – clusters structures according to their topological connections and numbers of secondary structures • Homologous superfamily – this level groups together protein domains which are thought to share a common ancestor and can therefore be described as homologous
  • 26. SCOP • Description of structural and evolutionary relationships between all the proteins with known structures • Uses the PDB entries • Search using keywords or PDB identifiers
  • 27. Hierarchy in SCOP • Class • Fold • Superfamily • Family • Species

Editor's Notes

  • #8: Each database exchange data every day. Each database has its own sequence submission and retrieval tools They follow a standardized annotation The Collaboration created a Feature Table Definition that outlines legal features and syntax
  • #10: Currently, NCBI receives and processes about 20,000 direct submission sequences per month, in addition to the approximately 200,000 bulk submissions that are processed automatically. Collaboration with EMBL and DDBJ
  • #11: Database continues to grow at exponential rate. Doubling in size every 10 months Has sequences of 250,000 distinct organisms
  • #12: All tools can be downloaded and used on your local workstations as standalone.
  • #19: The goal of this project is ultimately to represent a non-redundant view of all human genes and data on their expression patterns, cellular roles, functions, and evolutionary relationships. The database will also include links to genomic sequences, mapping data, 3D structures, and literature references