0% found this document useful (0 votes)
2 views22 pages

Lec (3) - Protein_databases

The document discusses protein classification methods based on sequence and structural similarities, detailing categories such as subfamilies, families, and superfamilies. It covers protein domains, sequence features like motifs and repeats, and introduces protein signatures used for classification. Additionally, it lists various online protein databases and resources, including NCBI, UniProtKB, Pfam, SMART, ExPASY, PIR, and InterPro.

Uploaded by

Alkadafe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views22 pages

Lec (3) - Protein_databases

The document discusses protein classification methods based on sequence and structural similarities, detailing categories such as subfamilies, families, and superfamilies. It covers protein domains, sequence features like motifs and repeats, and introduces protein signatures used for classification. Additionally, it lists various online protein databases and resources, including NCBI, UniProtKB, Pfam, SMART, ExPASY, PIR, and InterPro.

Uploaded by

Alkadafe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 22

Protein

Databases
Protein Classification
Concepts
• Classification methods
group proteins based on:
-Sequence similarity
- Structural similarity
Proteins can be classified into
different groups based on:
 The families to which they
belong
 The domains they contain
 The sequence features they
possess
Protein Classification
Subfamily
(small group of
closely
related proteins)
Family
( Group of evolutionarily related
proteins that share one or more
domains/repeats

Superfamily
( large group of distantly related
proteins )
Protein Domains
• Domain
- Discrete structural unit
that is assumed to fold
independently of the
rest of the protein and
to have its own function.
- Similar domains can be
found in proteins with
different functions
Protein Sequence Features
• Motifs
- Short conserved regions and
frequently are the most conserved
regions of a domain. Motifs are critical
for the domain to function – in
enzymes,
for example, they contain the active
sites
Protein Sequence Features
• Repeat
- Stretch of amino acid
sequence that gets
repeated a number of
times along the length of
the sequence. Many
domains are constituted
from repeats
- Repeats may contain
binding sites and
contribute to structural
properties of the protein
Protein Sequence Features
• Consensus site/post-translation modification
site (PTM)
- A conserved position(s) among homologous
sequences. Position can be theoretically
modified, for example, by phosphorylation or
glycosylation.
An asparagine followed by any amino acid
followed by serine or threonine, for example, is
a
consensus site for N-linked glycosylation
Protein Signatures
• Protein signature are
computational models used to
classify protein properties:
- Protein families
- Domains
-Conserved sites
- Protein sequence features
Protein Resources
• A variety of protein resources online
• Several websites/resources dedicated
to
providing a single interface to multiple
resources.
Protein Databases
• Sequence and information databases
 NCBI Protein Database –
contains protein sequences from
GenBank, RefSeq , as well as
records from SwissProt, PIR, PRF, and
PDB
 EBI - UniProtKB – the “Protein knowledgebase”,
a comprehensive set of protein sequences.
Functional information on proteins, with
accurate, consistent, and rich annotation, the
amino acid sequence, protein name or
description, taxonomic data and citation
information.
Divided into two parts: Swiss-Prot and TrEMBL
Protein Databases
Protein resources :
Pfam
• Collection of protein families and domains
• Represented by
- Multiple sequence alignments
- Hidden Markov Models (HMMs)
• Two components to Pfam:
– Pfam-A entries: High quality, manually curated
families
– Pfam-B entries: Automatically generated
SMART
• Simple Modular Architecture Research Tool
- Identification and annotation of protein
domains
- Analysis of protein domain architectures
- Manually curated models for the prediction of
protein domains
- http://smart.embl-heidelberg.de
ExPASY (https://www.expasy.org/)
• Expasy (Swiss Institute of Bioinformatics)
- UniProt, PROSITE, homology modelling,
docking,many other tools doing protein
sequences and identication, mass
spectrometry and 2-DE data, protein
characterisation and function families,
patterns and profiles, post-translational
modication, protein structure, protein-protein
interaction, similarity search/alignment, drug
design, molecular modelling
Protein Information Resource
• PIR
- Protein ontology
- ProClass: Reports for UniProtKB
- ProLink: Literature, Text Mining
-http://pir.georgetown.edu/
InterPro
• Designed to integrate signature
databases
- Protein families, domain and
functional sites
- http://www.ebi.ac.uk/interpro/
Uniprot – Example SGLT1 protein

You might also like