.

 
PROTEINS
ON THIS PAGE:
BASIC DATA: PROTEIN DATABASES and SEARCH ENGINES

SIGNAL and MODIFICATION SITES -- Search and Prediction

DOMAINS and MOTIFS -- Search and Prediction
PROTEIN STRUCTURE - PREDICTION

PROTEINS LISTED BY CLASSES:

PROTEIN-PROTEIN INTERACTIONS

INDIVIDUAL PROTEINS

MISC. LINKS

NEWSGROUPS


 
 
BASIC DATA

 
 
PROTEIN DATABASES and SEARCH ENGINES
  • ATLAS The Atlas Retrieval System is a mutlidatabase (~24 Databases) information retrieval program specifically designed to access macromolecular sequence databases.
  • BCM Search Launcher: 
  • COG Clusters of Orthologous Groups of proteins were delineated by comparing protein sequences encoded in 21 complete genomes (including Saccharomyces cerevisiae, Escherichia coli, and other microbes) representing 17 major phylogenetic lineages. Each COG consists of individual proteins or groups of paralogs from at least 3 lineages and thus corresponds to an ancient conserved  domain
  • ENTREZ Protein sequence database, NCBI
  • MIPS, Munich Information Centre for Protein Sequences.
  • OWL Web database is a non-redundant protein sequence database produced from the following source databases: SWISSPROT, PIR, GenBank translations, and NRL-3D.
  • PDB, The Protein Data Bank is the single international repository for the processing and distribution of 3-D macromolecular structure data primarily determined experimentally by X-ray crystallography and NMR.
  • Pfam is a large collection of multiple sequence alignments and hidden Markov models covering many common protein domains. Version 5.5 of Pfam (Sept 2000) contains alignments and models for 2478 protein families, based on the Swissprot 38 and SP-TrEMBL 11 protein sequence databases.
  • PIR The Protein Information Resource, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japanese International Protein Sequence Database (JIPID) maintains the PIR-International Protein Sequence Database -- a comprehensive, annotated, and non-redundant protein sequence database in which entries are classified into family groups and alignments of each group are available.
  • PredictProtein is a service for sequence analysis, and structure prediction.
  • PROSITE is a database of protein families and domains. It consists of biologically significant sites, patterns and profiles that help to reliably identify to which known protein family (if any) a new sequence belongs
  • Proteome, Inc. provides knowledge resources combining information on the proteins and genes of human, mouse and rat, and several model organisms.
  • PROW, Protein Reviews On the Web is an online resource that features PROW Guides, authoritative short, structured reviews on proteins and protein families. The Guides provide approximately 20 standardized categories of information (abstract, biochemical function, ligands, references, etc.) for each protein.
  • PROWL - a resource for protein chemistry and mass spectrometry
  • SRS - Sequence Retrieval System from EBI
  • SWISS-PROT is a curated protein sequence database which strives to provide a high level of annotations (such as the description of the function of a protein, its domains structure, post-translational modifications, variants, etc.), a minimal level of redundancy and high level of integration with other databases

 
 
PROTEIN SIGNAL and MODIFICATION SITES -- Search and Prediction
  • signal, localization, targeting, sorting, and cleavage
    • ChloroP Chloroplast transit peptides and their cleavage sites in plant proteins.
    • Predotar (Predotar for Prediction of organelle targeting sequences) is a neural-network-based prediction program capable of identifying ER signal peptides and mitochondrial or plastid transit peptides.
    • NetPicoRNA Posttranslational cleavage by picornaviral proteases
    • NetChop Proteasomal cleavages (MHC ligands)
    • TargetP Subcellular location of proteins: mitochondrial, chloroplastic, secretory pathway, or other
    • SignalP Signal peptide and cleavage sites in gram+, gram- and eukaryotic amino acid sequences
    • SMART A list of major signaling, nuclear, extracellular, and other domains (scroll down the page) identified by Simple Modular Architecture Research Tool: Identification of signalling domains and genetically mobile domains.
    • SIGSCAN Find and list homologies of published signal sequences with the input DNA sequence.
    • PSORT is a computer program for the prediction of protein sorting and localization sites. Input is an amino acid sequence.
      • PSORT (Old version; for bacterial/plant sequences)
      • PSORT II (Recommended for animal/yeast sequences)
    • MitoProt calculates the N-terminal protein region that can support a Mitochondrial Targeting Sequence and the cleavage site.
    • PESTfind (alternative site) prediction of PEST proteolytic signals, which are polypeptide sequences enriched in Proline (P), glutamic acid (E), serine (S) and threonine (T) and target proteins for rapid destruction. The algorithm defines PEST sequences as hydrophilic stretches of amino acids greater than or equal to 12 residues in length. Such regions contain at least one P, one E or D and one S or T. They are flanked by lysine (K), arginine (R) or histidine (H) residues, but positively charged residues are disallowed within the PEST sequence.
  • post-translational modifications, all types
    • RESID is a database of protein post-translational modifications with descriptive, chemical, structural and bibliographic information.
  • glycosylation
    • DictyOGlyc O-(alpha)-GlcNAc glycosylation sites (trained on Dictyostelium discoideum proteins)
    • NetOGlyc O-GalNAc (mucin type) glycosylation sites in mammalian proteins.
    • YinOYang O-(beta)-GlcNAc glycosylation and Yin-Yang sites (intracellular/nuclear proteins)
  • phosphorylation
    • NetPhos Serine, threonine and tyrosine phosphorylation sites in eukaryotic proteins
    • PhosphoSite A database dedicated to in vivo protein phosphorylation sites in Human and Mouse proteins
    • PhosphoBase - A database of phosphorylation sites
  • SUMOylation
    • SUMOplot™ can help you to explain larger MWs than expected on SDS gels due to attachment of SUMO protein (11kDa) at multiple positions of your protein.
      SUMO-1 (small ubiquitin-related modifier; also known as PIC1, UBL1, Sentrin, GMP1, and Smt3) is a member of the ubiquitin and ubiquitin-like superfamily. Most SUMO-modified proteins contain the tetrapeptide motif B-K-x-D/E where B is a hydrophobic residue, K is the lysine conjugated to SUMO, x is any amino acid (aa), D or E is an acidic residue. Substrate specificity appears to be derived directly from Ubc9 and the respective substrate motif. SUMOplot™ is predicting the probability for the SUMO consensus sequence (SUMO-CS) to be engaged in SUMO attachment. The SUMOplot™ score system is based on two criteria: 1) direct amino acid match to the SUMO-CS observed and shown to bind Ubc9, and 2) substitution of the consensus amino acid residues with amino acid residues exhibiting similar hydrophobicity.
  • membrane tethering

 
 
PROTEIN DOMAINS and MOTIFS -- Search and Prediction
  • Blocks WWW Server Blocks are multiply aligned ungapped segments corresponding to the most highly conserved regions of proteins. The blocks for the Blocks Database are made automatically by looking for the most highly conserved regions in groups of proteins documented in the Prosite Database. The Prosite pattern for a protein group is not used in any way to make the Blocks Database and the pattern may or may not be contained in one of the blocks representing a group.
  • DOMO  is a database of homologous protein domain families. It was based on successive sequence analysis steps including similarity search, domain delineation, multiple sequence alignment and motif construction. 83054 non redundant protein sequences from SWISSPROT and PIR have been analysed yielding a database of 99058 domains clustered into 8877 multiple sequence alignments
  • InterPro, Integrated resource of Protein Families, Domains and Sites. InterPro release 2.0 (October 2000) was built from Pfam 5.5, PRINTS 27.0, PROSITE 16.25,  ProDom 2000.1 and the current SWISS-PROT + TrEMBL data. This release of InterPro  contains 3204 entries, representing 767 domains, 2372 families, 50 repeats and 15 post-translational modification sites.
  • HLA Peptide Binding Predictions
  • Ig domains
  • MAST -- Motif Alignment and Search Tool. MAST is a tool for searching biological sequence databases for sequences that contain one or more of a group of known motifs.
  • MEME  is a tool for discovering motifs in a group of related DNA or protein sequences. A motif is a sequence pattern that occurs repeatedly in a group of related protein or DNA sequences. MEME represents motifs as position-dependent letter-probability matrices which describe the probability of each possible letter at each position in the pattern. Individual MEME motifs do not contain gaps. Patterns with variable-length gaps are split by MEME into two or more separate motifs.
  • MOTIF Searching Protein and Nucleic Acid Sequence Motifs
  • PESTfind prediction of PEST proteolytic signals, which are polypeptide sequences enriched in Proline (P), glutamic acid (E), serine (S) and threonine (T) and target proteins for rapid destruction. The algorithm defines PEST sequences as hydrophilic stretches of amino acids greater than or equal to 12 residues in length. Such regions contain at least one P, one E or D and one S or T. They are flanked by lysine (K), arginine (R) or histidine (H) residues, but positively charged residues are disallowed within the PEST sequence.
  • PRINTS is a compendium of protein fingerprints. A fingerprint is a group of conserved motifs used to characterise a protein family; its diagnostic power is refined by iterative scanning of OWL. Usually the motifs do not overlap, but are separated along a sequence, though they may be contiguous in 3D-space. Fingerprints can encode protein folds and functionalities more flexibly and powerfully than can single motifs: the database thus provides a useful adjunct to PROSITE.
  • ProDom The Protein Domain database consists of an automatic compilation of homologous domains. Current versions of ProDom are built using a novel procedure based on recursive PSI-BLAST searches. Large families are much better processed with this new procedure than with the former DOMAINER program
  • PROSITE is a database of protein families and domains. It consists of biologically significant sites, patterns and profiles that help to reliably identify to which known protein family (if any) a new sequence belongs
  • SMART  Simple Modular Architecture Research Tool allows the identification and annotation of genetically mobile domains and the analysis of domain architectures. More than 500 domain families found in signalling, extracellular and chromatin-associated proteins are detectable.
  • transmembrane domains
    • DAS - Transmembrane Prediction server. Based on publication: M. Cserzo, E. Wallin, I. Simon, G. von Heijne and A. Elofsson: Prediction of transmembrane alpha-helices in procariotic membrane proteins: the Dense Alignment Surface method; Prot. Eng. vol. 10, no. 6, 673-676, 1997
    • HMMTOP  is an automatic server for predicting transmembrane helices and topology of proteins. The method used by this prediction server is described in G.E Tusnády and I. Simon (1998) Principles Governing Amino Acid Composition of Integral Membrane Proteins: Applications to Topology Prediction. J. Mol. Biol. 283, 489-506.
    • TMAP  Membrane protein predictions using multiple sequence alignment form Department of Medical Biochemistry and Biophysics of Karolinska Institutet 
      • If you do not have access to a multiple sequence alignment, there is a single sequence input form available. However, keep in mind that such predictions are not as reliable as those based upon multiply aligned sequences.
    • TMHMM This server is for prediction of transmembrane helices in proteins. 
      • Version 1 
      • Version 2 is very similar to version one, but it builds on a new model, so predictions are not identical.  
    • TMpred program makes a prediction of membrane-spanning regions and their orientation. The algorithm is based on the statistical analysis of TMbase, a database of naturally occuring transmembrane proteins. The prediction is made using a combination of several weight-matrices for scoring.
    • TopPred2 Topology prediction of membrane proteins. Based on Publication: "Membrane Protein Structure Prediction, Hydrophobicity Analysis and the Positive-inside Rule", Gunnar von Heijne, J. Mol. Biol. (1992) 225, 487-494

 
 
PROTEIN STRUCTURE -- PREDICTION
    • Protein Structure Prediction Center. Biology and Biotechnology Research Program Lawrence Livermore National Laboratory, Livermore, California, USA
    • 3D-PSSM Web Server. A Fast, Web-based Method for Protein Fold Recognition using 1D and 3D Sequence Profiles coupled with Secondary Structure and Solvation Potential Information.
    • PSIpred, a protein structure prediction server allows to submit a protein sequence, perform a prediction of your choice and receive the results of the prediction via E-mail.
    • UCSC HMM Applications:
      • HMM-based Protein Sequence Analysis, SAM-T99, protein database query, and secondary structure prediction  Submit a protein sequence (or alignment) in FASTA format and receive SAM-T99 alignment, HMM, database hits, and secondary structure prediction.
    • EVA: EValuation of Automatic protein structure prediction. 
  • Portals:

 
 
PROTEINS LISTED BY  CLASSES:
  • by sequence homology, i.e. protein families --  go to page  "Gene and Protein Families" 
    • Family and superfamily, PIR  Proteins are clustered into homeomorphic protein families if they have 50% sequence identity. Protein families are further clustered into protein superfamilies if they have ~30% sequence identity
    • GeneFIND (Gene Family Identification Network Design) is an integrated database search system that combines several search/alignment tools and ProClass database to provide rapid and accurate gene family classification with enriched family information.
    • Library of Protein Family Cores
    • PROSITE is a database of protein families and domains. It consists of biologically significant sites, patterns and profiles that help to reliably identify to which known protein family (if any) a new sequence belongs
    • PROTFAM  Protein Classification Browsers, from MIPS
  • by structure:
    • FSSP -- Fold classification based on Structure-Structure alignment of Proteins
    • Dali server is a network service for comparing protein structures in 3D.
      • Dali Domain Dictionary is a structural classification of protein domains. Domains are delineated automatically optimising topological recurrence between large, compact units in the known protein structures
    • Library of Protein Family Cores
    • PDB, The Protein Data Bank is the single international repository for the processing and distribution of 3-D macromolecular structure data primarily determined experimentally by X-ray crystallography and NMR. 
    • SCOP, Structural Classification of Proteins
    • Structural Classification of Proteins, Glossary from CATH
  • by location:
  • by organism:
    • yeast protein classes
    • WormPep contains the predicted proteins from the Caenorhabditis elegans genome sequencing project. The current Wormpep33 database (released 15/11/2000), contains 8,596,659 residues in 19,705 protein sequences (including 405 splice variants). Wormpep33 is based on the current WS23 release of the C. elegans AceDB database.
    • HoBacGen: Homologous Bacterial Genes Database  HOBACGEN is a database system that contains all the protein sequences of bacteria organized into families. It allows one to select sets of homologous genes from bacterial species and to visualize multiple alignments and phylogenetic trees. Thus HOBACGEN is particularly useful for comparative genomics, phylogeny and molecular evolution studies on bacteria.
  • by function:
  • by biochemical properties:
    • enzymes, all classes:
      • BRENDA  -- The Comprehensive Enzyme Information System
      • ENZYME  -- Enzyme nomenclature database. ENZYME is a repository of information relative to the nomenclature of enzymes. It is primarily based on the recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (IUBMB) and it describes each type of characterized enzyme for which an EC (Enzyme Commission) number has been provided.
      • LIGAND -- Database for enzymes, compounds, and reactions
      • Worthington Enzyme Manual  -- enzymes and related biochemicals
    • restriction enzymes
    • proteases, peptidases
    • glycoproteins, proteoglycans, and carbohydrate active enzymes
      • Proteoglycans
      • N-Glycans
      • CAZy, Carbohydrate-Active enZYmes. This server describes the families of structurally-related catalytic and carbohydrate-binding modules (or functional domains) of enzymes that degrade, modify, or create glycosidic bonds
      • CarbBank - Complex Carbohydrate Structure Database
      • GlycoSuiteDB - GlycoSuiteDB is an annotated and curated relational database of glycan structures. Currently, the database contains most published O-linked glycans, and N-linked glycans. Where known, the proteins to which the glycan structures are attached are described, and cross-references to SWISS-PROT/TrEMBL are given
      • O-GLYCBASE is a revised database of O-glycosylated proteins.  Version 5.00 has 198 glycoprotein entries. The criteria for inclusion are at least one experimentally verified O-glycosylation site. The terminal sugar linked to serine or threonine is cited when known. The database is non-redundant in the sense that it contains no identical sequences, unless there is conflicting glycosylation data.
      • BPGD Bacterial Polysaccharide Gene Database
      • TGN - The Glycoscience Network
      • FCCA - Forum Carbohydrates Coming of Age
      • CARBHYD - Carbohydrate information WWW sites
      • Monosacharide browser - Space filling Fischer projection for monosaccharides
    • phosphoproteins and kinases/phosphatases
      • PPDB - Phosphoprotein Database
      • PKR, The Protein Kinase Resource is a web accessible compendium of information on the protein kinase family of enzymes. This resource includes tools for structural and computational analyses as well as links to related information maintained by others. The PKR is a collaborative project of protein kinase researchers and computational biologists working to create a database integrating molecular and cellular information.
      • RSK tandem kinases

 
 
PROTEIN-PROTEIN interactions and  PROTEIN COMPLEXES
  • protein-protein interactions
    • BIND  Biomolecular Interaction Network Database. BIND contains protein and other biomolecular interactions, molecular complex and pathway records. This database will span the complexity of interaction information gathered through experimental studies of biomolecular interactions, from the literature, submitters and other databases.
    • BRITE  Biomolecular Relations in Information Transmission and Expression. BRITE is a database of binary relations for computation and comparison of graphs. It contains diverse sets of binary relations, including the generalized protein-protein interactions that underlie the KEGG pathway diagrams, systematic experimental data on protein-protein interactions by yeast two-hybrid systems, sequence similarity relations by SEARCH, expression similarity relations by microarray gene expression profiles, and the cross-reference links between database entries. This is a preliminary version of BRITE for simple retrieval of partners.
    • DIP Database of Interacting Proteins is a database of protein pairs that are known to interact with each other. Interaction is defined as "two amino acid chains that bind to each other for a function". The idea is to provide well defined links between proteins that interact. The database is publicly available on the web and is intended to aid those studying protein-protein interactions, signaling pathways, multiple interactions and complex systems. 
    • Interact, An Object Oriented Database for Protein-Protein Interactions.
    • ProNet Online  Protein Interactions on the Web, from Myriad Genetics, Inc
    • PIMdb, Drosophila Protein Interaction Map database
    • PFBP Protein Function and Biochemical Pathways Project
  • protein complexes

 
 
INDIVIDUAL PROTEINS

 
 
MISC.  LINKS

 
 
NEWSGROUPS