Protein Structure Prediction

     linked to PubMed where applicable.

  1. Radresa O. Ogata K. Wodak SJ. RuysschaertJM. and Goormaghtigh E.
    Homology Modeling of the H+-ATPase of Neurospora crassa: proposal for a proton pathway from the analysis of internal polar cavities.
    Eur. J. Biochem. 269: 5246-58
  2. Wodak S.J. Rooman M.J. and Kocher J.-P.A.
    Effective potentials derived from known protein structures and their use in predicting 3D structure from sequence.
    In F. Sanz J. Giraldo and F. Manaut (eds.)QSAR and Molecular Modelling: Concepts computational tools and biological applications. Proceedings of the 10th European Symposium onStructure-Activity Relationships: QSAR and Molecular Modelling Barcelona Spain September 4-5 1994. J.R. Prous Science Publishers Barcelona (Spain) 206-215.
  3.     
    Lemer C.M.-R. Rooman M.J. and Wodak S.J.
    Protein structure prediction by threading methods: evaluation of current techniques.
    Proteins: Struct. Funct. Genet. 23(3) 337-355
  4. This paper evaluates the results of a protein structure prediction contest. The predictions were made using threading procedures, which employ techniques for aligning sequences with 3D structures to select the correct fold of a given sequence from a set of alternatives. Nine different teams submitted 86 predictions, on a total of 21 target proteins with little or no sequence homology to proteins of known structure. The 3D structures of these proteins were newly determined by experimental methods, but not yet published or otherwise available to the predictors. The predictions, made from the amino acid sequence alone, thus represent a genuine test of the current performance of threading methods. Only a subset of all the predictions is evaluated here. It corresponds to the 44 predictions submitted for the 11 target proteins seen to adopt known folds. The predictions for the remaining 10 proteins were not analyzed, although weak similarities with known folds may also exist in these proteins. We find that threading methods are capable of identifying the correct fold in many cases, but not reliably enough as yet. Every team predicts correctly a different set of targets, with virtually all targets predicted correctly by at least one team. Also, common folds such as TIM barrels are recognized more readily than folds with only a few known examples. However, quite surprisingly, the quality of the sequence-structure alignments, corresponding to correctly recognized folds, is generally very poor, as judged by comparison with the corresponding 3D structure alignments. Thus, threading can presently not be relied upon to derive a detailed 3D model from the amino acid sequence. This raises a very intriguing question: how is fold recognition achieved? Our analysis suggests that it may be achieved because threading procedures maximize hydrophobic interactions in the protein core, and are reasonably good at recognizing local secondary structure.


  5.     
    Rooman M.J. and Wodak S.J.
    Are database-derived potentials valid for scoring both forward and inverted protein folding?
    Protein Engineering 8(9) 849-858
  6. Database-derived potentials, compiled from frequencies of sequence and structure features, are often used for scoring the compatibility of protein sequences and conformations. It is often believed that these scores correspond to differences in free energy with, in addition, a term containing the partition function of the system. Since this function does not depend on the conformation, the potentials are considered to be valid for scoring the compatibility of different conformations with a given sequence ('forward folding'), but not of sequences with a given structure ('inverted folding'). This interpretation is questioned here. It is argued that when many body-effects, which dominate frequencies compiled from the protein database, are corrected for, the potentials approximate a physically meaningful free energy difference from which the partition function term cancels out. It is the difference between the free energy of a given sequence in a specific conformation and that of the same sequence in a denatured-like state. Two examples of denatured-like states are discussed. Depending on the considered state, the free energy difference reduces to the commonly used scoring scheme, or contains additional terms that depend on the sequence. In both cases, all the terms can be derived from sequence-structure frequencies in the database. Such free energy difference, commonly defined as the folding free energy, is a measure of protein stability and can be used for scoring both forward and inverted protein folding. The implications for the use of knowledge-based potentials in protein structure prediction are described. Finally, the difficulty of designing tests that could validate the proposed approach, and the inherent limitations of such tests, are discussed.


  7. Rooman M.J. Kocher J.-P.A. Wintjens R. andWodak S.J.
    Knowledge based potentials for predicting the three-dimensional conformation of proteins.
    In S. Doniach (ed.) Statistical mechanics protein structure and protein substrate interactions. Plenum Press New York. Proceedings of a NATO meeting held at Cargse France 1993 327-330
  8.     
    Kocher J.-P. A. Rooman M.J. and Wodak S.J.
    Factors influencing the ability of knowledge based potentials to identify native sequence-structure matches
    J. Mol. Biol. 235(5) 1598-1613
  9. Several types of potentials are derived from a data set of known protein structures by computing statistical relations between amino acid sequence and different descriptions of the protein conformation. These potentials formulate in different ways backbone dihedral angle preferences, pairwise distance-dependent interactions between amino acid residues, and solvation effects based on accessible surface area calculations. Parameters affecting the characteristics and the performance of the potentials are critically assessed by monitoring recognition of the native fold in a strict screening test, where each sequence in the data set is threaded through a repertoire of motifs, generated from all corresponding structures. Sequence gaps are not allowed, to avoid additional approximations. Results show that residue interaction potentials computed from distances between average side-chain centroids perform significantly better on this test than those computed considering inter-C alpha or inter-C beta distances. Combining potentials that are based on different structural descriptions and different interactions is also beneficial. The performance of some of these potentials is in fact so good that they recognize the correct fold for all the tested proteins, including subunits known to be unstable in the absence of quaternary interactions. Most strikingly, potentials representing backbone dihedral angle preferences recognize as many as 68 protein chains out of a total of 74, even though they consider solely local interactions along the chain, which, being the same as those considered in secondary structure prediction methods, are well known to be incapable of determining the full three-dimensional fold. This leads us to question the ability of procedures that screen a limited repertoire of structures to act as a stringent test for the potentials. We concede, however, that they are useful and fast tests, capable of revealing gross shortcomings of the potentials, or possible biases towards native recognition due, for example, to effects of sequence memory.


  10. Wodak S.J. and Rooman M.J.
    Generating and testing protein folds.
    Curr. Opin. Struct. Biol. 3 247-259.
  11.     
    Rooman M.J. and Wodak S.J.
    (1992)Extracting information on folding from the amino acid sequence: role of consensus stable regions in homologous proteins
    Biochemistry 31 10239-10249
  12. It is investigated whether protein segments predicted to have a well-defined conformational preference in the absence of tertiary interactions are conserved in families of homologous proteins. The prediction method follows the procedures of Rooman, M., Kocher, J.-P., and Wodak, S. (preceding paper in this issue). It uses a knowledge-based force field that incorporates only local interactions along the sequence and identifies segments whose lowest energy structure displays a sizable energy gap relative to other computed conformations. In 13 of the protein families and subfamilies considered that are sufficiently homologous to have similar 3D structures, at least one region is consistently predicted as having the same preferred conformation in virtually all family members. These regions are between 4 and 26 residues long. They are often located at chain ends and correspond primarily to segments of secondary structure heavily involved in interactions with the rest of the protein, suggesting that they could act as nuclei around which other parts of the structure would assemble. Experimental data on early folding intermediates or on protein fragments with appreciable structure in aqueous solution are available for more than half of the protein families. Comparison of our results with these data is quite favorable. They reveal that each of the experimentally identified early formed, or independently stable, substructures harbors at least one of the segments consistently predicted as having a preferred conformation by our procedure. The implications of our findings for the conservation of folding pathways in homologous proteins are discussed.


  13.     
    Rooman M.J. Kocher J.-P. and Wodak S.J.
    Extracting information on folding from the amino acid sequence: accurate predictions for protein regions with stable conformation in absence of tertiary interactions
    Biochemistry 31 10226-10238
  14. A recently developed procedure to predict backbone structure from the amino acid sequence [Rooman, M., Kocher, J. P., & Wodak, S. (1991) J. Mol. Biol, 221, 961-979] is fine tuned to identify protein segments, of length 5-15 residues, that adopt well-defined conformations in the absence of tertiary interactions. These segments are obtained by requiring that their predicted lowest energy structures have a sizable energy gap relative to other computed conformations. Applying this procedure to 69 proteins of known structure, we find that regions with largest energy gaps--those having highly preferred conformations--are also the most accurately predicted ones. On the basis of previous findings that such regions correlate well with sites that become structured early during folding, our approach provides the means of identifying such sites in proteins without prior knowledge of the tertiary structure. Furthermore, when predictions are performed so as to ignore the influence of residues flanking each segment along the sequence, a situation akin to excising the considered peptide from the rest of the chain, they offer the possibility of identifying protein segments liable to adopt well-defined conformations on their own. The described approach should have useful applications in experimental and theoretical investigations of protein folding and stability, and aid in designing peptide drugs and vaccines.


  15.     
    Rooman M.J. Kocher J.-P.A. and Wodak S.J.
    Prediction of protein backbone conformation based on 7 structure assignments: Influence of local interactions.
    J.Mol.Biol. 221 961-979.
  16. A method is developed to compute backbone tertiary folds from the amino acid sequence. In this method, the number of degrees of freedom is drastically reduced by neglecting side-chain flexibility, and by describing backbone conformations as combinations of only seven structural states. These are characterized by single values of the dihedral angles phi, psi and omega, representing allowed conformations of the isolated dipeptide. We show that this restrictive model is none the less capable of describing native backbones to within acceptable deviations. Using our backbone description, potentials of mean force are derived from a database of known protein structures, based on statistical influences of single residues and residue pairs on the conformational states in their vicinity along the chain. This yields the force-field component due to local interactions, which is then used to predict lowest-energy conformations from any given amino acid sequence. The prediction algorithm does not require searching conformational space and is therefore extremely fast. Another important asset of our method is that it is able to compute not only the minimum energy conformation, but any number of lowest energy structures, whose relative preferences can be determined from the corresponding computed energy values. The performance of our procedure is tested on short peptides that are likely to be stabilized by local interactions. These include several helical structures and a hexapeptide with a beta-bend conformation, corresponding to peptides shown to have relatively well-defined conformations in aqueous solution, and to protein segments believed to adopt their native conformation early during folding. In addition, several flexible peptides are analysed. Except for the problems encountered in predicting observed disulphide bridges in two of the flexible peptides, and in a somewhat larger fragment comprising residues 30 to 51 of bovine trypsin inhibitor, prediction results compare very favourably with experimental data. Potential applications of our procedure to protein modelling and its extension to protein folding are discussed.


  17.     
    Huysmans M. Richelle J. and Wodak S.J.
    SESAM: a relational database for structures of macromolecules.
    Proteins 11 (1) 59-76.
  18. A system is described that provides ways of integrating data on protein structure, sequence, and survey results, with molecular graphics and molecular mechanics software. Its major component is the relational database SESAM, presently implemented under the commercial package SYBASE. By design, the database allows full integration--within the same data organization--of raw data on protein structure, sequence, ligands, and heterogroups, obtained from the Brookhaven Protein Databank, with pure sequence information available from other databanks such as SWISS-PROT. It contains in addition higher level descriptions of structural and topological properties, as well as survey results, obtained by executing specialized computer programs. Aside from the very useful attribute of closely combining structural and nonstructural information, other important features distinguish it from analogous systems developed elsewhere. It includes a molecular dictionary with complete description of geometric properties and energy parameters used in modeling and conformational energy calculations. Using this dictionary, structural data are validated by checking for localized inconsistencies in atomic coordinates, atomic symbols, chirality definitions, and flagging errors and incomplete entries. Because of both the dictionary and the validation procedures, SESAM can be readily interfaced with conventional molecular graphics and mechanics software packages, or with other specialized application programs. With the aid of appropriate interfaces, data access is sufficiently fast for SESAM to be interrogated interactively. Prototypes of user interfaces, as well as an interface with the molecular graphics package BRUGEL, are described and the power of the system is illustrated in applications such as homology-based protein modeling, computer-aided protein design, protein structure predictions, analysis of local structure motifs, and of relationships between protein sequence and structure.


  19.     
    Rooman M.J. and Wodak S.J.
    Weak correlation between predictive power of individual sequence patterns and overall prediction accuracy in Proteins.
    Proteins 9 (1) 69-78.
  20. Patterns in amino acid properties (polar, hydrophobic, etc.) that characterize secondary structure motifs are derived from a database containing 75 protein structures, with the aim of circumventing the limitations due to data base size so as to increase structure prediction score. Many such sequence-structure associations with high intrinsic predictive power are found, which turn out to be correct 78% of the time when applied individually to proteins outside the learning set. Based on these associations, a prediction method is developed, which reaches the score of 62% on the 3 states alpha-helix, beta-strand, and loop, without using additional constraints. Though this score is quite good compared to that of other available prediction methods, it is much lower than could be expected from the high intrinsic predictive power of the associations used. The reasons underlying this surprising result, which indicate that prediction score and intrinsic predictive power are only weakly coupled, are discussed. It is also shown that the size of the present database still seriously limits prediction scores, even when property patterns are used, and that higher scores are expected in large databases. Clues are provided on the relative influence of neglecting spatial interactions on prediction efficiency, suggesting that, in sufficiently large databases, predicted secondary structures would correspond to those formed early in the folding process. This hypothesis is tested by confronting present predictions with available experimental data on early protein folding intermediates and on small peptides that adopt a relatively stable conformation in water. Although admittedly there are still too few such data, results suggest that the hypothesis might be well founded.


  21. Moens L. Wolf G. Van Hauwaert M.L. DeBaere I. Van Beumen J. Wodak S.J. and Trotman C.N.A.
    The extracellular hemoglobins of Artemia: Structure of the oxygen carrier and respiration physiology.
    In Artemia Biology CRC Press (Browne Sorgeloos and Trotman Editors) 187-220.
  22.     
    Moens L. Van Hauwaert M.-L. De Smet K. VerDonck K. Van de Peer Y. Van Beumen J. Wodak S.J. Alard Ph. andTrotman C.
    Structural interpretation of the amino acid sequence of a second domain from the Artemia covalent polymer globin.
    J.Biol.Chem. 265 (24) 14285-14291.
  23. Artemia has a complex extracellular hemoglobin of Mr 260,000 comprising two globin chains (Mr 130,000) each of which is a polymer of eight covalently linked domains of Mr 16,000. The primary structure of this polymeric globin was studied to understand how globin folded domains are ordered within a globin chain and, in turn, how the latter associate into a functional hemoglobin molecule. Here we report the amino acid sequence of a second domain, E7 (Mr 16,081, excluding the heme), and interpretations of sequence data by computer-assisted alignment and modeling. This clearly shows that, as with domain E1 (Moens, L., Van Hauwaert, M.-L., De Smet, K., Geelen, D., Verpooten, G., Van Beeumen, J., Wodak, S., Alard, P., & Trotman, C. (1988) J. Biol. Chem. 263, 4679-4685), domain E7 is compatible with a globin folded structure of the beta-type chain. Several specific differences of domains E7 and E1 from the classic globins are identified. They possibly can be interpreted in terms of specific requirements for a double octameric functional molecule.


  24.     
    Rooman M.J. Rodriguez J. and Wodak S.J.
    Relations between protein sequence and structure and their significance.
    J.Mol.Biol. 213 337-350.
  25. The relation between amino acid sequence and local structure in proteins is investigated. The local structures considered are either the four classes of secondary structure (H, E, T and C) or four classes of local conformations defined using measures of conformational similarity based on distances between C alpha atoms. The classes are obtained by applying an automatic clustering procedure to short polypeptide fragments of uniform length from a database of 75 known protein structures. The thrust of our investigation consists of systematically searching the database for simple amino acid patterns of the type Gly-X-Ala-X-X-Val, where X denotes an arbitrary residue. Patterns that are nearly always associated with the same structure are retained. Finding many such associations, we then evaluate by a statistical approach how many among them are non-random and compare the results for different definitions of local structure. A similar comparison is made for the predictive value of retained associations, which is assessed using an internal test based on dividing the database into "learning" and "test" subsets. While we find that local structures defined by conformational similarity are not superior to secondary structure for prediction purposes, they help us gain insight into the factors that influence the predictive value of derived associations. A major conclusion is that the number of retained associations is in large excess over the number expected from a random correlation between sequence and structure, irrespective of how local conformation is defined. However, only a very small number of these associations can be earmarked as reliable using statistical criteria, due to the limited size of the database. We find, for instance, that the pattern Ala-Ala-X-X-Lys reliably characterizes helix, and the pattern Val-X-Val-X-X-X-Ala reliably characterizes extended structure and beta-strand. The possibility is discussed that these and other reliable associations correspond to regions of the polypeptide chain whose conformations are locally determined and that these regions may play a role in folding.


  26.     
    Rooman M. Wodak S.J. and Thornton J.
    Amino acid sequence templates derived from recurrent turn motifs in proteins: critical evaluation of their predictive power.
    Protein Engineering 3 23-27.
  27. Amino acid sequence patterns suggested to characterize specific recurrent turn conformation in protein are tested as to their predictive power in a database containing 75 proteins of known structure. Many of these patterns are found to be associated with local structures that differ from the motifs originally used to derive them. It is therefore concluded that, while they could be useful for improving predictions made by other methods, their stand-alone predictive power is poor. The issue of deriving and validating consensus sequence patterns for use in protein structure prediction is raised.


  28.     
    Claessens M. Van Cutsem E. Lasters I. andWodak S.J.
    Modelling the polypeptide backbone with 'spare parts' from known protein structures.
    Protein Engineering 2 335-345.
  29. An automatic procedure for building a protein polyalanine backbone from C alpha positions and 'spare parts' retrieved from a data base of 66 high-resolution protein structures is described. Protein backbones are constructed from overlapping fragments of variable length, which allows the backbone of regular secondary structure elements to be built in one block. The procedure is shown to yield backbones which compare very favourably with those from highly refined X-ray structures (r.m.s. deviation between generated and crystal structures less than 1A). The method is furthermore quite insensitive to experimental errors in C alpha positions as well as to the size of the data base, and is seen to yield valuable insight into the relationships between sequence and 3-D structure: one example on triose phosphate isomerase, a beta-barrel protein, shows that beta alpha loops can be considered as structurally more uncommon than alpha beta loops. The 'spare parts' approach is also found to be useful for general-purpose modelling of local structural changes produced by insertion or deletion of residues. It should, however, be used with caution. Crude selection criteria based solely on fragment length and geometric fit to the loop base regions yield realistic backbones in about two-thirds of the test cases (r.m.s. deviations from refined crystal structure approximately 1A). In the remaining cases, sequence information, in particular the presence of glycine residues which tend to adopt more unusual backbone conformations, must be considered to obtain comparable results.


  30.     
    Rooman M.J. and Wodak S.J.
    Identification of predictive sequence motifs limited by protein structure database size.
    Nature 335 45-49
  31. Associations between short amino acid sequence patterns and protein secondary structure classes can be found by searching a data base of known protein structures. Analysis of these associations suggests that secondary structure of proteins can be determined locally by sequence motifs of high predictive value, but at present our ability to find these motifs is limited by the size of the available data bases.


  32.     
    Moens L. Van Hauwaert M.-L. De Smet K. Geelen D. Verpooten G. Van Beeumen J. Wodak S.J. Alard Ph. andTrotman C.
    (1988). A structural domain of the covalent polymer globin chains of Artemia - Interpretation of amino acids sequence data.
    J.Biol.Chem. 263 4679-4685
  33. Artemia is unusual in having extracellular hemoglobins of Mr 260,000 comprising two globin chains (Mr 130,000), each of which is a polymer of eight covalently linked domains of about Mr 16,000. The amino acid sequence of one of these domains (E1) has been determined. It has 147 residues and Mr of 17,574 including heme. Sequence alignment revealed 19.0% identity with sperm whale myoglobin, whereas other vertebrate and invertebrate globins had between 13 and 24% identity. However, a much higher percentage of residues has a similar side chain character, suggesting that the domain E1 is very similar to other globins in showing the myoglobin fold. Template model building based on the known three-dimensional structure of myoglobin further supports this conclusion. Conversely, the differences between E1 and other globins are believed to reflect differences in the packing of the domains, first in a covalent polymeric subunit containing eight hemes and subsequently by association of two of these subunits as dimers. These findings provide further evidence for the versatility of the myoglobin fold.


  34. Claessens M. Alard P. Lasters I and WodakS.J.
    Fragment-matching approach to protein backbone building: a tool for structural analysis.
    In ICSU short report vol.8 pp 13.
  35. Marchal B. Kennes R. Bardiaux M. andWodak S.
    A semantical approach to protein structure prediction.
    Journal of Molecular Graphics 3 113-114.
  36. Wodak S.J.
    Is it possible to deduce the interaction between two proteins from their 3-D structure?
    In Molecular basis of mutant hemoglobin dysfunction P. Siegler (Ed.).Amsterdam Elsevier/North Holland 199-211.