linked to PubMed where applicable.
BACKGROUND: Most methods for predicting functional sites in protein 3D structures, rely on information on related proteins and cannot be applied to proteins with no known relatives. Another limitation of these methods is the lack of a well annotated set of functional sites to use as benchmark for validating their predictions. Experimental findings and theoretical considerations suggest that residues involved in function often contribute unfavorably to the native state stability. We examine the possibility of systematically exploiting this intrinsic property to identify functional sites using an original procedure that detects destabilizing regions in protein structures. In addition, to relate destabilizing regions to known functional sites, a novel benchmark consisting of a diverse set of hand-curated protein functional sites is derived. RESULTS: A procedure for detecting clusters of destabilizing residues in protein structures is presented. Individual residue contributions to protein stability are evaluated using detailed atomic models and a force-field successfully applied in computational protein design. The most destabilizing residues, and some of their closest neighbours, are clustered into destabilizing regions following a rigorous protocol. Our procedure is applied to high quality apo-structures of 63 unrelated proteins. The biologically relevant binding sites of these proteins were annotated using all available information, including structural data and literature curation, resulting in the largest hand-curated data set of binding sites in proteins available to date. Comparing the destabilizing regions with the annotated binding sites in these proteins, we find that the overlap is on average limited, but significantly better than random. Results depend on the type of bound ligand. Significant overlap is obtained for most polysaccharide- and small ligand-binding sites, whereas no overlap is observed for most nucleic acid binding sites. These differences are rationalised in terms of the geometry and energetics of the binding site. CONCLUSION: We find that although destabilizing regions as detected here can in general not be used to predict binding sites in protein structures, they can provide useful information, particularly on the location of functional sites that bind polysaccharides and small ligands. This information can be exploited in methods for predicting function in protein structures with no known relatives. Our publicly available benchmark of hand-curated functional sites in proteins should help other workers derive and validate new prediction methods.
BACKGROUND: The classical picture of the hydrophobic stabilization of proteins invokes a resemblance between the protein interior and nonpolar solvents, but the extent to which this is the case has often been questioned. The protein interior is believed to be at least as tightly packed as organic crystals, and was shown to have very low compressibility. There is also evidence that these properties are not uniform throughout the protein, and conflicting views exist on the nature of sidechain packing and on its influence on the properties of the protein. RESULTS: In order to probe the physical properties of the protein, the free energy associated with the formation of empty cavities has been evaluated for two proteins: barnase and T4 lysozyme. To this end, the likelihood of encountering such cavities was computed from room temperature molecular dynamics trajectories of these proteins in water. The free energy was evaluated in each protein taken as a whole and in submolecular regions. The computed free energies yielded information on the manner in which empty space is distributed in the system, while the latter undergoes thermal motion, a property hitherto not analyzed in heterogeneous media such as proteins. Our results showed that the free energy of cavity formation is higher in proteins than in both water and hexane, providing direct evidence that the native protein medium differs in fundamental ways from the two liquids. Furthermore, although the packing density was found to be higher in nonpolar regions of the protein than in polar ones, the free energy cost of forming atomic size cavities is significantly lower in nonpolar regions, implying that these regions contain larger chunks of empty space, thereby increasing the likelihood of containing atomic size packing defects. These larger empty spaces occur preferentially where buried hydrophobic sidechains belonging to secondary structures meet one another. These particular locations also appear to be more compressible than other parts of the core or surface of the protein. CONCLUSIONS: The cavity free energy calculations described here provide a much more detailed physical picture of the protein matrix than volume and packing calculations. According to this picture, the packing of hydrophobic sidechains is tight in the interior of the protein, but far from uniform. In particular, the packing is tighter in regions where the backbone forms less regular hydrogen-bonding interactions than at interfaces between secondary structure elements, where such interactions are fully developed. This may have important implications on the role of sidechain packing in protein folding and stability.
Database-derived potentials, compiled from frequencies of sequence and structure features, are often used for scoring the compatibility of protein sequences and conformations. It is often believed that these scores correspond to differences in free energy with, in addition, a term containing the partition function of the system. Since this function does not depend on the conformation, the potentials are considered to be valid for scoring the compatibility of different conformations with a given sequence ('forward folding'), but not of sequences with a given structure ('inverted folding'). This interpretation is questioned here. It is argued that when many body-effects, which dominate frequencies compiled from the protein database, are corrected for, the potentials approximate a physically meaningful free energy difference from which the partition function term cancels out. It is the difference between the free energy of a given sequence in a specific conformation and that of the same sequence in a denatured-like state. Two examples of denatured-like states are discussed. Depending on the considered state, the free energy difference reduces to the commonly used scoring scheme, or contains additional terms that depend on the sequence. In both cases, all the terms can be derived from sequence-structure frequencies in the database. Such free energy difference, commonly defined as the folding free energy, is a measure of protein stability and can be used for scoring both forward and inverted protein folding. The implications for the use of knowledge-based potentials in protein structure prediction are described. Finally, the difficulty of designing tests that could validate the proposed approach, and the inherent limitations of such tests, are discussed.
Molecular dynamics simulations are used to investigate the unfolding reaction of an isolated beta-hairpin formed by residues 85 to 102 of barnase, a ribonuclease from Bacillus amyloliquefaciens. This peptide was considered following evidence from experimental studies that it may act as an initiation site for barnase folding by adopting a native-like conformation early during the folding process. Three successive molecular dynamics simulations of about 300 ps each were carried out for an all-atom model of the hairpin in water at 300 K, 450 K, and 600 K, respectively. A detailed analysis of all three simulations is presented. In particular we investigate the behavior of the backbone hydrogen bonds, and of hydrophobic interactions between side-chains, where distinction is made between contributions from native and non-native contacts, respectively. Furthermore, we investigate peptide water interactions and monitor the presence and size of empty cavities. The behavior of the hairpin in the three simulations, when considered sequentially, describes a process whereby a native-like conformation evolves to an unfolded state. Unfolding starts at the beginning of the 450 K simulation with the loss of two hydrogen bonds at the free hairpin extremities. At about the same time, the centrally located H-bonds are weakened and exchange more frequently with water, but the turn tightens up as the beta-sheet extends into the turn region. All this is accompanied by a volume expansion and the formation of a large hydrophobic side-chain cluster promoted by both native and highly fluctuating non-native apolar contacts involving residues 87 to 90 and 95 to 99. This collapsed but more loosely packed state, essentially stabilized by hydrophobic interactions, is stable throughout the entire 450 K simulation and for about 150 ps at 600 K, after which point it proceeds rapidly to completely denatured conformations. This behavior presents clear analogies with known features of the unfolding reaction of complete proteins. It may indicate that this beta-hairpin has a well-defined conformation on its own, which would be in agreement with its role as an initiation site for folding.
It is investigated whether protein segments predicted to have a well-defined conformational preference in the absence of tertiary interactions are conserved in families of homologous proteins. The prediction method follows the procedures of Rooman, M., Kocher, J.-P., and Wodak, S. (preceding paper in this issue). It uses a knowledge-based force field that incorporates only local interactions along the sequence and identifies segments whose lowest energy structure displays a sizable energy gap relative to other computed conformations. In 13 of the protein families and subfamilies considered that are sufficiently homologous to have similar 3D structures, at least one region is consistently predicted as having the same preferred conformation in virtually all family members. These regions are between 4 and 26 residues long. They are often located at chain ends and correspond primarily to segments of secondary structure heavily involved in interactions with the rest of the protein, suggesting that they could act as nuclei around which other parts of the structure would assemble. Experimental data on early folding intermediates or on protein fragments with appreciable structure in aqueous solution are available for more than half of the protein families. Comparison of our results with these data is quite favorable. They reveal that each of the experimentally identified early formed, or independently stable, substructures harbors at least one of the segments consistently predicted as having a preferred conformation by our procedure. The implications of our findings for the conservation of folding pathways in homologous proteins are discussed.
A recently developed procedure to predict backbone structure from the amino acid sequence [Rooman, M., Kocher, J. P., & Wodak, S. (1991) J. Mol. Biol, 221, 961-979] is fine tuned to identify protein segments, of length 5-15 residues, that adopt well-defined conformations in the absence of tertiary interactions. These segments are obtained by requiring that their predicted lowest energy structures have a sizable energy gap relative to other computed conformations. Applying this procedure to 69 proteins of known structure, we find that regions with largest energy gaps--those having highly preferred conformations--are also the most accurately predicted ones. On the basis of previous findings that such regions correlate well with sites that become structured early during folding, our approach provides the means of identifying such sites in proteins without prior knowledge of the tertiary structure. Furthermore, when predictions are performed so as to ignore the influence of residues flanking each segment along the sequence, a situation akin to excising the considered peptide from the rest of the chain, they offer the possibility of identifying protein segments liable to adopt well-defined conformations on their own. The described approach should have useful applications in experimental and theoretical investigations of protein folding and stability, and aid in designing peptide drugs and vaccines.
Free energy simulation methods are used to analyse the effects of the mutation Arg-96----His on the stability of bacteriophage T4 lysozyme and of Ile-96----Ala on the stability of barnase. By use of thermodynamic integration, the contributions of specific interactions to the free energy change are evaluated. It is shown that a number of contributions that stabilize the wild-type or the mutant partially cancel in the overall free energy difference; some of these involve the unfolded state. Comparison of the results with conclusions based on structural and thermodynamic data leads to new insights into the origin of the stability difference between wild-type and mutant proteins. For the charged-to-charged amino acid mutation in T4 lysozyme, the importance of the contributions of more distant residues, solvent water and the covalent linkage involving the mutated amino acid are of particular interest. Also, the analysis of the Arg-96 to His mutation with respect to the interactions with the C-terminal end of a helix (residues 82-90) indicates that the nearby carbonyl groups (Tyr-88 and Asp-89) make the dominant contribution, that the amide groups do not contribute significantly and that the helix dipole model is inappropriate for this case. For the non-polar-to-non-polar amino acid mutation in barnase, the solvent contribution is unimportant, and covalent terms are shown to be significant because they do not cancel between the folded and unfolded state.
Molecular dynamics simulations have been used to compute the difference in the unfolding free energy between wild-type barnase and the mutant in which Ile-96 is replaced by alanine. The simulations yield results (-3.42 and -5.21 kcal/mol) that compare favorably with experimental values (-3.3 and -4.0 kcal/mol). The major contributions to the free energy difference arise from bonding terms involving degrees of freedom of the mutated side chain and from nonbonded interactions of that side chain with its environment in the folded protein. By comparison with simulations of an extended peptide in the absence of solvent, used as a reference state, hydration effects are shown to play a minor role in the overall free energy balance for the Ile----Ala transformation. The implications of these results for our understanding of the hydrophobic effect and its contribution to protein stability are discussed.
A method is developed to compute backbone tertiary folds from the amino acid sequence. In this method, the number of degrees of freedom is drastically reduced by neglecting side-chain flexibility, and by describing backbone conformations as combinations of only seven structural states. These are characterized by single values of the dihedral angles phi, psi and omega, representing allowed conformations of the isolated dipeptide. We show that this restrictive model is none the less capable of describing native backbones to within acceptable deviations. Using our backbone description, potentials of mean force are derived from a database of known protein structures, based on statistical influences of single residues and residue pairs on the conformational states in their vicinity along the chain. This yields the force-field component due to local interactions, which is then used to predict lowest-energy conformations from any given amino acid sequence. The prediction algorithm does not require searching conformational space and is therefore extremely fast. Another important asset of our method is that it is able to compute not only the minimum energy conformation, but any number of lowest energy structures, whose relative preferences can be determined from the corresponding computed energy values. The performance of our procedure is tested on short peptides that are likely to be stabilized by local interactions. These include several helical structures and a hexapeptide with a beta-bend conformation, corresponding to peptides shown to have relatively well-defined conformations in aqueous solution, and to protein segments believed to adopt their native conformation early during folding. In addition, several flexible peptides are analysed. Except for the problems encountered in predicting observed disulphide bridges in two of the flexible peptides, and in a somewhat larger fragment comprising residues 30 to 51 of bovine trypsin inhibitor, prediction results compare very favourably with experimental data. Potential applications of our procedure to protein modelling and its extension to protein folding are discussed.
Basic design features of the beta-sheet portion in parallel alpha beta barrels in known protein structures are analysed in the context of a model of a regular hyperboloid. A formal description of the relationships between beta-sheet twist, number of strands in the sheet and barrel dimensions is derived, and the underlying physical principles are rationalized. Results suggest that the major constraints on the geometry of the beta-sheet portion of the barrel come from the requirements to have optimal H-bonding interactions between beta-strands and to closely pack amino acid side-chains in the barrel interior so as to exclude bulk water. In addition, we show how the hyperboloid model and the ensuing formalism can serve to derive useful geometric and graphic tools for computer-aided protein design de novo. We then illustrate how these tools are used to determine that the requirement to have a closed regular eight-stranded beta-sheet surface imposes no particular constraints on the geometry (phi, psi angles) of the polypeptide backbone. Understanding the role of the amino acid sequence in determining the observed structures remains a major challenge. Detailed comparisons of known alpha beta-barrel structures (and amino acid sequence) with each other, and with polypeptide fragments from other protein crystal structures, reveal only a limited number of common sequence-structure motifs. These belong to characteristic alpha beta 1 and alpha beta 3 loop families previously described in alpha beta proteins, and occur at least once in nearly all the alpha beta-barrel structures examined.
The relation between amino acid sequence and local structure in proteins is investigated. The local structures considered are either the four classes of secondary structure (H, E, T and C) or four classes of local conformations defined using measures of conformational similarity based on distances between C alpha atoms. The classes are obtained by applying an automatic clustering procedure to short polypeptide fragments of uniform length from a database of 75 known protein structures. The thrust of our investigation consists of systematically searching the database for simple amino acid patterns of the type Gly-X-Ala-X-X-Val, where X denotes an arbitrary residue. Patterns that are nearly always associated with the same structure are retained. Finding many such associations, we then evaluate by a statistical approach how many among them are non-random and compare the results for different definitions of local structure. A similar comparison is made for the predictive value of retained associations, which is assessed using an internal test based on dividing the database into "learning" and "test" subsets. While we find that local structures defined by conformational similarity are not superior to secondary structure for prediction purposes, they help us gain insight into the factors that influence the predictive value of derived associations. A major conclusion is that the number of retained associations is in large excess over the number expected from a random correlation between sequence and structure, irrespective of how local conformation is defined. However, only a very small number of these associations can be earmarked as reliable using statistical criteria, due to the limited size of the database. We find, for instance, that the pattern Ala-Ala-X-X-Lys reliably characterizes helix, and the pattern Val-X-Val-X-X-X-Ala reliably characterizes extended structure and beta-strand. The possibility is discussed that these and other reliable associations correspond to regions of the polypeptide chain whose conformations are locally determined and that these regions may play a role in folding.