linked to PubMed where applicable.
To examine the possible relationship of guanine-dependent GpA conformations with ribonucleotide cleavage, two potential of mean force (PMF) calculations were performed in aqueous solution. In the first calculation, the guanosine glycosidic (Gchi) angle was used as the reaction coordinate, and computations were performed on two GpA ionic species: protonated (neutral) or deprotonated (negatively charged) guanosine ribose O2 '. Similar energetic profiles featuring two minima corresponding to the anti and syn Gchi regions were obtained for both ionic forms. For both forms the anti conformation was more stable than the syn, and barriers of approximately 4 kcal/mol were obtained for the anti --> syn transition. Structural analysis showed a remarkable sensitivity of the phosphate moiety to the conformation of the Gchi angle, suggesting a possible connection between this conformation and the mechanism of ribonucleotide cleavage. This hypothesis was confirmed by the second PMF calculations, for which the O2 '--P distance for the deprotonated GpA was used as reaction coordinate. The computations were performed from two selected starting points: the anti and syn minima determined in the first PMF study of the deprotonated guanosine ribose O2'. The simulations revealed that the O2 ' attack along the syn Gchi was more favorable than that along the anti Gchi: energetically, significantly lower barriers were obtained in the syn than in the anti conformation for the O--P bond formation; structurally, a lesser O2 '--P initial distance, and a better suited orientation for an in-line attack was observed in the syn relative to the anti conformation. These results are consistent with the catalytically competent conformation of barnase-ribonucleotide complex, which requires a guanine syn conformation of the substrate to enable abstraction of the ribose h1 ' proton by the general base Glu73, thereby suggesting a coupling between the reactive substrate conformation and enzyme structure and mechanism. (c) 2007 Wiley-Liss, Inc.
The performance of methods for predicting protein-protein interactions at the atomic scale is assessed by evaluating blind predictions performed during 2005-2007 as part of Rounds 6-12 of the community-wide experiment on Critical Assessment of PRedicted Interactions (CAPRI). These Rounds also included a new scoring experiment, where a larger set of models contributed by the predictors was made available to groups developing scoring functions. These groups scored the uploaded set and submitted their own best models for assessment. The structures of nine protein complexes including one homodimer were used as targets. These targets represent biologically relevant interactions involved in gene expression, signal transduction, RNA, or protein processing and membrane maintenance. For all the targets except one, predictions started from the experimentally determined structures of the free (unbound) components or from models derived by homology, making it mandatory for docking methods to model the conformational changes that often accompany association. In total, 63 groups and eight automatic servers, a substantial increase from previous years, submitted docking predictions, of which 1994 were evaluated here. Fifteen groups submitted 305 models for five targets in the scoring experiment. Assessment of the predictions reveals that 31 different groups produced models of acceptable and medium accuracy-but only one high accuracy submission-for all the targets, except the homodimer. In the latter, none of the docking procedures reproduced the large conformational adjustment required for correct assembly, underscoring yet again that handling protein flexibility remains a major challenge. In the scoring experiment, a large fraction of the groups attained the set goal of singling out the correct association modes from incorrect solutions in the limited ensembles of contributed models. But in general they seemed unable to identify the best models, indicating that current scoring methods are probably not sensitive enough. With the increased focus on protein assemblies, in particular by structural genomics efforts, the growing community of CAPRI predictors is engaged more actively than ever in the development of better scoring functions and means of modeling conformational flexibility, which hold promise for much progress in the future. (c) 2007 Wiley-Liss, Inc.
BACKGROUND: In structural genomics, an important goal is the detection and classification of protein-protein interactions, given the structures of the interacting partners. We have developed empirical energy functions to identify native structures of protein-protein complexes among sets of decoy structures. To understand the role of amino acid diversity, we parameterized a series of functions, using a hierarchy of amino acid alphabets of increasing complexity, with 2, 3, 4, 6, and 20 amino acid groups. Compared to previous work, we used the simplest possible functional form, with residue-residue interactions and a stepwise distance-dependence. We used increased computational resources, however, constructing 290,000 decoys for 219 protein-protein complexes, with a realistic docking protocol where the protein partners are flexible and interact through a molecular mechanics energy function. The energy parameters were optimized to correctly assign as many native complexes as possible. To resolve the multiple minimum problem in parameter space, over 64000 starting parameter guesses were tried for each energy function. The optimized functions were tested by cross validation on subsets of our native and decoy structures, by blind tests on series of native and decoy structures available on the Web, and on models for 13 complexes submitted to the CAPRI structure prediction experiment. RESULTS: Performance is similar to several other statistical potentials of the same complexity. For example, the CAPRI target structure is correctly ranked ahead of 90% of its decoys in 6 cases out of 13. The hierarchy of amino acid alphabets leads to a coherent hierarchy of energy functions, with qualitatively similar parameters for similar amino acid types at all levels. Most remarkably, the performance with six amino acid classes is equivalent to that of the most detailed, 20-class energy function. CONCLUSION: This suggests that six carefully chosen amino acid classes are sufficient to encode specificity in protein-protein interactions, and provide a starting point to develop more complicated energy functions.
CAPRI is a community-wide experiment to test protein-protein docking methods in blind predictions. The Toronto meeting assessed structure predictions made from 2005-2007 on nine target protein-protein complexes or homodimers, and reported new developments in functions used to score predicted interactions, in treatment of conformational flexibility, and in taking nonstructural information into account in the predictions.
BACKGROUND: Most methods for predicting functional sites in protein 3D structures, rely on information on related proteins and cannot be applied to proteins with no known relatives. Another limitation of these methods is the lack of a well annotated set of functional sites to use as benchmark for validating their predictions. Experimental findings and theoretical considerations suggest that residues involved in function often contribute unfavorably to the native state stability. We examine the possibility of systematically exploiting this intrinsic property to identify functional sites using an original procedure that detects destabilizing regions in protein structures. In addition, to relate destabilizing regions to known functional sites, a novel benchmark consisting of a diverse set of hand-curated protein functional sites is derived. RESULTS: A procedure for detecting clusters of destabilizing residues in protein structures is presented. Individual residue contributions to protein stability are evaluated using detailed atomic models and a force-field successfully applied in computational protein design. The most destabilizing residues, and some of their closest neighbours, are clustered into destabilizing regions following a rigorous protocol. Our procedure is applied to high quality apo-structures of 63 unrelated proteins. The biologically relevant binding sites of these proteins were annotated using all available information, including structural data and literature curation, resulting in the largest hand-curated data set of binding sites in proteins available to date. Comparing the destabilizing regions with the annotated binding sites in these proteins, we find that the overlap is on average limited, but significantly better than random. Results depend on the type of bound ligand. Significant overlap is obtained for most polysaccharide- and small ligand-binding sites, whereas no overlap is observed for most nucleic acid binding sites. These differences are rationalised in terms of the geometry and energetics of the binding site. CONCLUSION: We find that although destabilizing regions as detected here can in general not be used to predict binding sites in protein structures, they can provide useful information, particularly on the location of functional sites that bind polysaccharides and small ligands. This information can be exploited in methods for predicting function in protein structures with no known relatives. Our publicly available benchmark of hand-curated functional sites in proteins should help other workers derive and validate new prediction methods.
Genetic analysis of a large Indian family with an autosomal dominant cataract phenotype allowed us to identify a novel cataract gene, CRYBA4. After a genomewide screen, linkage analysis identified a maximum LOD score of 3.20 (recombination fraction [theta] 0.001) with marker D22S1167 of the beta -crystallin gene cluster on chromosome 22. To date, CRYBA4 was the only gene in this cluster not associated with either human or murine cataracts. A pathogenic mutation was identified in exon 4 that segregated with the disease status. The c.317T-->C sequence change is predicted to replace the highly conserved hydrophobic amino acid phenylalanine94 with the hydrophilic amino acid serine. Modeling suggests that this substitution would significantly reduce the intrinsic stability of the crystalline monomer, which would impair its ability to form the association modes critical for lens transparency. Considering that CRYBA4 associates with CRYBB2 and that the latter protein has been implicated in microphthalmia, mutational analysis of CRYBA4 was performed in 32 patients affected with microphthalmia (small eye). We identified a c.242T-->C (Leu69Pro) sequence change in exon 4 in one patient, which is predicted here to disrupt the beta -sheet structure in CRYBA4. Protein folding would consequently be impaired, most probably leading to a structure with reduced stability in the mutant. This is the first report linking mutations in CRYBA4 to cataractogenesis and microphthalmia.
The current status of docking procedures for predicting protein-protein interactions starting from their three-dimensional (3D) structure is reassessed by evaluating blind predictions, performed during 2003-2004 as part of Rounds 3-5 of the community-wide experiment on Critical Assessment of PRedicted Interactions (CAPRI). Ten newly determined structures of protein-protein complexes were used as targets for these rounds. They comprised 2 enzyme-inhibitor complexes, 2 antigen-antibody complexes, 2 complexes involved in cellular signaling, 2 homo-oligomers, and a complex between 2 components of the bacterial cellulosome. For most targets, the predictors were given the experimental structures of 1 unbound and 1 bound component, with the latter in a random orientation. For some, the structure of the free component was derived from that of a related protein, requiring the use of homology modeling. In some of the targets, significant differences in conformation were displayed between the bound and unbound components, representing a major challenge for the docking procedures. For 1 target, predictions could not go to completion. In total, 1866 predictions submitted by 30 groups were evaluated. Over one-third of these groups applied completely novel docking algorithms and scoring functions, with several of them specifically addressing the challenge of dealing with side-chain and backbone flexibility. The quality of the predicted interactions was evaluated by comparison to the experimental structures of the targets, made available for the evaluation, using the well-agreed-upon criteria used previously. Twenty-four groups, which for the first time included an automatic Web server, produced predictions ranking from acceptable to highly accurate for all targets, including those where the structures of the bound and unbound forms differed substantially. These results and a brief survey of the methods used by participants of CAPRI Rounds 3-5 suggest that genuine progress in the performance of docking methods is being achieved, with CAPRI acting as the catalyst.
Increasingly complex schemes for representing solvent effects in an implicit fashion are being used in computational analyses of biological macromolecules. These schemes speed up the calculations by orders of magnitude and are assumed to compromise little on essential features of the solvation phenomenon. In this work we examine this assumption. Five implicit solvation models, a surface area-based empirical model, two models that approximate the generalized Born treatment and a finite difference Poisson-Boltzmann method are challenged in situations differing from those where these models were calibrated. These situations are encountered in automatic protein design procedures, whose job is to select sequences, which stabilize a given protein 3D structure, from a large number of alternatives. To this end we evaluate the energetic cost of burying amino acids in thousands of environments with different solvent exposures belonging, respectively, to decoys built with random sequences and to native protein crystal structures. In addition we perform actual sequence design calculations. Except for the crudest surface area-based procedure, all the tested models tend to favor the burial of polar amino acids in the protein interior over nonpolar ones, a behavior that leads to poor performance in protein design calculations. We show, on the other hand, that three of the examined models are nonetheless capable of discriminating between the native fold and many nonnative alternatives, a test commonly used to validate force fields. It is concluded that protein design is a particularly challenging test for implicit solvation models because it requires accurate estimates of the solvation contribution of individual residues. This contrasts with native recognition, which depends less on solvation and more on other nonbonded contributions.
Given the increasing interest in protein-protein interactions, the prediction of these interactions from sequence and structural information has become a booming activity. CAPRI, the community-wide experiment for assessing blind predictions of protein-protein interactions, is playing an important role in fostering progress in docking procedures. At the same time, novel methods are being derived for predicting regions of a protein that are likely to interact and for characterizing putative intermolecular contacts from sequence and structural data. Together with docking procedures, these methods provide an integrated computational approach that should be a valuable complement to genome-scale experimental studies of protein-protein interactions.
MALECON is a progressive combinatorial procedure for multiple alignments of protein structures. It searches a library of pairwise alignments for all three-protein alignments in which a specified number of residues is consistently aligned. These alignments are progressively expanded to include additional proteins and more spatially equivalent residues, subject to certain criteria. This action involves superimposing the aligned proteins by their hitherto equivalent residues and searching for additional Calpha atoms that lie close in space. The performance of MALECON is illustrated and compared with several extant multiple structure alignment methods by using as test the globin homologous superfamily, the OB and the Jellyrolls folds. MALECON gives better definitions of the common structural features in the structurally more diverse proteins of the OB and Jellyrolls folds, but it yields comparable results for the more similar globins. When no consistent multiple alignments can be derived for all members of a protein group, our procedure is still capable of automatically generating consistent alignments and common core definitions for subgroups of the members. This finding is illustrated for proteins of the OB fold and SH3 domains, believed to share common structural features, and should be very instrumental in homology modeling and investigations of protein evolution. Copyright 2004 Wiley-Liss, Inc.
CCR5 is a G protein-coupled receptor responding to four natural agonists, the chemokines RANTES (regulated on activation normal T cell expressed and secreted), macrophage inflammatory protein (MIP)-1 alpha, MIP-1 beta, and monocyte chemotactic protein (MCP)-2, and is the main co-receptor for the macrophage-tropic human immunodeficiency virus strains. We have previously identified a structural motif in the second transmembrane helix of CCR5, which plays a crucial role in the mechanism of receptor activation. We now report the specific role of aromatic residues in helices 2 and 3 of CCR5 in this mechanism. Using site-directed mutagenesis and molecular modeling in a combined approach, we demonstrate that a cluster of aromatic residues at the extracellular border of these two helices are involved in chemokine-induced activation. These aromatic residues are involved in interhelical interactions that are key for the conformation of the helices and govern the functional response to chemokines in a ligand-specific manner. We therefore suggest that transmembrane helices 2 and 3 contain important structural elements for the activation mechanism of chemokine receptors, and possibly other related receptors as well.
CAPRI is a communitywide experiment to assess the capacity of protein-docking methods to predict protein-protein interactions. Nineteen groups participated in rounds 1 and 2 of CAPRI and submitted blind structure predictions for seven protein-protein complexes based on the known structure of the component proteins. The predictions were compared to the unpublished X-ray structures of the complexes. We describe here the motivations for launching CAPRI, the rules that we applied to select targets and run the experiment, and some conclusions that can already be drawn. The results stress the need for new scoring functions and for methods handling the conformation changes that were observed in some of the target systems. CAPRI has already been a powerful drive for the community of computational biologists who development docking algorithms. We hope that this issue of Proteins will also be of interest to the community of structural biologists, which we call upon to provide new targets for future rounds of CAPRI, and to all molecular biologists who view protein-protein recognition as an essential process. Copyright 2003 Wiley-Liss, Inc.
The current status of docking procedures for predicting protein-protein interactions starting from their three-dimensional structure is assessed from a first major evaluation of blind predictions. This evaluation was performed as part of a communitywide experiment on Critical Assessment of PRedicted Interactions (CAPRI). Seven newly determined structures of protein-protein complexes were available as targets for this experiment. These were the complexes between a kinase and its protein substrate, between a T-cell receptor beta-chain and a superantigen, and five antigen-antibody complexes. For each target, the predictors were given the experimental structures of the free components, or of one free and one bound component in a random orientation. The structure of the complex was revealed only at the time of the evaluation. A total of 465 predictions submitted by 19 groups were evaluated. These groups used a wide range of algorithms and scoring functions, some of which were completely novel. The quality of the predicted interactions was evaluated by comparing residue-residue contacts and interface residues to those in the X-ray structures and by analyzing the fit of the ligand molecules (the smaller of the two proteins in the complex) or of interface residues only, in the predicted versus target complexes. A total of 14 groups produced predictions, ranking from acceptable to highly accurate for five of the seven targets. The use of available biochemical and biological information, and in one instance structural information, played a key role in achieving this result. It was essential for identifying the native binding modes for the five correctly predicted targets, including the kinase-substrate complex where the enzyme changes conformation on association. But it was also the cause for missing the correct solution for the two remaining unpredicted targets, which involve unexpected antigen-antibody binding modes. Overall, this analysis reveals genuine progress in docking procedures but also illustrates the remaining serious limitations and points out the need for better scoring functions and more effective ways for handling conformational flexibility. Copyright 2003 Wiley-Liss, Inc.
Homology modeling in combination with transmembrane topology predictions are used to build the atomic model of Neurospora crassa plasma membrane H+-ATPase, using as template the 2.6 A crystal structure of rabbit sarcoplasmic reticulum Ca2+-ATPase [Toyoshima, C., Nakasako, M., Nomura, H. & Ogawa, H. (2000) Nature 405, 647-655]. Comparison of the two calcium-binding sites in the crystal structure of Ca2+-ATPase with the equivalent region in the H+-ATPase model shows that the latter is devoid of most of the negatively charged groups required to bind the cations, suggesting a different role for this region. Using the built model, a pathway for proton transport is then proposed from computed locations of internal polar cavities, large enough to contain at least one water molecule. As a control, the same approach is applied to the high-resolution crystal structure of halorhodopsin and the proton pump bacteriorhodopsin. This revealed a striking correspondence between the positions of internal polar cavities, those of crystallographic water molecules and, in the case of bacteriorhodopsin, the residues mediating proton translocation. In our H+-ATPase model, most of these cavities are in contact with residues previously shown to affect coupling of proton translocation to ATP hydrolysis. A string of six polar cavities identified in the cytoplasmic domain, the most accurate part of the model, suggests a proton entry path starting close to the phosphorylation site. Strikingly, members of the haloacid dehalogenase superfamily, which are close structural homologs of this domain but do not share the same function, display only one polar cavity in the vicinity of the conserved catalytic Asp residue.
An automatic protein design procedure was used to compute amino acid sequences of peptides likely to bind the HLA-A2 major histocompatibility complex (MHC) class I allele. The only information used by the procedure are a structural template, a rotamer library, and a well established classical empirical force field. The calculations are performed on six different templates from x-ray structures of HLA-A0201-peptide complexes. Each template consists of the bound peptide backbone and the full atomic coordinates of the MHC protein. Sequences within 2 kcal/mol of the minimum energy sequence are computed for each template, and the sequences from all the templates are combined and ranked by their energies. The five lowest energy peptide sequences and five other low energy sequences re-ranked on the basis of their similarity to peptides known to bind the same MHC allele are chemically synthesized and tested for their ability to bind and form stable complexes with the HLA-A2 molecule. The most efficient binders are also tested for inhibition of the T cell receptor recognition of two known CD8(+) T effectors. Results show that all 10 peptides bind the expected MHC protein. The six strongest binders also form stable HLA-A2-peptide complexes, albeit to varying degrees, and three peptides display significant inhibition of CD8(+) T cell recognition. These results are rationalized in light of our knowledge of the three-dimensional structures of the HLA-A2-peptide and HLA-A2-peptide-T cell receptor complexes.
In recent years a large body of data has been obtained from Nuclear Magnetic Resonance and Circular Dichroism experiments on the influence of the amino acid sequence and various other parameters on the conformational state of peptides in solution. Interpreting the experimental data in terms of the conformational populations of the peptides remains a key problem, for which current solutions leave appreciable room for improvement. Considering that making this body of data available for surveys and analysis should be instrumental in tackling the problem, we undertook the development of Pescador: The 'PEptides in Solution ConformAtion Database: Online Resource'. Pescador contains data from NMR and CD spectroscopy on peptides in solution as well as information on the structural parameters derived from these data. It also features specialized Web-based tools for data deposition, and means for readily accessing the stored information for analysis purposes. To illustrate the use of the database in deriving information for the conformational analysis of peptides, we show how the alpha proton delta-values stored in Pescador and measured by NMR for different peptides in different laboratories can be used to derive a new set of 'random coil' chemical shift values. Firstly, we show these values to be very similar to those obtained experimentally for model peptides in water, and their variation with increasing Tri-Fluoro-Ethanol (TFE) concentration is similar to that reported for model peptides. We show, furthermore, that the chemical shift data in Pescador can be used to derive correction factors that take into account effects of neighboring residues. These correction factors compare favorably with those recently derived from a series of model GGXGG peptides (Schwarzinger et al., 2001). These encouraging results suggest that, as the quantity of NMR data on peptide deposited in Pescador increases, surveys of these data should be a valuable means of deriving key parameters for the analysis of peptide conformation.
A set of conserved water positions making direct contacts with the alpha1 and alpha2 domains of the MHC class-I protein was identified by a cluster analysis in 12 high-resolution crystal structures of proteins from different allele types and different species, comprising human, mouse and rat. The analysis revealed a total of 63 clusters, corresponding to water molecules, whose positions are conserved in half or more of the analyzed structures. Analysis of these clusters shows that the most conserved water positions-those appearing in the largest fraction of the structures-were also the most accurately defined, as measured by their normalized crystallographic B-factor. Not too surprisingly, these positions displayed better overlap and formed more H-bonds with the protein. In a second part of this work, a detailed analysis is presented of three of the most conserved water positions and their putative structural and functional roles are discussed. The most highly conserved of the three appears to play an important role in stabilizing the conformation of a twisted beta-turn between residues 118 and 122 (numbering of HLA-B3501, PDB code 1A1N). An equivalent water molecule was found to be associated with a similar beta-turn in 43 unrelated structures surveyed in the PDB, leading to the suggestion that this water molecule plays an important structural role in this type of turn. The second water molecule makes hydrogen bonds with residues lining pocket B in the peptide-binding groove and is suggested to play a role in modulating peptide recognition. The third highly conserved water molecule is located at the first kink of the alpha2 helix, possibly playing a role in determining the position of the N-terminal segment of that helix, which also carries side chains in contact with the bound peptide. This information on conserved water positions in MHC class-I molecules should be helpful in modeling interactions with bound peptide antigens and in designing new peptides with tailor-made affinities.
MOTIVATION: Comparing the 3D structures of two proteins or analyzing the structural changes undergone by a protein upon ligand binding or when it crystallizes under different conditions, can be both tricky and tedious, especially when the two proteins are distantly related, or when the structural changes are complex. Readily accessible tools for performing these tasks automatically and reliably should therefore be welcome. RESULTS: We describe a web interface to several automatic procedures for performing pairwise structure superposition in a flexible manner, for detailed analyses of conformational changes and for displaying the results in a pictorial fashion. AVAILABILITY: This interface can be accessed at the Brussels and Cuba Web sites, respectively: http: //www.ucmb.ulb.ac.be/SCMBB/Tools.htmland http: //bio.cigb.edu.cu.
The program SFCHECK [Vaguine et al. (1999), Acta Cryst. D55, 191-205] is used to survey the quality of the structure-factor data and the agreement of those data with the atomic coordinates in 105 nucleic acid crystal structures for which structure-factor amplitudes have been deposited in the Nucleic Acid Database [NDB; Berman et al. (1992), Biophys. J. 63, 751-759]. Nucleic acid structures present a particular challenge for structure-quality evaluations. The majority of these structures, and DNA molecules in particular, have been solved by molecular replacement of the double-helical motif, whose high degree of symmetry can lead to problems in positioning the molecule in the unit cell. In this paper, the overall quality of each structure was evaluated using parameters such as the R factor, the correlation coefficient and various atomic error estimates. In addition, each structure is characterized by the average values of several local quality indicators, which include the atomic displacement, the density correlation, the B factor and the density index. The latter parameter measures the relative electron-density level at the atomic position. In order to assess the quality of the model in specific regions, the same local quality indicators are also surveyed for individual groups of atoms in each structure. Several of the global quality indicators are found to vary linearly with resolution and less than a dozen structures are found to exhibit values significantly different from the mean for these indicators, showing that the quality of the nucleic acid structures tends to be rather uniform. Analysis of the mutual dependence of the values of different local quality indicators, computed for individual residues and atom groups, reveals that these indicators essentially complement each other and are not redundant with the B factor. Using several of these indicators, it was found that the atomic coordinates of the nucleic acid bases tend to be better defined than those of the backbone. One of the local indicators, the density index, is particularly useful in spotting regions of the model that fit poorly in the electron density. Using this parameter, the quality of crystallographic water positions in the analyzed structures was surveyed and it was found that a sizable fraction of these positions have poorly defined electron density and may therefore not be reliable. The possibility that cases of poorly positioned water molecules are symptomatic of more widespread problems with the structure as a whole is also raised.
This review describes computational procedures for deriving the amino acid sequences that are compatible with a given protein backbone structure. Such procedures can be used to gain insight into the constraints imposed by the 3D structure of the protein sequence, or to design proteins that are likely to adopt a given backbone conformation. We start by presenting a short overview of the various types of approaches to protein design developed over more than a decade. This is followed by a more detailed presentation of a recently developed sequence selection procedure DESIGNER. This latter presentation illustrates the basic principles underlying this type of procedures, described what they may teach us when applied to small proteins, and highlights issues that need to be addressed in order to go forward.
The thyrotropin (TSH) receptor is an interesting model to study G protein-coupled receptor activation as many point mutations can significantly increase its basal activity. Here, we identified a molecular interaction between Asp(633) in transmembrane helix 6 (TM6) and Asn(674) in TM7 of the TSHr that is crucial to maintain the inactive state through conformational constraint of the Asn. We show that these residues are perfectly conserved in the glycohormone receptor family, except in one case, where they are exchanged, suggesting a direct interaction. Molecular modeling of the TSHr, based on the high resolution structure of rhodopsin, strongly favors this hypothesis. Our approach combining site-directed mutagenesis with molecular modeling shows that mutations disrupting this interaction, like the D633A mutation in TM6, lead to high constitutive activation. The strongly activating N674D (TM7) mutation, which in our modeling breaks the TM6-TM7 link, is reverted to wild type-like behavior by an additional D633N mutation (TM6), which would restore this link. Moreover, we show that the Asn of TM7 (conserved in most G protein-coupled receptors) is mandatory for ligand-induced cAMP accumulation, suggesting an active role of this residue in activation. In the TSHr, the conformation of this Asn residue of TM7 would be constrained, in the inactive state, by its Asp partner in TM6.
Standard volumes for atoms in double-stranded B-DNA are derived using high resolution crystal structures from the Nucleic Acid Database (NDB) and compared with corresponding values derived from crystal structures of small organic compounds in the Cambridge Structural Database (CSD). Two different methods are used to compute these volumes: the classical Voronoi method, which does not depend on the size of atoms, and the related Radical Planes method which does. Results show that atomic groups buried in the interior of double-stranded DNA are, on average, more tightly packed than in related small molecules in the CSD. The packing efficiency of DNA atoms at the interfaces of 25 high resolution protein-DNA complexes is determined by computing the ratios between the volumes of interfacial DNA atoms and the corresponding standard volumes. These ratios are found to be close to unity, indicating that the DNA atoms at protein-DNA interfaces are as closely packed as in crystals of B-DNA. Analogous volume ratios, computed for buried protein atoms, are also near unity, confirming our earlier conclusions that the packing efficiency of these atoms is similar to that in the protein interior. In addition, we examine the number, volume and solvent occupation of cavities located at the protein-DNA interfaces and compared them with those in the protein interior. Cavities are found to be ubiquitous in the interfaces as well as inside the protein moieties. The frequency of solvent occupation of cavities is however higher in the interfaces, indicating that those are more hydrated than protein interiors. Lastly, we compare our results with those obtained using two different measures of shape complementarity of the analysed interfaces, and find that the correlation between our volume ratios and these measures, as well as between the measures themselves, is weak. Our results indicate that a tightly packed environment made up of DNA, protein and solvent atoms plays a significant role in protein-DNA recognition.
CCR5 is a G-protein-coupled receptor activated by the chemokines RANTES (regulated on activation normal T cell expressed and secreted), macrophage inflammatory protein 1alpha and 1beta, and monocyte chemotactic protein 2 and is the main co-receptor for the macrophage-tropic human immunodeficiency virus strains. We have identified a sequence motif (TXP) in the second transmembrane helix of chemokine receptors and investigated its role by theoretical and experimental approaches. Molecular dynamics simulations of model alpha-helices in a nonpolar environment were used to show that a TXP motif strongly bends these helices, due to the coordinated action of the proline, which kinks the helix, and of the threonine, which further accentuates this structural deformation. Site-directed mutagenesis of the corresponding Pro and Thr residues in CCR5 allowed us to probe the consequences of these structural findings in the context of the whole receptor. The P84A mutation leads to a decreased binding affinity for chemokines and nearly abolishes the functional response of the receptor. In contrast, mutation of Thr-82(2.56) into Val, Ala, Cys, or Ser does not affect chemokine binding. However, the functional response was found to depend strongly on the nature of the substituted side chain. The rank order of impairment of receptor activation is P84A > T82V > T82A > T82C > T82S. This ranking of impairment parallels the bending of the alpha-helix observed in the molecular simulation study.
The most abundant alpha-amylase inhibitor (AAI) present in the seeds of Amaranthus hypochondriacus, a variety of the Mexican crop plant amaranth, is the smallest polypeptide (32 residues) known to inhibit alpha-amylase activity of insect larvae while leaving that of mammals unaffected. In solution, 1H NMR reveals that AAI isolated from amaranth seeds adopts a major trans (70%) and minor cis (30%) conformation, resulting from slow cis-trans isomerization of the Val15-Pro16 peptide bond. Both solution structures have been determined using 2D 1H-NMR spectroscopy and XPLOR followed by restrained energy refinement in the consistent-valence force field. For the major isomer, a total of 563 distance restraints, including 55 medium-range and 173 long-range ones, were available from the NOESY spectra. This rather large number of constraints from a protein of such a small size results from a compact fold, imposed through three disulfide bridges arranged in a cysteine-knot motif. The structure of the minor cis isomer has also been determined using a smaller constraint set. It reveals a different backbone conformation in the Pro10-Pro20 segment, while preserving the overall global fold. The energy-refined ensemble of the major isomer, consisting of 20 low-energy conformers with an average backbone rmsd of 0.29 +/- 0.19 A and no violations larger than 0.4 A, represents a considerable improvement in precision over a previously reported and independently performed calculation on AAI obtained through solid-phase synthesis, which was determined with only half the number of medium-range and long-range restraints reported here, and featured the trans isomer only. The resulting differences in ensemble precision have been quantified locally and globally, indicating that, for regions of the backbone and a good fraction of the side chains, the conformation is better defined in the new solution structure. Structural comparison of the solution structure with the X-ray structure of the inhibitor when bound to its alpha-amylase target in Tenebrio molitor shows that the backbone conformation is only slightly adjusted on complexation, while that of the side chains involved in protein-protein contacts is similar to those present in solution. Therefore, the overall conformation of AAI appears to be predisposed to binding to its target alpha-amylase, confirming the view that it acts as a lid on top of the alpha-amylase active site.
A fully automatic procedure for predicting the amino acid sequences compatible with a given target structure is described. It is based on the CHARMM package, and uses an all atom force-field and rotamer libraries to describe and evaluate side-chain types and conformations. Sequences are ranked by a quantity akin to the free energy of folding, which incorporates hydration effects. Exact (Branch and Bound) and heuristic optimisation procedures are used to identifying highly scoring sequences from an astronomical number of possibilities. These sequences include the minimum free energy sequence, as well as all amino acid sequences whose free energy lies within a specified window from the minimum. Several applications of our procedure are illustrated. Prediction of side-chain conformations for a set of ten proteins yields results comparable to those of established side-chain placement programs. Applications to sequence optimisation comprise the re-design of the protein cores of c-Crk SH3 domain, the B1 domain of protein G and Ubiquitin, and of surface residues of the SH3 domain. In all calculations, no restrictions are imposed on the amino acid composition and identical parameter settings are used for core and surface residues. The best scoring sequences for the protein cores are virtually identical to wild-type. They feature no more than one to three mutations in a total of 11-16 variable positions. Tests suggest that this is due to the balance between various contributions in the force-field rather than to overwhelming influence from packing constraints. The effectiveness of our force-field is further supported by the sequence predictions for surface residues of the SH3 domain. More mutations are predicted than in the core, seemingly in order to optimise the network of complementary interactions between polar and charged groups. This appears to be an important energetic requirement in absence of the partner molecules with which the SH3 domain interacts, which were not included in the calculations. Finally, a detailed comparison between the sequences generated by the heuristic and exact optimisation algorithms, commends a note of caution concerning the efficiency of heuristic procedures in exploring sequence space. Copyright 2000 Academic Press.
The clearance of seven different ligands from the deeply buried active-site of Torpedo californica acetylcholinesterase is investigated by combining multiple copy sampling molecular dynamics simulations, with the analysis of protein-ligand interactions, protein motion and the electrostatic potential sampled by the ligand copies along their journey outwards. The considered ligands are the cations ammonium, methylammonium, and tetramethylammonium, the hydrophobic methane and neopentane, and the anionic product acetate and its neutral form, acetic acid. We find that the pathways explored by the different ligands vary with ligand size and chemical properties. Very small ligands, such as ammonium and methane, exit through several routes. One involves the main exit through the mouth of the enzyme gorge, another is through the so-called back door near Trp84, and a third uses a side door at a direction of approximately 45 degrees to the main exit. The larger polar ligands, methylammonium and acetic acid, leave through the main exit, but the bulkiest, tetramethylammonium and neopentane, as well as the smaller acetate ion, remain trapped in the enzyme gorge during the time of the simulations. The pattern of protein-ligand contacts during the diffusion process is highly non-random and differs for different ligands. A majority is made with aromatic side-chains, but classical H-bonds are also formed. In the case of acetate, but not acetic acid, the anionic and neutral form, respectively, of one of the reaction products, specific electrostatic interactions with protein groups, seem to slow ligand motion and interfere with protein flexibility; protonation of the acetate ion is therefore suggested to facilitate clearance. The Poisson-Boltzmann formalism is used to compute the electrostatic potential of the thermally fluctuating acetylcholinesterase protein at positions actually visited by the diffusing ligand copies. Ligands of different charge and size are shown to sample somewhat different electrostatic potentials during their migration, because they explore different microscopic routes. The potential along the clearance route of a cation such as methylammonium displays two clear minima at the active and peripheral anionic site. We find moreover that the electrostatic energy barrier that the cation needs to overcome when moving between these two sites is small in both directions, being of the order of the ligand kinetic energy. The peripheral site thus appears to play a role in trapping inbound cationic ligands as well as in cation clearance, and hence in product release. Copyright 2000 Academic Press.
Torpedo acetylcholinesterase is irreversibly inactivated by modifying a buried free cysteine, Cys231, with sulfhydryl reagents. The stability of the enzyme, as monitored by measuring the rate of inactivation, was reduced by mutating a leucine, Leu282, to a smaller amino acid residue. Leu282 is located within the "peripheral" anionic site, at the entrance to the active-site gorge. Thus, loss of activity was due to the increased reactivity of Cys231. This was paralleled by an increased susceptibility to thermal denaturation, which was shown to be due to a large decrease in the activation enthalpy. Similar results were obtained when either of two other residues in contact with Leu282 in Torpedo acetylcholinesterase, Trp279 and Ser291, was replaced by an amino acid with a smaller side chain. We studied the effects of various ligands specific for either the active or peripheral sites on both thermal inactivation and on inactivation by 4,4'-dithiodipyridine. The wild-type and mutated enzymes could be either protected or sensitized. In some cases, opposite effects of the same ligand were observed for chemical modification and thermal denaturation. The mutated residues are within a conserved loop, W279-S291, at the top of the active-site gorge, that contributes to the peripheral anionic site. Theoretical analysis showed that Torpedo acetylcholinesterase consists of two structural domains, each comprising one contiguous polypeptide segment. The W279-S291 loop, located in the first domain, makes multiple contacts with the second domain across the active-site gorge. We postulate that the mutations to residues with smaller side chains destabilize the conserved loop, thus disrupting cross-gorge interactions and, ultimately, the entire structure.
A novel automatic procedure for identifying domains from protein atomic coordinates is presented. The procedure, termed STRUDL (STRUctural Domain Limits), does not take into account information on secondary structures and handles any number of domains made up of contiguous or non-contiguous chain segments. The core algorithm uses the Kernighan-Lin graph heuristic to partition the protein into residue sets which display minimum interactions between them. These interactions are deduced from the weighted Voronoi diagram. The generated partitions are accepted or rejected on the basis of optimized criteria, representing basic expected physical properties of structural domains. The graph heuristic approach is shown to be very effective, it approximates closely the exact solution provided by a branch and bound algorithm for a number of test proteins. In addition, the overall performance of STRUDL is assessed on a set of 787 representative proteins from the Protein Data Bank by comparison to domain definitions in the CATH protein classification. The domains assigned by STRUDL agree with the CATH assignments in at least 81% of the tested proteins. This result is comparable to that obtained previously using PUU (Holm and Sander, Proteins 1994;9: 256-268), the only other available algorithm designed to identify domains with any number of non-contiguous chain segments. A detailed discussion of the structures for which our assignments differ from those in CATH brings to light some clear inconsistencies between the concept of structural domains based on minimizing inter-domain interactions and that of delimiting structural motifs that represent acceptable folding topologies or architectures. Considering both concepts as complementary and combining them in a layered approach might be the way forward.
We analyzed the atomic models of 75 X-ray structures of protein-nucleic acid complexes with the aim of uncovering common properties. The interface area measured the extent of contact between the protein and nucleic acid. It was found to vary between 1120 and 5800 A2. Despite this wide variation, the interfaces in complexes of transcription factors with double-stranded DNA could be broken up into recognition modules where 12 +/- 3 nucleotides on the DNA side contact 24 +/- 6 amino acids on the protein side, with interface areas in the range 1600 +/- 400 A2. For enzymes acting on DNA, the recognition module is on average 600 A2 larger, due to the requirement of making an active site. As judged by its chemical and amino acid composition, the average protein surface in contact with the DNA is more polar than the solvent accessible surface or the typical protein-protein interface. The protein side is rich in positively charged groups from lysine and arginine side chains; on the DNA side the negative charges from phosphate groups dominate. Hydrogen bonding patterns were also analyzed, and we found one intermolecular hydrogen bond per 125 A2 of interface area in high-resolution structures. An equivalent number of polar interactions involved water molecules, which are generally abundant at protein-DNA interfaces. Calculations of Voronoi atomic volumes, performed in the presence and absence of water molecules, showed that protein atoms buried at the interface with DNA are on average as closely packed as in the protein interior. Water molecules contribute to the close packing, thereby mediating shape complementarity. Finally, conformational changes accompanying association were analyzed in 24 of the complexes for which the structure of the free protein was also available. On the DNA side the extent of deformation showed some correlation with the size of the interface area. On the protein side the type and size of the structural changes spanned a wide spectrum. Disorder-to-order transitions, domain movements, quaternary and tertiary changes were observed, and the largest changes occurred in complexes with large interfaces.
In this paper we present SFCHECK, a stand-alone software package that features a unified set of procedures for evaluating the structure-factor data obtained from X-ray diffraction experiments and for assessing the agreement of the atomic coordinates with these data. The evaluation is performed completely automatically, and produces a concise PostScript pictorial output similar to that of PROCHECK [Laskowski, MacArthur, Moss & Thornton (1993). J. Appl. Cryst. 26, 283-291], greatly facilitating visual inspection of the results. The required inputs are the structure-factor amplitudes and the atomic coordinates. Having those, the program summarizes relevant information on the deposited structure factors and evaluates their quality using criteria such as data completeness, structure-factor uncertainty and the optical resolution computed from the Patterson origin peak. The dependence of various parameters on the nominal resolution (d spacing) is also given. To evaluate the global agreement of the atomic model with the experimental data, the program recomputes the R factor, the correlation coefficient between observed and calculated structure-factor amplitudes and Rfree (when appropriate). In addition, it gives several estimates of the average error in the atomic coordinates. The local agreement between the model and the electron-density map is evaluated on a per-residue basis, considering separately the macromolecule backbone and side-chain atoms, as well as solvent atoms and heterogroups. Among the criteria are the normalized average atomic displacement, the local density correlation coefficient and the polymer chain connectivity. The possibility of computing these criteria using the omit-map procedure is also provided. The described software should be a valuable tool in monitoring the refinement procedure and in assessing structures deposited in databases.
Barnase, an extracellular endoribonuclease from Bacillus amyloliquefaciens, hydrolyses single-stranded RNA. Its very low catalytic activity toward GpN dinucleotides, where N stands for any nucleoside, is markedly increased when a phosphate is added to the 3'-end, as in GpNp. Here we investigate the conformational properties of GpA and GpAp in solution, in order to determine whether differences in these properties may be related to the changes in enzymatic activity. Two independent 1.3 ns molecular dynamics trajectories are generated for each dinucleotide in the presence of explicit water molecules and counter ions. These trajectories are analysed by monitoring molecular properties, such as the solvent accessible surface area, the distance and orientation between the bases, the behaviour of torsion angles and formation of intramolecular H-bonds. To identify relevant correlations between these parameters, statistical techniques, comprising multiple regression, clustering and discriminant analysis are used. Results show that GpA has a significant propensity to form folded conformations (approximately 50%), fostered by a small number of intramolecular H-bonds, whereas GpAp remains essentially extended. The latter behaviour seems to be due to an H-bond between the terminal phosphate and adenosine ribose group, which restricts rotation about the adenine Agamma angle. We also find that GpA folding is induced by a concerted motion of specific torsion angles, which is closely coupled to the formation of a network of flexible hydrogen bonds. Finally, on the basis of an expression for barnase KM, which incorporates the folded/extended conformational equilibria of the dinucleotide substrates, it is argued that our findings on the differences between these equilibria, can qualitatively rationalize the experimentally measured differences in enzymatic properties. Copyright 1998 Academic Press.
The geometrical properties of zinc binding sites in a data set of high quality protein crystal structures deposited in the Protein Data Bank have been examined to identify important differences between zinc sites that are directly involved in catalysis and those that play a structural role. Coordination angles in the zinc primary coordination sphere are compared with ideal values for each coordination geometry, and zinc coordination distances are compared with those in small zinc complexes from the Cambridge Structural Database as a guide of expected trends. We find that distances and angles in the primary coordination sphere are in general close to the expected (or ideal) values. Deviations occur primarily for oxygen coordinating atoms and are found to be mainly due to H-bonding of the oxygen coordinating ligand to protein residues, bidentate binding arrangements, and multi-zinc sites. We find that H-bonding of oxygen containing residues (or water) to zinc bound histidines is almost universal in our data set and defines the elec-His-Zn motif. Analysis of the stereochemistry shows that carboxyl elec-His-Zn motifs are geometrically rigid, while water elec-His-Zn motifs show the most geometrical variation. As catalytic motifs have a higher proportion of carboxyl elec atoms than structural motifs, they provide a more rigid framework for zinc binding. This is understood biologically, as a small distortion in the zinc position in an enzyme can have serious consequences on the enzymatic reaction. We also analyze the sequence pattern of the zinc ligands and residues that provide elecs, and identify conserved hydrophobic residues in the endopeptidases that also appear to contribute to stabilizing the catalytic zinc site. A zinc binding template in protein crystal structures is derived from these observations.
A fully automatic classification procedure of short protein fragments is applied to identify connections between alpha-helices and beta-strands in a data set of 141 protein chains. It yields 15 structural families of alphabeta turns and 15 families of betaalpha turns with at least five members. The sequence and structural features of these turn motifs are analysed with the focus on the local interactions located at alpha-helix and beta-strand ends. This analysis reveals specific interaction patterns that occur frequently among the members of many of the identified turn motifs. For the beta-strands, novel patterns are identified at the strands' entry and exit; they involve side chain/side chain contacts and beta-turns, generally of type I or II. For the alpha-helices, the interaction patterns consist of several backbone/backbone or backbone/side chain hydrogen bonds and of hydrophobic contacts; they generalize the well known N-terminal capping and C-terminal Schellman motifs. The interaction patterns at both ends of alpha-helices and beta-strands are found to constitute favourable structure motifs with low amino acid sequence specificity; their possible stabilizing role is discussed. Finally, the robustness of our classification procedure and of the description of N- and C-cap interaction patterns is validated by repeating our analysis on a larger data set of 381 protein chains and showing that the results are maintained.
BACKGROUND: The classical picture of the hydrophobic stabilization of proteins invokes a resemblance between the protein interior and nonpolar solvents, but the extent to which this is the case has often been questioned. The protein interior is believed to be at least as tightly packed as organic crystals, and was shown to have very low compressibility. There is also evidence that these properties are not uniform throughout the protein, and conflicting views exist on the nature of sidechain packing and on its influence on the properties of the protein. RESULTS: In order to probe the physical properties of the protein, the free energy associated with the formation of empty cavities has been evaluated for two proteins: barnase and T4 lysozyme. To this end, the likelihood of encountering such cavities was computed from room temperature molecular dynamics trajectories of these proteins in water. The free energy was evaluated in each protein taken as a whole and in submolecular regions. The computed free energies yielded information on the manner in which empty space is distributed in the system, while the latter undergoes thermal motion, a property hitherto not analyzed in heterogeneous media such as proteins. Our results showed that the free energy of cavity formation is higher in proteins than in both water and hexane, providing direct evidence that the native protein medium differs in fundamental ways from the two liquids. Furthermore, although the packing density was found to be higher in nonpolar regions of the protein than in polar ones, the free energy cost of forming atomic size cavities is significantly lower in nonpolar regions, implying that these regions contain larger chunks of empty space, thereby increasing the likelihood of containing atomic size packing defects. These larger empty spaces occur preferentially where buried hydrophobic sidechains belonging to secondary structures meet one another. These particular locations also appear to be more compressible than other parts of the core or surface of the protein. CONCLUSIONS: The cavity free energy calculations described here provide a much more detailed physical picture of the protein matrix than volume and packing calculations. According to this picture, the packing of hydrophobic sidechains is tight in the interior of the protein, but far from uniform. In particular, the packing is tighter in regions where the backbone forms less regular hydrogen-bonding interactions than at interfaces between secondary structure elements, where such interactions are fully developed. This may have important implications on the role of sidechain packing in protein folding and stability.
Standard ranges of atomic and residue volumes are computed in 64 highly resolved and well-refined protein crystal structures using the classical Voronoi procedure. Deviations of the atomic volumes from the standard values, evaluated as the volume Z-scores, are used to assess the quality of protein crystal structures. To score a structure globally, we compute the volume Z-score root mean square deviation (Z-score rms), which measures the average magnitude of the volume irregularities in the structure. We find that the Z-score rms decreases as the resolution and R-factor improve, consistent with the fact that these improvements generally reflect more accurate models. From the Z-score rms distribution in structures with a given resolution or R-factor, we determine the normal limits in Z-score rms values for structures solved at that resolution or R-factor. Structures whose Z-score rms exceeds these limits are considered as outliers. Such structures also exhibit unusual stereochemistry, as revealed by other analyses. Absolute Z-scores of individual atoms are used to identify problems in specific regions within a protein model. These Z-scores correlate fairly well with the atomic B-factors, and atoms having absolute Z-scores > 3, occur at or near regions in the model where programs such as PROCHECK identify unusual stereochemistry. Atomic volumes, themselves not directly restrained in crystallographic refinement, can thus provide an independent, rather sensitive, measure of the quality of a protein structure. The volume-based structure validation procedures are implemented in the program PROVE (PROtein Volume Evaluation), which is accessible through the World Wide Web.
The current status and future outlook of macromolecular structure databases and information handling, with particular reference to European databases, are reviewed. Issues concerning the efficiency with which data are represented, validated, archived and accessed are discussed in view of the fast growing body of information on structures of biological macromolecules.
A thermodynamic cycle is used to describe barnase catalysis, which considers explicitly the presence of different ionic states of the catalytic residues Glu-73 and His-102 in barnase during the enzyme-substrate recognition process. Reinterpretation of published experimental data using rate equations derived from this cycle provides estimates of the ionization constants of these catalytic side chains, in the free enzyme and in the barnase-GpA complex. In addition, the electrostatic properties of the barnase-d(CGAC) crystal complex and of a barnase-5'3'(AAGAAp)-O-methyl ester modeled complex are investigated by means of a continuum approach to account for solvent polarization effects. Taking GpA as a reference substrate, it is shown that increasing the length of the bound nucleotide induces pKa shifts in the catalytic side chains, which modulate the fraction of enzyme in the correct ionic form for achieving the transesterification reaction. The computed results are in good agreement with the experimental variation of the optimum pH of barnase activity. The present analysis underscores the influence of pH effects on the kcat and KM kinetic constants of barnase and provides the basic formalism for linking the effective kinetic parameters, which usually depend on the pH, to the theoretical estimates of the true kinetic constants.
An automatic procedure for the classification of short protein fragments, representing turn motifs between two consecutive secondary structures, is presented. This procedure has two steps. Fragments of given length are first grouped on the basis of their backbone dihedral angle values, and then clustered as a function of the root-mean-square deviation of their superimposed backbone atoms. The classification procedure identifies 63 families of turn motifs with at least five members, in a data set of 141 proteins. A detailed analysis is presented of the ten identified alpha alpha-turn families, of which four correspond to novel motifs. The sequence and structure features that characterize these families are described. It is found that some features are conserved within the fragments belonging to the same family, but their environment in the parent protein varies considerably. N-capping interactions and helix stop signals are encountered in a number of families, where they seem to stabilize the motif conformation. In two families, one with three residues in the loop, and one with four, an appreciable fraction of the members displays both types of characteristic helix end interactions in the same motif. Interestingly, contrary to most other alpha alpha-turns, the relative frequency of these two motifs is much higher than that of short protein segments with the same loop conformation. Furthermore, the family with three residues in the loop includes the helix-turn-helix motif known to bind DNA. It seems to be the only one among the ten identified families that can be related to biological function.
This paper evaluates the results of a protein structure prediction contest. The predictions were made using threading procedures, which employ techniques for aligning sequences with 3D structures to select the correct fold of a given sequence from a set of alternatives. Nine different teams submitted 86 predictions, on a total of 21 target proteins with little or no sequence homology to proteins of known structure. The 3D structures of these proteins were newly determined by experimental methods, but not yet published or otherwise available to the predictors. The predictions, made from the amino acid sequence alone, thus represent a genuine test of the current performance of threading methods. Only a subset of all the predictions is evaluated here. It corresponds to the 44 predictions submitted for the 11 target proteins seen to adopt known folds. The predictions for the remaining 10 proteins were not analyzed, although weak similarities with known folds may also exist in these proteins. We find that threading methods are capable of identifying the correct fold in many cases, but not reliably enough as yet. Every team predicts correctly a different set of targets, with virtually all targets predicted correctly by at least one team. Also, common folds such as TIM barrels are recognized more readily than folds with only a few known examples. However, quite surprisingly, the quality of the sequence-structure alignments, corresponding to correctly recognized folds, is generally very poor, as judged by comparison with the corresponding 3D structure alignments. Thus, threading can presently not be relied upon to derive a detailed 3D model from the amino acid sequence. This raises a very intriguing question: how is fold recognition achieved? Our analysis suggests that it may be achieved because threading procedures maximize hydrophobic interactions in the protein core, and are reasonably good at recognizing local secondary structure.
An automatic algorithm is presented for analyzing protein conformational changes such as those occurring upon substrate binding or in different crystal forms of the same protein. Using, as sole information, the atomic coordinates of a pair of protein structures, the procedure first generates structure alignments, which optimize the root-mean-square deviation of the backbone atoms. To this end, equivalent secondary structures and/or loops from both proteins are combined by a multiple linkage hierarchic clustering algorithm, which generates several intertwined clustering trees. Automatic analysis of these clustering trees is used to dissect the mechanism of the conformational change. It allows the identification of the static core, representing the collection of secondary structures which undergo no structural changes, as well as other entities which move like rigid bodies. It also permits the description of the movement of secondary structures or loops relative to this core or entities. USing this information, it can be inferred whether a particular conformational change involves shear or hinge motion, or components of both. The algorithm is applied to the analysis of the conformational changes of citrate synthase, lactate dehydrogenase, lactoferrin and beta-glucosyltransferase, representing typical examples of shear- and hinge-type mechanisms, and a varied range in movement size. The results are shown to be in excellent agreement with previous analyses, and to provide additional information which gives a more complete and objective picture of the conformational change. Using our automatic algorithm, we find that any conformational change may be viewed as having components of both shear- and hinge-type motion. Determining which of these is most appropriate requires the combination of the information provided by our procedure with detailed knowledge of the protein tertiary structures.
Database-derived potentials, compiled from frequencies of sequence and structure features, are often used for scoring the compatibility of protein sequences and conformations. It is often believed that these scores correspond to differences in free energy with, in addition, a term containing the partition function of the system. Since this function does not depend on the conformation, the potentials are considered to be valid for scoring the compatibility of different conformations with a given sequence ('forward folding'), but not of sequences with a given structure ('inverted folding'). This interpretation is questioned here. It is argued that when many body-effects, which dominate frequencies compiled from the protein database, are corrected for, the potentials approximate a physically meaningful free energy difference from which the partition function term cancels out. It is the difference between the free energy of a given sequence in a specific conformation and that of the same sequence in a denatured-like state. Two examples of denatured-like states are discussed. Depending on the considered state, the free energy difference reduces to the commonly used scoring scheme, or contains additional terms that depend on the sequence. In both cases, all the terms can be derived from sequence-structure frequencies in the database. Such free energy difference, commonly defined as the folding free energy, is a measure of protein stability and can be used for scoring both forward and inverted protein folding. The implications for the use of knowledge-based potentials in protein structure prediction are described. Finally, the difficulty of designing tests that could validate the proposed approach, and the inherent limitations of such tests, are discussed.
BACKGROUND: Leucine-rich repeats (LRRs) are present in proteins with diverse functions. The horseshoe-shaped structure of a ribonuclease inhibitor (RI), with a parallel beta sheet lining the inner circumference of the horseshoe and alpha helices flanking its outer circumference, is the only X-ray structure containing these repeats to be determined. Despite the fact that the lengths and sequences of the RI repeats differ from those of the most commonly occurring LRRs, it was deemed worthwhile to derive a three-dimensional structural framework of these more typical LRR proteins, using the RI structure as a template. RESULTS: Sequence alignments of 569 LRRs from 68 proteins were obtained by a profile search and used in a comparative sequence analysis to distinguish between residues with a probable structural role and those which seemed essential for function. This knowledge, along with the known atomic structure of RI, was used to model the three-dimensional structure of the most common LRR units. These modeled units were then used to build the three-dimensional structure of the extracellular domain of the thyrotropin receptor (TSHR)--a 'typical' LRR protein. CONCLUSIONS: The modeled TSHR structure adopts a non-globular arrangement, similar to that in RI. The beta regions of this typical LRR protein are the same as in the RI structure, whereas the alpha helices are shorter and the conformations of the alpha beta and beta alpha connections are different. As a result of these differences it was not possible to pack together typical LRR units using repeats such as those found in RI. This mutually exclusive relationship is supported by sequence analysis. The predicted structure of the typical LRRs obtained here can be used to build models for any of the known LRR proteins and the approach used for the prediction could be applied to other proteins containing internal repeats.
A fully automatic procedure for aligning two protein structures is presented. It uses as sole structural similarity measure the root mean square (r.m.s.) deviation of superimposed backbone atoms (N, C alpha, C and O) and is designed to yield optimal solutions with respect to this measure. In a first step, the procedure identifies protein segments with similar conformations in both proteins. In a second step, a novel multiple linkage clustering algorithm is used to identify segment combinations which yield optimal global structure alignments. Several structure alignments can usually be obtained for a given pair of proteins, which are exploited here to define automatically the common structural core of a protein family. Furthermore, an automatic analysis of the clustering trees is described which enables detection of rigid-body movements between structure elements. To illustrate the performance of our procedure, we apply it to families of distantly related proteins. One groups the three alpha + beta proteins ubiquitin, ferredoxin and the B1-domain of protein G. Their common structure motif consists of four beta-strands and the only alpha-helix, with one strand and the helix being displaced as a rigid body relative to the remaining three beta-strands. The other family consists of beta-proteins from the Greek key group, in particular actinoxanthin, the immunoglobulin variable domain and plastocyanin. Their consensus motif, composed of five beta-strands and a turn, is identified, mostly intact, in all Greek key proteins except the trypsins, and interestingly also in three other beta-protein families, the lipocalins, the neuraminidases and the lectins. This result provides new insights into the evolutionary relationships in the very diverse group of all beta-proteins.
Molecular dynamics simulations are used to investigate the unfolding reaction of an isolated beta-hairpin formed by residues 85 to 102 of barnase, a ribonuclease from Bacillus amyloliquefaciens. This peptide was considered following evidence from experimental studies that it may act as an initiation site for barnase folding by adopting a native-like conformation early during the folding process. Three successive molecular dynamics simulations of about 300 ps each were carried out for an all-atom model of the hairpin in water at 300 K, 450 K, and 600 K, respectively. A detailed analysis of all three simulations is presented. In particular we investigate the behavior of the backbone hydrogen bonds, and of hydrophobic interactions between side-chains, where distinction is made between contributions from native and non-native contacts, respectively. Furthermore, we investigate peptide water interactions and monitor the presence and size of empty cavities. The behavior of the hairpin in the three simulations, when considered sequentially, describes a process whereby a native-like conformation evolves to an unfolded state. Unfolding starts at the beginning of the 450 K simulation with the loss of two hydrogen bonds at the free hairpin extremities. At about the same time, the centrally located H-bonds are weakened and exchange more frequently with water, but the turn tightens up as the beta-sheet extends into the turn region. All this is accompanied by a volume expansion and the formation of a large hydrophobic side-chain cluster promoted by both native and highly fluctuating non-native apolar contacts involving residues 87 to 90 and 95 to 99. This collapsed but more loosely packed state, essentially stabilized by hydrophobic interactions, is stable throughout the entire 450 K simulation and for about 150 ps at 600 K, after which point it proceeds rapidly to completely denatured conformations. This behavior presents clear analogies with known features of the unfolding reaction of complete proteins. It may indicate that this beta-hairpin has a well-defined conformation on its own, which would be in agreement with its role as an initiation site for folding.
The solution structure of chlorotoxin, a small toxin purified from the venom of the Leiurus quinquestriatus scorpion, has been determined using 2D 1H NMR spectroscopy. Analysis of the NMR data shows that the structure consists of a small three-stranded antiparallel beta-sheet packed against an alpha-helix, thereby adopting the same fold as charybdotoxin and other members of the short scorpion toxin family [Arseniev et al. (1984) FEBS Lett. 165, 57-62; Martins et al. (1990) FEBS Lett. 260, 249-253; Bontems et al. (1991) Science 254, 1521-1523]. Three disulfide bonds of chlorotoxin (Cys5-Cys28, Cys16-Cys33, and Cys20-Cys35), cross-linking the alpha-helix to the beta-sheet, follow the common pattern found in the other short scorpion toxins. The fourth disulfide bridge (Cys2-Cys19) links the small N-terminal beta strand to the rest of the molecule, in contrast to charybdotoxin where this disulfide bridge is absent and the first strand interacts with the rest of the molecule by several contacts between hydrophobic residues. Another structural difference between chlorotoxin and charybdotoxin is observed at the level of the alpha-beta turn. This difference is accompanied by a change in the electrostatic potential surface, which is largely positive at the level of this turn in chlorotoxin, whereas no such positive potential surface can be found at the same position in charybdotoxin. In the latter protein, the positive surface is formed by different charged residues situated on the solvent-exposed site of the C-terminal beta-sheet.(ABSTRACT TRUNCATED AT 250 WORDS)
Four peptides corresponding to alpha-helical regions delimited by residues 63-73 and 97-112 of cytochrome c2 (Rhodospirillum) and residues 24-36 and 45-55 of bovine calcium binding protein are predicted to be alpha-helical by a recently developed method [Rooman, M., Kocher, J.P., & Wodak, S.J. (1991) J. Mol. Biol. 221, 961-979], synthesized by solid phase methods, and purified by HPLC, and their solution conformations are determined by NMR and CD. The observed conformational properties of these peptides in solution confirmed prediction results: in water/TFE (60/40, v/v) at room temperature, these peptides adopt an alpha-helical conformation, as shown by an extended pattern of strong, sequential dNN(i,i + 1) NOE cross-peaks, d alpha N(i,i + 1) NOEs of reduced intensity, several medium-range [d alpha N(i,i + 3), d alpha N(i,i + 4), d alpha beta-(i,i + 3)] NOE connectivities, small 3JH alpha N values, and more upfield alpha-proton chemical shifts. CD studies at different TFE concentrations and at room temperature provide further evidence of the propensity of these peptides to adopt an alpha-helical conformation in solution, as determined by the ellipticity values at 222 nm, and by deconvolution of the CD spectra. According to the method used, helicities in the range 34-50% and 55-75% are found for the 63-73 and 97-112 fragments of cytochrome c2, respectively, and in the range 53-80% and 42-65% for the fragments 24-36 and 45-55 of calcium binding protein in water/TFE (60/40, v/v) at 298 K. In addition, the experiments and predictions agree for those residues that are more flexible. Finally, the relevance of our results for the protein folding pathways is discussed.
The globular domain of chicken histone H1 (GH1) has been studied by 1H homonuclear and 1H-15N heteronuclear 2D NMR spectroscopy. After the full assignment of the proton and 15N resonances, the tertiary structure of GH1 was determined by an iterative procedure using distance geometry and restrained simulated annealing. The secondary structure elements of GH1, three helices (S5-A16, S24-A34, N42-K56) followed by a beta-hairpin (L59-L73), are folded in a manner very similar to the corresponding parts of the globular domain of chicken histone H5 (GH5) [Clore et al. (1987) EMBO J. 6, 1833-1842; Ramakrishnan et al. (1993) Nature 362, 219-223]. However, subtle differences are detected between the two structures and between the electrostatic potentials surrounding the molecules. The most important differences are located in the loop between the second and third helices, a region that could be responsible for the different affinity for DNA. The most positively charged regions are not found in exactly the same position in GH1 and GH5. Nevertheless, their location seems to agree with the model where nucleosome binding takes place through contact points located at one DNA terminus and close to the dyad axis of the nucleosome [Schwabe & Travers (1993) Curr. Biol. 3, 628-630].
The interactions between HIV-1 protease and its bound inhibitors have been investigated by molecular mechanics calculations and by analysis of crystal structures of the complexes in order to determine general rules for inhibitor and substrate binding to the protease. Fifteen crystal structures of HIV-1 protease with different peptidomimetic inhibitors showed conservation of hydrogen bond interactions between the main chain C = O and NH groups of the inhibitors and the C = O and NH groups of the protease extending from P3 C = O to P3' NH. The mean length of