linked to PubMed where applicable.
In recent years a large body of data has been obtained from Nuclear Magnetic Resonance and Circular Dichroism experiments on the influence of the amino acid sequence and various other parameters on the conformational state of peptides in solution. Interpreting the experimental data in terms of the conformational populations of the peptides remains a key problem, for which current solutions leave appreciable room for improvement. Considering that making this body of data available for surveys and analysis should be instrumental in tackling the problem, we undertook the development of Pescador: The 'PEptides in Solution ConformAtion Database: Online Resource'. Pescador contains data from NMR and CD spectroscopy on peptides in solution as well as information on the structural parameters derived from these data. It also features specialized Web-based tools for data deposition, and means for readily accessing the stored information for analysis purposes. To illustrate the use of the database in deriving information for the conformational analysis of peptides, we show how the alpha proton delta-values stored in Pescador and measured by NMR for different peptides in different laboratories can be used to derive a new set of 'random coil' chemical shift values. Firstly, we show these values to be very similar to those obtained experimentally for model peptides in water, and their variation with increasing Tri-Fluoro-Ethanol (TFE) concentration is similar to that reported for model peptides. We show, furthermore, that the chemical shift data in Pescador can be used to derive correction factors that take into account effects of neighboring residues. These correction factors compare favorably with those recently derived from a series of model GGXGG peptides (Schwarzinger et al., 2001). These encouraging results suggest that, as the quantity of NMR data on peptide deposited in Pescador increases, surveys of these data should be a valuable means of deriving key parameters for the analysis of peptide conformation.
A set of conserved water positions making direct contacts with the alpha1 and alpha2 domains of the MHC class-I protein was identified by a cluster analysis in 12 high-resolution crystal structures of proteins from different allele types and different species, comprising human, mouse and rat. The analysis revealed a total of 63 clusters, corresponding to water molecules, whose positions are conserved in half or more of the analyzed structures. Analysis of these clusters shows that the most conserved water positions-those appearing in the largest fraction of the structures-were also the most accurately defined, as measured by their normalized crystallographic B-factor. Not too surprisingly, these positions displayed better overlap and formed more H-bonds with the protein. In a second part of this work, a detailed analysis is presented of three of the most conserved water positions and their putative structural and functional roles are discussed. The most highly conserved of the three appears to play an important role in stabilizing the conformation of a twisted beta-turn between residues 118 and 122 (numbering of HLA-B3501, PDB code 1A1N). An equivalent water molecule was found to be associated with a similar beta-turn in 43 unrelated structures surveyed in the PDB, leading to the suggestion that this water molecule plays an important structural role in this type of turn. The second water molecule makes hydrogen bonds with residues lining pocket B in the peptide-binding groove and is suggested to play a role in modulating peptide recognition. The third highly conserved water molecule is located at the first kink of the alpha2 helix, possibly playing a role in determining the position of the N-terminal segment of that helix, which also carries side chains in contact with the bound peptide. This information on conserved water positions in MHC class-I molecules should be helpful in modeling interactions with bound peptide antigens and in designing new peptides with tailor-made affinities.
MOTIVATION: Comparing the 3D structures of two proteins or analyzing the structural changes undergone by a protein upon ligand binding or when it crystallizes under different conditions, can be both tricky and tedious, especially when the two proteins are distantly related, or when the structural changes are complex. Readily accessible tools for performing these tasks automatically and reliably should therefore be welcome. RESULTS: We describe a web interface to several automatic procedures for performing pairwise structure superposition in a flexible manner, for detailed analyses of conformational changes and for displaying the results in a pictorial fashion. AVAILABILITY: This interface can be accessed at the Brussels and Cuba Web sites, respectively: http: //www.ucmb.ulb.ac.be/SCMBB/Tools.htmland http: //bio.cigb.edu.cu.
Standard volumes for atoms in double-stranded B-DNA are derived using high resolution crystal structures from the Nucleic Acid Database (NDB) and compared with corresponding values derived from crystal structures of small organic compounds in the Cambridge Structural Database (CSD). Two different methods are used to compute these volumes: the classical Voronoi method, which does not depend on the size of atoms, and the related Radical Planes method which does. Results show that atomic groups buried in the interior of double-stranded DNA are, on average, more tightly packed than in related small molecules in the CSD. The packing efficiency of DNA atoms at the interfaces of 25 high resolution protein-DNA complexes is determined by computing the ratios between the volumes of interfacial DNA atoms and the corresponding standard volumes. These ratios are found to be close to unity, indicating that the DNA atoms at protein-DNA interfaces are as closely packed as in crystals of B-DNA. Analogous volume ratios, computed for buried protein atoms, are also near unity, confirming our earlier conclusions that the packing efficiency of these atoms is similar to that in the protein interior. In addition, we examine the number, volume and solvent occupation of cavities located at the protein-DNA interfaces and compared them with those in the protein interior. Cavities are found to be ubiquitous in the interfaces as well as inside the protein moieties. The frequency of solvent occupation of cavities is however higher in the interfaces, indicating that those are more hydrated than protein interiors. Lastly, we compare our results with those obtained using two different measures of shape complementarity of the analysed interfaces, and find that the correlation between our volume ratios and these measures, as well as between the measures themselves, is weak. Our results indicate that a tightly packed environment made up of DNA, protein and solvent atoms plays a significant role in protein-DNA recognition.
A novel automatic procedure for identifying domains from protein atomic coordinates is presented. The procedure, termed STRUDL (STRUctural Domain Limits), does not take into account information on secondary structures and handles any number of domains made up of contiguous or non-contiguous chain segments. The core algorithm uses the Kernighan-Lin graph heuristic to partition the protein into residue sets which display minimum interactions between them. These interactions are deduced from the weighted Voronoi diagram. The generated partitions are accepted or rejected on the basis of optimized criteria, representing basic expected physical properties of structural domains. The graph heuristic approach is shown to be very effective, it approximates closely the exact solution provided by a branch and bound algorithm for a number of test proteins. In addition, the overall performance of STRUDL is assessed on a set of 787 representative proteins from the Protein Data Bank by comparison to domain definitions in the CATH protein classification. The domains assigned by STRUDL agree with the CATH assignments in at least 81% of the tested proteins. This result is comparable to that obtained previously using PUU (Holm and Sander, Proteins 1994;9: 256-268), the only other available algorithm designed to identify domains with any number of non-contiguous chain segments. A detailed discussion of the structures for which our assignments differ from those in CATH brings to light some clear inconsistencies between the concept of structural domains based on minimizing inter-domain interactions and that of delimiting structural motifs that represent acceptable folding topologies or architectures. Considering both concepts as complementary and combining them in a layered approach might be the way forward.
We analyzed the atomic models of 75 X-ray structures of protein-nucleic acid complexes with the aim of uncovering common properties. The interface area measured the extent of contact between the protein and nucleic acid. It was found to vary between 1120 and 5800 A2. Despite this wide variation, the interfaces in complexes of transcription factors with double-stranded DNA could be broken up into recognition modules where 12 +/- 3 nucleotides on the DNA side contact 24 +/- 6 amino acids on the protein side, with interface areas in the range 1600 +/- 400 A2. For enzymes acting on DNA, the recognition module is on average 600 A2 larger, due to the requirement of making an active site. As judged by its chemical and amino acid composition, the average protein surface in contact with the DNA is more polar than the solvent accessible surface or the typical protein-protein interface. The protein side is rich in positively charged groups from lysine and arginine side chains; on the DNA side the negative charges from phosphate groups dominate. Hydrogen bonding patterns were also analyzed, and we found one intermolecular hydrogen bond per 125 A2 of interface area in high-resolution structures. An equivalent number of polar interactions involved water molecules, which are generally abundant at protein-DNA interfaces. Calculations of Voronoi atomic volumes, performed in the presence and absence of water molecules, showed that protein atoms buried at the interface with DNA are on average as closely packed as in the protein interior. Water molecules contribute to the close packing, thereby mediating shape complementarity. Finally, conformational changes accompanying association were analyzed in 24 of the complexes for which the structure of the free protein was also available. On the DNA side the extent of deformation showed some correlation with the size of the interface area. On the protein side the type and size of the structural changes spanned a wide spectrum. Disorder-to-order transitions, domain movements, quaternary and tertiary changes were observed, and the largest changes occurred in complexes with large interfaces.
Barnase, an extracellular endoribonuclease from Bacillus amyloliquefaciens, hydrolyses single-stranded RNA. Its very low catalytic activity toward GpN dinucleotides, where N stands for any nucleoside, is markedly increased when a phosphate is added to the 3'-end, as in GpNp. Here we investigate the conformational properties of GpA and GpAp in solution, in order to determine whether differences in these properties may be related to the changes in enzymatic activity. Two independent 1.3 ns molecular dynamics trajectories are generated for each dinucleotide in the presence of explicit water molecules and counter ions. These trajectories are analysed by monitoring molecular properties, such as the solvent accessible surface area, the distance and orientation between the bases, the behaviour of torsion angles and formation of intramolecular H-bonds. To identify relevant correlations between these parameters, statistical techniques, comprising multiple regression, clustering and discriminant analysis are used. Results show that GpA has a significant propensity to form folded conformations (approximately 50%), fostered by a small number of intramolecular H-bonds, whereas GpAp remains essentially extended. The latter behaviour seems to be due to an H-bond between the terminal phosphate and adenosine ribose group, which restricts rotation about the adenine Agamma angle. We also find that GpA folding is induced by a concerted motion of specific torsion angles, which is closely coupled to the formation of a network of flexible hydrogen bonds. Finally, on the basis of an expression for barnase KM, which incorporates the folded/extended conformational equilibria of the dinucleotide substrates, it is argued that our findings on the differences between these equilibria, can qualitatively rationalize the experimentally measured differences in enzymatic properties. Copyright 1998 Academic Press.
The geometrical properties of zinc binding sites in a data set of high quality protein crystal structures deposited in the Protein Data Bank have been examined to identify important differences between zinc sites that are directly involved in catalysis and those that play a structural role. Coordination angles in the zinc primary coordination sphere are compared with ideal values for each coordination geometry, and zinc coordination distances are compared with those in small zinc complexes from the Cambridge Structural Database as a guide of expected trends. We find that distances and angles in the primary coordination sphere are in general close to the expected (or ideal) values. Deviations occur primarily for oxygen coordinating atoms and are found to be mainly due to H-bonding of the oxygen coordinating ligand to protein residues, bidentate binding arrangements, and multi-zinc sites. We find that H-bonding of oxygen containing residues (or water) to zinc bound histidines is almost universal in our data set and defines the elec-His-Zn motif. Analysis of the stereochemistry shows that carboxyl elec-His-Zn motifs are geometrically rigid, while water elec-His-Zn motifs show the most geometrical variation. As catalytic motifs have a higher proportion of carboxyl elec atoms than structural motifs, they provide a more rigid framework for zinc binding. This is understood biologically, as a small distortion in the zinc position in an enzyme can have serious consequences on the enzymatic reaction. We also analyze the sequence pattern of the zinc ligands and residues that provide elecs, and identify conserved hydrophobic residues in the endopeptidases that also appear to contribute to stabilizing the catalytic zinc site. A zinc binding template in protein crystal structures is derived from these observations.
A fully automatic classification procedure of short protein fragments is applied to identify connections between alpha-helices and beta-strands in a data set of 141 protein chains. It yields 15 structural families of alphabeta turns and 15 families of betaalpha turns with at least five members. The sequence and structural features of these turn motifs are analysed with the focus on the local interactions located at alpha-helix and beta-strand ends. This analysis reveals specific interaction patterns that occur frequently among the members of many of the identified turn motifs. For the beta-strands, novel patterns are identified at the strands' entry and exit; they involve side chain/side chain contacts and beta-turns, generally of type I or II. For the alpha-helices, the interaction patterns consist of several backbone/backbone or backbone/side chain hydrogen bonds and of hydrophobic contacts; they generalize the well known N-terminal capping and C-terminal Schellman motifs. The interaction patterns at both ends of alpha-helices and beta-strands are found to constitute favourable structure motifs with low amino acid sequence specificity; their possible stabilizing role is discussed. Finally, the robustness of our classification procedure and of the description of N- and C-cap interaction patterns is validated by repeating our analysis on a larger data set of 381 protein chains and showing that the results are maintained.
BACKGROUND: The classical picture of the hydrophobic stabilization of proteins invokes a resemblance between the protein interior and nonpolar solvents, but the extent to which this is the case has often been questioned. The protein interior is believed to be at least as tightly packed as organic crystals, and was shown to have very low compressibility. There is also evidence that these properties are not uniform throughout the protein, and conflicting views exist on the nature of sidechain packing and on its influence on the properties of the protein. RESULTS: In order to probe the physical properties of the protein, the free energy associated with the formation of empty cavities has been evaluated for two proteins: barnase and T4 lysozyme. To this end, the likelihood of encountering such cavities was computed from room temperature molecular dynamics trajectories of these proteins in water. The free energy was evaluated in each protein taken as a whole and in submolecular regions. The computed free energies yielded information on the manner in which empty space is distributed in the system, while the latter undergoes thermal motion, a property hitherto not analyzed in heterogeneous media such as proteins. Our results showed that the free energy of cavity formation is higher in proteins than in both water and hexane, providing direct evidence that the native protein medium differs in fundamental ways from the two liquids. Furthermore, although the packing density was found to be higher in nonpolar regions of the protein than in polar ones, the free energy cost of forming atomic size cavities is significantly lower in nonpolar regions, implying that these regions contain larger chunks of empty space, thereby increasing the likelihood of containing atomic size packing defects. These larger empty spaces occur preferentially where buried hydrophobic sidechains belonging to secondary structures meet one another. These particular locations also appear to be more compressible than other parts of the core or surface of the protein. CONCLUSIONS: The cavity free energy calculations described here provide a much more detailed physical picture of the protein matrix than volume and packing calculations. According to this picture, the packing of hydrophobic sidechains is tight in the interior of the protein, but far from uniform. In particular, the packing is tighter in regions where the backbone forms less regular hydrogen-bonding interactions than at interfaces between secondary structure elements, where such interactions are fully developed. This may have important implications on the role of sidechain packing in protein folding and stability.
Standard ranges of atomic and residue volumes are computed in 64 highly resolved and well-refined protein crystal structures using the classical Voronoi procedure. Deviations of the atomic volumes from the standard values, evaluated as the volume Z-scores, are used to assess the quality of protein crystal structures. To score a structure globally, we compute the volume Z-score root mean square deviation (Z-score rms), which measures the average magnitude of the volume irregularities in the structure. We find that the Z-score rms decreases as the resolution and R-factor improve, consistent with the fact that these improvements generally reflect more accurate models. From the Z-score rms distribution in structures with a given resolution or R-factor, we determine the normal limits in Z-score rms values for structures solved at that resolution or R-factor. Structures whose Z-score rms exceeds these limits are considered as outliers. Such structures also exhibit unusual stereochemistry, as revealed by other analyses. Absolute Z-scores of individual atoms are used to identify problems in specific regions within a protein model. These Z-scores correlate fairly well with the atomic B-factors, and atoms having absolute Z-scores > 3, occur at or near regions in the model where programs such as PROCHECK identify unusual stereochemistry. Atomic volumes, themselves not directly restrained in crystallographic refinement, can thus provide an independent, rather sensitive, measure of the quality of a protein structure. The volume-based structure validation procedures are implemented in the program PROVE (PROtein Volume Evaluation), which is accessible through the World Wide Web.
The current status and future outlook of macromolecular structure databases and information handling, with particular reference to European databases, are reviewed. Issues concerning the efficiency with which data are represented, validated, archived and accessed are discussed in view of the fast growing body of information on structures of biological macromolecules.
An automatic procedure for the classification of short protein fragments, representing turn motifs between two consecutive secondary structures, is presented. This procedure has two steps. Fragments of given length are first grouped on the basis of their backbone dihedral angle values, and then clustered as a function of the root-mean-square deviation of their superimposed backbone atoms. The classification procedure identifies 63 families of turn motifs with at least five members, in a data set of 141 proteins. A detailed analysis is presented of the ten identified alpha alpha-turn families, of which four correspond to novel motifs. The sequence and structure features that characterize these families are described. It is found that some features are conserved within the fragments belonging to the same family, but their environment in the parent protein varies considerably. N-capping interactions and helix stop signals are encountered in a number of families, where they seem to stabilize the motif conformation. In two families, one with three residues in the loop, and one with four, an appreciable fraction of the members displays both types of characteristic helix end interactions in the same motif. Interestingly, contrary to most other alpha alpha-turns, the relative frequency of these two motifs is much higher than that of short protein segments with the same loop conformation. Furthermore, the family with three residues in the loop includes the helix-turn-helix motif known to bind DNA. It seems to be the only one among the ten identified families that can be related to biological function.
An automatic algorithm is presented for analyzing protein conformational changes such as those occurring upon substrate binding or in different crystal forms of the same protein. Using, as sole information, the atomic coordinates of a pair of protein structures, the procedure first generates structure alignments, which optimize the root-mean-square deviation of the backbone atoms. To this end, equivalent secondary structures and/or loops from both proteins are combined by a multiple linkage hierarchic clustering algorithm, which generates several intertwined clustering trees. Automatic analysis of these clustering trees is used to dissect the mechanism of the conformational change. It allows the identification of the static core, representing the collection of secondary structures which undergo no structural changes, as well as other entities which move like rigid bodies. It also permits the description of the movement of secondary structures or loops relative to this core or entities. USing this information, it can be inferred whether a particular conformational change involves shear or hinge motion, or components of both. The algorithm is applied to the analysis of the conformational changes of citrate synthase, lactate dehydrogenase, lactoferrin and beta-glucosyltransferase, representing typical examples of shear- and hinge-type mechanisms, and a varied range in movement size. The results are shown to be in excellent agreement with previous analyses, and to provide additional information which gives a more complete and objective picture of the conformational change. Using our automatic algorithm, we find that any conformational change may be viewed as having components of both shear- and hinge-type motion. Determining which of these is most appropriate requires the combination of the information provided by our procedure with detailed knowledge of the protein tertiary structures.
BACKGROUND: Leucine-rich repeats (LRRs) are present in proteins with diverse functions. The horseshoe-shaped structure of a ribonuclease inhibitor (RI), with a parallel beta sheet lining the inner circumference of the horseshoe and alpha helices flanking its outer circumference, is the only X-ray structure containing these repeats to be determined. Despite the fact that the lengths and sequences of the RI repeats differ from those of the most commonly occurring LRRs, it was deemed worthwhile to derive a three-dimensional structural framework of these more typical LRR proteins, using the RI structure as a template. RESULTS: Sequence alignments of 569 LRRs from 68 proteins were obtained by a profile search and used in a comparative sequence analysis to distinguish between residues with a probable structural role and those which seemed essential for function. This knowledge, along with the known atomic structure of RI, was used to model the three-dimensional structure of the most common LRR units. These modeled units were then used to build the three-dimensional structure of the extracellular domain of the thyrotropin receptor (TSHR)--a 'typical' LRR protein. CONCLUSIONS: The modeled TSHR structure adopts a non-globular arrangement, similar to that in RI. The beta regions of this typical LRR protein are the same as in the RI structure, whereas the alpha helices are shorter and the conformations of the alpha beta and beta alpha connections are different. As a result of these differences it was not possible to pack together typical LRR units using repeats such as those found in RI. This mutually exclusive relationship is supported by sequence analysis. The predicted structure of the typical LRRs obtained here can be used to build models for any of the known LRR proteins and the approach used for the prediction could be applied to other proteins containing internal repeats.
Four peptides corresponding to alpha-helical regions delimited by residues 63-73 and 97-112 of cytochrome c2 (Rhodospirillum) and residues 24-36 and 45-55 of bovine calcium binding protein are predicted to be alpha-helical by a recently developed method [Rooman, M., Kocher, J.P., & Wodak, S.J. (1991) J. Mol. Biol. 221, 961-979], synthesized by solid phase methods, and purified by HPLC, and their solution conformations are determined by NMR and CD. The observed conformational properties of these peptides in solution confirmed prediction results: in water/TFE (60/40, v/v) at room temperature, these peptides adopt an alpha-helical conformation, as shown by an extended pattern of strong, sequential dNN(i,i + 1) NOE cross-peaks, d alpha N(i,i + 1) NOEs of reduced intensity, several medium-range [d alpha N(i,i + 3), d alpha N(i,i + 4), d alpha beta-(i,i + 3)] NOE connectivities, small 3JH alpha N values, and more upfield alpha-proton chemical shifts. CD studies at different TFE concentrations and at room temperature provide further evidence of the propensity of these peptides to adopt an alpha-helical conformation in solution, as determined by the ellipticity values at 222 nm, and by deconvolution of the CD spectra. According to the method used, helicities in the range 34-50% and 55-75% are found for the 63-73 and 97-112 fragments of cytochrome c2, respectively, and in the range 53-80% and 42-65% for the fragments 24-36 and 45-55 of calcium binding protein in water/TFE (60/40, v/v) at 298 K. In addition, the experiments and predictions agree for those residues that are more flexible. Finally, the relevance of our results for the protein folding pathways is discussed.
A systematic survey of seven parallel alpha/beta barrel protein domains, based on exhaustive structural comparisons, reveals that a sizable proportion of the alpha beta loops in these proteins--20 out of a total of 49--belong to either one of two loop types previously described by Thornton and co-workers. Six loops are of the alpha beta 1 type, with one residue between the alpha-helix and beta-strand, and 13 are of the alpha beta 3 type, with three residues between the helix and the strand. Protein fragments embedding the identified loops, and termed alpha beta connections since they contain parts of the flanking helix and strand, have been analyzed in detail revealing that each type of connection has a distinct set of conserved structural features. The orientation of the beta-strand relative to the helix and loop portions is different owing to a very localized difference in backbone conformation. In alpha beta 1 connections, the chain enters the beta-strand via a residue adopting an extended conformation, while in alpha beta 3 it does so via a residue in a near alpha-helical conformation. Other conserved structural features include distinct patterns of side chain orientation relative to the beta-sheet surface and of main chain H-bonds in the loop and the beta-strand moieties. Significant differences also occur in packing interactions of conserved hydrophobic residues situated in the last turn of the helix. Yet the alpha-helix surface of both types of connections adopts similar orientations relative to the barrel sheet surface. Our results suggest furthermore that conserved hydrophobic residues along the sequence of the connections, may be correlated more with specific patterns of interactions made with neighboring helices and sheet strands than with helix/strand packing within the connection itself. A number of intriguing observations are also made on the distribution of the identified alpha beta 1 and alpha beta 3 loops within the alpha/beta-barrel motifs. They often occur adjacent to each other; alpha beta 3 loops invariably involve even numbered beta-strands, while alpha beta 1 loops involve preferentially odd beta-strands; all the analyzed proteins contain at least one alpha beta 3 loop in the first half of the eightfold alpha/beta barrel. Possible origins of all these observations, and their relevance to the stability and folding of parallel alpha/beta barrel motifs are discussed.
Basic design features of the beta-sheet portion in parallel alpha beta barrels in known protein structures are analysed in the context of a model of a regular hyperboloid. A formal description of the relationships between beta-sheet twist, number of strands in the sheet and barrel dimensions is derived, and the underlying physical principles are rationalized. Results suggest that the major constraints on the geometry of the beta-sheet portion of the barrel come from the requirements to have optimal H-bonding interactions between beta-strands and to closely pack amino acid side-chains in the barrel interior so as to exclude bulk water. In addition, we show how the hyperboloid model and the ensuing formalism can serve to derive useful geometric and graphic tools for computer-aided protein design de novo. We then illustrate how these tools are used to determine that the requirement to have a closed regular eight-stranded beta-sheet surface imposes no particular constraints on the geometry (phi, psi angles) of the polypeptide backbone. Understanding the role of the amino acid sequence in determining the observed structures remains a major challenge. Detailed comparisons of known alpha beta-barrel structures (and amino acid sequence) with each other, and with polypeptide fragments from other protein crystal structures, reveal only a limited number of common sequence-structure motifs. These belong to characteristic alpha beta 1 and alpha beta 3 loop families previously described in alpha beta proteins, and occur at least once in nearly all the alpha beta-barrel structures examined.
Artemia has a complex extracellular hemoglobin of Mr 260,000 comprising two globin chains (Mr 130,000) each of which is a polymer of eight covalently linked domains of Mr 16,000. The primary structure of this polymeric globin was studied to understand how globin folded domains are ordered within a globin chain and, in turn, how the latter associate into a functional hemoglobin molecule. Here we report the amino acid sequence of a second domain, E7 (Mr 16,081, excluding the heme), and interpretations of sequence data by computer-assisted alignment and modeling. This clearly shows that, as with domain E1 (Moens, L., Van Hauwaert, M.-L., De Smet, K., Geelen, D., Verpooten, G., Van Beeumen, J., Wodak, S., Alard, P., & Trotman, C. (1988) J. Biol. Chem. 263, 4679-4685), domain E7 is compatible with a globin folded structure of the beta-type chain. Several specific differences of domains E7 and E1 from the classic globins are identified. They possibly can be interpreted in terms of specific requirements for a double octameric functional molecule.
The 8-fold parallel alpha/beta-barrel topology is encountered in proteins that display an impressive variety of functions, suggesting that this topology may be a rather nonspecific and stable folding motif. Consequently, this motif can be considered as an interesting framework to design novel proteins. It has been shown that the shape of the beta-sheet portion of the barrel can be approximated by a hyperboloid. This geometric object may therefore be used as a scaffold to construct an idealized eight-stranded beta-barrel. To facilitate the de novo design of such structures, a collection of modeling tools has been developed allowing secondary structure elements to be mapped onto the scaffold surface and rotation and translation operations to be performed about user defined axes while evaluating their contribution to the conformational energy of the system. These tools have been applied in a systematic study assessing the phi, psi requirements to design symmetric eight stranded beta barrels with optimal hydrogen bonding between adjacent beta-strands. It is observed that: (a) the beta-sheet structure can be closed without introducing irregular stagger between beta-strands and (b) the region of phi, psi dihedral angle space compatible with the formation of regular symmetric eight stranded beta-barrels coincides with the phi, psi region corresponding to average beta-strands in known protein structures, suggesting that barrel closure does not impose gross constraints on beta-strand geometry.
An automatic procedure for defining recurrent folding motifs in proteins of known structure is described. These motifs are formed by short polypeptide fragments of equal size containing between four and seven residues. The method applies a classical clustering algorithm that operates on distances between selected backbone atoms. In one application, we use it to cluster all protein fragments into only four structural classes. This classification is rough considering the observed diversity of local structures, but comparable in homogeneity to the four classes of secondary structure (alpha-helix, beta-strand, turn and coil). Yet, it discriminates between extended and curved coil and distinguishes beta-bulges from beta-strands. In a second application, the clustering procedure is combined with assignment of backbone dihedral angles to allowed regions in the Ramachandran map. This produces an exhaustive repertoire of highly homogeneous families of structural motifs that contains all the beta-hairpins, beta alpha- and alpha beta-loops previously defined by manual procedures, and new structural families of which two examples, a beta alpha-loop and an alpha-helix beginning, are analyzed in detail. The described automatic procedures should be useful in categorizing structure information in proteins, thereby increasing our ability to analyze relations between structure and sequence.
Amino acid sequence patterns suggested to characterize specific recurrent turn conformation in protein are tested as to their predictive power in a database containing 75 proteins of known structure. Many of these patterns are found to be associated with local structures that differ from the motifs originally used to derive them. It is therefore concluded that, while they could be useful for improving predictions made by other methods, their stand-alone predictive power is poor. The issue of deriving and validating consensus sequence patterns for use in protein structure prediction is raised.
The structure of Xylose isomerase (X.I.) from Actinoplanes missouriensis has been solved to 2.8 Angstroms resolution. Phases were determined from a single Eu3+ derivative and from the noncrystallographic 222 symmetry of the tetrameric molecule. An atomic model was built and subjected to restrained crystallographic refinement. The resulting model is shown to be closely similar to the recently reported X.I.'s structures from three other bacterial sources. Each monomer is found to be composed of an eight-stranded alpha/beta "T.I.M." barrel forming an N-terminal domain of 328 residues followed by a large loop of 66 residues embracing an adjacent subunit. Analysis of intersubunit packing shows that the X.I. tetramer is an assembly of two tight dimers. The beta barrel fits a simple hyperboloid model as other T.I.M. barrels do. The active site, identified as the binding site for the inhibitor xylitol, is located at the carboxyl end of the beta strands in the barrel next to a pair of binding sites for Eu3+ ions, which are assumed to be sites for the divalent ions involved in catalysis. Active sites in the tetramer are oriented towards the interface between dimers. It is suggested that subunit interfaces might stabilize the active site region and this might explain the oligomeric nature of other alpha/beta barrel enzymes.
Eight-stranded beta-sheets in nine protein structures containing "TIM (triose phosphate isomerase) barrels" are shown to be fitted satisfactorily by hyperboloids, the generating lines of which pass through the beta-strands. Simple parameterizations of the hyperboloid model are then used to determine the constraints that govern key parameters, such as the number of strands in the barrel, and to rationalize the remarkable conservation of strand number, observed to be eight, in nearly all the known examples of parallel beta-barrels. It is shown that the requirement to exclude solvent from the barrel interior, while at the same time keeping an upper limit on strand twist and interstrand distance so as to foster extensive hydrogen bonding interactions within the sheet, imposes strong constraints on barrel geometry. A formal description of the relationships between beta-sheet twist, strand number, and barrel dimensions is given here. It could have important implications for studies of protein folding and design.
Using the newly available refined co-ordinates of deoxy and oxyhaemoglobin, we have re-examined and compared the interfaces between the dimers alpha 1 beta 1 and alpha 2 beta 2. The most extensive monomer-monomer contacts are between alpha 1 and beta 2, and, symmetrically, alpha 2 and beta 1. In oxyhaemoglobin these interfaces bury 700 A2 less protein surface than in deoxyhaemoglobin. The alpha 1 alpha 2 interface involves similar salt bridges in both forms, but in oxyhaemoglobin buries 240 A2 more surface than in deoxyhaemoglobin. There is a loosely packed beta 1 beta 2 interface burying 320 A2 of surface in oxyhaemoglobin; there is no beta 1 beta 2 interface in deoxyhaemoglobin. The greater stability of the deoxy form, in the absence of ligands, can be attributed to a combination of hydrophobic, van der Waals' and electrostatic interactions.
We are surface area measurements based on atomic positions to give a quantitative definition of structural domains in proteins. Segments of the polypeptide chain making a minimum of interactions with the rest of the protein structure are identified on interface area scans, where the area B of the interface between a N-terminal segment of i residues and the complementary C-terminal segment is plotted as a function of i. Domain boundaries appear as minima of B in the scans. The procedure may be iterated to build a hierarchy of subdomains. It detects only continuous domains made of a single stretch of polypeptide chain but may be extended to detect such domains in the presence of discontinuous ones. Domains defined from interface area scans fit very well with globular structural regions identified by inspection of protein models [Wetlaufer, D. B. (1973) Proc. Natl. Acad. Sci U.S.A. 70, 697-701]. They do not in general correspond to the repeated structural units observed in some proteins by superposition studies. In hemoglobin and hen lysozyme, the domains do not correspond to the coding sequences separated by introns in the genes.
We propose an analytical substitute to the geometrical construction that is commonly used in calculating the protein surface area that is accessible to the solvent. A statistical approach leads to an expression of accessible surface areas as a function of distances between pairs of atoms or of residues in the protein structure, assuming only that these atoms or residues are randomly distributed in space but not penetrating each other. This function gives good estimates of the accessible surface area and of the area buried in subunit contacts for a number of proteins. Its evaluation is very fast, and the function can be differentiated, which opens the way to new applications of accessibility measurements in the study of proteins. As an example, we show that the presence of domains is easily detected by an automatic procedure based on surface areas only.
We calculate the surface area buried in subunit interfaces of human deoxyhemoglobin and of horse methemoglobin. A larger surface area is buried in deoxy- than in methemoglobin as a result of tertiary and quaternary structure changes. In both molecules the dimer-dimer interface is closepacked. This implies that hydrophobicity stabilizes the deoxystructure, the free energy spent in keeping the subunits in a low-affinity conformation being compensated by hydrophobic free energy due to the smaller surface area accessible to solvent.
Computerized molecular model building has been used to deduce the arrangement of sickle cell hemoglobin molecules (Hb-S) in the tubular fibers which form within sickling cells and in concentrated cell-free solutions of deoxygenated Hb-S. A "best" solution has been found which satisfies all of the reported properties of these fibers. In the proposed arrangement the contact between adjacent Hb-S molecules in the direction parallel to the fiber axis is primarily hydrophobic and in addition contains two salt bridges between the molecules. This contact would be disrupted with the Glu of Hb-A at the beta6 position instead of the Val of Hb-S, and it would not make a long fiber with oxygenated Hb-S. Residues in the A helix and the GH corner of the beta2 chain of one molecule are in contact with residues of the A, B, and E helices and the GH corner of the alpha1 chain of its neighbor. The intermolecular contact in the direction perpendicular to the fiber axis is mainly between the end of the E helix and the EF corner of the beta1 chain on the first molecule and the F helix and FG corner of the alpha2 chain of its neighbor. Some of the implications of these contacts are reported here, and others will be presented in subsequent papers.