linked to PubMed where applicable.
Genetic analysis of a large Indian family with an autosomal dominant cataract phenotype allowed us to identify a novel cataract gene, CRYBA4. After a genomewide screen, linkage analysis identified a maximum LOD score of 3.20 (recombination fraction [theta] 0.001) with marker D22S1167 of the beta -crystallin gene cluster on chromosome 22. To date, CRYBA4 was the only gene in this cluster not associated with either human or murine cataracts. A pathogenic mutation was identified in exon 4 that segregated with the disease status. The c.317T-->C sequence change is predicted to replace the highly conserved hydrophobic amino acid phenylalanine94 with the hydrophilic amino acid serine. Modeling suggests that this substitution would significantly reduce the intrinsic stability of the crystalline monomer, which would impair its ability to form the association modes critical for lens transparency. Considering that CRYBA4 associates with CRYBB2 and that the latter protein has been implicated in microphthalmia, mutational analysis of CRYBA4 was performed in 32 patients affected with microphthalmia (small eye). We identified a c.242T-->C (Leu69Pro) sequence change in exon 4 in one patient, which is predicted here to disrupt the beta -sheet structure in CRYBA4. Protein folding would consequently be impaired, most probably leading to a structure with reduced stability in the mutant. This is the first report linking mutations in CRYBA4 to cataractogenesis and microphthalmia.
Increasingly complex schemes for representing solvent effects in an implicit fashion are being used in computational analyses of biological macromolecules. These schemes speed up the calculations by orders of magnitude and are assumed to compromise little on essential features of the solvation phenomenon. In this work we examine this assumption. Five implicit solvation models, a surface area-based empirical model, two models that approximate the generalized Born treatment and a finite difference Poisson-Boltzmann method are challenged in situations differing from those where these models were calibrated. These situations are encountered in automatic protein design procedures, whose job is to select sequences, which stabilize a given protein 3D structure, from a large number of alternatives. To this end we evaluate the energetic cost of burying amino acids in thousands of environments with different solvent exposures belonging, respectively, to decoys built with random sequences and to native protein crystal structures. In addition we perform actual sequence design calculations. Except for the crudest surface area-based procedure, all the tested models tend to favor the burial of polar amino acids in the protein interior over nonpolar ones, a behavior that leads to poor performance in protein design calculations. We show, on the other hand, that three of the examined models are nonetheless capable of discriminating between the native fold and many nonnative alternatives, a test commonly used to validate force fields. It is concluded that protein design is a particularly challenging test for implicit solvation models because it requires accurate estimates of the solvation contribution of individual residues. This contrasts with native recognition, which depends less on solvation and more on other nonbonded contributions.
An automatic protein design procedure was used to compute amino acid sequences of peptides likely to bind the HLA-A2 major histocompatibility complex (MHC) class I allele. The only information used by the procedure are a structural template, a rotamer library, and a well established classical empirical force field. The calculations are performed on six different templates from x-ray structures of HLA-A0201-peptide complexes. Each template consists of the bound peptide backbone and the full atomic coordinates of the MHC protein. Sequences within 2 kcal/mol of the minimum energy sequence are computed for each template, and the sequences from all the templates are combined and ranked by their energies. The five lowest energy peptide sequences and five other low energy sequences re-ranked on the basis of their similarity to peptides known to bind the same MHC allele are chemically synthesized and tested for their ability to bind and form stable complexes with the HLA-A2 molecule. The most efficient binders are also tested for inhibition of the T cell receptor recognition of two known CD8(+) T effectors. Results show that all 10 peptides bind the expected MHC protein. The six strongest binders also form stable HLA-A2-peptide complexes, albeit to varying degrees, and three peptides display significant inhibition of CD8(+) T cell recognition. These results are rationalized in light of our knowledge of the three-dimensional structures of the HLA-A2-peptide and HLA-A2-peptide-T cell receptor complexes.
This review describes computational procedures for deriving the amino acid sequences that are compatible with a given protein backbone structure. Such procedures can be used to gain insight into the constraints imposed by the 3D structure of the protein sequence, or to design proteins that are likely to adopt a given backbone conformation. We start by presenting a short overview of the various types of approaches to protein design developed over more than a decade. This is followed by a more detailed presentation of a recently developed sequence selection procedure DESIGNER. This latter presentation illustrates the basic principles underlying this type of procedures, described what they may teach us when applied to small proteins, and highlights issues that need to be addressed in order to go forward.
A fully automatic procedure for predicting the amino acid sequences compatible with a given target structure is described. It is based on the CHARMM package, and uses an all atom force-field and rotamer libraries to describe and evaluate side-chain types and conformations. Sequences are ranked by a quantity akin to the free energy of folding, which incorporates hydration effects. Exact (Branch and Bound) and heuristic optimisation procedures are used to identifying highly scoring sequences from an astronomical number of possibilities. These sequences include the minimum free energy sequence, as well as all amino acid sequences whose free energy lies within a specified window from the minimum. Several applications of our procedure are illustrated. Prediction of side-chain conformations for a set of ten proteins yields results comparable to those of established side-chain placement programs. Applications to sequence optimisation comprise the re-design of the protein cores of c-Crk SH3 domain, the B1 domain of protein G and Ubiquitin, and of surface residues of the SH3 domain. In all calculations, no restrictions are imposed on the amino acid composition and identical parameter settings are used for core and surface residues. The best scoring sequences for the protein cores are virtually identical to wild-type. They feature no more than one to three mutations in a total of 11-16 variable positions. Tests suggest that this is due to the balance between various contributions in the force-field rather than to overwhelming influence from packing constraints. The effectiveness of our force-field is further supported by the sequence predictions for surface residues of the SH3 domain. More mutations are predicted than in the core, seemingly in order to optimise the network of complementary interactions between polar and charged groups. This appears to be an important energetic requirement in absence of the partner molecules with which the SH3 domain interacts, which were not included in the calculations. Finally, a detailed comparison between the sequences generated by the heuristic and exact optimisation algorithms, commends a note of caution concerning the efficiency of heuristic procedures in exploring sequence space. Copyright 2000 Academic Press.
Aldose-ketose isomerization by xylose isomerase requires bivalent cations such as Mg2+, Mn2+, or Co2+. The active site of the enzyme from Actinoplanes missouriensis contains two metal ions that are involved in substrate binding and in catalyzing a hydride shift between the C1 and C2 substrate atoms. Glu 186 is a conserved residue located near the active site but not in contact with the substrate and not with a metal ligand. The E186D and E186Q mutant enzymes were prepared. Both are active, and their metal specificity is different from that of the wild type. The E186Q enzyme is most active with Mn2+ and has a drastically shifted pH optimum. The X-ray analysis of E186Q was performed in the presence of xylose and either Mn2+ or Mg2+. The Mn2+ structure is essentially identical to that of the wild type. In the presence of Mg2+, the carboxylate group of residue Asp 255, which is part of metal site 2 and a metal ligand, turns toward Gln 186 and hydrogen bonds to its side-chain amide. Mg2+ is not bound at metal site 2, explaining the low activity of the mutant with this cation. Movements of Asp 255 also occur in the wild-type enzyme. We propose that they play a role in the O1 to O2 proton relay accompanying the hydride shift.
Site-directed mutagenesis in the active site of xylose isomerase derived from Actinoplanes missouriensis is used to investigate the structural and functional role of specific residues. The mutagenesis work together with the crystallographic studies presented in detail in two accompanying papers adds significantly to the understanding of the catalytic mechanism of this enzyme. Changes caused by introduced mutations emphasize the correlation between substrate specificity and cation preference. Mutations in both His 220 and His 54 mainly affect the catalytic rate constant, with catalysis being severely reduced but not abolished, suggesting that both histidines are important, but not essential, for catalysis. Our results thus challenge the hypothesis that His 54 acts as an obligatory catalytic base for ring opening; this residue appears instead to be implicated in governing the anomeric specificity. With none of the active site histidines acting as a catalytic base, the role of the cations in catalyzing proton transfer is confirmed. In addition, Lys 183 appears to play a crucial part in the isomerization step, by assisting the proton shuttle. Other residues also are important but to a lesser extent. The conserved Lys 294 is indirectly involved in binding the activating cations. Among the active site aromatic residues, the tryptophans (16 and 137) play a role in maintaining the general architecture of the substrate binding site while the role of Phe 26 seems to be purely structural.
The structure and function of the xylose (glucose) isomerase from Actinoplanes missouriensis have been analyzed by X-ray crystallography and site-directed mutagenesis after cloning and overexpression in Escherichia coli. The crystal structure of wild-type enzyme has been refined to an R factor of 15.2% against diffraction data to 2.2-A resolution. The structures of a number of binary and ternary complexes involving wild-type and mutant enzymes, the divalent cations Mg2+, Co2+, or Mn2+, and either the substrate xylose or substrate analogs have also been determined and refined to comparable R factors. Two metal sites are identified. Metal site 1 is four-coordinated and tetrahedral in the absence of substrate and is six-coordinated and octahedral in its presence; the O2 and O4 atoms of linear inhibitors and substrate bind to metal 1. Metal site 2 is octahedral in all cases; its position changes by 0.7 A when it binds O1 of the substrate and by more than 1 A when it also binds O2; these bonds replace bonds to carboxylate ligands from the protein. Side chains involved in metal binding have been substituted by site-directed mutagenesis. The biochemical properties of the mutant enzymes are presented. Together with structural data, they demonstrate that the two metal ions play an essential part in binding substrates, in stabilizing their open form, and in catalyzing hydride transfer between the C1 and C2 positions.
Molecular dynamics simulations have been used to compute the difference in the unfolding free energy between wild-type barnase and the mutant in which Ile-96 is replaced by alanine. The simulations yield results (-3.42 and -5.21 kcal/mol) that compare favorably with experimental values (-3.3 and -4.0 kcal/mol). The major contributions to the free energy difference arise from bonding terms involving degrees of freedom of the mutated side chain and from nonbonded interactions of that side chain with its environment in the folded protein. By comparison with simulations of an extended peptide in the absence of solvent, used as a reference state, hydration effects are shown to play a minor role in the overall free energy balance for the Ile----Ala transformation. The implications of these results for our understanding of the hydrophobic effect and its contribution to protein stability are discussed.
A system is described that provides ways of integrating data on protein structure, sequence, and survey results, with molecular graphics and molecular mechanics software. Its major component is the relational database SESAM, presently implemented under the commercial package SYBASE. By design, the database allows full integration--within the same data organization--of raw data on protein structure, sequence, ligands, and heterogroups, obtained from the Brookhaven Protein Databank, with pure sequence information available from other databanks such as SWISS-PROT. It contains in addition higher level descriptions of structural and topological properties, as well as survey results, obtained by executing specialized computer programs. Aside from the very useful attribute of closely combining structural and nonstructural information, other important features distinguish it from analogous systems developed elsewhere. It includes a molecular dictionary with complete description of geometric properties and energy parameters used in modeling and conformational energy calculations. Using this dictionary, structural data are validated by checking for localized inconsistencies in atomic coordinates, atomic symbols, chirality definitions, and flagging errors and incomplete entries. Because of both the dictionary and the validation procedures, SESAM can be readily interfaced with conventional molecular graphics and mechanics software packages, or with other specialized application programs. With the aid of appropriate interfaces, data access is sufficiently fast for SESAM to be interrogated interactively. Prototypes of user interfaces, as well as an interface with the molecular graphics package BRUGEL, are described and the power of the system is illustrated in applications such as homology-based protein modeling, computer-aided protein design, protein structure predictions, analysis of local structure motifs, and of relationships between protein sequence and structure.
The 8-fold parallel alpha/beta-barrel topology is encountered in proteins that display an impressive variety of functions, suggesting that this topology may be a rather nonspecific and stable folding motif. Consequently, this motif can be considered as an interesting framework to design novel proteins. It has been shown that the shape of the beta-sheet portion of the barrel can be approximated by a hyperboloid. This geometric object may therefore be used as a scaffold to construct an idealized eight-stranded beta-barrel. To facilitate the de novo design of such structures, a collection of modeling tools has been developed allowing secondary structure elements to be mapped onto the scaffold surface and rotation and translation operations to be performed about user defined axes while evaluating their contribution to the conformational energy of the system. These tools have been applied in a systematic study assessing the phi, psi requirements to design symmetric eight stranded beta barrels with optimal hydrogen bonding between adjacent beta-strands. It is observed that: (a) the beta-sheet structure can be closed without introducing irregular stagger between beta-strands and (b) the region of phi, psi dihedral angle space compatible with the formation of regular symmetric eight stranded beta-barrels coincides with the phi, psi region corresponding to average beta-strands in known protein structures, suggesting that barrel closure does not impose gross constraints on beta-strand geometry.
The relation between amino acid sequence and local structure in proteins is investigated. The local structures considered are either the four classes of secondary structure (H, E, T and C) or four classes of local conformations defined using measures of conformational similarity based on distances between C alpha atoms. The classes are obtained by applying an automatic clustering procedure to short polypeptide fragments of uniform length from a database of 75 known protein structures. The thrust of our investigation consists of systematically searching the database for simple amino acid patterns of the type Gly-X-Ala-X-X-Val, where X denotes an arbitrary residue. Patterns that are nearly always associated with the same structure are retained. Finding many such associations, we then evaluate by a statistical approach how many among them are non-random and compare the results for different definitions of local structure. A similar comparison is made for the predictive value of retained associations, which is assessed using an internal test based on dividing the database into "learning" and "test" subsets. While we find that local structures defined by conformational similarity are not superior to secondary structure for prediction purposes, they help us gain insight into the factors that influence the predictive value of derived associations. A major conclusion is that the number of retained associations is in large excess over the number expected from a random correlation between sequence and structure, irrespective of how local conformation is defined. However, only a very small number of these associations can be earmarked as reliable using statistical criteria, due to the limited size of the database. We find, for instance, that the pattern Ala-Ala-X-X-Lys reliably characterizes helix, and the pattern Val-X-Val-X-X-X-Ala reliably characterizes extended structure and beta-strand. The possibility is discussed that these and other reliable associations correspond to regions of the polypeptide chain whose conformations are locally determined and that these regions may play a role in folding.
The perturbations of the conformation of human deoxyhemoglobin induced by the covalent attachment of glutathione at cysteine beta 93 have been investigated by computer simulation in conjunction with molecular graphics. In the first phase of the analysis, a systematic search was carried out of the conformational space of glutathione attached to deoxyhemoglobin. In this search, the conformation of the hemoglobin molecule was held constant, while the relative energies of a series of 186,624 glutathione conformations involving systematic variation of six dihedral angels were calculated. From this search, the most favorable conformation was selected as the starting conformation for energy minimization of the glutathionyl hemoglobin molecule as a function of all Cartesian coordinates. In order to provide a reference state, an independent minimization by the same procedures was carried out for deoxyhemoglobin in the absence of glutathione. Comparison of the minimized structures with and without glutathione attached revealed a number of significant differences. The most conspicuous difference in the protein moiety concerned the salt bridge between aspartate beta 94 and histidine beta 146 which is destabilized upon minimization of the glutathionyl-hemoglobin complex due to interactions of the aspartate residue with the glycyl NH group of glutathione. Other observed differences in the minimized structures are located at the alpha 1-beta 2 interface and include displacement of the carboxyl group of aspartate beta 99. In the minimized complex, the glutathione portion assumes a quasi-cyclic conformation stabilized through interactions between the free (gamma-glutamyl) amino and (glycyl) carboxyl ends of the tripeptide and between this carboxyl end and the epsilon amino group of lysine alpha 40. In a parallel conformational study of glutathione alone, a similar structure was found as the lowest energy form. These quasi-cyclic conformations contrast with the extended structures reported by Wright (Wright, W.B. (1955) Acta Crystallogr. 11, 632-642) for crystals of glutathione where interactions between molecules play a major role. The conclusions of our analysis are in agreement with the experimental investigations reported in the two preceding papers and permit, moreover, a coherent interpretation of the observed functional and structural changes in deoxyhemoglobin induced by glutathione.