linked to PubMed where applicable.
Reliable information on the physical and functional interactions between the gene products is an important prerequisite for deriving meaningful system-level descriptions of cellular processes. The available information about protein interactions in Saccharomyces cerevisiae has been vastly increased recently by two comprehensive tandem affinity purification/mass spectrometry (TAP/MS) studies. However, using somewhat different approaches, these studies produced diverging descriptions of the yeast interactome, clearly illustrating the fact that converting the purification data into accurate sets of protein-protein interactions and complexes remains a major challenge. Here, we review the major analytical steps involved in this process, with special focus on the task of deriving complexes from the network of binary interactions. Applying the Markov Cluster procedure to an alternative yeast interaction network, recently derived by combining the data from the two latest TAP/MS studies, we produce a new description of yeast protein complexes. Several objective criteria suggest that this new description is more accurate and meaningful than those previously published. The same criteria are also used to gauge the influence that different methods for deriving binary interactions and complexes may have on the results. Lastly, it is shown that employing identical procedures to process the latest purification data sets significantly improves the convergence between the resulting interactome descriptions.
Identification of protein-protein interactions often provides insight into protein function, and many cellular processes are performed by stable protein complexes. We used tandem affinity purification to process 4,562 different tagged proteins of the yeast Saccharomyces cerevisiae. Each preparation was analysed by both matrix-assisted laser desorption/ionization-time of flight mass spectrometry and liquid chromatography tandem mass spectrometry to increase coverage and accuracy. Machine learning was used to integrate the mass spectrometry scores and assign probabilities to the protein-protein interactions. Among 4,087 different proteins identified with high confidence by mass spectrometry from 2,357 successful purifications, our core data set (median precision of 0.69) comprises 7,123 protein-protein interactions involving 2,708 proteins. A Markov clustering algorithm organized these interactions into 547 protein complexes averaging 4.9 subunits per complex, about half of them absent from the MIPS database, as well as 429 additional interactions between pairs of complexes. The data (all of which are available online) will help future studies on individual proteins as well as functional genomics and systems biology.
A comprehensive study is performed on the condition-dependent expression of genes coding for the components of hand curated multi-protein complexes of the yeast Saccharomyces cerevisiae, in order to identify coherent transcriptional modules within these complexes. Such modules are defined as groups of genes within complexes whose expression profiles under a common set of experimental conditions allow us to discriminate them from random sets of genes. Our analysis reveals that complexes such as the cytoplasmic ribosome, the proteasome and the respiration chain complexes previously characterized as "stable" or "permanent" represent transcriptional modules that are coherently up or down-regulated in many different conditions. Overall however, some level of coherent expression is detected only in 71 out of the total of 113 complexes with at least five different protein components that could be reliably analyzed. Of these, 26 behave as coherently expressed transcriptional modules encompassing all the components of the complex. In another 15, at least half of the components make up such modules and in ten, few or no modules are detected. In an additional 20 complexes coherent expression is detected, but in too few conditions to enable reliable module detection. Interestingly, the transcriptional modules, when detected, often correspond to one or more known sub-complexes with specific functions. Furthermore, detected modules are generally consistent with transcriptional modules identified on the basis of predicted cis-regulatory sequence motifs. Also, groups of genes shared between complexes that carry out related functions tend to be part of overlapping transcriptional modules identified in these complexes. Together these findings suggest that transcriptional modules may represent basic functional and evolutionary building blocs of protein complexes.
The Comprehensive Yeast Genome Database (CYGD) compiles a comprehensive data resource for information on the cellular functions of the yeast Saccharomyces cerevisiae and related species, chosen as the best understood model organism for eukaryotes. The database serves as a common resource generated by a European consortium, going beyond the provision of sequence information and functional annotations on individual genes and proteins. In addition, it provides information on the physical and functional interactions among proteins as well as other genetic elements. These cellular networks include metabolic and regulatory pathways, signal transduction and transport processes as well as co-regulated gene clusters. As more yeast genomes are published, their annotation becomes greatly facilitated using S.cerevisiae as a reference. CYGD provides a way of exploring related genomes with the aid of the S.cerevisiae genome as a backbone and SIMAP, the Similarity Matrix of Proteins. The comprehensive resource is available under http: //mips.gsf.de/genre/proj/yeast/.
MOTIVATION: Several pattern discovery methods have been proposed to detect over-represented motifs in upstream sequences of co-regulated genes, and are for example used to predict cis-acting elements from clusters of co-expressed genes. The clusters to be analyzed are often noisy, containing a mixture of co-regulated and non-co-regulated genes. We propose a method to discriminate co-regulated from non-co-regulated genes on the basis of counts of pattern occurrences in their non-coding sequences. METHODS: String-based pattern discovery is combined with discriminant analysis to classify genes on the basis of putative regulatory motifs. RESULTS: The approach is evaluated by comparing the significance of patterns detected in annotated regulons (positive control), random gene selections (negative control) and high-throughput regulons (noisy data) from the yeast Saccharomyces cerevisiae. The classification is evaluated on the annotated regulons, and the robustness and rejection power is assessed with mixtures of co-regulated and random genes.
BACKGROUND: Multiprotein complexes play an essential role in many cellular processes. But our knowledge of the mechanism of their formation, regulation and lifetimes is very limited. We investigated transcriptional regulation of protein complexes in yeast using two approaches. First, known regulons, manually curated or identified by genome-wide screens, were mapped onto the components of multiprotein complexes. The complexes comprised manually curated ones and those characterized by high-throughput analyses. Second, putative regulatory sequence motifs were identified in the upstream regions of the genes involved in individual complexes and regulons were predicted on the basis of these motifs. RESULTS: Only a very small fraction of the analyzed complexes (5-6%) have subsets of their components mapping onto known regulons. Likewise, regulatory motifs are detected in only about 8-15% of the complexes, and in those, about half of the components are on average part of predicted regulons. In the manually curated complexes, the so-called 'permanent' assemblies have a larger fraction of their components belonging to putative regulons than 'transient' complexes. For the noisier set of complexes identified by high-throughput screens, valuable insights are obtained into the function and regulation of individual genes. CONCLUSIONS: A small fraction of the known multiprotein complexes in yeast seems to have at least a subset of their components co-regulated on the transcriptional level. Preliminary analysis of the regulatory motifs for these components suggests that the corresponding genes are likely to be co-regulated either together or in smaller subgroups, indicating that transcriptionally regulated modules might exist within complexes.
The aMAZE LightBench (http: //www.amaze.ulb. ac.be/) is a web interface to the aMAZE relational database, which contains information on gene expression, catalysed chemical reactions, regulatory interactions, protein assembly, as well as metabolic and signal transduction pathways. It allows the user to browse the information in an intuitive way, which also reflects the underlying data model. Moreover links are provided to literature references, and whenever appropriate, to external databases.
This paper describes how biological function can be represented in terms of molecular activities and processes. It presents several key features of a data model that is based on a conceptual description of the network of interactions between molecular entities within the cell and between cells. This model is implemented in the aMAZE database that presently deals with information on metabolic pathways, gene regulation, sub- or supracellular locations, and transport. It is shown that this model constitutes a useful generalisation of data representations currently implemented in metabolic pathway databases, and that it can furthermore include multiple schemes for categorising and classifying molecular entities, activities, processes and localisations. In particular, we highlight the flexibility offered by our system in representing multiple molecular activities and their control, in viewing biological function at different levels of resolution and in updating this view as our knowledge evolves.