Proteins tend to associate with one another, often forming large edifices that act as complex molecular machines. Although this has long been realized, the prevalence of such interactions and complexes in living cells only became apparent in recent years due to technological advances enabling large scale studies of protein-protein interactions and complexes in organisms such as yeast, bacteria worm, fly and more recently in humans. In parallel, methods have been developed for inferring physical and functional interactions from information on genome and protein sequences, as well as structural information stored in the Protein Data Bank. These various efforts have produced data typically describing thousands of interactions that can be grouped into hundreds of protein complexes. Processing and analyzing these data and extracting biologically meaningful information from them is an exciting but challenging undertaking. The data tend to be noisy and their coverage is still short of comprehensive, as many biologically important interactions (for instance, transient interactions and those involving membrane proteins) are not readily detected by current methods.
Our laboratory has been involved in analyzing genome-wide protein-protein interaction data for several years. Together with colleagues and former students in Brussels (J. vanHelden, N. Simonis, D. Gonze) we investigated the transcriptional regulation of protein complexes in the yeast S. cerevisiae. We used computational approaches to identify regulatory motifs in proteins belonging to the same complex and analyzed the overlap between genes coding for components of protein complexes and groups of genes known to be co-regulated. In addition we investigated the extent to which components of protein complexes are coherently expressed as judged by the mRNA expression profiles.
More recently we collaborated with the teams of Jack Greenblatt and Andrew Emili at the University of Toronto, analyzing one of the latest comprehensive dataset on protein-protein interactions in yeast that they produced. Our main role was in generating meaningful protein complexes from the binary interaction data, and in developing the necessary software and methodology for analyzing and validating these complexes. In another study we investigated the influence that the computational methodology can have on the final description of protein complexes from high throughput Tap-tag/MS data. Currently we are examining ways of improving the ranking of interactions derived from the TAP-tag/MS data so that fewer true interactions are discarded, while maintaining a low false positive error rate. This is achieved by using various machine learning techniques to incorporate additional biological evidence into the calculation of the confidence score that is associated with each interaction in the dataset.
As part of this work we are developing software for visualizing the analyzing protein interaction networks and complexes. Our software tools are developed as plug-ins to the Cytoscape platform. See for example GenePro, which allow flexible display and analysis of genome scale protein-protein interactions networks and complexes. Much of the supplementary material of the Nature paper of Krogan et al. was presented using a web-based version of GenePro, which allowed reader to examine the network and complexes in an interactive fashion. This was the first time that such interactive software was used in this context.
For further details see: