from C. elegans, D. melanogaster, H. sapiens, M. musculus, and Strongylocentrotus purpuratus (purple sea urchin). This dataset includes ~300 complexes composed of entirely ancient proteins (evolutionarily conserved from lower-order organisms), and ~500 complexes composed of largely ancient proteins conserved ubiquitously among eurkaryotes. Drew et al. [2017] present a comprehensive catalog of >4,600 computationally predicted human protein complexes covering >7,700 proteins and >56,000 interactions by analyzing data from >9,000 published mass spectrometry experiments. Vinayagam et al. [2013] present COMPLEAT (http://www.flyrnai.org/compleat/), a database of 3,077, 3,636, and 2,173 literature-curated protein complexes from D. melanogaster, H. sapiens, and S. cerevisiae, respectively. Ori et al. [2016] combined mammalian complexes from CORUM and COMPLEAT to generate a dataset of 279 protein complexes from mammals.
Table 1.3 Publicly available databases for protein complexes a
a. No. of complexes as of 2016.
b. COMPLEAT includes protein complexes from D. melanogaster, H. sapiens, and S. cerevisiae. The EMBL-EBI portal includes protein complexes from 18 different species of which are C. elegans (16 complexes), H. sapiens (441), M. musculus (404), S. cerevisiae (399), and S. pombe (16). CORUM includes mammalian protein complexes, mainly from H. sapiens (64%), M. musculus (house mouse) (15%) and R. norvegicus (12%) (Norwegian rat).
c. Includes mainly conserved complexes among the metazoans, C.elegans, D. melanogaster, H. sapiens, M. musculus, and Strongylocentrotus purpuratus (purple sea urchin), consisting of 344 complexes with entirely ancient proteins and 490 complexes with largely ancient proteins conserved ubiquitously among eurkaryotes.
1.3 Organization of the Rest of the Book
The rest of this book reads as follows. Chapter 2 discusses important concepts underlying PPI networks and presents prerequisites for understanding subsequent chapters. We discuss different high-throughput experimental techniques employed to infer PPIs (including the Y2H and AP/MS techniques mentioned earlier), explaining briefly the biological and biochemical concepts underlying these techniques and highlighting their strengths and weaknesses. We explain computational approaches that denoise (PPI weighting) and integrate data from multiple experiments to construct reliable PPI networks. We also discuss topological properties of PPI networks, theoretical models for PPI networks, and the various databases and software tools that catalog and visualize PPI networks. Chapter 3 forms the main crux of this book as it introduces and discusses in depth the algorithmic underpinnings of some of the classical (seminal) computational methods to identify protein complexes from PPI networks. While some of these methods work solely on the topology of the PPI network, others incorporate additional biological information—e.g., in the form of functional annotations—with PPI network topology to improve their predictions. Chapter 4 presents a comprehensive empirical evaluation of six widely used protein complex prediction methods available in the literature using unweighted and weighted PPI networks from yeast and human. Taking a known human protein complex as an example, we discuss how the methods have fared in recovering this complex from the PPI network. Based on this evaluation, we explain in Chapter 5 the shortcomings of current methods in detecting certain kinds of protein complexes, e.g., protein complexes that are sparse or that overlap with other complexes. Through this, we highlight the open challenges that need to be tackled to improve coverage and accuracy of protein complex prediction. We discuss some recently proposed methods that attempt to tackle these open challenges and to what extent these methods have been successful. Chapter 6 is dedicated to an important class of protein complexes that are dynamic in their protein composition and assembly. While some of these protein complexes are temporal in nature—i.e., assemble at a specific timepoint and dissociate after that—others are structurally variable—e.g., change their 3D structure and/or composition—based on the cellular context. Quite obviously, it is not possible to detect dynamic complexes solely by analyzing the PPI network; methods that integrate gene or protein expression and 3D structural information are required. These more-sophisticated methods are covered here. Chapter 7 discusses methods to identify protein complexes that are conserved between organisms or species; these evolutionarily conserved complexes provide important insights into the conservation of cellular processes through the evolution. Finally, in today’s era of systems biology where biological systems are studied as a complex interplay of multiple (biomolecular) entities, we explain how protein complex prediction methods are playing a crucial role in shaping up the field; these applications are covered in Chapter 8. We discuss the application of these methods for predicting dysregulated or dysfunctional protein complexes, identifying rewiring of interactions within complexes, and in discovery of new disease genes and drug targets. We conclude the book in Chapter 9 by reiterating the diverse applications of protein complex prediction methods and thereby the importance of computational methods in driving this exciting field of research.
1. http://www.nobelprize.org/nobel_prizes/medicine/laureates/1993/
2
Constructing Reliable Protein-Protein Interaction (PPI) Networks
No molecule arising naturally (MAN) is an island, entire of itself.
—John Donne (1573–1631), English poet and cleric (modified [Dunn 2010] from original quote, “No man is an island, entire of itself.”)
The identification of PPIs yields insights into functional relationships between proteins. Over the years, a number of different experimental techniques have been developed to infer PPIs. This inference of PPIs is orthogonal, but also complementary, to experiments inferring genetic interactions; both provide lists of candidate interactions and implicate functional relationships between proteins [Morris et al. 2014].
2.1 High-Throughput Experimental Systems to Infer PPIs
Physical interactions between proteins are inferred using different biochemical, biophysical, and genetic techniques (summarized in Table 2.1). Yeast two-hybrid (Y2H; less commonly, YTH) [Ito et al. 2000, Uetz et al. 2000, Ho et al. 2002] and protein-fragment complement assays [Michnick 2003, Remy and Michnick 2004, Remy et al. 2007] enable identification of direct binary physical interactions between the proteins, whereas co-immunoprecipitation or affinity purification assays [Golemis and Adams 2002, Rigaut et al. 1999, Köcher and Superti-Furga 2007, Dunham et al. 2012] enable pull down of whole protein complexes from which the binary interactions are inferred. Protein-fragment complementation assay (PCA) coupled with biomolecular fluorescence complementation (BIFC) [Grinberg et al. 2004] enables mapping of interaction surfaces of proteins, and is thus a good tool to confirm protein binding. Membrane YTH and mammalian membrane YTH (MaMTH) [Lalonde et al. 2008, Kittanakom et al. 2009, Lalonde et al. 2010, Petschnigg et al. 2014, Yao et al. 2017] enable identification of interactions involving membrane or membrane-bound proteins which are typically difficult to identify using traditional Y2H and AP techniques. Techniques inferring genetic interactions [Brown et al. 2006] enable detection of functional associations or genetic relationships between the proteins (genes), but these associations do not always correspond to physical interactions. Here, we present only an overview of each of the experimental techniques; for