Computational Prediction of Protein Complexes from Protein Interaction Networks. Sriganesh Srihari. Читать онлайн. Hotlib. HOTLIB.NET

Computational Prediction of Protein Complexes from Protein Interaction Networks

determined protein interactions with a “core” subset of interactions that have passed quality assessment (for example, based on literature verification). The Centre for Cancer Systems Biology (CCSB) Interactome Database at Harvard [Rolland et al. 2014, Yu et al. 2008, Yu et al. 2011] contains yeast, plant, virus, and human interactions. STRING [Von Mering et al. 2003, Szklarczyk et al. 2011] catalogs physical and functional interactions inferred from experimental and computational techniques. MIPS Comprehensive Yeast Genome Database (CYGD) [Güldener et al. 2005] and MIPS Mammalian Protein-Protein Interaction Database (MPPI) [Pagel et al. 2005] catalog protein interactions and also expert-curated protein complexes from yeast and mammals. The Human Protein Reference Database (HPRD) [Peri et al. 2004, Prasad et al. 2009] mainly contains experimentally identified human interactions. IRefIndex [Razick et al. 2008, Turner et al. 2010] and Integrated Interaction Database (IID) [Brown and Jurisica 2005] integrate experimental and computationally predicted interactions for human and several other species.

Table 2.2 Public and proprietary databases for protein-protein, protein-small molecule, and protein-DNA interactions

PPI Database	Source	Reference
BioGrid	http://thebiogrid.org	[Stark et al. 2011]
BIND	http://bind.ca	[Bader et al. 2003]
CCSB	http://interactome.dfci.harvard.edu/	[Rolland et al. 2014, Yu et al. 2008, Yu et al. 2011]
CYGD	http://mips.helmholtz-muenchen.de/genre/proj/yeast/	[Güldener et al. 2005]
DIP	http://dip.doe-mbi.ucla.edu	[Xenarios et al. 2002, Salwinski et al. 2004]
EMBL-EBI IntAct	http://www.ebi.ac.uk/intact/	[Hermjakob et al. 2004, Kerrien et al. 2012]
HAPPI	http://discern.uits.iu.edu:8340/HAPPI/	[Chen et al. 2009]
HPRD	http://www.hprd.org/	[Peri et al. 2004, Prasad et al. 2009]
OPHID/IID	http://ophid.utoronto.ca/	[Brown and Jurisica 2005]
InnateDB	http://www.innatedb.com/	[Lynn et al. 2008]
iRefIndex	http://irefindex.org/wiki/index.php?title=iRefIndex	[Razick et al. 2008, Turner et al. 2010]
MINT/HomoMINT	http://mint.bio.uniroma2.it/mint/	[Zanzoni et al. 2002, Chatr-Aryamontri et al. 2007, Persico et al. 2005]
MIPS	http://mips.helmholtz-muenchen.de/proj/ppi/	[Mewes et al. 2008]
MPPI	http://mips.helmholtz-muenchen.de/proj/ppi/	[Pagel et al. 2005]
STRING	http://string-db.org/	[Von Mering et al. 2003, Szklarczyk et al. 2011]

2.3 Topological Properties of PPI Networks

A simple yet effective way to represent interaction data is in the form of an undirected network called a protein-protein interaction network or simply PPI network, given as G = 〈V, E〉, where V is the set of proteins and E is the set of physical interactions between the proteins. Such a network presents a global or “systems” view of the entire set of proteins and their interactions, and provides a topological (mathematical) framework to interrogate the interactions. In the definitions throughout this book, we also use V (G) and E(G) to refer to the set of proteins and interactions of a (sub)network of G. For a protein v ∈ V, the set N(v) or Nv includes all immediate neighbors of v, and deg(v) = |N(v)| is the degree of v. These neighbors together with their interactions, Ev = E(v) = {(v, u) : u ∈ N(v)} ∪ {(u, w) : u ∈ N(v), w ∈ N(v) ∩ N(u)}, constitute the local (immediate) neighborhood subnetwork of v.

PPI networks, like most real-world networks, have characteristic topological properties which are distinct from that of random networks. But, to understand this distinction we need to first understand what are random networks. Traditionally, random networks have been described using the Erdös-Rényi (ER) model, in which G(n, p) is a random network with |V| = n nodes and each possible edge connecting pairs of these nodes has probability p of existing [Erdös 1960, Bollobás 1985]. The expected number of edges in the network is , and the expected mean degree is np. Alternatively, a random network is defined as a network chosen uniformly at random from the collection of all possible networks with n nodes and m edges. If p is the probability for the existence of an edge, the probability for each network in this collection is,

where the closer p gets to 1, the more skewed the collection is toward networks with higher number of edges. When p = 1/2, the collection contains all possible networks with equal probability, .

A marked feature of PPI networks that makes them non-random is the extremely broad distribution of degrees of proteins: While a majority of proteins have small number of immediate binding partners, there exist some proteins, referred to as “hubs,” with unusually large number of binding partners. Moreover, the degrees of most proteins are larger than the average node degree of the network. This degree distribution P(k) can be approximated by a power law P(k) ∼ k^−γ, where k is the node degree and γ is a small constant. Such networks are called scale-free networks [Barabási 1999, Albert and Barabási 2002].

Proteins within PPI networks exhibit an inherent tendency to cluster, which can be quantified by the local clustering coefficient for these proteins [Watts and Strogatz 1998]. For a selected protein v that is connected to |N(v)| other nodes, if these immediate neighbors were part of a clique (a complete subgraph), there will be interactions between them. The ratio of the actual number of interactions that exist between these nodes and the total number of possible interactions gives the local clustering coefficient CC(v) for the protein v:

The average clustering coefficient of the entire network is therefore given by

This clustering property gives rise to groups (subnetworks) of proteins in the PPI network that exhibit dense interactions within the groups, but sparse or less-dense interactions between the groups. The interactions between the groups occur via a few “central” proteins through which pass most paths in the network. The average shortest path length is given by

Скачать книгу