Sriganesh Srihari

Computational Prediction of Protein Complexes from Protein Interaction Networks


Скачать книгу

as Ni(G)/T(G), where Ni(G) is the number of graphlets of type i ∈ {1, …, 29}, and Image is the total number of graphlets of G. The same is defined for a generated geometric random network H. The relative graphlet frequency distance D(G, H) between the two networks G and H is measured as

      Figure 2.3 The 29 graphlets of 3–5 nodes defined by Pržulj et al. [2004].

Image

      where Fi(G) = − log(Ni(G)/T(G)); the logarithm being used because the graphlet frequencies can differ by several orders of magnitude. The authors generated geometric random networks with the same number of nodes as the proteins in S. cerevisiae and D. melanogaster PPI networks from high-throughput experiments. The authors found that, although the degree distributions of these PPI networks were closer to that of scale-free random networks, other topological parameters matched closely with those of geometric random networks. Specifically, the diameter, local and whole-network clustering coefficients, and the relative graphlet frequencies, computed as above, showed that PPI networks were closer to geometric random networks than scale-free networks. Furthermore, the authors suggest that as the quality and quantity of PPI data improve, the geometric random network may become better suited compared to scale-free and other networks to model PPI networks.

      Visualization concerns an important component of the analysis of PPI networks. Very simply, a PPI network is visualized as dots and lines, where the dots (or other shapes) represent proteins and the lines connecting the dots represent interactions between the proteins. Such a visualization enables quick exploration of the topological properties of PPI networks—for example, counting the neighbors of a selected protein in a PPI network, the number of connected components in the network, or even spotting dense and sparse regions of the network. However, the ease of this exploration and subsequent analysis depends on how effective is the visualization method or tool used to render the PPI network. This rendering of the PPI network concerns the field of graph or network layout, where layout algorithms are used to draw the network—by appropriately positioning the dots and lines—in a 2D space. A good layout should be able to (visually) bring out the topological properties of the PPI network easily, and this has been a subject of research in graph visualization for several years. Here we briefly introduce some of the commonly used algorithms for PPI network layout; for details the readers are referred to the excellent reviews of Morris et al. [2014], Agaptio et al. [2013], and Doncheva et al. [2012].

      Random layout arranges the dots (nodes) and lines (edges) in a random manner in the 2D space. The advantage of this algorithm is its simplicity, but on the other hand, it presents a high number of criss-crossing edges, and does not necessarily use the available space optimally, especially for large networks. Circular layout arranges the nodes in succession, one after the other, on a circle. While this algorithm also suffers from criss-crossing of edges, it is widely used to visualize small (sub)networks such as protein complexes and pathways. Tree layout arranges the network as a tree with a hierarchical organization of the nodes. This is obviously more suitable for visualizing trees than networks with cycles. Often, the layout is “ballooned out” by placing the children of each node in the tree on a circle surrounding the node, resulting in several concentric circles. Force-directed layout places the nodes according to a system of forces based on physical concepts in spring mechanics. Typically, the system combines attractive forces between adjacent nodes with repulsive forces between all pairs of nodes to seek an optimal layout in which the overall edge lengths are small while the nodes are well separated.

      Figure 2.4 PPI network visualization using the Cytoscape 3.4.0 tool [Shannon et al. 2003, Smoot et al. 2010]. A portion of the human PPI network (1,977 proteins and 5,679 interactions) downloaded from BioGrid [Stark et al. 2011] is visualized here using force-directed layout. Basic statistics—average number of neighbors, network diameter, etc.—are displayed for the network. Proteins (e.g., BRCA1), protein complexes (e.g., eukaryotic initiation factor 4F and nuclear pore complexes), pathways (e.g., Fanconi anaemia pathway), and cellular processes (e.g., DNA-damage repair and chromatin remodeling) are “pulled-out” and highlighted. Cytoscape provides “link-out” to external databases and tools—e.g., KEGG [Kanehisa and Goto 2000]—to enable further analysis.

      Once the PPI network is laid out, a good visualization tool should allow at least some basic visual analysis of the network. The following aspects become important here (see Figure 2.4). The ease of navigation through the PPI network to explore individual proteins and interactions is of prime importance. In particular, the tool should be able to load and enable nagivation of even large networks. Next is the provision to annotate the network using internal (e.g., labeling nodes by serial numbers or by their network properties) or external information (see below). The tool should also be able to compute (basic) topological properties of the network—for example, node degree, shortest path lengths, and clustering, closeness, and betweenness coefficients. These statistics help users get at least a preliminary idea of the network. Another valuable feature of a good tool is linkout to external databases, for example to PubMed literature (http://www.ncbi.nlm.nih.gov/pubmed), National Center for Biotechnology Information (NCBI) (http://www.ncbi.nlm.nih.gov/), UniProt or SwissProt (http://www.uniprot.org/) [Bairoch and Apweiler 1996, UniProt 2015], BioGrid (http://thebiogrid.org/) [Stark et al. 2011], Gene Ontology (GO) (http://www.geneontology.org/) [Ashburner et al. 2000], and Kyoto Encyclopedia of Genes and Genomes (KEGG) (http://www.genome.jp/kegg/pathway.html) [Kanehisa and Goto 2000]. These enable functional annotation of proteins and interactions. Finally, the tool should also possibly support advanced analyses such as clustering of the network, comparison (based on topological characteristics, for example) between networks, and enrichment analysis, e.g., using GO terms. Table 2.3 lists some of the popular tools available for PPI network visualization and (visual) analysis. OMICS Tools (http://omictools.com/network-visualization-category) maintains an exhaustive list of visualization tools for PPI and other biomolecular network analysis.

Visualization ToolSourceReference
Arena3Dhttp://arena3d.org/[Pavlopoulos et al. 2011]
AVIShttp://actin.pharm.mssm.edu/AVIS2/[Seth et al. 2007]
BioLayouthttp://www.biolayout.org/[Theocharidis et al. 2009]
Cytoscapehttp://www.cytoscape.org/download.html[Shannon et al. 2003, Smoot et al. 2010]
Medusahttp://coot.embl.de/medusa/[Hooper and Bork 2005]
NAViGaTORhttp://ophid.utoronto.ca/navigator/download.html[Brown et al. 2009]
ONDEX