|InterJournal Complex Systems, 584
|Manuscript Number: |
Submission Date: 20621
|Iterative Vector Diffusion for the Detection of Modularity in Large Networks|
Subject(s): CX.11, CX.30
Category: Brief Article
Systems biology offers the prospect of new insights into the emergent properties of complex biological systems such as cells, tissues, whole organisms and ecosystems. However, the data which has been collected to date is incomplete, raising concerns that attempts to model biological systems at a systems level are premature. We contend that incomplete data sets are not necessarily a problem if the data which does exist is organized into structural modules characterized by relatively high connectivity within the nodule and lower connectivity between modules. Modularity appears to be widespread in biological systems ranging from subcellular networks to ecosystems, and is important to both the functionality and the evolution of the system concerned. The ability to identify modules within largely uncharacterized biological networks would be valuable in several ways. It would assist with the characterization of uninvestigated nodes based upon their module membership and it would permit assessment of the extent to which analysis of the already characterized nodes of an incompletely studied network is feasible. If the well understood nodes form a largely independent module within the network, the fact that much of the rest of the network is uncharacterized becomes less relevant. In this paper we describe an algorithm for the objective identification of modules within a network. Each node of the network is assigned a binary vector n bits long, where n is the number of nodes in the network. The initial vector for node i consists of a 1 in position i and 0 in every other position. The system then undergoes an iterative process of vector modification, as follows: ·At each time step an edge in the graph is selected at random. ·The vectors representing the nodes at each end of the edge are compared ·At any position i in the vector at which the entry is not equal to 0 an amount delta is added to the larger of the two entries and subtracted from the smaller This process is iterated until each node in the network has been modified on average p times, where p is a tunable parameter of the system. The final set of vectors is then subjected to a standard clustering method (we have used Kohonen's Self-Organizing Map) in order to assign cluster membership to each node in the network. Each cluster can be interpreted as a structural module within the original network. Using artificially generated networks composed of modules of known degree of clustering, the iterative vector diffusion algorithm performs robustly. We have also applied the algorithm to the network of protein-protein interactions in the yeast Saccharomyces cerevisiae. In this network the nodes are proteins and the links are protein-protein interactions detected using a yeast two-hybrid screen. Details of the nature, function and subcellular location of the protein are unknown for more than half of the proteins in S. cerevisiae. The characterization of the modules detected, and the potential utility of the IVD algorithm for the detection of functional modules in a network such as this is discussed.
|Submit referee report/comment|