Dr. Konstantin Kuzmin, Researcher in the Center, advised by Prof. Boleslaw Szymanski, defended today his Ph.D. thesis titled "New Approaches to Efficient Structural Analysis of Social and Biological Networks".

Dr. Konstantin Kuzmin, Researcher in the Center, advised by Prof. Boleslaw Szymanski, defended today his Ph.D. thesis titled "New Approaches to Efficient Structural Analysis of Social and Biological Networks". Thesis first discusses the limitations of the multithreaded solution and propose a parallel community detection method which uses Message Passing Interface (MPI). This approach provides a higher parallel processing scalability than does the multithreaded solution. We also present the evidence that scalability is limited by the properties of the base SLPA algorithm as described by Amdahl's Law. Our previous work on extending the SLPA algorithm led to the development of SpeakEasy - a robust community detection algorithm which combines top-down and bottom-up approaches with the label propagation process and performing multiple runs of consensus clustering. We showed that SpeakEasy can surpass SLPA in terms of the quality of communities it is capable of discovering for a number of representative real-world and synthetic networks. At the same time, since SpeakEasy is a more sophisticated extension of SLPA, its base sequential version does not provide the efficiency needed to analyze billion-scale graphs. In this work, we developed a parallel SpeakEasy algorithm that is capable of efficiently performing community detection on both shared memory and distributed memory machines. Since SpeakEasy requires that certain global data (e.g., the global label histogram) are maintained and made available to all processors. We show that by carefully selecting data structures and communication patterns and by optimizing the algorithm to take advantage of both the specific MPI library features and certain capabilities provided by the underlying hardware platforms, parallel SpeakEasy can achieve the expected degree of parallel efficiency. Finally, we described a Synergy Landscapes project which combines data from different domains (e.g., molecules and publications in biology) with multilayer graph representation and analysis algorithms provided by network science. As part of the Synergy Landscapes project, we created a MoleClue application which implements the Synergy principles in a multilayer network that includes molecular, publication, and author graphs and a set of algorithms for performing nontrivial searches and ranking of the results. Our experiments show that potential collaborators recommended by several ranking methods implemented in MoleClue based on several molecules commonly associated with Alzheimer's Disease have a high degree of correlation with each other. To further verify the validity of our method, we consider authors who frequently coauthor publications and compute the proximity of molecules that such authors have in common. Then, we contrast those values to the proximity of random molecules. The results indicate that potential collaborators suggested by our algorithm are at least an order of magnitude more likely to appear than by random chance.