Dr. Xiaoyan Lu supervised by Prof. Bolesaw K. Szymanski defended the Ph.D. thesis "Community Detection and its Applications in Understanding Dynamics of Social Networks" and will join Google, Inc.

Dr. Xiaoyan Lu supervised by Prof. Bolesaw K. Szymanski defended the Ph.D. thesis "Community Detection and its Applications in Understanding Dynamics of Social Networks" in March 2019 and graduated in May 2019. Dr. Lu will join Google, Inc. in New York, NY. His thesis focuses on community detection and its applications. Modularity is perhaps the most widely used quality metric to evaluate the partition of network nodes, yet modularity maximization suffers from the resolution limit problem. In this thesis, we uncover importance of this dependence using random graph theories to explain the resolution limit of modularity. We establish the asymptotic theoretical upper and lower bounds on the resolution parameter of generalized modularity for the modularity maximization to recover community structure correctly, which is the fi rst work connecting the resolution limit of modularity with the random graph models.

While most community detection algorithms take unweighted graph as input by default, they can be extended to accept the edge weights. In this thesis, we also show that the
appropriately assigned edge weights can improve the quality of the detected communities. We propose an edge weighting scheme to avoid the bias of modularity maximization towards
merging well-formed small communities into large ones. Our proposed edge weighting scheme works in a semi-supervised fashion: a regression model penalized by the merging of small
ground truth communities is trained to convert the local edge features into edge weights. Experimental results show that, in addition to modularity maximization, the different state-of-the-art approaches are also signi ficantly improved by the edge weights produced by our model.

Besides the maximization of modularity and its generalized version, an alternative approach to detect communities is the statistical inference to t the generative graph model to the observed network data. The degree-corrected stochastic block model is one such random graph model, generating different network partitions, ranging from traditional assortative
communities to disassortative structures. It does not impose any constraints on the mixing pattern of the resulting block assignments, thus the return of the traditional assortative community structures is not guaranteed. On top of the degree-corrected stochastic block model, we propose a generative random graph model which puts a constraint on nodes' internal degree ratio. This model stabilizes the inference of block model, avoiding inference algorithms like Markov chain Monte Carlo to get trapped in the local optima of the log-likelihood. Unlike the modularity maximization algorithm which always attempts to fi nd traditional assortative communities, in this regularized model, one single regularization parameter controls the mixing patterns discovered from the given network. In the context of information propagation, the community structures play an essential role in facilitating the local spread of information because the community members are more likely to accept inputs from each other than from the outsiders. On the other hand, the community structures slow down global information diffusion due to trapping the messages in dense regions and thus preventing global penetration. For this reason, we formulate a virality prediction framework for global online news using the community structure of online news media. The goal is to classify the viral news at their early stage of spreading. Our virality prediction framework considers every news a cascade in the network formed by online media sites. When an online media site reports a piece of news, it gets infected by the corresponding cascade. Based on the statistical survival analysis, our model de fines the probability of infection along the network edges given the propagation delay. Since the news generally spreads among the media sites of similar cultural backgrounds, our model assumes each media site is affiated with some communities, and the news cascade reaches each community with a certain probability. The mathematical model, after simpli fication, becomes a graph representation learning model which finds the node embbeddings from the news cascades.

Finally, we study the patterns of the polarization evolution by analyzing millions of roll-call votes in the legislative branches of the United States. Community structure of the members in these legislative branches is a critical factor for the development of political polarization. We proposed a social dynamical model explaining the formation of polarization observed in real legislative voting data and derived the tipping points of the dynamical system which provides early warning signals when the system is close to the bifurcation. Our dynamical model successfully explains the directions of polarization change in 28 out of the past 30 U.S. Congresses. The hidden variable in the dynamical model, called the polarization utility, is shown to correlate well with critical historical events such as the civil rights movement and Super PAC