Joint Colloquium: The NeST Center and Computer Science on Finding Sources in Spreading Processes was given by Prof. Krzysztof Suchecki of Warsaw University of Technology, Poland, on Nov 14, 2017.

Joint Colloquium: The NeST Center and Computer Science on Finding Sources in Spreading Processes was given by Prof. Krzysztof Suchecki of Warsaw University of Technology, Poland, on Nov 14, 2017. The phenomenon of spreading is very broad: it can mean spreading of electromagnetic waves, wind-blown seeds, diseases or information. Spreading can also happen in various environments, from simple spatial (like waves on water) to complicated networks (like information in society). Quite often, it is evident there is a spread, but it is not directly known where it came from. In case of physical phenomena, that feature constant-velocity spreading in space, it may require simple triangulation to pinpoint the source. But if the process happens in a complex network and it also has nature so complex as to be only possible to describe in a stochastic manner (such as epidemics or information spreading) then finding a source becomes a more complicated problem.

Both epidemics and information spreading share a common property - there is no total mass, energy, count, or volume conserved, as is for example in spreading waves (energy) or diffusing substances (mass/volume). Because of that, both can be modeled in similar way, for example by a basic SI (Susceptible-Infected) model. The presentation will describe some existing methods to find sources of spread in such processes and focus on a method based on maximum likelihood introduced by Pinto et.al. as well as describe derivative methods that feature much better performance, as well as improvements in accuracy.

Assume we have a network, where information or epidemic has spread, and we only know when it arrived in some specific points in the network (which we call observers). Further assumptions are that spreading always happens along shortest paths from source to any node, and that the delays on links can be approximated by normal distribution. It is then possible for each potential source in the network (every node in general) to calculate expected arrival times as well as the variances and covariance of these times. For each node we therefore have a multivariate normal distribution of arrival times at all observers. Comparing the distributions with actual observation, we can find the source that has distribution best fitting observed arrival times.

One of drawbacks of such method is that for large number N of nodes in network, the computational costs gets prohibitive, with up to fourth power of N complexity. We have proposed a derivative method that limits the considered observers, as well as smart choice of suspected nodes. We only take into account small number of observers that are closest to real source (have earliest observed time of infection), greatly decreasing computational complexity. Instead of calculating distribution for every node, we start with closest observer and follow a gradient of increasing likelihood. These changes not only greatly increase performance (complexity at worst in the order of square of N times log(N)) but also increase accuracy in scale-free network topologies.