Submodular Inference of Diffusion Networks from Multiple Trees

Diffusion and propagation of information, influence and diseases take place over increasingly larger networks. We observe when a node copies information, makes a decision or becomes infected but networks are often hidden or unobserved. Since networks are highly dynamic, changing and growing rapidly, we only observe a relatively small set of cascades before a network changes significantly. Scalable network inference based on a small cascade set is then necessary for understanding the rapidly evolving dynamics that govern diffusion.

We have developed a scalable approximation algorithm with provable near-optimal performance based on submodular maximization which achieves a high accuracy in such scenario.

Below, you can find some extra information:

About

We consider that on a fixed hypothetical network, diffusion processes propagate as directed trees through the network. Since we only observe the times when nodes are reached by a diffusion process, there are many possible propagation trees that explain a set of cascades. Naive computation of the model takes exponential time since there is a combinatorially large number of propagation trees. It has been shown that computations over this super-exponential set of trees can be performed in cubic time. Efficient optimization of the model has been an open question to date. Our work shows that computation over the super-exponential set of trees can indeed be performed in quadratic time and surprisingly, we show that the resulting objective function is submodular, allowing for a greedy efficient algorithm with provable guarantees.

Considering all possible propagation trees enables us to learn a network from fewer observed cascades. This is important since social networks are highly dynamic, changing and growing rapidly, and we can only expect to record a small number of cascades over a fixed network. For more details read our paper:

M. Gomez-Rodriguez, B. Schölkopf. Submodular Inference of Diffusion Networks from Multiple Trees.The 29th International Conference on Machine Learning (ICML), 2012.

There have been previous attempts to infer diffusion networks. You may like to check out NETRATE, NETINF or CONNIE!

Download the code

We provide a simple implementation of our algorithm in a comprehensive C++ package (Linux/MACOS/Windows). The implementation uses the library SNAP, developed by Jure Leskovec at Stanford University, which we include in the package. If you have suggestions, comments, or you discover any bug in the the code, please contact manuelgr[at]stanford.edu.

Cascade Input format: The input file, with information about the cascades, should have two blocks separated by a blank line. Each line in the first block contains the id and name of a site:

<id>,<name>

Each line in the second block contains information about one cascade:

<id>,<timestamp>,<id>,<timestamp>,<id>,<timestamp>...

Example of a valid input file:

0,nyt.com 1,latimes.com 2,huffingtonpost.com 3,salon.com 4,gizmodo.com 5,gawker.com 0,0,1,1,2,2,3,3 4,0,3,1,0,3 5,0,4,3,2,8 0,0,5,3,3,6 3,0,4,2,5,8

Within the package, you can find additional information in ReadMe.txt. We also provide code for generating synthetic networks and cascades, and sample input data as a toy.

Papers

To learn more about our algorithm, you can download our paper:

Moreover, we build on previous work:


Imprint / Data Protection