Edge-Copying Generative Model and Link Prediction via Stochastic Expectation Maximization in Large Hypergraphs
This is the GitHub repo accompany the [paper by Xie He, Phil Chodrow, Peter Mucha](To be submitted). The paper is currently in preparation for submission.
Please cite the paper when using the data or code. See License Information for more details on Usage.
To reproduce all results from our experiments, you will need:
- Python 3.10.13
- xgi 0.8.2
- patsy-0.5.6 statsmodels-0.14.2 You can check these versions respectively with:
$ python --version$ import xgi
$ xgi.__version__Note:
- If you wish to recreate the degree distribution plot, you need to install the code accompany the paper The simpliciality of higher-order networks because of the dependency of for edit simplicity and face edit simplicity. Please also cite the original paper if used.
- The data folder contains NOT xgi dataset (which can be obtained from the xgi package directly), but rather the three bio-medical dataset used in both Logical Hypergraph Link Prediction and Neural Hypergraph Link Prediction. We did NOT create these datasets but uploaded these so that readers can conveniently access the information without needing to look them up elsewhere. For clear reference, please see their papers and cite the original sources and datasets when used.
pip install python==3.10.13 xgi==0.8.2The above environment has been tested to build successfully and run all the following experiments successfully in both Windows and Linux environments. If support needed for Mac, please report in Issues.
Please see model.py if you are only interested in using the model. Detailed example usage for generating a synthetic hypergraph can be found in sem-demo-example-usage.ipynb
You can make figures with the makefile.
make fig2 fig3 fig4 fig5Or you can see below for detailed instructions.
- Figure 1 is genereated as a Toy Model via Google Drawing.
- Figure 2 is generated via running: a)
Fig2_calculate_properties_cluster_part.py, b)Fig2_Hypergraph_Properties_plotting_part.py. Feel free to checkFig2-Hypergraph Properties- plotting part.ipynbfor the display of figure. - Figure 3 is generated via running: a)
REAL_WORLD_SEM.py(note the comments for the difference of generation of real-world xgi datasets and the bio-medical datasets, if run correctly, you should see a folder namedsem_results_newwith results file for each xgi dataset), b)Fig3-degree-and-size-distributions.py. - Figure 4 is generated via running: a)
REAL_WORLD_SEM.pyb)simulation_real_world.pyc)simulation_real_world_recovered.pyd)get_results_real_world.pye)Fig4_edge_intersection_comparison_colorscheme.py. (Refer toFig4_edge_intersection_comparison.ipynbwas used to generate the colorful version of the plots andFig4_edge_intersection_comparison_colorscheme.ipynbwas actually used to generate the plot with monochromatic color schemes.) Note very importantly,Fig4_edge_intersection_comparison.pywill produce error message for partial datasets that we have not run thesimulation_real_world_recovered.pyfor. - Figure 5 and Table S1 (with recovered parameters) is generated via running: a)
REAL_WORLD_SEM.py, b)Fig5-ScatterPlot-Table1and2.py. UseFig5-ScatterPlot-Table1and2.ipynbto view the printed table and figure. - In order to generate AUC scores for all of the XGI datasets/bio-medical datasets, please run
LP-REAL-WORLD.py(note the comments for the difference of generation of real-world xgi datasets and the bio-medical datasets). After the run completed for both temporal dependent datasets and non-temporal datasets, runFig5-ScatterPlot-Table1and2.pyfor printing of the link prediction results.
See stochastic-em-demo_TOPK.ipynb for how to generate synthetic hypergraph and get the recovered parameters. Sepcific comments in the file.