ParGNN is accepted by DAC 2025.
ParGNN, an efficient full-batch training system for GNNs, which adopts a profiler-guided adaptive load balancing partition method(PGALB) and a subgraph pipeline algorithm to overlap communication and computation.
cd pgalb ## go into the pgalb dirctionary
python setup.py build_ext --inplace ## install pgalb
python test_adpat.py ## check the install of C extension- ParGNN use two stage partition to deal with the load unbalance question, and provide the 3 functions:
- graph_partition_dgl_metis: load graph from local path(or download online automately), and create DGL-format graph. Then run the intial patition by metis.
- graph_eval: profile the subgraphs from the inital parition.
- mapping: map the subgraphs to a cluster for running in the same GPUs.
- we provide two scripts to test the partition and repartition process, just run:
python graph_partition.py python repart.py
- data used in the evaluation on paper: (script will download them automately when they are needed)
- ogb graph dataset: ogbn-products, ogbn-proteins from https://ogb.stanford.edu/docs/nodeprop/
- yelp dataset: https://www.dgl.ai/dgl_docs/generated/dgl.data.YelpDataset.html#dgl.data.YelpDataset
- reddit dataset: https://www.dgl.ai/dgl_docs/generated/dgl.data.RedditDataset.html
- The scripts dirctionary has the example scripts to run ParGNN.
cd scripts sh train_all.sh
To cite this project, you can use the following BibTex citation.
@INPROCEEDINGS{11133102,
author={Gu, Junyu and Li, Shunde and Cao, Rongqiang and Wang, Jue and Wang, Zijian and Liang, Zhiqiang and Liu, Fang and Li, Shigang and Zhou, Chunbao and Wang, Yangang and Chi, Xuebin},
booktitle={2025 62nd ACM/IEEE Design Automation Conference (DAC)},
title={ParGNN: A Scalable Graph Neural Network Training Framework on multi-GPUs},
year={2025},
volume={},
number={},
pages={1-7},
keywords={Training;Accuracy;Design automation;Pipelines;Graphics processing units;Load management;Graph neural networks;Partitioning algorithms;Faces;Convergence;Graph neural network;Full-batch distributed training;Load balancing;Computation and communication overlapping},
doi={10.1109/DAC63849.2025.11133102}}
@inproceedings{10.1145/3627535.3638488,
author = {Li, Shunde and Gu, Junyu and Wang, Jue and Yao, Tiechui and Liang, Zhiqiang and Shi, Yumeng and Li, Shigang and Xi, Weiting and Li, Shushen and Zhou, Chunbao and Wang, Yangang and Chi, Xuebin},
title = {POSTER: ParGNN: Efficient Training for Large-Scale Graph Neural Network on GPU Clusters},
year = {2024},
isbn = {9798400704352},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3627535.3638488},
doi = {10.1145/3627535.3638488},
pages = {469–471},
numpages = {3},
keywords = {graph neural network, load balancing, data transfer hiding, distributed training},
location = {Edinburgh, United Kingdom},
series = {PPoPP '24}
}