Welcome to the GitHub repository for my bachelor's thesis titled A Theoretical and Empirical Investigation into the Equivalence of Graph Neural Networks and the Weisfeiler-Leman Algorithm.
Here, you will find all the code used to conduct experiments, as well as the LaTeX code for creating the written version of the thesis. Please note that we employed Weights&Biases to record our experimental results. To access these results, kindly refer to wandb.ai/eric-bill/BachelorThesisExperiments.
All the code used for the experiments can be found in the Code folder. Firstly, we will list the requirements needed to execute the datasets, and then we will discuss how to test the classification and regression datasets separately.
Python 3.10numpypandasscipysklearntorch 1.13.xtorch-geometric 2.3.xwandb
If you want to try out a 1-WL+NN or GNN setup with any classification dataset from the TUDataset library, simplye/main.pyand provide the following arguments: runpython Code/main.py` and provide the following arguments:
--datasetName of the dataset to be tested--max_epochsMaximum number of epochs a model should be trained for--batch_sizeNumber of samples per batch--lrInitial learning rate--k_foldNumber of folds for k-fold cross-validation--seedRandom seed for initializing all random samplers used--k_wlNumber of Weisfeiler-Lehman iterations, or if -1 it runs until convergences--modelModel to use. Options are "1WL+NN:Embedding-{Sum,Max,Mean}", "1WL+NN:{Sum,Max,Mean}", "1WL+NN:{GAT,GCN,GIN}:{Sum,Max,Mean}" or "{GAT,GCN,GIN}:{Sum,Max,Mean}"--wl_convergence{True,False} Whether to use the convergence criterion for the Weisfeiler-Lehman algorithm--tagsTags that are to be added to the recording of the run on wandb.ai--num_repitionNumber of repitions--transformer_kwargsArguments for the transformer. For example, for the OneHotDegree transformer, the argument is the maximum degree.--encoding_kwargsArguments for the encoding function. For example, for Embedding, the argument is the embedding dimension with the key "embedding_dim"--mlp_kwargsArguments for the MLP. For example, for the MLP, the argument is the number of hidden layers with the key "num_layers"--gnn_kwargsArguments for the GNN. For example, for GIN, the argument is the number of MLP layers with the key "num_layers"--use_one_hot{True,False} Whether to use one-hot encoding for the node features. Only for 1-WL+NN:GNN models.
To ensure consistency in the regression datasets, we created separate Python scripts for each fixed split. To test any of these datasets, follow these steps.
cd Code/python filenamewherefilenameis substituted by the following:
gnn_alchemy_{10K, full}To test GNN configurations on ALCHEMYgnn_zinc_{10K, full}To test GNN configurations on ZINCmain_alchemy_{10K, full}To test 1-WL+NN configurations on ALCHEMYmain_zinc_{10K, full}To test 1-WL+NN configurations on ZINC
You can locate all of our LaTeX code in the LaTeX folder. To make it easier to write the thesis, we have separated subparts of it into individual .tex files. The file that brings all of these together is called main.tex. To compile the thesis, you only need to compile this file.