Our pipeline leverages a multimodal constraint autoencoder (scHCAE) to integrate the multiomics data during the clustering process and a matrix factorization-based model (scMF) to predict target genes regulated by a TF.
- Python 3.8 or higher
- Required packages:
torch,sklearn,scipy,scanpy,h5py,numpys,pandas
git clone https://github.com/xianglin226/Multi-SC.gitTo run the scHCAE for clustering with a specified number of clusters, use the following command:
python -u run_scMultiCluster3.py \
--n_clusters 6 \
--data_file GSE178707_neatseq_lane1.h5To predict target genes regulated by TFs using scMF, run:
python -u run_scMF.py \
--data_file processedinput_scMF_lane1.h5The example data can be access here.
GSE178707_neatseq_lane1.h5
GSE178707_neatseq_lane2.h5
GSM5123951_TEAseq_well1.h5
X1: Gene expression data (RNA)X2: Protein expression data (ADT)X3: Chromatin accessibility data (ATAC)X4: Chromatin accessibility data (ATAC) mapped to gene featuresGenes: Gene features (rows ofX1)ADT: Surface protein features (rows ofX2)Peaks: Peak features (rows ofX3)GeneFromPeaks: Gene features (rows ofX4)Barcode: Cell barcodes
processedinput_scMF_lane1.h5
processedinput_scMF_lane1.h5
B: ADT-to-cell matrixW: Gene-to-ADT matrixX: Cell-to-gene matrix

