diff --git a/appyters/chea_kg_ts/README.md b/appyters/chea_kg_ts/README.md new file mode 100644 index 00000000..c79cdea1 --- /dev/null +++ b/appyters/chea_kg_ts/README.md @@ -0,0 +1,16 @@ +# ChEA-KG-TS Appyter + +Given either raw time series RNA-seq data or pre-computed DEGs at each time point, the ChEA-KG-TS Appyter visualizes on a regulatory subnetwork how the enriched TFs at one time point regulate the enriched TFs at the subsequent time point, therefore enabling users to understand how the TF landscape governing a biological process evolves over time. The Appyter also determines which modules of TFs may be targeting similar genes at each time point by visualizing how the enriched TFs cluster on a UMAP plot. The Appyter also outputs the average rank of each enriched TF within the ChEA3 gene set libraries, enabling users to see which TFs may play a more substantial role in dictating the gene expression changes at each time point. + +## Launching the appyter +```bash +appyter --profile=biojupies chea_kg_ts_appyter.ipynb +``` + +As long as the terminal is open, the appyter should be available at https://localhost:5000/ + +## Developing the appyter +The appyter can be developed with any jupyter notebook compatible editor. +```bash +jupyter chea_kg_ts_appyter.ipynb +``` diff --git a/appyters/chea_kg_ts/appyter.json b/appyters/chea_kg_ts/appyter.json new file mode 100644 index 00000000..52ddf12e --- /dev/null +++ b/appyters/chea_kg_ts/appyter.json @@ -0,0 +1,35 @@ +{ + "$schema": "https://raw.githubusercontent.com/MaayanLab/appyter-catalog/main/schema/appyter-validator.json", + "name": "chea_kg_ts", + "title": "ChEA-KG-TS", + "version": "0.0.1", + "description": "An Appyter that visualizes the enriched TFs at each time point from time series RNA-seq data using a regulatory subnetwork and UMAP plot.", + "image": "TODO", + "authors": [ + { + "name": "Andrew Van Dusen", + "email": "avandusen30@gmail.com" + } + ], + "url": "https://github.com/MaayanLab/chea_kg_ts", + "tags": [ + "Enrichment Analysis", + "Transcription Factors", + "Gene Regulatory Network", + "Time Series", + "RNA-seq", + "Differential Expression Analysis", + "ChEA-KG" + ], + "license": "CC-BY-NC-SA-4.0", + "appyter": { + "file": "chea_kg_ts_appyter.ipynb", + "profile": "biojupies", + "extras": [ + "ipywidgets", + "toggle-code", + "hide-code" + ] + }, + "public": true +} diff --git a/appyters/chea_kg_ts/chea_kg_ts_appyter.ipynb b/appyters/chea_kg_ts/chea_kg_ts_appyter.ipynb new file mode 100644 index 00000000..b17602de --- /dev/null +++ b/appyters/chea_kg_ts/chea_kg_ts_appyter.ipynb @@ -0,0 +1,2019 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "id": "098e6ee1", + "metadata": {}, + "outputs": [], + "source": [ + "#%%appyter init\n", + "from appyter import magic\n", + "magic.init(lambda _=globals: _())" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "318f5c72", + "metadata": {}, + "outputs": [], + "source": [ + "%%appyter hide_code\n", + "\n", + "{% do SectionField(\n", + " name= 'appyter_intro',\n", + " title= 'ChEA-KG-TS Appyter',\n", + " img= 'time series.png'\n", + ") %}\n", + "\n", + "{% do DescriptionField(\n", + " name= 'appyter_description',\n", + " text= 'The ChEA-KG Time Series Appyter visualizes the enriched transcription factor landscape from time series RNA-seq data, helping to inform hypothesis about the regulatory mechanisms governing biological processes.',\n", + " section= 'appyter_intro'\n", + ") %}\n", + "\n", + "{% do SectionField(\n", + " name= 'title_section',\n", + " title= 'Provide a title for your study.'\n", + ") %}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8c5147c1", + "metadata": {}, + "outputs": [], + "source": [ + "%%appyter code_eval\n", + "\n", + "{% set study_title = TextField(\n", + " name= 'user_inputted_study_title',\n", + " label= 'Title',\n", + " default= '',\n", + " section = 'title_section',\n", + ") %}\n", + "\n", + "study_title = {{study_title}}" + ] + }, + { + "cell_type": "markdown", + "id": "0c671de6", + "metadata": {}, + "source": [ + "# ChEA-KG-TS Appyter Report: Temporal Transitions of Transcription Factor Regulatory Modules\n", + "\n", + "Given time series RNA-seq data from either humans or mice, this Appyter first determines which transcription factors are enriched for the differentially expressed genes (DEGs) found at each time point. DEGs are computed using [DESeq2](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-014-0550-8) (Love et al., 2014) or [Characteristic Direction](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-15-79) (Clark et al., 2014) by comparing gene expression at adjacent time points (*t0 vs t1, t1 vs t2, t2 vs t3, ... , tn-1 vs tn*) or comparing to the initial time point (*t0 vs t1, t0 vs t2, t0 vs t3, ... , t0 vs tn*). Thus, there are 4 different options for computing DEGs. Next, the enriched transcription factors for each set of DEGs are determined using [ChEA3](https://maayanlab.cloud/chea3/) (Keenan et al., 2019). \n", + "\n", + "Afterwards, the Appyter plots a regulatory subnetwork using [ChEA-KG](https://chea-kg.maayanlab.cloud/) (project led by Anna Byrd) of the enriched TFs at each time point, enabling users to determine how the enriched TFs at one time point regulate the TFs at the subsequent time point. A UMAP plot of the enriched TFs is also generated, providing users with visualization of enriched TFs that act in modules based on their shared target genes. \n", + "\n", + "If users have pre-computed the up- and downregulated DEGs at each time point, they can opt to upload a GMT file of those DEGs to *directly* visualize the enriched TFs. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f72b2957", + "metadata": {}, + "outputs": [], + "source": [ + "from IPython.display import display, Markdown\n", + "display(Markdown(f\"# __Study title:__ {study_title}\"))" + ] + }, + { + "cell_type": "markdown", + "id": "6695dcf3", + "metadata": {}, + "source": [ + "## Loading the necessary packages" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "aaad10ef", + "metadata": {}, + "outputs": [], + "source": [ + "# general\n", + "import pandas as pd\n", + "import numpy as np\n", + "import json\n", + "import requests\n", + "import urllib\n", + "from IPython.display import display, FileLink, HTML, Markdown\n", + "import time\n", + "import os\n", + "\n", + "# differential gene expression\n", + "from maayanlab_bioinformatics.dge import deseq2_differential_expression\n", + "from maayanlab_bioinformatics.dge import characteristic_direction\n", + "from maayanlab_bioinformatics.dge import up_down_from_characteristic_direction\n", + "\n", + "# time series enriched TF subnetwork visualization\n", + "from ipycytoscape import CytoscapeWidget\n", + "from dash import html\n", + "\n", + "# bar chart\n", + "import plotly.graph_objects as go\n", + "import plotly.io as pio\n", + "import uuid\n", + "import html\n", + "from pathlib import Path\n", + "\n", + "# enriched TF UMAP visualization\n", + "from sklearn.feature_extraction.text import TfidfVectorizer\n", + "import scanpy as sc\n", + "import anndata\n", + "from copy import deepcopy\n", + "from collections import OrderedDict\n", + "from bokeh.io import output_notebook, show\n", + "from bokeh.io.export import export_png\n", + "from bokeh.plotting import figure\n", + "from bokeh.models import ColumnDataSource, HoverTool, Slider, CustomJS, Title, Label\n", + "from bokeh.layouts import column\n", + "from PIL import Image\n", + "output_notebook()" + ] + }, + { + "cell_type": "markdown", + "id": "e6bfe09b", + "metadata": {}, + "source": [ + "## Step 1. Computing DEGs for the raw gene counts matrix\n", + "\n", + "DEGs are computed using [DESeq2](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-014-0550-8) (Love et al., 2014) or [Characteristic Direction](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-15-79) (Clark et al., 2014) by comparing gene expression at adjacent time points (*t0 vs t1, t1 vs t2, t2 vs t3, ... , tn-1 vs tn*) or comparing to the initial time point (*t0 vs t1, t0 vs t2, t0 vs t3, ... , t0 vs tn*). All in all, there are 4 different methods of computing DEGs." + ] + }, + { + "cell_type": "markdown", + "id": "e4ceb21b", + "metadata": {}, + "source": [ + "Upload fields are for (1) a raw gene counts matrix and (2) experiment metadata from a time series RNA-seq experiment. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "70226435", + "metadata": {}, + "outputs": [], + "source": [ + "%%appyter hide_code\n", + "\n", + "{% do SectionField(\n", + " name= 'section0',\n", + " title= 'Upload files to this section ONLY if you are uploading raw time series RNA-seq data.',\n", + " img= 'matrix.png'\n", + ") %}\n", + "\n", + "{% do SectionField(\n", + " name= 'section1',\n", + " title= '1. Upload raw gene counts matrix as a CSV file.',\n", + " img= 'mRNA.png'\n", + ") %}\n", + "\n", + "{% do SectionField(\n", + " name= 'section2',\n", + " title= '2. Upload experiment metadata as a CSV file.',\n", + " img= 'experiment.png'\n", + ") %}\n", + "\n", + "{% do SectionField(\n", + " name= 'section3',\n", + " title= '3. Choose whether you would like to calculate differentially expressed genes using DESeq2, CD, or both.'\n", + ") %}\n", + "\n", + "{% do SectionField(\n", + " name= 'section4',\n", + " title= '4. Choose the number of top enriched TFs to include in the visualizations.'\n", + ") %}\n", + "\n", + "{% do SectionField(\n", + " name= 'section5',\n", + " title= '5. Provide a short description of your study (optional).'\n", + ") %}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "aa272ef5", + "metadata": {}, + "outputs": [], + "source": [ + "%%appyter code_eval\n", + "\n", + "{% do DescriptionField(\n", + " name= 'matrix_description',\n", + " text= 'Ensure that the first column is titled \"gene_name\" or \"gene_id\" and contains the gene names. The example file has {id}_{name}, which is also an acceptable notation.',\n", + " section= 'section1'\n", + ") %}\n", + "\n", + "{% set dataset1 = FileField(\n", + " name= 'dataset1',\n", + " label= 'Raw gene counts matrix',\n", + " default= '',\n", + " examples= {'Raw gene counts from TRAIL-treated TNBC cell line': 'https://minio.dev.maayanlab.cloud/chea-kg-timeseries/GSE271120_RawCountFile_rsemgenes.CCBR1062.csv'},\n", + " section= 'section1'\n", + ") %}\n", + "\n", + "ds1 = {{dataset1}}\n", + "\n", + "{% do DescriptionField(\n", + " name= 'metadata_description',\n", + " text= 'Ensure that the columns are labeled exactly as \"sample_name\" and \"time_pt_annotation\". See example for how to format the experiment metadata file.',\n", + " section= 'section2'\n", + ") %}\n", + "\n", + "{% set dataset2 = FileField(\n", + " name= 'dataset2',\n", + " label= 'Experiment metadata',\n", + " default= '',\n", + " examples= {'Samples taken from TRAIL-treated TNBC cell line': 'https://minio.dev.maayanlab.cloud/chea-kg-timeseries/sample_metadata.csv'},\n", + " section= 'section2'\n", + ") %}\n", + "\n", + "ds2 = {{dataset2}}\n", + "\n", + "{% set chosen_method = ChoiceField(\n", + " name= 'chosen_dge_method',\n", + " label= 'Differential gene expression method(s)',\n", + " default= 'DESeq2',\n", + " choices= {'DESeq2': '1', 'Characteristic Direction': '2', 'Both': '3'},\n", + " section= 'section3'\n", + ") %}\n", + "\n", + "dge_method = {{chosen_method}}\n", + "dge_method = str(dge_method)\n", + "\n", + "{% set chosen_num = ChoiceField(\n", + " name= 'chosen_num_tfs',\n", + " label= 'Number of top enriched TFs',\n", + " default= '5',\n", + " choices= {'5': '1', '10': '2'},\n", + " section= 'section4'\n", + ") %}\n", + "\n", + "num_tfs = {{chosen_num}}\n", + "num_tfs = str(num_tfs)\n", + "\n", + "{% set user_description = TextField(\n", + " name= 'user_description',\n", + " label= 'Description',\n", + " default= '',\n", + " section = 'section5',\n", + ") %}\n", + "\n", + "umap_desc = {{user_description}}\n", + "\n", + "compute_degs = False\n", + "if ds1 != '' and ds2 != '':\n", + " compute_degs = True\n", + "\n", + "if num_tfs == '1':\n", + " num_tfs = 5\n", + "if num_tfs == '2':\n", + " num_tfs = 10" + ] + }, + { + "cell_type": "markdown", + "id": "f56cbfd2", + "metadata": {}, + "source": [ + "### Differential Expression Method Option 1: DESeq2\n", + "[PyDESeq2](https://pydeseq2.readthedocs.io/en/stable/), the Python implementation of DESeq2 (Love et al., 2014), is used to compute the DEGs of time series RNA-seq data given a user-inputted raw gene counts matrix and sample metadata file describing the time points. \n", + "\n", + "DEGs are computed by comparing gene expression at adjacent time points (*t0 vs t1, t1 vs t2, t2 vs t3, ... , tn-1 vs tn*) or with the initial time point (*t0 vs t1, t0 vs t2, t0 vs t3, ... , t0 vs tn*)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "de7d6b9e", + "metadata": {}, + "outputs": [], + "source": [ + "if compute_degs:\n", + " raw_counts = pd.read_csv(ds1)\n", + " sample_metadata = pd.read_csv(ds2)\n", + "\n", + " time_pt_list = []\n", + " for time_pt in sample_metadata[\"time_pt_annotation\"].tolist():\n", + " if time_pt not in time_pt_list:\n", + " time_pt_list.append(time_pt)\n", + "\n", + " time_pt_dict = {}\n", + " for i, time_pt in enumerate(time_pt_list):\n", + " samples_at_time_pt = sample_metadata.loc[sample_metadata[\"time_pt_annotation\"] == time_pt, \"sample_name\"].astype(str).tolist()\n", + " if raw_counts.columns[0] == \"gene_id\":\n", + " subset_counts = raw_counts[[\"gene_id\"] + samples_at_time_pt]\n", + " rev_subset_counts = subset_counts.set_index(\"gene_id\")\n", + " time_pt_dict[i] = (time_pt, rev_subset_counts)\n", + " elif raw_counts.columns[0] == \"gene_name\":\n", + " subset_counts = raw_counts[[\"gene_name\"] + samples_at_time_pt]\n", + " rev_subset_counts = subset_counts.set_index(\"gene_name\")\n", + " time_pt_dict[i] = (time_pt, rev_subset_counts)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "219cdcb4", + "metadata": {}, + "outputs": [], + "source": [ + "### [DESeq2] finding the DEGs from adjacent time pt comparisons\n", + "if compute_degs and (dge_method == '1' or dge_method == '3'):\n", + " adj_time_pt_comparisons = []\n", + " adj_time_pt_degs = []\n", + " for i in range(len(time_pt_list) - 1):\n", + " controls, cases = time_pt_dict[i][1], time_pt_dict[i+1][1]\n", + " results_df = deseq2_differential_expression(controls, cases)\n", + "\n", + " p_vals = [0.05] + [10**(-i) for i in range(2, 21, 1)]\n", + " significant_genes = results_df[results_df[\"padj\"] < p_vals[0]]\n", + " up_count = (significant_genes[\"log2FoldChange\"] > 0).sum()\n", + " down_count = (significant_genes[\"log2FoldChange\"] < 0).sum()\n", + "\n", + " idx = 1\n", + " while (up_count + down_count) > 2000:\n", + " significant_genes = results_df[results_df[\"padj\"] < p_vals[idx]]\n", + " up_count = (significant_genes[\"log2FoldChange\"] > 0).sum()\n", + " down_count = (significant_genes[\"log2FoldChange\"] < 0).sum()\n", + " idx += 1\n", + "\n", + " ctrl_time_pt, case_time_pt = time_pt_dict[i][0], time_pt_dict[i+1][0]\n", + " adj_time_pt_comparisons.append(f\"{ctrl_time_pt} v {case_time_pt}\")\n", + "\n", + " file = f\"deseq2_{case_time_pt}_v_{ctrl_time_pt}.csv\"\n", + " significant_genes.to_csv(file)\n", + " adj_time_pt_degs.append(file)\n", + "\n", + " print(f\"time point comparison: {ctrl_time_pt}_v_{case_time_pt}\")\n", + " print(f\"total DEGs: {up_count + down_count}, up genes: {up_count}, down genes: {down_count}, padj: {p_vals[idx-1]}\")\n", + " print()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8d5be5a8", + "metadata": {}, + "outputs": [], + "source": [ + "### [DESeq2] finding the DEGs from time pt 0 comparisons\n", + "if compute_degs and (dge_method == '1' or dge_method == '3'):\n", + " time_pt_0_comparisons = []\n", + " time_pt_0_degs = []\n", + " for i in range(1, len(time_pt_list)):\n", + " controls, cases = time_pt_dict[0][1], time_pt_dict[i][1]\n", + " results_df = deseq2_differential_expression(controls, cases)\n", + "\n", + " p_vals = [0.05] + [10**(-i) for i in range(2, 21, 1)]\n", + " significant_genes = results_df[results_df[\"padj\"] < p_vals[0]]\n", + " up_count = (significant_genes[\"log2FoldChange\"] > 0).sum()\n", + " down_count = (significant_genes[\"log2FoldChange\"] < 0).sum()\n", + "\n", + " idx = 1\n", + " while (up_count + down_count) > 2000:\n", + " significant_genes = results_df[results_df[\"padj\"] < p_vals[idx]]\n", + " up_count = (significant_genes[\"log2FoldChange\"] > 0).sum()\n", + " down_count = (significant_genes[\"log2FoldChange\"] < 0).sum()\n", + " idx += 1\n", + "\n", + " ctrl_time_pt, case_time_pt = time_pt_dict[0][0], time_pt_dict[i][0]\n", + " time_pt_0_comparisons.append(f\"{ctrl_time_pt} v {case_time_pt}\")\n", + "\n", + " file = f\"deseq2_{case_time_pt}_v_{ctrl_time_pt}.csv\"\n", + " significant_genes.to_csv(file)\n", + " time_pt_0_degs.append(file)\n", + "\n", + " print(f\"time point comparison: {ctrl_time_pt}_v_{case_time_pt}\")\n", + " print(f\"total DEGs: {up_count + down_count}, up genes: {up_count}, down genes: {down_count}, padj: {p_vals[idx-1]}\")\n", + " print()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "903a7430", + "metadata": {}, + "outputs": [], + "source": [ + "def up_gene_list(input_csv, filename=None):\n", + " \"\"\"\n", + " Outputs list of upregulated DEGs from DESeq2 results CSV.\n", + " \"\"\"\n", + " df = pd.read_csv(input_csv)\n", + " row_filter = df[\"log2FoldChange\"] > 0\n", + " filtered = df.loc[row_filter, df.columns[0]]\n", + " gene_ids = list(filtered)\n", + "\n", + " up_list = []\n", + " for gene in gene_ids:\n", + " if \"_\" in gene:\n", + " up_list.append(gene.split(\"_\", 1)[1])\n", + " else:\n", + " up_list.append(gene)\n", + "\n", + " return up_list\n", + "\n", + "\n", + "def down_gene_list(input_csv, filename=None):\n", + " \"\"\"\n", + " Outputs list of downregulated DEGs from DESeq2 results CSV.\n", + " \"\"\"\n", + " df = pd.read_csv(input_csv)\n", + " row_filter = df[\"log2FoldChange\"] < 0\n", + " filtered = df.loc[row_filter, df.columns[0]]\n", + " gene_ids = list(filtered)\n", + "\n", + " down_list = []\n", + " for gene in gene_ids:\n", + " if \"_\" in gene:\n", + " down_list.append(gene.split(\"_\", 1)[1])\n", + " else:\n", + " down_list.append(gene)\n", + "\n", + " return down_list\n", + "\n", + "\n", + "def csv_to_gmt(input_csv_list, comparisons, filename):\n", + " gmt_dict = {}\n", + " for i, file in enumerate(input_csv_list):\n", + " up_genes = up_gene_list(file)\n", + " down_genes = down_gene_list(file)\n", + " print(f\"up genes: {len(up_genes)}, down genes: {len(down_genes)}\")\n", + " gmt_dict[f\"{comparisons[i]} up genes\"] = up_genes\n", + " gmt_dict[f\"{comparisons[i]} down genes\"] = down_genes\n", + "\n", + " with open(filename, \"w\") as file:\n", + " for s,t in gmt_dict.items():\n", + " file.write(str(s) + \"\\t\\t\" + \"\\t\".join(t) + \"\\n\")\n", + " print(\"Finished creating the GMT file containing the PyDESeq2 DEGs at each time point.\")\n", + " return filename\n", + "\n", + "if compute_degs and (dge_method == '1' or dge_method == '3'):\n", + " deseq2_degs_gmt_1 = csv_to_gmt(adj_time_pt_degs, adj_time_pt_comparisons, \"appyter_deseq2_adj_time_pt_degs.gmt\")\n", + " deseq2_degs_gmt_2 = csv_to_gmt(time_pt_0_degs, time_pt_0_comparisons, \"appyter_deseq2_compare_w_time_pt_0_degs.gmt\")" + ] + }, + { + "cell_type": "markdown", + "id": "16b27a30", + "metadata": {}, + "source": [ + "### Differential Expression Method Option 2: Characteristic Direction\n", + "\n", + "The [Characteristic Direction](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-15-79) method (Clark et al., 2014) is used to compute the DEGs of time series RNA-seq data given a user-inputted raw gene counts matrix and sample metadata file describing the time points. \n", + "\n", + "DEGs are computed by comparing gene expression at adjacent time points (*t0 vs t1, t1 vs t2, t2 vs t3, ... , tn-1 vs tn*) or with the initial time point (*t0 vs t1, t0 vs t2, t0 vs t3, ... , t0 vs tn*)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8468437f", + "metadata": {}, + "outputs": [], + "source": [ + "### [CD] finding the DEGs from adjacent time pt comparisons\n", + "def run_chea_kg(gene_list, num_tfs):\n", + " \"\"\"\n", + " Outputs JSON of TF subnetwork best corresponding to input gene list.\n", + " \"\"\"\n", + " CHEA_KG = 'https://chea-kg.maayanlab.cloud/api/enrichment'\n", + "\n", + " description = \"insert description here\"\n", + " payload = {\n", + " 'list': (None, \"\\n\".join(gene_list)),\n", + " 'description': (None, description)\n", + " }\n", + " response=requests.post(f\"{CHEA_KG}/addList\", files=payload)\n", + " time.sleep(0.2)\n", + " data = json.loads(response.text)\n", + "\n", + " q = {\n", + " 'min_lib': 3, # minimum number of libraries that a TF must be ranked in\n", + " 'libraries': [\n", + " {'library': \"Integrated--meanRank\", 'term_limit': num_tfs} # edit term_limit to change number of top-ranked TFs\n", + " ],\n", + " 'limit':50, # controls number of edges returned - may cause issues with visualization if too large\n", + " 'userListId': data['userListId']\n", + " }\n", + " query_json=json.dumps(q)\n", + " res = requests.post(CHEA_KG, data=query_json)\n", + " if res.ok:\n", + " data = json.loads(res.text)\n", + " return data\n", + " else:\n", + " data = None\n", + " return res.text\n", + "\n", + "\n", + "def top_tfs(gene_list, num_tfs=5):\n", + " \"\"\"\n", + " Returns a list of the top N most enriched TFs corresponding to an input gene list.\n", + " \"\"\"\n", + " enriched_tfs = run_chea_kg(gene_list, num_tfs)\n", + " tfs_list = []\n", + " for node in enriched_tfs[\"nodes\"]:\n", + " tfs_list.append(node[\"data\"][\"label\"])\n", + " return tfs_list\n", + "\n", + "\n", + "if compute_degs and (dge_method == '2' or dge_method == '3'):\n", + " cd_adj_time_pt_comparisons = []\n", + " cd_degs_1 = []\n", + " cd_tf_time_dict_1 = {}\n", + " for i in range(len(time_pt_list) - 1):\n", + " ctrl_time_pt, case_time_pt = time_pt_dict[i][0], time_pt_dict[i+1][0]\n", + " cd_adj_time_pt_comparisons.append(f\"{ctrl_time_pt} v {case_time_pt}\")\n", + "\n", + " controls, cases = time_pt_dict[i][1], time_pt_dict[i+1][1]\n", + " results_df = characteristic_direction(controls, cases)\n", + "\n", + " up_genes = up_down_from_characteristic_direction(results_df).up\n", + " up_list = []\n", + " for gene in up_genes:\n", + " if \"_\" in gene:\n", + " up_list.append(gene.split(\"_\", 1)[1])\n", + " else:\n", + " up_list.append(gene)\n", + "\n", + " down_genes = up_down_from_characteristic_direction(results_df).down\n", + " down_list = []\n", + " for gene in down_genes:\n", + " if \"_\" in gene:\n", + " down_list.append(gene.split(\"_\", 1)[1])\n", + " else:\n", + " down_list.append(gene)\n", + "\n", + " cd_degs_1.append((up_list, down_list))\n", + " cd_tf_time_dict_1[i] = (top_tfs(up_list, num_tfs), top_tfs(down_list, num_tfs))\n", + "\n", + " print(f\"time point comparison: {ctrl_time_pt}_v_{case_time_pt}\")\n", + " print(f\"total DEGs: {len(up_list) + len(down_list)}, up genes: {len(up_list)}, down genes: {len(down_list)}\")\n", + " print()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f86251ef", + "metadata": {}, + "outputs": [], + "source": [ + "### [CD] finding the DEGs from time pt 0 comparisons\n", + "if compute_degs and (dge_method == '2' or dge_method == '3'):\n", + " cd_time_pt_0_comparisons = []\n", + " cd_degs_2 = []\n", + " cd_tf_time_dict_2 = {}\n", + " for i in range(1, len(time_pt_list)):\n", + " ctrl_time_pt, case_time_pt = time_pt_dict[0][0], time_pt_dict[i][0]\n", + " cd_time_pt_0_comparisons.append(f\"{ctrl_time_pt} v {case_time_pt}\")\n", + "\n", + " controls, cases = time_pt_dict[0][1], time_pt_dict[i][1]\n", + " results_df = characteristic_direction(controls, cases)\n", + "\n", + " up_genes = up_down_from_characteristic_direction(results_df).up\n", + " up_list = []\n", + " for gene in up_genes:\n", + " if \"_\" in gene:\n", + " up_list.append(gene.split(\"_\", 1)[1])\n", + " else:\n", + " up_list.append(gene)\n", + "\n", + " down_genes = up_down_from_characteristic_direction(results_df).down\n", + " down_list = []\n", + " for gene in down_genes:\n", + " if \"_\" in gene:\n", + " down_list.append(gene.split(\"_\", 1)[1])\n", + " else:\n", + " down_list.append(gene)\n", + "\n", + " cd_degs_2.append((up_list, down_list))\n", + " cd_tf_time_dict_2[i-1] = (top_tfs(up_list, num_tfs), top_tfs(down_list, num_tfs))\n", + "\n", + " print(f\"time point comparison: {ctrl_time_pt}_v_{case_time_pt}\")\n", + " print(f\"total DEGs: {len(up_list) + len(down_list)}, up genes: {len(up_list)}, down genes: {len(down_list)}\")\n", + " print()" + ] + }, + { + "cell_type": "markdown", + "id": "f5424dc0", + "metadata": {}, + "source": [ + "## Step 2. Performing TF enrichment analysis\n", + "\n", + "The DEGs computed in Step 1 are submitted to [ChEA3](https://maayanlab.cloud/chea3/) (Keenan et al., 2019) for TF enrichment analysis. The output from this step is a set of enriched TFs for each up and down gene set at each time point. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b0175641", + "metadata": {}, + "outputs": [], + "source": [ + "def run_chea_kg(gene_list, num_tfs):\n", + " \"\"\"\n", + " Outputs JSON of TF subnetwork best corresponding to input gene list.\n", + " \"\"\"\n", + " CHEA_KG = 'https://chea-kg.maayanlab.cloud/api/enrichment'\n", + "\n", + " description = \"searching for enriched TFs\"\n", + " payload = {\n", + " 'list': (None, \"\\n\".join(gene_list)),\n", + " 'description': (None, description)\n", + " }\n", + " response=requests.post(f\"{CHEA_KG}/addList\", files=payload)\n", + " time.sleep(0.2)\n", + " data = json.loads(response.text)\n", + "\n", + " q = {\n", + " 'min_lib': 3, # minimum number of libraries that a TF must be ranked in\n", + " 'libraries': [\n", + " {'library': \"Integrated--meanRank\", 'term_limit': num_tfs} # edit term_limit to change number of top-ranked TFs\n", + " ],\n", + " 'limit':50, # controls number of edges returned - may cause issues with visualization if too large\n", + " 'userListId': data['userListId']\n", + " }\n", + " query_json=json.dumps(q)\n", + "\n", + " res = requests.post(CHEA_KG, data=query_json)\n", + " if res.ok:\n", + " data = json.loads(res.text)\n", + " return data\n", + " else:\n", + " data = None\n", + " return res.text\n", + "\n", + "\n", + "def top_tfs(gene_list, num_tfs=5):\n", + " \"\"\"\n", + " Returns a list of the top N most enriched TFs corresponding to an input gene list.\n", + " \"\"\"\n", + " enriched_tfs = run_chea_kg(gene_list, num_tfs)\n", + " tfs_list = []\n", + " for node in enriched_tfs[\"nodes\"]:\n", + " tfs_list.append(node[\"data\"][\"label\"])\n", + " return tfs_list" + ] + }, + { + "cell_type": "markdown", + "id": "a7aa8d98", + "metadata": {}, + "source": [ + "## Step 3. Constructing the regulatory subnetwork\n", + "\n", + "In step 3, a network that connects the enriched TFs at each time point is constructed. A JSON file describing the regulatory subnetwork of enriched TFs is created. Edges between enriched TF nodes are determined using [ChEA-KG](https://chea-kg.maayanlab.cloud/) (project led by Anna Byrd). " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "28ddb535", + "metadata": {}, + "outputs": [], + "source": [ + "def fetch_chea_kg_data(start_tf, end_tf):\n", + " \"\"\"\n", + " Outputs JSON data for shortest path connecting two TFs.\n", + " \"\"\"\n", + " base_url = \"https://chea-kg.maayanlab.cloud/api/knowledge_graph\"\n", + "\n", + " query_filter = {\n", + " \"start\": \"Transcription Factor\",\n", + " \"start_field\": \"label\",\n", + " \"start_term\": start_tf,\n", + " \"end\": \"Transcription Factor\",\n", + " \"end_field\": \"label\",\n", + " \"end_term\": end_tf\n", + " }\n", + "\n", + " encoded_filter = urllib.parse.quote(str(query_filter).replace(\"'\", '\"'))\n", + " full_url = f\"{base_url}?filter={encoded_filter}\"\n", + "\n", + " response = requests.get(full_url)\n", + " response.raise_for_status()\n", + " return response.json()\n", + "\n", + "\n", + "def get_tf_node_info(tf_label):\n", + " \"\"\"\n", + " Gets node information associated with single TF.\n", + " \"\"\"\n", + " base_url = \"https://chea-kg.maayanlab.cloud/api/knowledge_graph\"\n", + "\n", + " query_filter = {\n", + " \"start\": \"Transcription Factor\",\n", + " \"start_field\": \"label\",\n", + " \"start_term\": tf_label\n", + " }\n", + " encoded_filter = urllib.parse.quote(str(query_filter).replace(\"'\", '\"'))\n", + " full_url = f\"{base_url}?filter={encoded_filter}\"\n", + "\n", + " response = requests.get(full_url)\n", + " response.raise_for_status()\n", + " data = response.json()\n", + "\n", + " for node in data[\"nodes\"]:\n", + " if node[\"data\"][\"label\"] == tf_label:\n", + " return node\n", + " raise ValueError(f\"Node for TF '{tf_label}' not found in response.\")\n", + "\n", + "\n", + "def get_tf_edge_info(tf_label):\n", + " \"\"\"\n", + " Returns edge info if TF is autoregulatory.\n", + " \"\"\"\n", + " base_url = \"https://chea-kg.maayanlab.cloud/api/knowledge_graph\"\n", + "\n", + " query_filter = {\n", + " \"start\": \"Transcription Factor\",\n", + " \"start_field\": \"label\",\n", + " \"start_term\": tf_label\n", + " }\n", + " encoded_filter = urllib.parse.quote(str(query_filter).replace(\"'\", '\"'))\n", + " full_url = f\"{base_url}?filter={encoded_filter}\"\n", + "\n", + " response = requests.get(full_url)\n", + " response.raise_for_status()\n", + " data = response.json()\n", + "\n", + " for edge in data[\"edges\"]:\n", + " if edge[\"data\"][\"source_label\"] == tf_label and edge[\"data\"][\"target_label\"] == tf_label:\n", + " return edge\n", + " raise ValueError(f\"Self-edge for TF '{tf_label}' not found in response.\")\n", + "\n", + "\n", + "def create_tf_time_series_graph(tf_time_dict, num_comparisons, filename):\n", + " \"\"\"\n", + " Creates network of TFs given differentially expressed genes at each time point.\n", + " \"\"\"\n", + " subnetwork = {\"nodes\": [], \"edges\": []}\n", + " for time in tf_time_dict.keys():\n", + " up_tfs = tf_time_dict[time][0]\n", + " down_tfs = tf_time_dict[time][1]\n", + "\n", + " for tf in up_tfs:\n", + " node_info = get_tf_node_info(tf)\n", + " node_info[\"data\"][\"color\"] = \"#80eaff\"\n", + " node_info[\"data\"][\"id\"] = f\"{node_info['data']['label']}_up_{time}\"\n", + " subnetwork[\"nodes\"].append(node_info)\n", + "\n", + " for tf in down_tfs:\n", + " node_info = get_tf_node_info(tf)\n", + " node_info[\"data\"][\"color\"] = \"#ff8a80\"\n", + " node_info[\"data\"][\"id\"] = f\"{node_info['data']['label']}_down_{time}\"\n", + " subnetwork[\"nodes\"].append(node_info)\n", + "\n", + " print(\"Creating time series TF subnetwork data structure...\")\n", + " print(\"Case 1 = connecting two different TFs at adjacent time points\")\n", + " print(\"Case 2 = connecting the same TF at adjacent time points\")\n", + " for i in range(num_comparisons-1):\n", + " for j, source_tf_list in enumerate(tf_time_dict[i]):\n", + " for k, target_tf_list in enumerate(tf_time_dict[i+1]):\n", + " for source_tf in source_tf_list:\n", + " for target_tf in target_tf_list:\n", + " if source_tf != target_tf:\n", + " data = fetch_chea_kg_data(source_tf, target_tf)\n", + " if len(data[\"nodes\"]) == 2 and len(data[\"edges\"]) == 1:\n", + " if data[\"edges\"][0][\"data\"][\"source_label\"] == source_tf and \\\n", + " data[\"edges\"][0][\"data\"][\"target_label\"] == target_tf:\n", + " if j == 0 and k == 0:\n", + " data[\"edges\"][0][\"data\"][\"source\"] = f\"{source_tf}_up_{i}\"\n", + " data[\"edges\"][0][\"data\"][\"target\"] = f\"{target_tf}_up_{i+1}\"\n", + " elif j == 0 and k == 1:\n", + " data[\"edges\"][0][\"data\"][\"source\"] = f\"{source_tf}_up_{i}\"\n", + " data[\"edges\"][0][\"data\"][\"target\"] = f\"{target_tf}_down_{i+1}\"\n", + " elif j == 1 and k == 0:\n", + " data[\"edges\"][0][\"data\"][\"source\"] = f\"{source_tf}_down_{i}\"\n", + " data[\"edges\"][0][\"data\"][\"target\"] = f\"{target_tf}_up_{i+1}\"\n", + " elif j == 1 and k == 1:\n", + " data[\"edges\"][0][\"data\"][\"source\"] = f\"{source_tf}_down_{i}\"\n", + " data[\"edges\"][0][\"data\"][\"target\"] = f\"{target_tf}_down_{i+1}\"\n", + " subnetwork[\"edges\"].append(data[\"edges\"][0])\n", + " print(\"Edge added to subnetwork (case 1)\")\n", + " else:\n", + " if len(source_tf_list) != 0:\n", + " try:\n", + " edge = get_tf_edge_info(source_tf)\n", + " if j == 0 and k == 0:\n", + " edge[\"data\"][\"source\"] = f\"{source_tf}_up_{i}\"\n", + " edge[\"data\"][\"target\"] = f\"{target_tf}_up_{i+1}\"\n", + " elif j == 0 and k == 1:\n", + " edge[\"data\"][\"source\"] = f\"{source_tf}_up_{i}\"\n", + " edge[\"data\"][\"target\"] = f\"{target_tf}_down_{i+1}\"\n", + " elif j == 1 and k == 0:\n", + " edge[\"data\"][\"source\"] = f\"{source_tf}_down_{i}\"\n", + " edge[\"data\"][\"target\"] = f\"{target_tf}_up_{i+1}\"\n", + " elif j == 1 and k == 1:\n", + " edge[\"data\"][\"source\"] = f\"{source_tf}_down_{i}\"\n", + " edge[\"data\"][\"target\"] = f\"{target_tf}_down_{i+1}\"\n", + " subnetwork[\"edges\"].append(edge)\n", + " print(\"Edge added to subnetwork (case 2)\")\n", + " except ValueError:\n", + " continue\n", + " output_path = f\"{filename}.json\"\n", + " with open(output_path, \"w\") as outfile:\n", + " json.dump(subnetwork, outfile, indent=4)\n", + " print(\"Finished creating the time series TF subnetwork graph.\")\n", + " print()\n", + " return output_path\n", + "\n", + "\n", + "def gmt_to_tf_time_dict(gmt_file, num_tfs):\n", + " \"\"\"\n", + " Converts GMT file containing DEGs to tf_time_dict.\n", + " \"\"\"\n", + " print(\"Processing GMT file of DEGs...\")\n", + " with open(gmt_file, 'r') as f:\n", + " lines = f.readlines()\n", + "\n", + " temp_dict = {}\n", + " degs_list = []\n", + " for line in lines:\n", + " tokens = line.split(\"\\t\\t\")\n", + " term = tokens[0]\n", + " genes = [x.split(',')[0].strip() for x in tokens[1].split('\\t')]\n", + " degs_list.append(genes)\n", + " temp_dict[term] = top_tfs(genes, num_tfs) # edit this step to also find the top 10 TFs\n", + "\n", + " new_degs_list = []\n", + " for i in range(0, len(degs_list), 2):\n", + " new_degs_list.append((degs_list[i], degs_list[i+1]))\n", + "\n", + " comparisons = list(temp_dict.keys())\n", + " new_comparisons = []\n", + " for item in comparisons:\n", + " comp = item.rsplit(' ', 2)[0] # extracts \"Hour 1 vs Hour 0\" from \"Hour 1 vs Hour 0 up genes\" in GMT file\n", + " if comp not in new_comparisons:\n", + " new_comparisons.append(comp)\n", + "\n", + " j = 0\n", + " tf_time_dict = {}\n", + " for i in range(len(comparisons) // 2):\n", + " tf_time_dict[i] = (temp_dict[comparisons[j]], temp_dict[comparisons[j+1]])\n", + " j += 2\n", + " return tf_time_dict, new_comparisons, new_degs_list" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ea6a3d5d", + "metadata": {}, + "outputs": [], + "source": [ + "%%appyter hide_code\n", + "\n", + "{% do SectionField(\n", + " name= 'gmt_section_header',\n", + " title= 'Upload files to this section ONLY if you pre-computed up and down DEGs and saved them in a GMT file.',\n", + " img= 'gene.png'\n", + ")%}\n", + "\n", + "{% do SectionField(\n", + " name= 'gmt_data',\n", + " title= '1. Upload differentially expressed genes as a GMT file.'\n", + ")%}\n", + "\n", + "{% do SectionField(\n", + " name= 'tf_section',\n", + " title= '2. Choose the number of top enriched TFs to include in the visualizations.'\n", + ") %}\n", + "\n", + "{% do SectionField(\n", + " name= 'final_section',\n", + " title= '3. Provide a short description of your study (optional).'\n", + ") %}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a693db24", + "metadata": {}, + "outputs": [], + "source": [ + "%%appyter code_eval\n", + "\n", + "{% do DescriptionField(\n", + " name= 'description_2',\n", + " text= 'Each row should consist of either the \"up\" or \"down\" differentially expressed genes at each time point comparison. The first element of each row should be formatted exactly as \"{time point comparison} up genes\" or \"{time point comparison} down genes\". See example files for how to format the input GMT file.',\n", + " section= 'gmt_data'\n", + ") %}\n", + "\n", + "{% set dataset3 = FileField(\n", + " name= 'dataset3',\n", + " label= 'GMT file containing DEGs from ADJACENT TIME POINT COMPARISONS',\n", + " default= '',\n", + " examples= {'First set of DEGs from TRAIL-treated TNBC cell line': 'https://minio.dev.maayanlab.cloud/chea-kg-timeseries/deseq2_adj_time_pts_degs.gmt'},\n", + " section= 'gmt_data'\n", + ") %}\n", + "\n", + "ds3 = {{dataset3}}\n", + "\n", + "{% set dataset4 = FileField(\n", + " name= 'dataset4',\n", + " label= 'GMT file containing DEGs from COMPARISONS WITH TIME POINT 0',\n", + " default= '',\n", + " examples= {'Second set of DEGS from TRAIL-treated TNBC cell line': 'https://minio.dev.maayanlab.cloud/chea-kg-timeseries/deseq2_compare_w_time_pt_0_degs.gmt'},\n", + " section= 'gmt_data'\n", + ") %}\n", + "\n", + "ds4 = {{dataset4}}\n", + "\n", + "{% set chosen_num = ChoiceField(\n", + " name= 'chosen_num_tfs_2',\n", + " label= 'Number of top enriched TFs',\n", + " default= '5',\n", + " choices= {'5': '1', '10': '2'},\n", + " section= 'tf_section'\n", + ") %}\n", + "\n", + "if not compute_degs:\n", + " num_tfs = {{chosen_num}}\n", + " num_tfs = str(num_tfs)\n", + "\n", + " if num_tfs == '1':\n", + " num_tfs = 5\n", + " if num_tfs == '2':\n", + " num_tfs = 10\n", + "\n", + "{% set user_description_2 = TextField(\n", + " name= 'user_description_2',\n", + " label= 'Description',\n", + " default= '',\n", + " section = 'final_section',\n", + ") %}\n", + "\n", + "if not compute_degs:\n", + " umap_desc = {{user_description_2}}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cde58a65", + "metadata": {}, + "outputs": [], + "source": [ + "if compute_degs and (dge_method == '1' or dge_method == '3'):\n", + " deseq2_tf_time_dict_1, deseq2_adj_time_pt_comparisons, deseq2_degs_1 = gmt_to_tf_time_dict(deseq2_degs_gmt_1, num_tfs)\n", + " deseq2_subnetwork_data_1 = create_tf_time_series_graph(deseq2_tf_time_dict_1, len(deseq2_adj_time_pt_comparisons), \"appyter_deseq2_adj_time_pts_top_5_tfs\")\n", + "\n", + " deseq2_tf_time_dict_2, deseq2_time_pt_0_comparisons, deseq2_degs_2 = gmt_to_tf_time_dict(deseq2_degs_gmt_2, num_tfs)\n", + " deseq2_subnetwork_data_2 = create_tf_time_series_graph(deseq2_tf_time_dict_2, len(deseq2_time_pt_0_comparisons), \"appyter_deseq2_compare_w_time_pt_0_top_5_tfs\")\n", + "\n", + "if compute_degs and (dge_method == '2' or dge_method == '3'):\n", + " cd_subnetwork_data_1 = create_tf_time_series_graph(cd_tf_time_dict_1, len(cd_adj_time_pt_comparisons), \"appyter_cd_adj_time_pts_top_5_tfs\")\n", + " cd_subnetwork_data_2 = create_tf_time_series_graph(cd_tf_time_dict_2, len(cd_time_pt_0_comparisons), \"appyter_cd_compare_w_time_pt_0_top_5_tfs\")\n", + "\n", + "if not compute_degs:\n", + " if ds3:\n", + " tf_time_dict_1, comparisons_1, deseq2_degs_1 = gmt_to_tf_time_dict(ds3, num_tfs)\n", + " subnetwork_data_1 = create_tf_time_series_graph(tf_time_dict_1, len(comparisons_1), \"appyter_deseq2_adj_time_pts_top_5_tfs\")\n", + "\n", + " if ds4:\n", + " tf_time_dict_2, comparisons_2, deseq2_degs_2 = gmt_to_tf_time_dict(ds4, num_tfs)\n", + " subnetwork_data_2 = create_tf_time_series_graph(tf_time_dict_2, len(comparisons_2), \"appyter_deseq2_compare_w_time_pt_0_top_5_tfs\")" + ] + }, + { + "cell_type": "markdown", + "id": "b3140530", + "metadata": {}, + "source": [ + "## Step 4. Visualizing the enriched TFs within a time series regulatory subnetwork\n", + "\n", + "The JSON file created in step 3 is used to visualize the regulatory network using ball-and-stick diagrams. Blue and red nodes correspond to TFs enriched for upregulated and downregulated gene sets at the given time point, respectively. Activation (green) or inhibition (red) arrows are drawn between TFs at adjacent time points (as determined by [ChEA-KG](https://chea-kg.maayanlab.cloud/)), therefore forming a regulatory subnetwork of all the enriched TFs. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d715c5cb", + "metadata": {}, + "outputs": [], + "source": [ + "def export_cytoscape_json(data, filename):\n", + " output_dir = os.environ.get(\"APP_STATIC\", \"/app/static\")\n", + " os.makedirs(output_dir, exist_ok=True)\n", + " json_output_path = os.path.join(output_dir, filename)\n", + "\n", + " with open(json_output_path, \"w\") as f:\n", + " json.dump(data, f, indent=2)\n", + " display(FileLink(json_output_path))\n", + "\n", + "\n", + "def visualize_network_2(data, filename):\n", + " row_col_dict = {}\n", + " for node in data[\"nodes\"]:\n", + " if \"position\" in node:\n", + " continue\n", + " node_id = node[\"data\"][\"id\"]\n", + " row_index = int(node_id.split(\"_\")[-1])\n", + " col_index = row_col_dict.get(row_index, 1)\n", + " row_col_dict[row_index] = col_index + 1 # sets the col_index for the next node in that row\n", + " node[\"position\"] = {\"x\": col_index * 150, \"y\": row_index * 150}\n", + "\n", + " cyto_widget = CytoscapeWidget()\n", + " cyto_widget.graph.add_graph_from_json(data)\n", + " cyto_widget.set_layout(name='preset')\n", + "\n", + " cyto_widget.set_style([{\n", + " 'selector': 'node',\n", + " 'style': {\n", + " 'label': 'data(label)',\n", + " 'background-color': 'data(color)',\n", + " 'border-width': 'data(borderWidth)',\n", + " 'border-color': 'data(borderColor)',\n", + " 'width': 50,\n", + " 'height': 50,\n", + " 'font-size': '16px',\n", + " 'font-weight': 'bold',\n", + " 'text-valign': 'center',\n", + " 'text-halign': 'center'\n", + " }\n", + " }, {\n", + " 'selector': '.row-label',\n", + " 'style': {\n", + " 'label': 'data(label)',\n", + " 'background-color': '#ffffff',\n", + " 'color': '#000000',\n", + " 'font-size': '20px',\n", + " 'width': 5,\n", + " 'height': 5,\n", + " 'text-valign': 'center',\n", + " 'text-halign': 'center'\n", + " }\n", + " }, {\n", + " 'selector': 'edge',\n", + " 'style': {\n", + " 'width': 3,\n", + " 'line-color': 'data(lineColor)',\n", + " 'target-arrow-color': 'data(lineColor)',\n", + " 'target-arrow-shape': 'data(directed)',\n", + " 'curve-style': 'bezier',\n", + " 'arrow-scale': 1.5\n", + " }\n", + " }])\n", + "\n", + " display(cyto_widget)\n", + " json_filename = f\"{filename}.json\"\n", + " export_cytoscape_json(data, json_filename)\n", + "\n", + "\n", + "def run_visualization(input_json, comparisons, filename):\n", + " if isinstance(input_json, dict):\n", + " data = input_json\n", + " else:\n", + " with open(input_json, \"r\") as input_file:\n", + " data = json.load(input_file)\n", + "\n", + " nodes = data[\"nodes\"]\n", + " for i, comparison in enumerate(comparisons):\n", + " label_node = {\n", + " 'data': {\n", + " 'id': f'row_label_{i}',\n", + " 'label': comparison,\n", + " 'color': '#ffffff',\n", + " 'borderColor': '#ffffff',\n", + " 'borderWidth': 0\n", + " },\n", + " 'classes': 'row-label',\n", + " 'position': {\"x\": 0, \"y\": i * 150}\n", + " }\n", + " nodes.append(label_node)\n", + "\n", + " visualize_network_2(data, filename)\n", + " return \"VISUALIZATION HAS RUN\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0e94259d", + "metadata": {}, + "outputs": [], + "source": [ + "if compute_degs and (dge_method == '1' or dge_method == '3'):\n", + " for i, comparison in enumerate(deseq2_adj_time_pt_comparisons):\n", + " all_enriched_tfs = list(deseq2_tf_time_dict_1[i][0]) + list(deseq2_tf_time_dict_1[i][1])\n", + " # print(all_enriched_tfs)\n", + "\n", + " up_enriched_tfs = []\n", + " for tf in all_enriched_tfs:\n", + " if tf in deseq2_degs_1[i][0]:\n", + " up_enriched_tfs.append(tf)\n", + " print(f\"{comparison} enriched TFs found in {comparison} up genes: {up_enriched_tfs}\")\n", + "\n", + " if i < (len(deseq2_adj_time_pt_comparisons) - 1):\n", + " up_enriched_tfs = []\n", + " for tf in all_enriched_tfs:\n", + " if tf in deseq2_degs_1[i+1][0]:\n", + " up_enriched_tfs.append(tf)\n", + " print(f\"{comparison} enriched TFs found in {deseq2_adj_time_pt_comparisons[i+1]} up genes: {up_enriched_tfs}\")\n", + "\n", + " down_enriched_tfs = []\n", + " for tf in all_enriched_tfs:\n", + " if tf in deseq2_degs_1[i][1]:\n", + " down_enriched_tfs.append(tf)\n", + " print(f\"{comparison} enriched TFs found in {comparison} down genes: {down_enriched_tfs}\")\n", + "\n", + " if i < (len(deseq2_adj_time_pt_comparisons) - 1):\n", + " down_enriched_tfs = []\n", + " for tf in all_enriched_tfs:\n", + " if tf in deseq2_degs_1[i+1][1]:\n", + " down_enriched_tfs.append(tf)\n", + " print(f\"{comparison} enriched TFs found in {deseq2_adj_time_pt_comparisons[i+1]} down genes: {down_enriched_tfs}\")\n", + "\n", + " run_visualization(deseq2_subnetwork_data_1, deseq2_adj_time_pt_comparisons, \"deseq2_adj_time_pt_visualization\")\n", + " display(Markdown(f\"##### *__Figure 1.1__: Subnetwork of enriched TFs determined from comparing gene expression at adjacent time points (DEGs computed using PyDESeq2). Blue TFs are enriched for up genes while red TFs are enriched for down genes. Activation and inhibition arrows depict the potential regulatory effects of a TF in an earlier time point on a TF in the subsequent time point.*\"))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "eeae420a", + "metadata": {}, + "outputs": [], + "source": [ + "if compute_degs and (dge_method == '1' or dge_method == '3'):\n", + " for i, comparison in enumerate(deseq2_time_pt_0_comparisons):\n", + " all_enriched_tfs = list(deseq2_tf_time_dict_2[i][0]) + list(deseq2_tf_time_dict_2[i][1])\n", + " # print(all_enriched_tfs)\n", + "\n", + " up_enriched_tfs = []\n", + " for tf in all_enriched_tfs:\n", + " if tf in deseq2_degs_2[i][0]:\n", + " up_enriched_tfs.append(tf)\n", + " print(f\"{comparison} enriched TFs found in {comparison} up genes: {up_enriched_tfs}\")\n", + "\n", + " if i < (len(deseq2_time_pt_0_comparisons) - 1):\n", + " up_enriched_tfs = []\n", + " for tf in all_enriched_tfs:\n", + " if tf in deseq2_degs_2[i+1][0]:\n", + " up_enriched_tfs.append(tf)\n", + " print(f\"{comparison} enriched TFs found in {deseq2_time_pt_0_comparisons[i+1]} up genes: {up_enriched_tfs}\")\n", + "\n", + " down_enriched_tfs = []\n", + " for tf in all_enriched_tfs:\n", + " if tf in deseq2_degs_2[i][1]:\n", + " down_enriched_tfs.append(tf)\n", + " print(f\"{comparison} enriched TFs found in {comparison} down genes: {down_enriched_tfs}\")\n", + "\n", + " if i < (len(deseq2_time_pt_0_comparisons) - 1):\n", + " down_enriched_tfs = []\n", + " for tf in all_enriched_tfs:\n", + " if tf in deseq2_degs_2[i+1][1]:\n", + " down_enriched_tfs.append(tf)\n", + " print(f\"{comparison} enriched TFs found in {deseq2_time_pt_0_comparisons[i+1]} down genes: {down_enriched_tfs}\")\n", + "\n", + " run_visualization(deseq2_subnetwork_data_2, deseq2_time_pt_0_comparisons, \"deseq2_compare_w_time_pt_0_visualization\")\n", + " display(Markdown(f\"##### *__Figure 1.2__: Subnetwork of enriched TFs determined from comparing gene expression at each time point to time point 0 (DEGs computed using PyDESeq2). Blue TFs are enriched for up genes while red TFs are enriched for down genes. Activation and inhibition arrows depict the potential regulatory effects of a TF in an earlier time point on a TF in the subsequent time point.*\"))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f2af5a30", + "metadata": {}, + "outputs": [], + "source": [ + "if compute_degs and dge_method == '2':\n", + " fig_nums = ('1.1', '1.2')\n", + "\n", + "if compute_degs and dge_method == '3':\n", + " fig_nums = ('1.3', '1.4')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d2c4abdc", + "metadata": {}, + "outputs": [], + "source": [ + "if compute_degs and (dge_method == '2' or dge_method == '3'):\n", + " for i, comparison in enumerate(cd_adj_time_pt_comparisons):\n", + " all_enriched_tfs = list(cd_tf_time_dict_1[i][0]) + list(cd_tf_time_dict_1[i][1])\n", + " # print(all_enriched_tfs)\n", + "\n", + " up_enriched_tfs = []\n", + " for tf in all_enriched_tfs:\n", + " if tf in cd_degs_1[i][0]:\n", + " up_enriched_tfs.append(tf)\n", + " print(f\"{comparison} enriched TFs found in {comparison} up genes: {up_enriched_tfs}\")\n", + "\n", + " if i < (len(cd_adj_time_pt_comparisons) - 1):\n", + " up_enriched_tfs = []\n", + " for tf in all_enriched_tfs:\n", + " if tf in cd_degs_1[i+1][0]:\n", + " up_enriched_tfs.append(tf)\n", + " print(f\"{comparison} enriched TFs found in {cd_adj_time_pt_comparisons[i+1]} up genes: {up_enriched_tfs}\")\n", + "\n", + " down_enriched_tfs = []\n", + " for tf in all_enriched_tfs:\n", + " if tf in cd_degs_1[i][1]:\n", + " down_enriched_tfs.append(tf)\n", + " print(f\"{comparison} enriched TFs found in {comparison} down genes: {down_enriched_tfs}\")\n", + "\n", + " if i < (len(cd_adj_time_pt_comparisons) - 1):\n", + " down_enriched_tfs = []\n", + " for tf in all_enriched_tfs:\n", + " if tf in cd_degs_1[i+1][1]:\n", + " down_enriched_tfs.append(tf)\n", + " print(f\"{comparison} enriched TFs found in {cd_adj_time_pt_comparisons[i+1]} down genes: {down_enriched_tfs}\")\n", + "\n", + " run_visualization(cd_subnetwork_data_1, cd_adj_time_pt_comparisons, \"cd_adj_time_pt_visualization\")\n", + " display(Markdown(f\"##### *__Figure {fig_nums[0]}__: Subnetwork of enriched TFs determined from comparing gene expression at adjacent time points (DEGs computed using CD). Blue TFs are enriched for up genes while red TFs are enriched for down genes. Activation and inhibition arrows depict the potential regulatory effects of a TF in an earlier time point on a TF in the subsequent time point.*\"))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "375df57a", + "metadata": {}, + "outputs": [], + "source": [ + "if compute_degs and (dge_method == '2' or dge_method == '3'):\n", + " for i, comparison in enumerate(cd_time_pt_0_comparisons):\n", + " all_enriched_tfs = list(cd_tf_time_dict_2[i][0]) + list(cd_tf_time_dict_2[i][1])\n", + " # print(all_enriched_tfs)\n", + "\n", + " up_enriched_tfs = []\n", + " for tf in all_enriched_tfs:\n", + " if tf in cd_degs_2[i][0]:\n", + " up_enriched_tfs.append(tf)\n", + " print(f\"{comparison} enriched TFs found in {comparison} up genes: {up_enriched_tfs}\")\n", + "\n", + " if i < (len(cd_time_pt_0_comparisons) - 1):\n", + " up_enriched_tfs = []\n", + " for tf in all_enriched_tfs:\n", + " if tf in cd_degs_2[i+1][0]:\n", + " up_enriched_tfs.append(tf)\n", + " print(f\"{comparison} enriched TFs found in {cd_time_pt_0_comparisons[i+1]} up genes: {up_enriched_tfs}\")\n", + "\n", + " down_enriched_tfs = []\n", + " for tf in all_enriched_tfs:\n", + " if tf in cd_degs_2[i][1]:\n", + " down_enriched_tfs.append(tf)\n", + " print(f\"{comparison} enriched TFs found in {comparison} down genes: {down_enriched_tfs}\")\n", + "\n", + " if i < (len(cd_time_pt_0_comparisons) - 1):\n", + " down_enriched_tfs = []\n", + " for tf in all_enriched_tfs:\n", + " if tf in cd_degs_2[i+1][1]:\n", + " down_enriched_tfs.append(tf)\n", + " print(f\"{comparison} enriched TFs found in {cd_time_pt_0_comparisons[i+1]} down genes: {down_enriched_tfs}\")\n", + "\n", + " run_visualization(cd_subnetwork_data_2, cd_time_pt_0_comparisons, \"cd_compare_w_time_pt_0_visualization\")\n", + " display(Markdown(f\"##### *__Figure {fig_nums[1]}__: Subnetwork of enriched TFs determined from comparing gene expression at each time point to time point 0 (DEGs computed using CD). Blue TFs are enriched for up genes while red TFs are enriched for down genes. Activation and inhibition arrows depict the potential regulatory effects of a TF in an earlier time point on a TF in the subsequent time point.*\"))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "64243b0a", + "metadata": {}, + "outputs": [], + "source": [ + "if not compute_degs:\n", + " if ds3:\n", + " for i, comparison in enumerate(comparisons_1):\n", + " all_enriched_tfs = list(tf_time_dict_1[i][0]) + list(tf_time_dict_1[i][1])\n", + " # print(all_enriched_tfs)\n", + "\n", + " up_enriched_tfs = []\n", + " for tf in all_enriched_tfs:\n", + " if tf in deseq2_degs_1[i][0]:\n", + " up_enriched_tfs.append(tf)\n", + " print(f\"{comparison} enriched TFs found in {comparison} up genes: {up_enriched_tfs}\")\n", + "\n", + " if i < (len(comparisons_1) - 1):\n", + " up_enriched_tfs = []\n", + " for tf in all_enriched_tfs:\n", + " if tf in deseq2_degs_1[i+1][0]:\n", + " up_enriched_tfs.append(tf)\n", + " print(f\"{comparison} enriched TFs found in {comparisons_1[i+1]} up genes: {up_enriched_tfs}\")\n", + "\n", + " down_enriched_tfs = []\n", + " for tf in all_enriched_tfs:\n", + " if tf in deseq2_degs_1[i][1]:\n", + " down_enriched_tfs.append(tf)\n", + " print(f\"{comparison} enriched TFs found in {comparison} down genes: {down_enriched_tfs}\")\n", + "\n", + " if i < (len(comparisons_1) - 1):\n", + " down_enriched_tfs = []\n", + " for tf in all_enriched_tfs:\n", + " if tf in deseq2_degs_1[i+1][1]:\n", + " down_enriched_tfs.append(tf)\n", + " print(f\"{comparison} enriched TFs found in {comparisons_1[i+1]} down genes: {down_enriched_tfs}\")\n", + "\n", + " run_visualization(subnetwork_data_1, comparisons_1, \"adj_time_pt_visualization\")\n", + " display(Markdown(f\"##### *__Figure 1.1__: Subnetwork of enriched TFs determined from comparing gene expression at adjacent time points. Blue TFs are enriched for up genes while red TFs are enriched for down genes. Activation and inhibition arrows depict the potential regulatory effects of a TF in an earlier time point on a TF in the subsequent time point.*\"))\n", + "\n", + " if ds4:\n", + " for i, comparison in enumerate(comparisons_2):\n", + " all_enriched_tfs = list(tf_time_dict_2[i][0]) + list(tf_time_dict_2[i][1])\n", + " # print(all_enriched_tfs)\n", + "\n", + " up_enriched_tfs = []\n", + " for tf in all_enriched_tfs:\n", + " if tf in deseq2_degs_2[i][0]:\n", + " up_enriched_tfs.append(tf)\n", + " print(f\"{comparison} enriched TFs found in {comparison} up genes: {up_enriched_tfs}\")\n", + "\n", + " if i < (len(comparisons_2) - 1):\n", + " up_enriched_tfs = []\n", + " for tf in all_enriched_tfs:\n", + " if tf in deseq2_degs_2[i+1][0]:\n", + " up_enriched_tfs.append(tf)\n", + " print(f\"{comparison} enriched TFs found in {comparisons_2[i+1]} up genes: {up_enriched_tfs}\")\n", + "\n", + " down_enriched_tfs = []\n", + " for tf in all_enriched_tfs:\n", + " if tf in deseq2_degs_2[i][1]:\n", + " down_enriched_tfs.append(tf)\n", + " print(f\"{comparison} enriched TFs found in {comparison} down genes: {down_enriched_tfs}\")\n", + "\n", + " if i < (len(comparisons_2) - 1):\n", + " down_enriched_tfs = []\n", + " for tf in all_enriched_tfs:\n", + " if tf in deseq2_degs_2[i+1][1]:\n", + " down_enriched_tfs.append(tf)\n", + " print(f\"{comparison} enriched TFs found in {comparisons_2[i+1]} down genes: {down_enriched_tfs}\")\n", + "\n", + " run_visualization(subnetwork_data_2, comparisons_2, \"compare_w_time_pt_0_visualization\")\n", + " display(Markdown(f\"##### *__Figure 1.2__: Subnetwork of enriched TFs determined from comparing gene expression at each time point to time point 0. Blue TFs are enriched for up genes while red TFs are enriched for down genes. Activation and inhibition arrows depict the potential regulatory effects of a TF in an earlier time point on a TF in the subsequent time point.*\"))" + ] + }, + { + "cell_type": "markdown", + "id": "92ffcff8", + "metadata": {}, + "source": [ + "## Step 5. Bar graph representation of enriched TF ranks\n", + "\n", + "The bar graphs below depict the ranks of the top enriched TFs across the TF-target gene set libraries in ChEA3 (Keenan et al., 2019). Bar graphs are shown for the enriched TFs for the \"up\" and \"down\" gene sets at each time point. \n", + "\n", + "Here, a TF's rank within a given library refers to how well the TF's target genes within that library overlap with the input gene set (e.g. up genes at the \"hour 3 vs hour 1\" comparison) compared to other TFs within that library. The top enriched TFs are computed by finding the TFs with the lowest average rank across all six ChEA3 libraries (referred to as the MeanRank method). " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c8a6c577", + "metadata": {}, + "outputs": [], + "source": [ + "def get_chea3_results(gene_set, query_name):\n", + " ADDLIST_URL = 'https://maayanlab.cloud/chea3/api/enrich/'\n", + " payload = {\n", + " 'gene_set': gene_set,\n", + " 'query_name': query_name\n", + " }\n", + " response = requests.post(ADDLIST_URL, data=json.dumps(payload))\n", + " if not response.ok:\n", + " # r.ok (where r is the object) returns whether the call to the url was successful\n", + " raise Exception('Error analyzing gene list')\n", + " time.sleep(1)\n", + " return json.loads(response.text) # .text returns the content of response in\n", + "\n", + "\n", + "def indexfinder(lib_score_list, value):\n", + " index = 1\n", + " for num in lib_score_list:\n", + " if num == value:\n", + " return index\n", + " elif num != 0:\n", + " index += 1" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a531fe3e", + "metadata": {}, + "outputs": [], + "source": [ + "def get_bar_graph(gene_set, query_name, num_tfs, title):\n", + " term_limit = num_tfs\n", + "\n", + " c_lib_palette = {'ARCHS4 Coexpression':'rgb(196, 8, 8)',\n", + " 'ENCODE ChIP-seq':'rgb(244, 109, 67)',\n", + " 'Enrichr Queries':'rgb(242, 172, 68)',\n", + " 'GTEx Coexpression':'rgb(236, 252, 68)',\n", + " 'Literature ChIP-seq':'rgb(165, 242, 162)',\n", + " 'ReMap ChIP-seq':'rgb(92, 217, 78)'}\n", + "\n", + " c_lib_means = {'ARCHS4 Coexpression': [0] * term_limit, 'ENCODE ChIP-seq': [0] * term_limit,\n", + " 'Enrichr Queries': [0] * term_limit, 'GTEx Coexpression': [0] * term_limit,\n", + " 'Literature ChIP-seq': [0] * term_limit, 'ReMap ChIP-seq': [0] * term_limit}\n", + " # creates a dictionary where each library is a key, and the values are empty lists with as\n", + " # many indices/spaces as the user has requested transcription factors (ex: if the user requests\n", + " # 15 TFs to be returned, the lists will have 15 spaces)\n", + "\n", + " libs_sorted = ['ARCHS4 Coexpression','ENCODE ChIP-seq','Enrichr Queries',\n", + " 'GTEx Coexpression','Literature ChIP-seq','ReMap ChIP-seq']\n", + "\n", + " results = get_chea3_results(gene_set, query_name)\n", + " mr_results = results['Integrated--meanRank']\n", + " # for MeanRank, the TFs are already ranked by 'Score' within mr_results\n", + "\n", + " for i in range(len(mr_results)):\n", + " for lib in libs_sorted:\n", + " mr_results[i].update({lib:0})\n", + "\n", + " for i in range(len(mr_results)):\n", + " thing = mr_results[i]['Library'].split(';')\n", + " for a in range(len(thing)):\n", + " library, value = thing[a].split(',')\n", + " mr_results[i].update({library:int(value)})\n", + "\n", + " sortedARCHS4 = sorted(mr_results, key = lambda k: k['ARCHS4 Coexpression'])\n", + " sortedGTEx = sorted(mr_results, key = lambda k: k['GTEx Coexpression'])\n", + " sortedEnrichr = sorted(mr_results, key = lambda k: k['Enrichr Queries'])\n", + " sortedENCODE = sorted(mr_results, key = lambda k: k['ENCODE ChIP-seq'])\n", + " sortedReMap = sorted(mr_results, key = lambda k: k['ReMap ChIP-seq'])\n", + " sortedLit = sorted(mr_results, key = lambda k: k['Literature ChIP-seq'])\n", + "\n", + " rankedARCHS4 = [entry['ARCHS4 Coexpression'] for entry in sortedARCHS4]\n", + " rankedENCODE = [entry['ENCODE ChIP-seq'] for entry in sortedENCODE]\n", + " rankedEnrichr = [entry['Enrichr Queries'] for entry in sortedEnrichr]\n", + " rankedGTEx = [entry['GTEx Coexpression'] for entry in sortedGTEx]\n", + " rankedLit = [entry['Literature ChIP-seq'] for entry in sortedLit]\n", + " rankedReMap = [entry['ReMap ChIP-seq'] for entry in sortedReMap]\n", + "\n", + " ranking_dict = {'ARCHS4 Coexpression':rankedARCHS4,\n", + " 'ENCODE ChIP-seq':rankedENCODE,\n", + " 'Enrichr Queries':rankedEnrichr,\n", + " 'GTEx Coexpression':rankedGTEx,\n", + " 'Literature ChIP-seq':rankedLit,\n", + " 'ReMap ChIP-seq':rankedReMap}\n", + "\n", + " # Computing MeanRank\n", + " for tfentry in mr_results:\n", + " tfentry.update( [('SumRank', 0), ('AvgRank', 0) ])\n", + " library_scores = tfentry['Library'].split(';')\n", + " lib_counter = 0\n", + " for a in library_scores:\n", + " l, v = a.split(',')\n", + " v = int(v)\n", + " #scorerank = ranking_dict[l].index(v) + 1\n", + " scorerank = indexfinder(ranking_dict[l], int(v))\n", + " tfentry['SumRank'] += int(scorerank)\n", + " lib_counter += 1\n", + " tfentry['AvgRank'] = (tfentry['SumRank'] / lib_counter)\n", + " sorted_results = sorted(mr_results, key = lambda k: k['AvgRank']) # rank by AvgRank (aka MeanRank method)\n", + "\n", + " sorted_top_results = [] # only adds num_tfs # of TFs\n", + " threshold = 3\n", + " index = 0\n", + " while (len(sorted_top_results) < term_limit) and (index < len(sorted_results)):\n", + " if len(sorted_results[index]['Library'].split(';')) >= threshold: # makes sure there are enough libraries\n", + " sorted_top_results.append(sorted_results[index])\n", + " index += 1\n", + "\n", + " sorted_top_results = sorted_top_results[::-1]\n", + "\n", + " sorted_tfs = []\n", + " for i in range(0, len(sorted_top_results)):\n", + " sorted_tfs.append(sorted_top_results[i].get('TF'))\n", + "\n", + " # Defining bar length\n", + " for i, tfentry in enumerate(sorted_top_results):\n", + " libscores = tfentry['Library'].split(';')\n", + " for a in libscores:\n", + " lib, value = a.split(',')\n", + " rank = indexfinder(ranking_dict[lib], int(value))\n", + " avg = tfentry['AvgRank']\n", + " tot = tfentry['SumRank']\n", + " bar_length = (rank*avg)/tot\n", + " c_lib_means[lib][i] = float(bar_length)\n", + "\n", + " # Plotting the actual bar chart\n", + " fig = go.Figure(data = [go.Bar(name = c_lib,\n", + " x = c_lib_means[c_lib],\n", + " y = sorted_tfs,\n", + " marker = go.bar.Marker(color = c_lib_palette[c_lib]),\n", + " orientation = 'h')\n", + " for c_lib in libs_sorted])\n", + " h = 400 if term_limit <=10 else 400+10*term_limit\n", + " fig.update_layout(barmode = 'stack')\n", + " '''fig.update_layout(\n", + " title = {\n", + " 'text': 'Stacked Bar Chart of Average Ranks in Different Libraries',\n", + " 'y': 0.67,\n", + " 'x': 0.5,\n", + " 'xanchor': 'center',\n", + " 'yanchor': 'top',\n", + " }\n", + " )'''\n", + " fig.update_layout(\n", + "\n", + " xaxis_title = 'Average of Ranks Across All Libraries',\n", + " yaxis_title = title,\n", + " font = dict(\n", + " size = 12,\n", + " color = 'black'\n", + " ),\n", + " width=900,\n", + " height=h\n", + " )\n", + " # uid = uuid.uuid4().hex[:8]\n", + " # folder = Path(\"chea3_graph_exports\")\n", + " # folder.mkdir(exist_ok=True)\n", + "\n", + " # svg_path = folder / f\"bar_plot_{uid}.svg\"\n", + " # png_path = folder / f\"bar_plot_{uid}.png\"\n", + " # jpg_path = folder / f\"bar_plot_{uid}.jpg\"\n", + "\n", + " # pio.write_image(fig, str(svg_path), format=\"svg\")\n", + " # pio.write_image(fig, str(png_path), format=\"png\")\n", + " # pio.write_image(fig, str(jpg_path), format=\"jpg\")\n", + "\n", + " # html_str = fig.to_html(include_plotlyjs='cdn')\n", + " # escaped_html = html.escape(html_str)\n", + " # iframe = f\"\"\"\n", + " # \n", + " # \"\"\"\n", + "\n", + " # download_html = f\"\"\"\n", + " #
\n", + " # Download plot:
\n", + " # Download as PNG
\n", + " #
\n", + " # \"\"\"\n", + "\n", + " # display(HTML(iframe + download_html))\n", + "\n", + " html_str = fig.to_html(include_plotlyjs='cdn')\n", + " escaped_html = html.escape(html_str)\n", + "\n", + " iframe = f\"\"\"\n", + " \n", + " \"\"\"\n", + "\n", + " display(HTML(iframe))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "361423c6", + "metadata": {}, + "outputs": [], + "source": [ + "if compute_degs and (dge_method == '1' or dge_method == '3'):\n", + " j = 1\n", + " for i, pair in enumerate(deseq2_degs_1):\n", + " up, down = pair\n", + " get_bar_graph(up, deseq2_adj_time_pt_comparisons[i], num_tfs, f\"Enriched TFs at {deseq2_adj_time_pt_comparisons[i]}\")\n", + " display(Markdown(f\"##### *__Figure 2.1.{j}__: Ranks of enriched TFs for UP genes at {deseq2_adj_time_pt_comparisons[i]} (computed using DESeq2) in their respective ChEA3 libraries. Bar length is proportional to rank (a narrower bar means a better rank).*\"))\n", + " get_bar_graph(down, deseq2_adj_time_pt_comparisons[i], num_tfs, f\"Enriched TFs at {deseq2_adj_time_pt_comparisons[i]}\")\n", + " display(Markdown(f\"##### *__Figure 2.1.{j+1}__: Ranks of enriched TFs for DOWN genes at {deseq2_adj_time_pt_comparisons[i]} (computed using DESeq2) in their respective ChEA3 libraries. Bar length is proportional to rank (a narrower bar means a better rank).*\"))\n", + " j += 2" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3e9a0066", + "metadata": {}, + "outputs": [], + "source": [ + "if compute_degs and (dge_method == '1' or dge_method == '3'):\n", + " j = 1\n", + " for i, pair in enumerate(deseq2_degs_2):\n", + " up, down = pair\n", + " get_bar_graph(up, deseq2_time_pt_0_comparisons[i], num_tfs, f\"Enriched TFs at {deseq2_time_pt_0_comparisons[i]}\")\n", + " display(Markdown(f\"##### *__Figure 2.2.{j}__: Ranks of enriched TFs for UP genes at {deseq2_time_pt_0_comparisons[i]} (computed using DESeq2) in their respective ChEA3 libraries. Bar length is proportional to rank (a narrower bar means a better rank).*\"))\n", + " get_bar_graph(down, deseq2_time_pt_0_comparisons[i], num_tfs, f\"Enriched TFs at {deseq2_time_pt_0_comparisons[i]}\")\n", + " display(Markdown(f\"##### *__Figure 2.2.{j+1}__: Ranks of enriched TFs for DOWN genes at {deseq2_time_pt_0_comparisons[i]} (computed using DESeq2) in their respective ChEA3 libraries. Bar length is proportional to rank (a narrower bar means a better rank).*\"))\n", + " j += 2" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f4dd8c65", + "metadata": {}, + "outputs": [], + "source": [ + "if compute_degs and dge_method == '2':\n", + " fig_nums = ('2.1', '2.2')\n", + "\n", + "if compute_degs and dge_method == '3':\n", + " fig_nums = ('2.3', '2.4')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f9da4fad", + "metadata": {}, + "outputs": [], + "source": [ + "if compute_degs and (dge_method == '2' or dge_method == '3'):\n", + " j = 1\n", + " for i, pair in enumerate(cd_degs_1):\n", + " up, down = pair\n", + " get_bar_graph(up, cd_adj_time_pt_comparisons[i], num_tfs, f\"Enriched TFs at {cd_adj_time_pt_comparisons[i]}\")\n", + " display(Markdown(f\"##### *__Figure {fig_nums[0]}.{j}__: Ranks of enriched TFs for UP genes at {cd_adj_time_pt_comparisons[i]} (computed using CD) in their respective ChEA3 libraries. Bar length is proportional to rank (a narrower bar means a better rank).*\"))\n", + " get_bar_graph(down, cd_adj_time_pt_comparisons[i], num_tfs, f\"Enriched TFs at {cd_adj_time_pt_comparisons[i]}\")\n", + " display(Markdown(f\"##### *__Figure {fig_nums[0]}.{j+1}__: Ranks of enriched TFs for DOWN genes at {cd_adj_time_pt_comparisons[i]} (computed using CD) in their respective ChEA3 libraries. Bar length is proportional to rank (a narrower bar means a better rank).*\"))\n", + " j += 2" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "22ed63a6", + "metadata": {}, + "outputs": [], + "source": [ + "if compute_degs and (dge_method == '2' or dge_method == '3'):\n", + " j = 1\n", + " for i, pair in enumerate(cd_degs_2):\n", + " up, down = pair\n", + " get_bar_graph(up, cd_time_pt_0_comparisons[i], num_tfs, f\"Enriched TFs at {cd_time_pt_0_comparisons[i]}\")\n", + " display(Markdown(f\"##### *__Figure {fig_nums[1]}.{j}__: Ranks of enriched TFs for UP genes at {cd_time_pt_0_comparisons[i]} (computed using CD) in their respective ChEA3 libraries. Bar length is proportional to rank (a narrower bar means a better rank).*\"))\n", + " get_bar_graph(down, cd_time_pt_0_comparisons[i], num_tfs, f\"Enriched TFs at {cd_time_pt_0_comparisons[i]}\")\n", + " display(Markdown(f\"##### *__Figure {fig_nums[1]}.{j+1}__: Ranks of enriched TFs for DOWN genes at {cd_time_pt_0_comparisons[i]} (computed using CD) in their respective ChEA3 libraries. Bar length is proportional to rank (a narrower bar means a better rank).*\"))\n", + " j += 2" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2b485ae4", + "metadata": {}, + "outputs": [], + "source": [ + "if not compute_degs:\n", + " if ds3:\n", + " j = 1\n", + " for i, pair in enumerate(deseq2_degs_1):\n", + " up, down = pair\n", + " get_bar_graph(up, comparisons_1[i], num_tfs, f\"Enriched TFs at {comparisons_1[i]}\")\n", + " display(Markdown(f\"##### *__Figure 2.1.{j}__: Ranks of enriched TFs for UP genes at {comparisons_1[i]} in their respective ChEA3 libraries. Bar length is proportional to rank (a narrower bar means a better rank).*\"))\n", + " get_bar_graph(down, comparisons_1[i], num_tfs, f\"Enriched TFs at {comparisons_1[i]}\")\n", + " display(Markdown(f\"##### *__Figure 2.1.{j+1}__: Ranks of enriched TFs for DOWN genes at {comparisons_1[i]} in their respective ChEA3 libraries. Bar length is proportional to rank (a narrower bar means a better rank).*\"))\n", + " j += 2\n", + "\n", + " if ds4:\n", + " j = 1\n", + " for i, pair in enumerate(deseq2_degs_2):\n", + " up, down = pair\n", + " get_bar_graph(up, comparisons_2[i], num_tfs, f\"Enriched TFs at {comparisons_2[i]}\")\n", + " display(Markdown(f\"##### *__Figure 2.2.{j}__: Ranks of enriched TFs for UP genes at {comparisons_2[i]} in their respective ChEA3 libraries. Bar length is proportional to rank (a narrower bar means a better rank).*\"))\n", + " get_bar_graph(down, comparisons_2[i], num_tfs, f\"Enriched TFs at {comparisons_2[i]}\")\n", + " display(Markdown(f\"##### *__Figure 2.2.{j+1}__: Ranks of enriched TFs for DOWN genes at {comparisons_2[i]} in their respective ChEA3 libraries. Bar length is proportional to rank (a narrower bar means a better rank).*\"))\n", + " j += 2" + ] + }, + { + "cell_type": "markdown", + "id": "9c947143", + "metadata": {}, + "source": [ + "## Step 6. Generating a UMAP visualization of the enriched TFs\n", + "\n", + "The enriched transcription factors from each time point are colored on a UMAP plot of 700 TFs identified by ChEA-KG to be \"source TFs\" (i.e. they exert regulatory effects on other TFs). The UMAP algorithm was performed using the TF-IDF scores of the TFs' target genes, meaning that TFs are placed in the same cluster if they generally regulate similar genes. \n", + "\n", + "Hovering over each data point reveals the gene identity of the TF, the cluster the TF belongs to, and if the TF is enriched at that time point. \n", + "\n", + "Use the slider to navigate between time points or click on the corresponding GIF link." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "90965bc8", + "metadata": {}, + "outputs": [], + "source": [ + "def process_tf_data(libdict, nneighbors=30, mindist=0.1, spread=1.0, maxdf=1.0, mindf=1):\n", + " # print(\"\\tTF-IDF vectorizing gene set data...\")\n", + " vec = TfidfVectorizer(max_df=maxdf, min_df=mindf)\n", + " X = vec.fit_transform(libdict.values())\n", + " adata = anndata.AnnData(X)\n", + " adata.obs.index = libdict.keys()\n", + "\n", + " # print(\"\\tPerforming Leiden clustering...\")\n", + " sc.pp.neighbors(adata, n_neighbors=nneighbors) # added use_rep='X'\n", + " sc.tl.leiden(adata, resolution=1.0)\n", + " sc.tl.umap(adata, min_dist=mindist, spread=spread, random_state=42)\n", + "\n", + " new_order = adata.obs.sort_values(by='leiden').index.tolist()\n", + " adata = adata[new_order, :] # added .copy()\n", + " adata.obs['leiden'] = 'Cluster ' + adata.obs['leiden'].astype('object')\n", + "\n", + " df = pd.DataFrame(adata.obsm['X_umap'])\n", + " df.columns = ['x', 'y']\n", + "\n", + " df['cluster'] = adata.obs['leiden'].values\n", + " df['term'] = adata.obs.index\n", + " df['genes'] = [libdict[l] for l in df['term']]\n", + "\n", + " return df\n", + "\n", + "\n", + "def get_scatter_colors(df):\n", + " clusters = pd.unique(df['cluster']).tolist()\n", + " n_clusters = len(clusters)\n", + " gray_shades = [f'#{int(v):02x}{int(v):02x}{int(v):02x}' for v in np.linspace(50, 230, n_clusters)]\n", + " color_mapper = {clusters[i]: gray_shades[i] for i in range(n_clusters)}\n", + " return color_mapper\n", + "\n", + "\n", + "def generate_df_for_comparison(base_df, tf_pair, comparison_label):\n", + " \"\"\"\n", + " Generates a new df for each time point.\n", + " \"\"\"\n", + " up_tfs, down_tfs = tf_pair[0], tf_pair[1]\n", + " df = base_df.copy()\n", + " color_mapper = get_scatter_colors(df)\n", + " df['color'] = df['cluster'].apply(lambda x: color_mapper[x])\n", + " df['size'] = 6\n", + " df['time_point'] = \"Not enriched\"\n", + "\n", + " for idx, term in df['term'].items():\n", + " if (term in up_tfs) and (term not in down_tfs):\n", + " df.at[idx, 'color'] = \"#1595f0\"\n", + " df.at[idx, 'size'] = 12\n", + " df.at[idx, 'time_point'] = comparison_label\n", + " if (term in down_tfs) and (term not in up_tfs):\n", + " df.at[idx, 'color'] = \"#f30a1a\"\n", + " df.at[idx, 'size'] = 12\n", + " df.at[idx, 'time_point'] = comparison_label\n", + " if (term in up_tfs) and (term in down_tfs):\n", + " df.at[idx, 'color'] = \"#26e411\"\n", + " df.at[idx, 'size'] = 12\n", + " df.at[idx, 'time_point'] = comparison_label\n", + " return df\n", + "\n", + "\n", + "def generate_legend_descriptions(df, tf_time_dict, comparisons, user_desc):\n", + " descriptions = []\n", + " for i in range(len(comparisons)):\n", + " up_tfs, down_tfs = tf_time_dict[i]\n", + " up_only = sorted([tf for tf in df['term'].tolist() if (tf in up_tfs and tf not in down_tfs)])\n", + " down_only = sorted([tf for tf in df['term'].tolist() if (tf not in up_tfs and tf in down_tfs)])\n", + " both = sorted([tf for tf in df['term'].tolist() if (tf in up_tfs and tf in down_tfs)])\n", + "\n", + " parts = []\n", + " if up_only:\n", + " parts.append(\"Enriched transcription factors for up genes: \" + \", \".join(up_only))\n", + " if down_only:\n", + " parts.append(\"Enriched transcription factors for down genes: \" + \", \".join(down_only))\n", + " if both:\n", + " parts.append(\"Enriched transcription factors for up AND down genes: \" + \", \".join(both))\n", + "\n", + " description = str(comparisons[i] + \"\\n\")\n", + " description += \"\\n\".join(parts)\n", + " description += \"\\n\\n\" + user_desc.strip()\n", + " descriptions.append(description)\n", + " return descriptions\n", + "\n", + "\n", + "def get_scatterplot(scatterdf, tf_time_dict=None, comparisons=None, image_dir=None, gif_filename=None):\n", + " \"\"\"\n", + " Generates images navigable via a slider, as well as all the images separately.\n", + " \"\"\"\n", + " df = scatterdf.copy()\n", + " df['cluster_number'] = df['cluster'].apply(lambda x: int(x.split(\" \")[-1]))\n", + " df.sort_values(by=['cluster_number'], inplace=True)\n", + " df.drop(columns = ['cluster_number'], inplace=True)\n", + "\n", + " sources = []\n", + " for i, label in enumerate(comparisons):\n", + " df_comp = generate_df_for_comparison(df, tf_time_dict[i], label)\n", + " source = ColumnDataSource(data=dict(x = df_comp['x'], y = df_comp['y'],\n", + " gene_set = df_comp['term'], colors = df_comp['color'],\n", + " label = df_comp['cluster'], size = df_comp['size'],\n", + " time_point = df_comp['time_point']))\n", + " sources.append(source)\n", + "\n", + " # source = sources[0]\n", + " source = ColumnDataSource(data=deepcopy(sources[0].data))\n", + " tooltips = [\n", + " (\"Gene Set\", \"@gene_set\"),\n", + " (\"Cluster\", \"@label\"),\n", + " (\"Time point\", \"@time_point\")\n", + " ]\n", + "\n", + " hover_emb = HoverTool(tooltips=tooltips)\n", + " tools_emb = [hover_emb, 'pan', 'wheel_zoom', 'reset', 'save']\n", + "\n", + " plot_emb = figure(\n", + " width=500*2,\n", + " height=400*2,\n", + " tools=tools_emb,\n", + " output_backend='canvas'\n", + " )\n", + "\n", + " plot_emb.scatter(\n", + " 'x',\n", + " 'y',\n", + " size = 'size',\n", + " source = source,\n", + " marker = 'circle',\n", + " fill_color = 'colors',\n", + " color = 'colors'\n", + " )\n", + "\n", + " color_mapper = get_scatter_colors(df)\n", + " for cluster, gray_color in color_mapper.items():\n", + " dummy_source = ColumnDataSource(data=dict(x=[None], y=[None]))\n", + " plot_emb.scatter(\n", + " 'x',\n", + " 'y',\n", + " source=dummy_source,\n", + " fill_color=gray_color,\n", + " color=gray_color,\n", + " marker='circle',\n", + " size=10,\n", + " legend_label=cluster\n", + " )\n", + "\n", + " # hide axis labels and grid lines\n", + " # plot_title = Title(text=comparisons[0], align='center')\n", + " # plot_title.text_font_size = '20pt'\n", + " # plot_title.text_font_style = 'bold'\n", + " # plot_emb.add_layout(plot_title, 'above')\n", + "\n", + " plot_emb.xaxis.major_tick_line_color = None\n", + " plot_emb.xaxis.minor_tick_line_color = None\n", + " plot_emb.yaxis.major_tick_line_color = None\n", + " plot_emb.yaxis.minor_tick_line_color = None\n", + " plot_emb.grid.grid_line_color = None\n", + " plot_emb.xaxis.major_label_text_font_size = '0pt'\n", + " plot_emb.yaxis.major_label_text_font_size = '0pt'\n", + "\n", + " plot_emb.xaxis.axis_label = \"UMAP-1\"\n", + " plot_emb.yaxis.axis_label = \"UMAP-2\"\n", + " plot_emb.xaxis.axis_label_text_font_size = '20pt'\n", + " plot_emb.yaxis.axis_label_text_font_size = '20pt'\n", + " plot_emb.xaxis.axis_label_text_font_style = \"normal\"\n", + " plot_emb.yaxis.axis_label_text_font_style = \"normal\"\n", + "\n", + " plot_emb.legend.label_text_font_size = '18pt'\n", + " plot_emb.legend.glyph_height = 20\n", + " plot_emb.legend.glyph_width = 20\n", + "\n", + " plot_emb.add_layout(plot_emb.legend[0], 'right')\n", + "\n", + " plot_emb.min_border_bottom = 168\n", + "\n", + " legend_descriptions = generate_legend_descriptions(df, tf_time_dict, comparisons, umap_desc)\n", + " description_source = ColumnDataSource(data=dict(descriptions=legend_descriptions))\n", + " description_label = Label(x=0, y=-7, x_units='screen', y_units='screen',\n", + " text=legend_descriptions[0],\n", + " text_font_size='10pt', text_align='left', name='dynamic_description')\n", + " plot_emb.add_layout(description_label, 'below')\n", + "\n", + " ### adding a slider ###\n", + " slider = Slider(start=0, end=len(sources) - 1, value=0, step=1, title=\"Comparison\")\n", + " comparison_source = ColumnDataSource(data=dict(comparisons=[str(c) for c in comparisons]))\n", + " callback = CustomJS(args=dict(source=source, slider=slider, sources=sources, plot=plot_emb,\n", + " comparison_source=comparison_source, description_source=description_source,\n", + " description_label=description_label), code=\"\"\"\n", + " const i = slider.value;\n", + " const new_data = sources[i].data;\n", + " const copied_data = {};\n", + " for (const key in new_data) {\n", + " copied_data[key] = [...new_data[key]];\n", + " }\n", + " source.data = copied_data;\n", + "\n", + " // Update legend description\n", + " const descriptions = description_source.data['descriptions'];\n", + " description_label.text = descriptions[i];\n", + "\n", + " source.change.emit();\n", + " \"\"\")\n", + " slider.js_on_change('value', callback)\n", + " show(column(slider, plot_emb))\n", + "\n", + " ### can either show or save the plot (cannot do both)\n", + " # output_file(\"top_10_tfs_deseq2_adjacent_time_pts_umap_plot.html\")\n", + " # save(column(slider, plot_emb))\n", + "\n", + " ### for isolated individual time point images ###\n", + " frame_dir = os.path.join(os.getcwd(), image_dir)\n", + " os.makedirs(frame_dir, exist_ok=True)\n", + " for i, label in enumerate(comparisons):\n", + " source.data = dict(sources[i].data)\n", + " description_label.text = legend_descriptions[i]\n", + " # plot_title.text = label\n", + " export_png(plot_emb, filename=os.path.join(frame_dir, f\"frame_{i:02d}_{label}.png\"))\n", + "\n", + " frame_paths = sorted([os.path.join(frame_dir, f) for f in os.listdir(frame_dir) if f.endswith(\".png\")])\n", + " images = [Image.open(frame) for frame in frame_paths]\n", + "\n", + " output_dir = os.environ.get(\"APP_STATIC\", \"/app/static\")\n", + " os.makedirs(output_dir, exist_ok=True)\n", + " new_gif_filename = os.path.join(output_dir, gif_filename)\n", + "\n", + " images[0].save(new_gif_filename, save_all=True, append_images=images[1:], duration=1500, loop=0)\n", + " display(FileLink(new_gif_filename, result_html_prefix=\"Click here to download: \"))\n", + " print(\"GIF of UMAP plot was successfully created.\")\n", + "\n", + " return plot_emb, source" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "04ed8470", + "metadata": {}, + "outputs": [], + "source": [ + "r = requests.get(\"https://minio.dev.maayanlab.cloud/hgrn-chear/network_target_sets.gmt\")\n", + "file = r.text.split(\"\\n\")\n", + "\n", + "lib_dict = OrderedDict()\n", + "for line in file[:-1]:\n", + " tokens = line.split(\"\\t\\t\")\n", + " term = tokens[0]\n", + " genes = [x.split(',')[0].strip() for x in tokens[1].split('\\t')]\n", + " lib_dict[term] = ' '.join(genes)\n", + "\n", + "## defaults: nneighbors=30, mindist=0.1, spread=1.0, maxdf=1.0, mindf=1\n", + "scatter_df = process_tf_data(\n", + " lib_dict,\n", + " nneighbors=20,\n", + " mindist=0.15,\n", + ")\n", + "\n", + "legend_description = (\"Blue dots are TFs enriched for upregulated DEGs.\\n\"\n", + " \"Red dots are TFs enriched for downregulated DEGs.\\n\"\n", + " \"Green dots are TFs enriched for both up- and downregulated DEGs.\\n\")\n", + "legend_description += umap_desc\n", + "\n", + "if compute_degs and (dge_method == '1' or dge_method == '3'):\n", + " deseq2_plot_emb_1, deseq2_source_1 = get_scatterplot(scatter_df, deseq2_tf_time_dict_1, deseq2_adj_time_pt_comparisons, \"umap_png_frames_deseq2_adjacent_time_pts\", f\"top_{num_tfs}_tfs_deseq2_adjacent_time_pts_umap.gif\")\n", + " display(Markdown(f\"##### *__Figure 3.1__: UMAP of the enriched TFs determined from comparing gene expression at adjacent time points (DEGs computed using PyDESeq2). The top {num_tfs} TFs for the up- and downregulated DEGs are shown in blue and red, respectively, on the UMAP plot. Green TFs (if any) are enriched in both the up and down gene sets.*\"))\n", + " deseq2_plot_emb_2, deseq2_source_2 = get_scatterplot(scatter_df, deseq2_tf_time_dict_2, deseq2_time_pt_0_comparisons, \"umap_png_frames_deseq2_compare_w_time_pt_0\", f\"top_{num_tfs}_tfs_deseq2_compare_w_time_pt_0_umap.gif\")\n", + " display(Markdown(f\"##### *__Figure 3.2__: UMAP of the enriched TFs determined from comparing gene expression at each time point to time point 0 (DEGs computed using PyDESeq2). The top {num_tfs} TFs for the up- and downregulated DEGs are shown in blue and red, respectively, on the UMAP plot. Green TFs (if any) are enriched in both the up and down gene sets.*\"))\n", + "\n", + "if compute_degs and dge_method == '2':\n", + " fig_nums = ('3.1', '3.2')\n", + "\n", + "if compute_degs and dge_method == '3':\n", + " fig_nums = ('3.3', '3.4')\n", + "\n", + "if compute_degs and (dge_method == '2' or dge_method == '3'):\n", + " cd_plot_emb_1, cd_source_1 = get_scatterplot(scatter_df, cd_tf_time_dict_1, cd_adj_time_pt_comparisons, \"umap_png_frames_cd_adjacent_time_pts\", f\"top_{num_tfs}_tfs_cd_adjacent_time_pts_umap.gif\")\n", + " display(Markdown(f\"##### *__Figure {fig_nums[0]}__: UMAP of the enriched TFs determined from comparing gene expression at adjacent time points (DEGs computed using CD). The top {num_tfs} TFs for the up- and downregulated DEGs are shown in blue and red, respectively, on the UMAP plot. Green TFs (if any) are enriched in both the up and down gene sets.*\"))\n", + " cd_plot_emb_2, cd_source_2 = get_scatterplot(scatter_df, cd_tf_time_dict_2, cd_time_pt_0_comparisons, \"umap_png_frames_cd_compare_w_time_pt_0\", f\"top_{num_tfs}_tfs_cd_compare_w_time_pt_0_umap.gif\")\n", + " display(Markdown(f\"##### *__Figure {fig_nums[1]}__: UMAP of the enriched TFs determined from comparing gene expression at each time point to time point 0 (DEGs computed using CD). The top {num_tfs} TFs for the up- and downregulated DEGs are shown in blue and red, respectively, on the UMAP plot. Green TFs (if any) are enriched in both the up and down gene sets.*\"))\n", + "\n", + "if not compute_degs:\n", + " if ds3:\n", + " plot_emb_1, source_1 = get_scatterplot(scatter_df, tf_time_dict_1, comparisons_1, \"umap_png_frames_deseq2_adjacent_time_pts\", f\"top_{num_tfs}_tfs_deseq2_adjacent_time_pts_umap.gif\")\n", + " display(Markdown(f\"##### *__Figure 3.1__: UMAP of the enriched TFs determined from comparing gene expression at adjacent time points. The top {num_tfs} TFs for the up- and downregulated DEGs are shown in blue and red, respectively, on the UMAP plot. Green TFs (if any) are enriched in both the up and down gene sets.*\"))\n", + "\n", + " if ds4:\n", + " plot_emb_2, source_2 = get_scatterplot(scatter_df, tf_time_dict_2, comparisons_2, \"umap_png_frames_deseq2_compare_w_time_pt_0\", f\"top_{num_tfs}_tfs_deseq2_compare_w_time_pt_0_umap.gif\")\n", + " display(Markdown(f\"##### *__Figure 3.2__: UMAP ofthe enriched TFs determined from comparing gene expression at each time point to time point 0. The top {num_tfs} TFs for the up- and downregulated DEGs are shown in blue and red, respectively, on the UMAP plot. Green TFs (if any) are enriched in both the up and down gene sets.*\"))" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "appyter_310", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.18" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/appyters/chea_kg_ts/requirements.txt b/appyters/chea_kg_ts/requirements.txt new file mode 100644 index 00000000..d756bfcd --- /dev/null +++ b/appyters/chea_kg_ts/requirements.txt @@ -0,0 +1,25 @@ +# python dependencies should be collected and kept in this file +# one line for each dependency. this file can be used to install +# dependencies with `pip install -r requirements.txt` +appyter +python-dotenv +flask +gunicorn +numpy +pandas +ipython +requests +ipycytoscape +dash +dash_cytoscape +plotly +scikit-learn +igraph +leidenalg +scanpy +anndata +bokeh +selenium +Pillow +maayanlab-bioinformatics @ git+https://github.com/Maayanlab/maayanlab-bioinformatics +pydeseq2