diff --git a/manuals/example_1.md b/manuals/example_1.md index 03462392..3020a628 100644 --- a/manuals/example_1.md +++ b/manuals/example_1.md @@ -1,10 +1,12 @@ -# Example1 +# AlphaPulldown manual: + +# Example 1 # Aim: Find proteins involving human translation pathway that might interact with eIF4G2 ## 1st step: compute multiple sequence alignment (MSA) and template features (run on CPUs) -For the purpose of this manual, the expected file is already provided here: [`example_1_sequences.fasta`](../example_data/example_1_sequences.fasta). If you want to run a smaller test, you can use [`example_1_sequences_shorter.fasta`](../example_data/example_1_sequences_shorter.fasta) instead. +For the purpose of this manual, the expected file is already provided here: [`example_1_sequences.fasta`](./example_data/example_1_sequences.fasta). If you want to run a smaller test, you can use [`example_1_sequences_shorter.fasta`](./example_data/example_1_sequences_shorter.fasta) instead. :memo: *The example file was generated by downloading all 294 proteins that belong to human translation pathway from: [Reactome](https://reactome.org/PathwayBrowser/#/R-HSA-72766&DTAB=MT). eIF4G2 sequence was downloaded from (Uniprot:[P78344](https://www.uniprot.org/uniprot/P78344)).* @@ -32,7 +34,7 @@ MMSeqs2 and ColabFold allow for much quicker calculation of MSAs than the defaul ### Expected output -```create_individual_features.py``` will compute necessary features each protein in [`example_1_sequences.fasta`](../example_data/example_1_sequences.fasta) and store them in the ```output_dir```. Please be aware that everything after ```>``` will be +```create_individual_features.py``` will compute necessary features each protein in [`example_1_sequences.fasta`](./example_data/example_1_sequences.fasta) and store them in the ```output_dir```. Please be aware that everything after ```>``` will be taken as the description of the protein and **please be aware** that any special symbol, such as ```| : ; #```, after ```>``` will be replaced with ```_```. The name of the pickles will be the same as the descriptions of the sequences in fasta files (e.g. ">protein_A" in the fasta file will yield "protein_A.pkl") @@ -160,9 +162,9 @@ different number if you wish to run an array of jobs in parallel then the progra #### **Run in pulldown mode** -Inspired by pull-down assays, one can specify one or more proteins as "bait" and another list of proteins as "candidates". Then the programme will use AlphafoldMultimerV2 to predict interactions between baits (as in [`example_data/baits.txt`](../example_data/baits.txt)) and candidates (as in [`example_data/candidates.txt`](../example_data/candidates.txt)). +Inspired by pull-down assays, one can specify one or more proteins as "bait" and another list of proteins as "candidates". Then the programme will use AlphafoldMultimerV2 to predict interactions between baits (as in [`example_data/baits.txt`](./example_data/baits.txt)) and candidates (as in [`example_data/candidates.txt`](./example_data/candidates.txt)). -**Note** If you want to save time and run fewer jobs, you can use [`example_data/candidates_shorter.txt`](../example_data/candidates_shorter.txt) instead of [`example_data/candidates.txt`](../example_data/candidates.txt) +**Note** If you want to save time and run fewer jobs, you can use [`example_data/candidates_shorter.txt`](./example_data/candidates_shorter.txt) instead of [`example_data/candidates.txt`](./example_data/candidates.txt) In this example, we selected pulldown mode and made eIF4G2 (Uniprot:[P78344](https://www.uniprot.org/uniprot/P78344)) as a bait while the other 294 proteins as candidates. Thus, in total, there will be 1 * 294 = 294 predictions. @@ -184,7 +186,7 @@ run_multimer_jobs.py --mode=pulldown \ --remove_result_pickles=True ``` -:memo: To reproduce the results of Lassa virus Z protein vs L protein fragments written in our paper, simply use [`baits_Z_protein.txt`](../example_data/baits_Z_protein.txt) and [`L_protein_fragments.txt`](../example_data/L_protein_fragments.txt) as the ```--protein_lists```inputs. This example shows also how to run the interaction screen for fragments of proteins, keeping the original full-length residue numbering in the output! +:memo: To reproduce the results of Lassa virus Z protein vs L protein fragments written in our paper, simply use [`baits_Z_protein.txt`](./example_data/baits_Z_protein.txt) and [`L_protein_fragments.txt`](./example_data/L_protein_fragments.txt) as the ```--protein_lists```inputs. This example shows also how to run the interaction screen for fragments of proteins, keeping the original full-length residue numbering in the output! ✨ **New Features** Now AlphaPulldown supports integrative structural modelling if the user has experimental cross-link data. Please refer to [this manual](run_with_AlphaLink2.md) if you'd like to model your protein complexes with cross-link MS data as extra input. @@ -346,7 +348,7 @@ By default, you will have a csv file named ```predictions_with_good_interpae.csv ## Appendix: Instructions on running in `all_vs_all` mode -As the name suggest, all_vs_all means predict all possible pairwise comparisons within a single input file. The input can be either full-length proteins or regions of a protein, as illustrated in the [`example_all_vs_all_list.txt`](../example_data/example_all_vs_all_list.txt) and the figure below: +As the name suggest, all_vs_all means predict all possible pairwise comparisons within a single input file. The input can be either full-length proteins or regions of a protein, as illustrated in the [`example_all_vs_all_list.txt`](./example_data/example_all_vs_all_list.txt) and the figure below: ![plot](./all_vs_all_demo.png) The corresponding command is: diff --git a/manuals/example_2.md b/manuals/example_2.md index 7a1031c4..bf49efd2 100644 --- a/manuals/example_2.md +++ b/manuals/example_2.md @@ -1,12 +1,12 @@ # AlphaPulldown manual: -# Example2 +# Example 2 # Aims: Model interactions between Lassa virus L protein and Z matrix protein; Determine the oligomer state of _E.coli_ Single-stranded DNA-binding protein (SSB) ## 1st step: compute multiple sequence alignment (MSA) and template features (run on CPUs) -Firstly, download sequences of L (Uniprot: [O09705](https://www.uniprot.org/uniprotkb/O09705/entry)) and Z(uniprot:[O73557](https://www.uniprot.org/uniprotkb/O73557/entry)) proteins. The result is [`example_2_sequences.fasta`](../example_data/example_2_sequences.fasta) +Firstly, download sequences of L (Uniprot: [O09705](https://www.uniprot.org/uniprotkb/O09705/entry)) and Z(uniprot:[O73557](https://www.uniprot.org/uniprotkb/O73557/entry)) proteins. The result is [`example_2_sequences.fasta`](./example_data/example_2_sequences.fasta) Now run: @@ -27,17 +27,17 @@ taken as the description of the protein and **please be aware** that any specia ------------------------ -## 1.1 Explanation about the parameters +## Explanation about the parameters -See [Example 1](https://github.com/KosinskiLab/AlphaPulldown/blob/main/example_1.md#11-explanation-about-the-parameters) +See [Example 1](./example_1.md#explanation-about-the-parameters) ## 2nd step: Predict structures (run on GPU) #### **Task 1** -We want to predict the structure of full-length L protein together with Z protein. However, as the L protein is very long, many users would not have a GPU card with sufficient memory. Moreover, when attempting modeling the full L-Z, the resulting model does not match the known cryo-EM structure. In [Example 1](https://github.com/KosinskiLab/AlphaPulldown/blob/main/example_1.md), we showed how to use AlphaPulldown to find the interaction site by screening fragments using the ```pullldown``` mode. Here, to demonstrate the ```custom``` mode, we will assume the we know the interaction site and model the fragment using this mode, as demonstrated in the figure below ![custom_demo_2.png](./custom_demo_2.png): +We want to predict the structure of full-length L protein together with Z protein. However, as the L protein is very long, many users would not have a GPU card with sufficient memory. Moreover, when attempting modeling the full L-Z, the resulting model does not match the known cryo-EM structure. In [Example 1](./example_1.md), we showed how to use AlphaPulldown to find the interaction site by screening fragments using the ```pullldown``` mode. Here, to demonstrate the ```custom``` mode, we will assume the we know the interaction site and model the fragment using this mode, as demonstrated in the figure below ![custom_demo_2.png](./custom_demo_2.png): -Different proteins are seperated by ```;```. If a particular region is wanted from one protein, simply add ```,``` after that protein and followed by the region. Region comes in the format of ```number1-number2```. An example input file is: [`custom_mode.txt`](../example_data/custom_mode.txt) +Different proteins are seperated by ```;```. If a particular region is wanted from one protein, simply add ```,``` after that protein and followed by the region. Region comes in the format of ```number1-number2```. An example input file is: [`custom_mode.txt`](./example_data/custom_mode.txt) The command line interface for using custom mode will then become: @@ -125,7 +125,7 @@ or ``` #### **Task 2** -This taks is to determine the oligomer state of SSB protein [(Uniprot:P0AGE0)](https://www.uniprot.org/uniprotkb/P0AGE0/entry#function) by modelling its monomeric, homodimeric, homotrimeric, and homoquatrameric structures. Thus, homo-oligomer mode is needed. An oligomer state file will tell the programme the number of units. An example is: [`example_oligomer_state_file.txt`](../example_data/example_oligomer_state_file.txt) +This taks is to determine the oligomer state of SSB protein [(Uniprot:P0AGE0)](https://www.uniprot.org/uniprotkb/P0AGE0/entry#function) by modelling its monomeric, homodimeric, homotrimeric, and homoquatrameric structures. Thus, homo-oligomer mode is needed. An oligomer state file will tell the programme the number of units. An example is: [`example_oligomer_state_file.txt`](./example_data/example_oligomer_state_file.txt) In the file, oligomeric states of the corresponding proteins should be separated by ```,``` e.g. ```protein_A,3```means a homotrimer for protein_A ![homo-oligomer_demo](./homooligomer_demo.png) @@ -293,7 +293,7 @@ By default, you will have a csv file named ```predictions_with_good_interpae.csv ## Appendix: Instructions on running in `all_vs_all` mode -As the name suggest, all_vs_all means predict all possible combinations within a single input file. The input can be either full-length proteins or regions of a protein, as illustrated in the [`example_all_vs_all_list.txt`](../example_data/example_all_vs_all_list.txt) and the figure below: +As the name suggest, all_vs_all means predict all possible combinations within a single input file. The input can be either full-length proteins or regions of a protein, as illustrated in the [`example_all_vs_all_list.txt`](./example_data/example_all_vs_all_list.txt) and the figure below: ![plot](./all_vs_all_demo.png) The corresponding command is: diff --git a/manuals/example_3.md b/manuals/example_3.md index 7a997689..83a5988a 100644 --- a/manuals/example_3.md +++ b/manuals/example_3.md @@ -1,6 +1,6 @@ # AlphaPulldown manual: -# Example3 +# Example 3 # Aims: Model activation of phosphoinositide 3-kinase by the influenza A virus NS1 protein (PDB: 3L4Q) ## 1st step: compute multiple sequence alignment (MSA) and template features using provided pbd templates (run on CPU) @@ -43,14 +43,14 @@ It is also possible to combine all your fasta files into a single fasta file. ------------------------ -## 1.1 Explanation about the parameters +## Explanation about the parameters -See [Example 1](https://github.com/KosinskiLab/AlphaPulldown/blob/main/manuals/example_1.md#11-explanation-about-the-parameters) +See [Example 1](./example_1.md#explanation-about-the-parameters) ## 2nd step: Predict structures (run on GPU) #### **Task 1** -To predict structure we can use the usual ```run_multimer_jobs.py``` in custom mode (See [Example 2](https://github.com/KosinskiLab/AlphaPulldown/blob/main/manuals/example_2.md#2nd-step-predict-structures-run-on-gpu)) with an extra ```--multimeric_mode=True``` flag, that deactivates per-chain multimeric binary mask. +To predict structure we can use the usual ```run_multimer_jobs.py``` in custom mode (See [Example 2](./example_2.md#2nd-step-predict-structures-run-on-gpu)) with an extra ```--multimeric_mode=True``` flag, that deactivates per-chain multimeric binary mask. The user can also specify the depth of the MSA that is taken for modelling to increase the influence of the template on the predicted model. This can be done by using the flag ```--msa_depth```. It's always recommended running with all 5 AlphaFold Multimer settings but if you want to save time, you could specify the model name(s) you want to run, use the following flag: ```--model_names=model_1_multimer_v3,model_2_multimer_v3``` (for models 1 and 2). If you do not know the exact MSA depth, there is another flag ```--gradient_msa_depth=True``` for exploring the desired MSA depth. This flag generates a set of logarithmically distributed points (denser at lower end) with the number of points equal to the number of predictions. The MSA depth (```num_msa```) starts from 16 and ends with the maximum value taken from the model config file. The ```extra_num_msa``` is always calculated as ```4*num_msa```. The command line interface for using custom mode will then become: @@ -187,4 +187,4 @@ or --models_to_relax=all ``` -After the successful run one can evaluate and visualise the results in a usual manner (see e.g. [Example 2](https://github.com/KosinskiLab/AlphaPulldown/blob/main/manuals/example_2.md#2nd-step-predict-structures-run-on-gpu)) +After the successful run one can evaluate and visualise the results in a usual manner (see e.g. [Example 2](./example_2.md#2nd-step-predict-structures-run-on-gpu))