A tool to identify bacterial hosts for phages on the basis of genomic sequences of bacteriophages and bacteria. Host4Phage uses bacterial CRISPR-Cas system for this purpose. The tool supports multithreading.
Host4phage uses other available tools:
PILER-CR ---> Reference | Source
CRT ---> Reference | Source
MinCED --> Source
CRISPRDetect --> Reference | Source
Kmer-db --> Reference | Source
All the above mentioned tools will be called from the tool/bin folder.
- To run host4phage.py you'll need Python 3.8.8 or greater.
- Python dependencies:
tqdm-->pip install tqdmandjoblib-->pip install joblib. Check out --> tqdm joblib. - CRT and MinCED tools require Java Runtime Environment.
- CRISPRDetect tool requires the following tools:
clustalwwaterseqretRNAfoldcd-hit-estblastn. Check out--> CRISPRDetect. FASTAextension for input files is required -->(*.fasta, *.fna, *.fa)
The tool uses two subcommands: spacers and compare.
spacerssubcommand is responsible for identifying and extracting spacers.comparesubcommand is responsible for finding common sequences for hosts and bacteriophages by using k-mers.
Host4Phage with spacers subcommand can be called from the command line in the following way (quick usage):
python tool/host4phage.py spacers -i host_20_test -o output_spacers/piler -m piler
Host4Phage with compare subcommand can be called from the command line in the following way (quick usage):
python tool/host4phage.py compare -s output_spacers -v virus_20_test -o output_compare
Parameters - spacers subcommand:
| Name | Requiredness | Description |
|---|---|---|
-input/-i |
obligatory | Directory path with bacterial genomes - files should contain FASTA extension (*.fasta, *.fna, *.fa). |
-method/-m |
obligatory | Method for CRISPR sequence identification - piler/crt/minced/crisprdetect. |
-threads/-t |
optional | Number of threads - is adjusted by default to the number of processor threads in a user's computer. |
-output/-o |
optional | Directory path where two subdirectories will be created: output containing result files of the selected method and fasta containing extracted spacers - by default, the directory named spacers will be created. |
Parameters - compare subcommand:
| Name | Requiredness | Description |
|---|---|---|
-spacers/-s |
obligatory | Directory path with extracted spacers - you can combine results from all methods for identyfing CRISPR sequences in two ways. The first one is to pass a directory where subdirectories with the result files are located (e.g., the output_spacers directory will contain subdirectories with spacers for all methods and you can use only -s output_spacers). The second one is to pass paths to the results of each method separately in a single command. Files with spacers should contain FASTA extension (*.fasta, *.fna, *.fa). |
-virus/-v |
obligatory | Directory path with bacteriophage genomes - files should contain FASTA extension (*.fasta, *.fna, *.fa). |
-k |
optional | Length of k-mers - viral genomes and CRISPR spacers found in hosts will be divided into sequences of the given length - by default, k = 18. |
-threads/-t |
optional | Number of threads - is adjusted by default to the number of processor threads in a user's computer. |
-output/-o |
optional | Directory path where a file with .CSV extension will be created - by default, the directory will be named comparison. The file will contain number of common k-mers for each bacterial and bacteriophage species. |
You can also find the description of the parameters by using python tool/host4phage.py --help.