nuc_process

nuc_process command options

The nuc_process command is used to create Hi-C chromatin contact data in NCC format given input FASTQ paired read files and a genome index/sequence.

Usage

nuc_process [-h] [-g GENOME_FILE] [-g2 GENOME_FILE_2] [-cn CHROM_NAME_FILE] [-cn2 CHROM_NAME_FILE_2] [-re1 ENZYME] [-re2 ENZYME] [-s SIZE_RANGE] [-n CPU_COUNT] [-r COUNT] [-o NCC_FILE] [-pdf PDF_FILE] [-b EXE_FILE] [-q SCHEME] [-qm MIN_QUALITY] [-m] [-p] [-pt PAIRED_READ_TAGS PAIRED_READ_TAGS] [-x] [-f FASTA_FILES [FASTA_FILES ...]] [-f2 FASTA_FILES_2 [FASTA_FILES_2 ...]] [-a] [-k] [-sam] [-l SEQUENCE] [-z] [-v] [-u] [-cc GENOME_COPIES] [-lim MAX_READS] [-5 CLIP_BP] [-3 CLIP_BP] [-ad [ADAPTER_SEQ [ADAPTER_SEQ ...]]] FASTQ_FILE [FASTQ_FILE ...]

Command line arguments

-h, --help Show command line options and exit

Positional arguments:

FASTQ_FILE [FASTQ_FILE ...] Input paired-read FASTQ files to process. Accepts wildcards that match paired files. If more than two files are input, processing will be run in batch mode using the same parameters.

Optional arguments:

-g GENOME_FILE
Location of genome index files to map sequence reads to without any file extensions like ".1.b2" etc. A new index will be created with the name if the index is missing and genome FASTA files are specified

-g2 GENOME_FILE_2 Location of secondary genome index files for hybrid genomes. A new index will be created with the name if the index is missing and genome FASTA files are specified

-cn CHROM_NAME_FILE Location of a file containing chromosome names for the genome build: tab-separated lines mapping sequence/contig names (as appear at the start of genome FASTA headers) to desired (human readable) chromosome names. This file is not mandatory if the primary restriction enzyme (-re1) is specified (i.e. not "None") and a corresponding RE1 mapping file has already been created for the genome. The naming file may be built automatically from NCBI genome FASTA files using the supplied "nuc_sequence_names" program

-cn2 CHROM_NAME_FILE_2 Location of a file containing chromosome names for a second hybrid genome build. This file is only mandatory if an RE1 mapping file has not already been created for the genome. The names in this file must exactly match those for the other genome build where chromosomes are homologous. The file may be built automatically from NCBI genome FASTA files using the supplied "nuc_sequence_names" program

-re1 ENZYME Primary restriction enzyme (for ligation junctions). May be set to "None" for MicroC etc., where digestion is not sequence specific. Options with "" denote promiscuous/secondary cleavage activity at star sites. Default: MboI. Available: AluI, BglII, DpnII, DpnII, HindIII, HindIII*, MboI, None

-re2 ENZYME Secondary restriction enzyme (if used). Available: AluI, BglII, DpnII, DpnII*, HindIII, HindIII*, MboI, None

-s SIZE_RANGE Allowed range of sequenced molecule sizes, e.g. "150-1000", "100,800" or "200" (no maximum)

-n CPU_COUNT Number of CPU cores to use in parallel

-r COUNT Minimum number of sequencing repeats required to support a contact

-o NCC_FILE Optional output name for NCC format chromosome contact file. This option will be ignored if more than two paired FASTA files are input (i.e. for batch mode); automated naming will be used instead. If the -a option is used this file will contain ambiguous contacts, as well as unambiguous.

-pdf PDF_FILE Optional output name for PDF format report file. This option will be ignored if more than two paired FASTA files are input (i.e. for batch mode); automated naming will be used instead.

-b EXE_FILE Path to bowtie2 (read aligner) executable (will be searched for if not specified)

-q SCHEME Use a specific FASTQ quality scheme (normally not set and deduced automatically). Available: phred33, phred64, solexa

-qm MIN_QUALITY Minimum acceptable FASTQ quality score in range 0-40 for clipping end of reads. Default: 10

-m Force a re-mapping of genome restriction enzyme sites (otherwise cached values will be used if present)

-p The input data is multi-cell/population Hi-C; single-cell processing steps are avoided

-pt PAIRED_READ_TAG PAIRED_READ_TAG When more than two FASTQ files are input (batch mode), the subtrings/tags which differ between paired FASTQ file paths. Default: r_1 r_2

-x, --reindex Force a re-indexing of the genome (given appropriate FASTA files)

-f FASTA_FILES [FASTA_FILES ...] Specify genome FASTA files for genome index building (accepts wildcards)

-f2 FASTA_FILES_2 [FASTA_FILES_2 ...] A second set of genome FASTA files for building a second genome index when using hybrid strain cells (accepts wildcards)

-a Whether to report ambiguously mapped contacts

-k Keep any intermediate files (e.g. clipped FASTQ etc)

-sam Write paired contacts files to SAM format

-l SEQUENCE Seek a specific ligation junction sequence (otherwise this is guessed from the primary restriction enzyme)

-z GZIP compress any output FASTQ files

-v, --verbose Display verbose messages to report progress

-u Whether to only accept uniquely mapping genome positions and not attempt to resolve certain classes of ambiguous mapping where a single perfect match is found.

-cc GENOME_COPIES Number of whole-genome, and hence chromosome copies, e.g. for G2 phase; Default 1 for a single genome index or 2 if second genome index is specified, for hybrid samples

-lim MAX_READS Limit the number of input reads considered: useful for testing population Hi-C data prior to a lengthy full run

-5 CLIP_BP Number of base pairs to trim from the 5' start of all input reads. Default: 0

-3 CLIP_BP Number of base pairs to trim from the 3' end of all input reads. Default: 0

-ad [ADAPTER_SEQ [ADAPTER_SEQ ...]] Adapter sequences to truncate reads at (or blank for none). E.g. Nextera:CTGTCTCTTATA, Illumina universal:AGATCGGAAGAGC. Default: AGATCGGAAGAGC (Illumina universal)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nuc_process

nuc_process command options

Usage

Command line arguments

Positional arguments:

Optional arguments:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally