Skip to content

split_fastq_barcodes

Tim Stevens edited this page Apr 6, 2018 · 1 revision

split_fastq_barcodes command options

A Python script to split FASTQ files, both paired or single-end, into separate files according to thier 5' and 3' barcode sequences

Usage

split_fastq_barcodes [-h] [-b BARCODE_FILE] [-a ANALYSIS_FILE] [-o OUT_DIR] [-m NUM_MISMATCHES] [-s BARCODE_SIZE] [-u] [-d] [-nb] [-ih] FASTQ_FILES) [FASTQ_FILE(S ...]

Command line arguments

positional arguments:

FASTQ_FILE(S)
One or two input FASTQ files to process. If two files are input they are assumed to be paired reads with matching rows. Accepts wildcards that match two files

optional arguments:

-h, --help
Show command line options and exit

-b BARCODE_FILE
Optional TSV file specifying expected/valid barcodes. Contains single barcodes or pairs joined with "-", one per row with optional sample names. An existing analysis output file (see -a option) may be used. If not specified only barcode analysis will be performed; split FASTQ files will not be created.

-a ANALYSIS_FILE
Optional TSV file name for output of barcode analysis. Defaults to using a name derived from the input FASTQ files tagged with "bc_report"

-o OUT_DIR
Output directory for the results. Defaults to the directory for the first input FASTQ file.

-m NUM_MISMATCHES, --max-mismatches NUM_MISMATCHES
Maximum number of basepair mitchmatches tolerated in a barcode sequence, unless the mismatch is ambiguous (does not distinguish barcodes). Default: 0

-s BARCODE_SIZE, --barcode-size BARCODE_SIZE
Barcode length in basepairs for analysis. Only required when not using -b or -ih options.

u, --write-unmached
Write out FASTQ data which cannot be matched to a barcode (i.e. 'lost' reads).

-d, --different-ends
Specifies that potentially different barcodes are used for the 5' and 3' ends of the same read pair. Ignored if using -b option.

-nb, --barcode-file-names
Suppress the use of sample names in output FASTQ files and name files with barcode sequences instead.

-ih, --illimina-head-barcodes
Take barcode sequences from Illumina format FASTQ header lines, rather than the sequence line. Assumes ":{barcode1}+{barcode2}" at end of the header.

Clone this wiki locally