-
Notifications
You must be signed in to change notification settings - Fork 10
split_fastq_barcodes
A Python script to split FASTQ files, both paired or single-end, into separate files according to thier 5' and 3' barcode sequences
split_fastq_barcodes [-h] [-b BARCODE_FILE] [-a ANALYSIS_FILE] [-o OUT_DIR] [-m NUM_MISMATCHES] [-s BARCODE_SIZE] [-u] [-d] [-nb] [-ih] FASTQ_FILES) [FASTQ_FILE(S ...]
FASTQ_FILE(S)
One or two input FASTQ files to process. If two files
are input they are assumed to be paired reads with
matching rows. Accepts wildcards that match two files
-h, --help
Show command line options and exit
-b BARCODE_FILE
Optional TSV file specifying expected/valid barcodes.
Contains single barcodes or pairs joined with "-", one
per row with optional sample names. An existing
analysis output file (see -a option) may be used. If
not specified only barcode analysis will be performed;
split FASTQ files will not be created.
-a ANALYSIS_FILE
Optional TSV file name for output of barcode analysis.
Defaults to using a name derived from the input FASTQ
files tagged with "bc_report"
-o OUT_DIR
Output directory for the results. Defaults to the
directory for the first input FASTQ file.
-m NUM_MISMATCHES, --max-mismatches NUM_MISMATCHES
Maximum number of basepair mitchmatches tolerated in a
barcode sequence, unless the mismatch is ambiguous
(does not distinguish barcodes). Default: 0
-s BARCODE_SIZE, --barcode-size BARCODE_SIZE
Barcode length in basepairs for analysis. Only
required when not using -b or -ih options.
u, --write-unmached
Write out FASTQ data which cannot be matched to a
barcode (i.e. 'lost' reads).
-d, --different-ends
Specifies that potentially different barcodes are used
for the 5' and 3' ends of the same read pair. Ignored
if using -b option.
-nb, --barcode-file-names
Suppress the use of sample names in output FASTQ files
and name files with barcode sequences instead.
-ih, --illimina-head-barcodes
Take barcode sequences from Illumina format FASTQ
header lines, rather than the sequence line. Assumes
":{barcode1}+{barcode2}" at end of the header.