diff --git a/docs/04-genomic-file-formats/02-FASTQ-files.md b/docs/04-genomic-file-formats/02-FASTQ-files.md index ed7d77a..464858d 100644 --- a/docs/04-genomic-file-formats/02-FASTQ-files.md +++ b/docs/04-genomic-file-formats/02-FASTQ-files.md @@ -34,9 +34,9 @@ by newlines. Random access within FASTQ files is not typical—generally, FASTQs are used solely as input to some alignment process, which then produces a BAM file—so they are gzipped (not bgzipped) to save space. Conventionally, FASTQ file names -indicate which read the files contains (e.g. Sample.fastq.gz for single-end -sequencing or Sample_R1.fastq.gz and Sample_R2.fastq.gz in paired-end sequencing -where _R1 stands for "read one(s)" and _R2 stands for "read two(s)"). +indicate which read the files contains (e.g. `Sample.fastq.gz` for single-end +sequencing or `Sample_R1.fastq.gz` and `Sample_R2.fastq.gz` in paired-end sequencing +where `_R1` stands for "read one(s)" and `_R2` stands for "read two(s)"). Note that in the case of paired-end sequencing, it is crucial that each of the FASTQ files list their reads in same order. If even one read is deleted from either file, the entire read pairing will be off, which will have disastrous results during the alignment phase. To catch common formatting errors in single-end or paired-end FASTQ files, consider using [fqlib](https://github.com/stjude/fqlib) (specifically, the lint subcommand). @@ -82,4 +82,4 @@ zcat Sample_R2.fastq.gz | head -n 7 | gzip -c > Sample_R2.bad.fastq.gz fq lint Sample_R1.fastq.gz Sample_R2.bad.fastq.gz # Sample_R2.bad.fastq.gz:8:1: [S004] CompleteValidator: Incomplete record: quality is empty -``` \ No newline at end of file +```