GitHub

MInimap2 Contig ClassifieR (MICCR)

MICCR is a taxonomically classifier for contigs that leverage minimap2 to map sequences to a reference genome database. Based on the mapping results, MICCR annotated the mapped regions of a contig by analyzing LCA taxonomy from the best hits.

REQUIREMENTS

python >= 3.0
pandas >= 0.23.0
minimap2

RESULTS

MICCR provides two tsv files PREFIX.ctg.tsv and PREFIX.lca_ctg.tsv. The '.ctg.tsv' has all 11 columns below to show all of taxonomic annotation regions by regions. The '.lca_ctg.tsv' includes first 10 columns below to display the LCA results of qualified regions in each contig.

COLUMN	NAME	DESCRIPTION
1	CONTIG	Name of the contig
2	LENGTH	Length of the contig
3	START	Start position of taxonomic annotated region (0-based)
4	END	End position of taxonomic annotated region
5	LCA_TAXID	LCA taxonomy ID of all qualified alignments mapped to this region
6	LCA_RANK	LCA taxonomy rank of all qualified alignments mapped to this region
7	LCA_NAME	LCA taxonomy name of all qualified alignments mapped to this region
8	HIT_COUNT	Number of accounted alignments
9	AGG_LENGTH	Additional annotated length this mapped region can provide
10	AVG_IDENTITY	Approximate mapping identity based on minimap2
11	AGG_REGION	Start and end positions of additional annotated length

USAGE

usage: ./miccr.py [-h] (-i [FASTA] | -f [PAF]) [-d [FASTA/MMI]] [-dp [PATH]]
                  [-x {asm5,asm10,map-pb,map-ont}] [-t <INT>] [-c] [-o [DIR]]
                  [-p <STR>] [-sc <FLOAT>] [-mp <FLOAT>] [-if <FLOAT>]
                  [--silent] [-v]

MInimap2 Contig ClassifieR (MICCR) 0.0.2

optional arguments:
  -h, --help            show this help message and exit
  -i [FASTA], --input [FASTA]
                        Input one or multiple contig files in FASTA format.
                        Use space to separate multiple input files.
  -f [PAF], --paf [PAF]
                        Input a PAF alignment file.
  -d [FASTA/MMI], --database [FASTA/MMI]
                        Name/path of readmapper's index [default: None]
  -dp [PATH], --dbPath [PATH]
                        Path of databases. If dbPath isn't specified but a
                        path is provided in "--database" option, this path of
                        database will also be used in dbPath. Otherwise, the
                        program will search "database/" in program directory.
                        [default: database/]
  -x {asm5,asm10,map-pb,map-ont}, --platform {asm5,asm10,map-pb,map-ont}
                        You can specify one of the following platform:
                        "asm5"    : Long assembly to reference mapping (avg divergence < 5%);
                        "asm10"   : Long assembly to reference mapping (avg divergence < 10%);
                        "asm20"   : Long assembly to reference mapping (avg divergence < 20%);
                        "map-pb"  : PacBio/Oxford Nanopore read to reference mapping;
                        "map-ont" : Slightly more sensitive for Oxford Nanopore to reference mapping;
                        [default: 'asm10']
  -t <INT>, --numthreads <INT>
                        Number of cpus [default: 1]
  -c, --stdout          Output to STDOUT [default: False]
  -o [DIR], --outdir [DIR]
                        Output directory [default: .]
  -p <STR>, --prefix <STR>
                        Prefix of the output file [default:
                        <INPUT_FILE_PREFIX>]
  -sc <FLOAT>, --SkipCumsumLcaProp <FLOAT>
                        [LCA options] For each contig, consider a segment
                        qualified for LCA when the cumulative AGG_LENGTH <
                        specified proportion of contig length
  -mp <FLOAT>, --minLcaProp <FLOAT>
                        [LCA options] For each contig, consider a segment
                        qualified for LCA if AGG_LENGTH > specified proportion
                        of contig length
  -if <FLOAT>, --iqrfactor <FLOAT>
                        [LCA options] Specify a facter (f). Classify qualified
                        segments which AGG_LENGTH > Q1+f*IQR, where
                        Q1/3=first/third quartile of AGG_LENGTH and
                        IQR=(Q3-Q1). [default: 0.5]
  --silent              Disable all messages.
  -v, --verbose         Provide verbose running messages.

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
test		test
utils		utils
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
.travis.yml		.travis.yml
CHANGELOG		CHANGELOG
LICENSE		LICENSE
README.md		README.md
miccr.py		miccr.py
requirements.txt		requirements.txt
taxonomy.py		taxonomy.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MInimap2 Contig ClassifieR (MICCR)

REQUIREMENTS

RESULTS

USAGE

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

poeli/miccr

Folders and files

Latest commit

History

Repository files navigation

MInimap2 Contig ClassifieR (MICCR)

REQUIREMENTS

RESULTS

USAGE

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages