Releases: ChaissonLab/danbing-tk
*conda test*
v1.3.2.5 Missed commit. Respect CXX option
danbing-tk v1.3.2
Major changes:
- Automated bias correction using
danbing-tk-pred
Resources:
ikmer.metarequired bydanbing-tk-predikmer.meta.txthuman readable version ofikmer.meta.txtwith format documented in Wiki- Example
trkmers.meta.txtrequired bydanbing-tk-pred
Next release (v1.3.3):
- Automated dosage computation for motifs and TR loci
danbing-tk v1.3.1 (manuscript)
This version is associated with the manuscript: "The motif composition of variable-number tandem repeats impacts gene expression"
Major changes:
- Updated preferred usage of danbing-tk by turning on kmer filter:
-kf 4 1 - Reduces
*.tr.kmersoutput size by saving only counts, and uses index file to reconstruct locus name and kmer names
Resources in Assets:
tr.good.bed: VNTR set for building RPGG
Additional resource on Zenodo:
- VNTR statistics and annotations on 35 HGSVC assemblies
- RPGG built from the annotations
- GTEx gene-level eVNTR discoveries
- GTEx gene-level eMotif discoveries
- GTEx fine-mapping results using susieR
- Bias matrices for HGSVC, HPRC, GTEx, and Geuvadis samples used in bias correction
- GTEx bias-corrected kmer dosage table
- Geuvadis bias-corrected kmer dosage table
Additional analysis scripts for bias correction, eQTL mapping, and fine-mapping are available in this repo.
danbing-tk v1.3
Improvements:
- Significantly improve the time/mem usage of danbing-tk
- benchmark setting
- 31x HG00731 SRS sample from 1000 Genomes Project
- two-consortium RPGG, 81045 loci
- 16 cores xeon-2665, avx
samtools fasta -@2 -n $bam | danbing-tk -a -kf 4 1 -gc 80 -k 21 -qs pan -fa /dev/stdin -o $out -p 16 -cth 45 | gzip >$aln
- Sample was genotyped in ~43 min using 31.4 Gb mem
- 24x speedup, 37% reduction in mem usage
- Output file size: 1.3 Gb
- benchmark setting
- danbing-tk now takes binary graph/index as input
ktools serialzewas added to convert*kmersto*.graph.umap*.kmerDBi.umapand*.kmerDBi.vv
bam2peis now merged withdanbing-tk- use
-faoption for non-interleaved fasta e.g.samtools fasta -@2 -n $bam - use
-faifor interleaved fasta
- use
Resources
- New RPGG and VNTR coordinates on 35 HGSVC genomes are available at Zenodo
danbing-tk v1.2
Improvements:
- Improved indel handling in graph threading.
- Improved the memory scalability of multiple-boundary-alignment.
Resources:
- New RPGG and VNTR coordinates on 35 HGSVC genomes are available at Zenodo
manuscript-1
Latest version of code and resources that associate with the manuscript "Profiling variable-number tandem repeat variation across populations using repeat-pangenome graphs". Released for creating DOI with Zenodo.
danbing-tk v1.1
Improvements:
- Faster
danbing-tk aliign: 2.6x speedup on HG00096 when genotying 32,138 loci - More flexible use of
danbing-tk build: generating RPGG without SRS data by skipping graph pruning - More informative aln-r2: fixed zero r2 when no variation in assembly kmer count by adding a dummy point at (0,0)
danbing-tk v1.0
Improvements:
- Improved length estimation accuracy using multi-boundary expansion, due to more accurate orthology mapping of VNTRs across haplotypes.
- More stringent QC on VNTR size, number of supporting haplotypes, consistency of liftover coordinates, etc.
- Slightly expand VNTR set from 29,111 to 32,138 loci.
- Added more user-friendly length estimation script.
- Added option for alignment output by using
-awithdanbing-tk align - DOI created using Zenodo
Additional resources:
- Repeat-pangenome graph encoded as
pan.tr.kmers,pan.ntr.kmersandpan.graph.kmersinRPGG.tar.gz - 84,411 raw VNTR coordinates
tr.84411.bed - 32,138 raw VNTR coordinates (high-confidence genotypable set)
tr.good.bed - 397 non-VNTR regions
ctrl.bed - Locus-specific biases of VNTR and non-VNTR regions
LSB.tsv - Summary of eGene discoveries
Alltissue.egenes.tsv - Comprehensive VNTR statistics
vntr.statistics.tsvvntr.statistics.README - 13 PacBio CLR assemblies (26 haplotypes)
*.h?.fasta.gz - 32,138 boundary-expanded VNTR coordinates in the 26 haplotypes
pan.tr.mbe.no_CCS.bedandpan.tr.mbe.no_CCS.README - 73,582 boundary-expanded VNTR coordinates
pan.tr.73582.mbe.no_CCS.bed
Example analyses:
- QC of multi-boundary expansion
202011.MultiBoundaryExpansion.QC.ipynb - Measuring length prediction accuracy
202012.Acc.pan.ipynb - Contrasting the most informative kmer between populations
202012.mikmer.ipynb - eQTL mapping
202012.eQTL.32138.ipynb - Sample QC on locus-specific bias
LSB_analysis.ipynb - Heritability analysis of SNP v.s. SNP+VNTR models
202011.sg.joint.ipynb - Miscellaneous analyses in the original manuscript
202012.revision.supp.ipynb
v0.0
Version 0 of genotypable VNTRs, RPGG and precomputed LSB are out! These files should be the same as the ones used for the analysis in the original paper.