Arianna I. Krinos, Margaret Mars Brisbin, Sarah K. Hu, Natalie R. Cohen, Tatiana A. Rynearson, Michael J. Follows, Frederik Schulz, Harriet Alexander
Taxonomic annotation is a critical problem in environmental microbial meta-omics. In protists (single-celled microbial eukaryotes) in particular, complex genomes and incomplete databases pose important threats to accurate interpretation. We conducted a careful analysis of protistan meta-omic datasets in order to quantify the extent of this problem. We also propose a 2-stage approach that helps with more accurate estimation of uncertainty in microbial meta-omics.
This work would not have been possible without many very useful software tools, including but not limited to
And a couple of our own tools
These workflows are deployed on the cluster for heavier-lift parts of this analysis. The outputs of these workflows are often used in the analysis notebooks.
01-scale-genus_eukulele- runEUKuleleagainst the Phaeocystis databases (stored on Zenodo) for Scale 1 of the paper as written onbioRxiv01-scale-genus_functional- runeggnog-mapperto functionally annotate Phaeocystis sequences from the Tara Oceans metagenomes01-scale-genus_tree- run alignment and phylogenetic tree tools for the Phaeocystis references02-scale-family_eukulele- runEUKuleleagainst the sequences from Narragansett Bay, as appears in Figure 3 of the paper03-scale-phylum_deepclust- runDIAMOND DeepClustagainst the sequences from the BATS dataset, including/excluding the sequences from phylum Retaria as described in the paper03-scale-phylum_eukulele- runEUKuleleagainst the sequences from the BATS dataset, including/excluding the sequences from phylum Retaria as described in the paperXX-scale-all_deepclust- run all scales of analysis throughDIAMOND DeepClustto provide input to thetax-aliquotssteps
Each notebook is connected to one of the main text and/or supplemental figures in the final paper. Data needed to run these notebooks can be generated by downloading source datasets and running the Snakemake workflows from the section above.
Notebooks are named according to the convention:
XXFIG_<descriptor>.ipynb
where "XX" will either tell you which figure this notebook was connected to, if a main text figure, or "XX" if strictly supplemental. "FIG" tells you that this is a figure notebook, and the descriptor provides more details about the notebook's objective(s).