Skip to content

bge-barcoding/bold-data-fun

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

62 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BOLD DATA FUN

A place to organise and share scripts used to manipulate BOLD data into other formats for downstream processing.

Scripts

Advanced choropleth map generator that creates professional geographic visualizations from CSV data. Features smart label placement, multiple color schemes, custom boundaries, and publication-quality output in PNG/SVG formats.

Extracts taxonomy information from BOLD project TSV files based on plate IDs. Automatically handles multiple BOLD data formats and provides robust error handling. Helpful when partners don't use Plate_Well as their sample ID.

Two-stage TSV merging tools for BOLD data exports. First script merges multiple TSV files from individual datasets, second script combines multiple merged datasets into a unified file. Handles UUID field mapping and complex BOLD data structures.

Creates horizontal stacked bar charts showing pass/fail percentages for various criteria. Automatically sorts by pass rate, includes smart label placement (hiding labels for small values and specific criteria), and generates publication-quality PNG output with customizable colors.

Creates hierarchical sunburst charts from CSV data with support for 3-5 levels of hierarchy. Features intelligent color inheritance, smart labeling, slice aggregation for small categories, and multiple output formats (PNG, SVG, PDF).

Identifies BOLD records from specified projects that meet filtering criteria related to species gaps, BAGS grades, and UK representation in BINs. Processes species lists with synonyms and generates detailed results with family-level summaries.

Performs comprehensive gap analysis for DNA barcode library curation by comparing target species lists against BOLD (Barcode of Life Data) database records and BAGS (Barcode, Audit & Grade System) assessments output from BOLDetective. Features traffic light status system (Green/Amber/Red/Blue/Black), synonym handling with BIN concordance checking, intelligent categorisation of species records, BAGS grade E analysis for BIN-sharing species, and taxonomy inference for missing species from congeneric records.

Filters BOLD bioscan analysis data to select representative specimens for expert analysis and genomic processing. Organizes specimens by taxonomic family and expert assignment, filtering for plates currently at NHM and gap analysis targets. Generates protected XLSX workbooks with editable fields for lab processing workflow (FluidX tube tracking, morphological verification notes) while maintaining data integrity for specimen identifiers and taxonomic information.

Automates taxonomic name decisions when reconciling specimen names against GBIF's backbone taxonomy. Uses a configurable rules matrix to determine whether to use original names (e.g., for type specimens) or GBIF-accepted names based on match status, type, and name differences. Generates separate outputs for taxonomy requests and ENA/NCBI verification.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •