Skip to content

[FEATURE] Store annotation info in database instead of .mat files #6

@anderstorstensson

Description

@anderstorstensson

Feature Description

Currently annotation info is stored and loaded from .mat files, which is unnecessary for the majority of users and requires Python (via scipy). This creates an unnecessary dependency and complicates the workflow for R users who want to work with IFCB annotation data but do not plan to use the data in MATLAB.

Use Case

IFCB researchers may want to access and store annotation information directly from a database or native R format so that they can work with data without installing Python dependencies or dealing with MATLAB file format conversions.

Current workflow pain points:

  • Users must install Python and scipy to write annotation files
  • Slower file I/O compared to native database queries
  • Version compatibility issues between MATLAB file format versions

Proposed Solution

Replace .mat file storage with a SQLite database backend for annotations:

Database schema with tables for:

  • Annotations (roi_id, class_label, annotator, sample, timestamp, etc.)
  • Classification metadata (classifier versions, validation status)
  • Annotation provenance (manual vs automated, confidence scores)

R-native access using RSQLite or similar packages

  • Simple queries: get_annotations(sample_id, class = "Dinophysis")
  • Batch operations for multi-sample analyses
  • Easy filtering and aggregation

Migration path:

  • One-time conversion utility to import existing .mat files
  • Maintain .mat export option for MATLAB users

Alternatives Considered

RDS files: Native R format, but:

  • Less efficient for querying subsets
  • No standardized schema across installations
  • Harder to access from other languages if needed

CSV files: Simple but:

  • Poor performance with large annotation sets
  • No relational structure for metadata
  • Manual handling of data types

Keep .mat files but improve R support:

  • R.matlab package exists but is less maintained
  • Still maintains the unnecessary MATLAB dependency

Additional Context

  • Screenshots or mockups (if applicable)
  • Links to related tools or implementations
  • Any other relevant information

Impact

  • Reduced barriers to entry: New users don't need to set up Python environments
  • Better performance: Database queries are faster than loading entire .mat files
  • Easier multi-sample analysis: SQL queries naturally handle cross-sample operations
  • Simplified deployment: Fewer dependencies means easier ClassiPyR installation
  • Better data provenance: Database structure naturally supports tracking annotation history
  • Cross-platform compatibility: SQLite works identically on Windows, Mac, and Linux
  • Future-proofing: Standard database format is more maintainable long-term than MATLAB-specific files

Metadata

Metadata

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions