-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Feature Description
Currently annotation info is stored and loaded from .mat files, which is unnecessary for the majority of users and requires Python (via scipy). This creates an unnecessary dependency and complicates the workflow for R users who want to work with IFCB annotation data but do not plan to use the data in MATLAB.
Use Case
IFCB researchers may want to access and store annotation information directly from a database or native R format so that they can work with data without installing Python dependencies or dealing with MATLAB file format conversions.
Current workflow pain points:
- Users must install Python and scipy to write annotation files
- Slower file I/O compared to native database queries
- Version compatibility issues between MATLAB file format versions
Proposed Solution
Replace .mat file storage with a SQLite database backend for annotations:
Database schema with tables for:
- Annotations (roi_id, class_label, annotator, sample, timestamp, etc.)
- Classification metadata (classifier versions, validation status)
- Annotation provenance (manual vs automated, confidence scores)
R-native access using RSQLite or similar packages
- Simple queries: get_annotations(sample_id, class = "Dinophysis")
- Batch operations for multi-sample analyses
- Easy filtering and aggregation
Migration path:
- One-time conversion utility to import existing .mat files
- Maintain .mat export option for MATLAB users
Alternatives Considered
RDS files: Native R format, but:
- Less efficient for querying subsets
- No standardized schema across installations
- Harder to access from other languages if needed
CSV files: Simple but:
- Poor performance with large annotation sets
- No relational structure for metadata
- Manual handling of data types
Keep .mat files but improve R support:
- R.matlab package exists but is less maintained
- Still maintains the unnecessary MATLAB dependency
Additional Context
- Screenshots or mockups (if applicable)
- Links to related tools or implementations
- Any other relevant information
Impact
- Reduced barriers to entry: New users don't need to set up Python environments
- Better performance: Database queries are faster than loading entire .mat files
- Easier multi-sample analysis: SQL queries naturally handle cross-sample operations
- Simplified deployment: Fewer dependencies means easier ClassiPyR installation
- Better data provenance: Database structure naturally supports tracking annotation history
- Cross-platform compatibility: SQLite works identically on Windows, Mac, and Linux
- Future-proofing: Standard database format is more maintainable long-term than MATLAB-specific files