Skip to content

Comments

Add archive data auto-loading on database initialization#16

Merged
cchwala merged 5 commits intomainfrom
data_archive_parser
Feb 12, 2026
Merged

Add archive data auto-loading on database initialization#16
cchwala merged 5 commits intomainfrom
data_archive_parser

Conversation

@cchwala
Copy link
Member

@cchwala cchwala commented Feb 11, 2026

Closes #8

Summary of changes in this PR:

  • Implement parse_netcdf_archive.py to load historical CML data from NetCDF files into PostgreSQL using efficient COPY FROM operations with configurable time range limiting
  • Add generate_archive.py script to create compressed archive data files (metadata + time-series) for demo setup and database initialization
  • Enhance Grafana dashboard with Interval (Auto/1min/5min/15min/1h/6h/1d) and Aggregation (Mean/Raw/Min/Max/Median/StdDev) dropdown controls (required because we now show more data with the longer archive)
  • Refactor dashboard queries to UNION ALL pattern separating raw and aggregated data paths, with safe interval casting for auto-scaling support (related to point above)
  • Add unit tests for archive scripts covering database truncation, file creation, and error handling (0.5s runtime)

- Create archive generation script using real NetCDF data with synthetic timestamps (7 days, 1.5M rows)
- Add init script to auto-load gzip-compressed archive data on first database startup (~3 seconds)
- Include archive CSV files in repo (7.6 MB total, small enough for version control)
- Update database Dockerfile for proper init script execution order (01-init-schema.sql, 99-load-archive.sh)
- Configure docker-compose to mount archive data directory and init script
@codecov
Copy link

codecov bot commented Feb 11, 2026

Codecov Report

❌ Patch coverage is 89.70100% with 31 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.72%. Comparing base (841b78a) to head (784536e).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
parser/parse_netcdf_archive.py 84.10% 31 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #16      +/-   ##
==========================================
+ Coverage   64.64%   68.72%   +4.08%     
==========================================
  Files          19       22       +3     
  Lines        1547     1848     +301     
==========================================
+ Hits         1000     1270     +270     
- Misses        547      578      +31     
Flag Coverage Δ
mno_simulator 87.87% <100.00%> (+2.05%) ⬆️
parser 80.56% <86.75%> (+2.20%) ⬆️
webserver 29.63% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

- Add parse_netcdf_archive.py for direct NetCDF-to-DB loading with PostgreSQL COPY
- Support configurable time window via ARCHIVE_MAX_DAYS env var (default: 7 days)
- Auto-download 3-month NetCDF dataset (~209 MB) on first run
- Achieve ~155K rows/sec throughput with batched processing and timestamp shifting
- Update README with dual archive loading methods (CSV default vs NetCDF high-resolution)
- Add Interval (Auto/1min/5min/15min/1h/6h/1d) and Aggregation (Mean/Raw/Min/Max/Median/StdDev) dropdown variables
- Refactor RSL and TSL queries to UNION ALL pattern separating raw and aggregated paths
- Support auto interval via $__interval_ms with safe ::interval casting outside CASE expression
@cchwala cchwala merged commit 9f3bd73 into main Feb 12, 2026
7 checks passed
@cchwala cchwala deleted the data_archive_parser branch February 12, 2026 12:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Parse large existing open CML data to database as fast as possible

1 participant