Automated daily pipeline pulling Sentinel-2 data from Microsoft Planetary Computer (free, no account needed) to keep agricultural commodity spectral metrics live, with monthly statistical analysis, automatic yield forecast updates, and a machine learning layer that generates LONG/SHORT/NEUTRAL trading signals from satellite data.
quantag/
├── .github/
│ └── workflows/
│ ├── quantagri_daily.yml ← Runs every day at 06:00 UTC
│ ├── quantagri_historical.yml ← One-time 10-year historical run (manual)
│ └── quantagri_analysis.yml ← Runs 1st of each month at 08:00 UTC
│
├── quantagri/
│ ├── quantagri_spectral_velocity_pc.py ← Sentinel-2 STAC query, cloud mask, NDVI/LSWI composites
│ ├── quantagri_commodity_config.py ← 12 growing season configs
│ ├── quantagri_metrics_engine_pc.py ← NDVI/LSWI/velocity/SAR metrics engine
│ ├── quantagri_batch_runner_pc.py ← Historical batch runner
│ ├── quantagri_live_monitor.py ← Daily live monitor
│ ├── quantagri_historical_monthly.py ← 10-year historical monthly aggregator
│ ├── quantagri_backtest_aggregator.py ← Excel workbook builder
│ ├── quantagri_yields_updater.py ← Auto-updates official_yields.csv from USDA/CONAB APIs
│ ├── quantagri_monthly_analysis.py ← Monthly aggregation + statistical analysis
│ │
│ └── ml/ ← ML signal engine
│ ├── __init__.py
│ ├── features.py ← Season-level feature engineering
│ ├── models.py ← LightGBM + Ridge yield prediction ensemble
│ ├── anomaly.py ← Isolation Forest anomaly detector
│ ├── phenology.py ← Crop stage / changepoint extraction
│ ├── signals.py ← Yield surprise classifier (LONG/SHORT/NEUTRAL)
│ ├── score.py ← Signal scorer — combines ML + z-scores into conviction score
│ └── train.py ← CLI trainer — run via workflow or manually
│
├── quantagri_live/ ← Created by daily monitor
│ ├── quantagri_live_results.csv ← Rolling live results (one row per region per day)
│ ├── quantagri_live_alerts.csv ← Alert rows only
│ ├── quantagri_signal_scorecard.csv ← Conviction scores per commodity/region
│ └── quantagri_monitor.log ← Run log
│
├── quantagri_historical/ ← Created by historical workflow
│ ├── quantagri_monthly_soy_mato_grosso_br.csv
│ ├── quantagri_monthly_wheat_kansas_us.csv
│ ├── quantagri_monthly_wheat_rostov_ru.csv
│ ├── quantagri_monthly_corn_iowa_us.csv
│ ├── quantagri_monthly_corn_illinois_us.csv
│ ├── quantagri_monthly_cotton_*.csv
│ └── quantagri_monthly_ALL.csv ← All commodities combined
│
├── quantagri_analysis/ ← Created by monthly analysis workflow
│ ├── quantagri_monthly_summary.csv ← Daily results aggregated to monthly
│ ├── quantagri_anomaly_report.csv ← Rows flagged as anomalous (|z| >= 1.5σ)
│ ├── quantagri_correlations.csv ← R² by month for each commodity/region
│ └── quantagri_stats_report.txt ← Human-readable summary report
│
├── ml_models/ ← Persisted trained models (auto-updated daily)
│ └── .gitkeep
│
├── official_yields.csv ← USDA/CONAB yield forecasts
└── requirements.txt ← Python dependencies
| Commodity | Region | Season Window |
|---|---|---|
| Soy | Mato Grosso, BR | Oct → Mar |
| Wheat | Kansas, US | Mar → Jun |
| Wheat | Rostov, RU | Mar → Jul |
| Corn | Iowa, US | May → Oct |
| Corn | Illinois, US | May → Oct |
| Cotton | South Texas, US | Apr → Oct |
| Cotton | North Texas, US | Apr → Oct |
| Cotton | Xinjiang, CN | Apr → Oct |
The daily monitor automatically detects which seasons are active and only runs those.
Your repo layout must be:
quantagri/ ← all .py files go here (including ml/ subfolder)
official_yields.csv ← repo root
requirements.txt ← repo root
.github/workflows/quantagri_daily.yml
.github/workflows/quantagri_historical.yml
.github/workflows/quantagri_analysis.yml
Upload .py files via GitHub UI:
- Go to
https://github.com/rmkenv/quantag/tree/main/quantagri - Click Add file → Upload files
- Upload all
.pyfiles. For theml/subfolder, create it first via Add file → Create new file, typeml/__init__.py, then upload the rest.
- Go to Settings → Actions → General
- Scroll to Workflow permissions
- Select "Read and write permissions"
- Click Save
The ML models and z-score analysis need historical data to train on. Run the historical workflow before expecting ML signals to populate.
- Click Actions → QuantAgri Historical Monthly → Run workflow
- Pick
wheatfirst (fastest), then repeat forcorn,cotton,soy - Runtime: ~30–90 min per commodity
- Recommended order: wheat → corn → cotton → soy
- Click Actions → QuantAgri Daily Monitor
- Click "Run workflow" → mode:
daily→ Run workflow - You will see 5 parallel jobs: soy, wheat, corn, cotton, commit-results
- Runtime: ~10–30 min depending on which seasons are active
06:00 UTC
↓
4 commodity jobs spin up in parallel (soy, wheat, corn, cotton)
↓
Each job:
Installs Python dependencies
Restores quantagri_live/ from cache
Checks which seasons are active today
Fetches new Sentinel-2 scenes from Planetary Computer
Builds season-to-date NDVI/LSWI/velocity composites
Computes yield surprise vs USDA/CONAB forecast
Writes row to quantagri_live_results.csv
↓
commit-results job:
Downloads all 4 commodity artifacts
Merges and deduplicates CSVs (by commodity + region + date, keeps latest)
Installs ML dependencies (lightgbm, scikit-learn, shap, ruptures)
Retrains ML models on updated data → saves to ml_models/
Generates signal scorecard → saves to quantagri_live/quantagri_signal_scorecard.csv
Commits quantagri_live/ and ml_models/ back to repo
Uploads merged artifact (30-day retention)
↓
If strong signal (conviction ±3 or ±4) → opens GitHub Issue (triggers email)
If alert thresholds breached → opens GitHub Issue (triggers email)
Builds a full monthly NDVI/LSWI history for 2016–2025. Run once per commodity — resumable if it times out.
- Actions → QuantAgri Historical Monthly → Run workflow
- Pick a commodity (start with
wheat— fastest) - Leave start/end year as 2016/2025
- Runtime: ~30–90 min per commodity
Output: monthly ndvi_mean, ndvi_max, lswi_mean, velocity_mean
per region per year — 10 years × growing season months.
This data is required before the ML models and z-score columns will populate.
Runs automatically on the 1st of every month. Can also be triggered manually.
Step 1 — Yield updater
Hits USDA PSD API and CONAB API, updates official_yields.csv automatically.
Regions without a free API (Xinjiang cotton, Indian sugar, Rostov wheat) are flagged for manual update.
Step 2 — Monthly aggregation Groups daily live results by commodity + region + year + month:
- End-of-month NDVI/LSWI/velocity
- Intra-month mean, std, min, max
- Season context (tercile means, peak NDVI)
Step 3 — Statistical analysis
| Output | What it shows |
|---|---|
| Z-score | Is this month's NDVI high/low vs same month historically |
| Percentile rank | Where does this month sit in the 10-year distribution |
| R² by month | Which month of the season best predicts final yield |
| Mann-Kendall test | Is NDVI velocity trending up or down this season |
| Anomaly flags | Anything beyond ±1.5σ flagged with severity (MODERATE/SIGNIFICANT/EXTREME) |
Sample stats report output:
[3] NDVI → YIELD R² BY MONTH
wheat / kansas_us
Month R² r p n OLS slope
----------------------------------------------------
Mar 0.412 0.642 0.045 9 28.4141*
Apr 0.631 0.794 0.011 9 41.2203* ← BEST
May 0.589 0.767 0.016 9 38.8901*
Jun 0.521 0.722 0.028 9 34.1122*
The ← BEST month is your highest-signal timing window.
The OLS slope = bushels/acre per 0.1 NDVI unit change.
The quantagri/ml/ layer runs automatically after each daily data merge and produces three types of output, combined into a single conviction score in quantagri_signal_scorecard.csv.
Each commodity/region gets a score from -4 to +4 built from four components:
| Component | Max contribution | Source |
|---|---|---|
| ML signal direction (LONG/SHORT) | ±2 | YieldSurpriseClassifier |
| ML confidence (high/medium) | ±1 | Classifier probability |
| Anomaly flag | +1 | IsolationForest score < -0.05 |
| Z-score alignment | ±1 | Historical monthly analysis |
| Score | Conviction | Action |
|---|---|---|
| ±4 | Maximum | Act with full size |
| ±3 | Strong | Act with normal size |
| ±2 | Moderate | Small position or wait |
| ±1 | Weak lean | Watch only |
| 0 | Neutral | Stay flat |
A GitHub Issue is opened (email sent) whenever any commodity hits ±3 or ±4.
1. YieldModel → predicted yield
A number in the crop's native unit (bu/ac for US corn/soy/wheat, bag/ha for Brazil soy, t/ha for Rostov wheat). Compare against the current USDA WASDE or CONAB forecast — the gap is your surprise estimate.
corn/iowa_us → predicted: 174.1 bu/ac | USDA: 178.0 → -3.9 bearish
soy/illinois_us → predicted: 53.4 bu/ac | USDA: 52.0 → +1.4 bullish
Uses a LightGBM + Ridge ensemble with time-series cross-validation. LightGBM gets 60% weight because it captures non-linear stress interactions (e.g. high NDVI but low LSWI = green but water-stressed) that Ridge misses.
2. AnomalyDetector → anomaly score
A score between roughly -0.3 and +0.2. Below -0.05 means the current NDVI/LSWI/velocity pattern is outside the historical envelope — something unusual is happening. Fires 4–6 weeks before yield impacts appear in official crop condition reports. Does not tell you direction — pair with the yield model for that.
wheat/kansas_us → anomaly_score: -0.18 ⚠️ FLAG
soy/illinois_us → anomaly_score: +0.04 ✓ normal
3. YieldSurpriseClassifier → LONG / SHORT / NEUTRAL
The most directly actionable output. Predicts whether the crop will beat or miss the official consensus forecast, with a probability and confidence tier.
corn/illinois_us → LONG | prob: 0.71 | confidence: high
soy/mato_grosso_br → SHORT | prob: 0.28 | confidence: medium
wheat/kansas_us → NEUTRAL | prob: 0.51 | confidence: low
Confidence tiers:
- high — probability ≥ 0.75 or ≤ 0.25 → consider full position
- medium — 0.65–0.75 or 0.25–0.35 → consider half position
- low / NEUTRAL — 0.35–0.65 → no edge, stay flat
The highest-conviction setups are when multiple models agree:
| Scenario | Interpretation | Action |
|---|---|---|
| Anomaly flagged + model predicts miss + SHORT | Strong bearish consensus | High conviction short |
| No anomaly + model predicts beat + LONG | Strong bullish consensus | High conviction long |
| Anomaly flagged + NEUTRAL signal | Unusual pattern, direction unclear | Watch, wait |
| Yield miss predicted, no anomaly | Gradual underperformance | Mild bearish lean |
| Models disagree | Conflicting signals | Stay flat |
The edge is sharpest mid-season when you have enough satellite composites to be confident but USDA hasn't yet revised its numbers:
| Crop | Best signal window |
|---|---|
| US corn / soy | June – August |
| US wheat (Kansas) | April – May |
| Rostov wheat | April – May |
| Brazil soy | November – January |
python3 quantagri/ml/train.py \
--sat_csv quantagri_live/quantagri_live_results.csv \
--historical_csv quantagri_historical/quantagri_monthly_ALL.csv \
--yield_csv official_yields.csv \
--model_dir ml_modelsfrom quantagri.ml.signals import YieldSurpriseClassifier
import pickle, pandas as pd
clf = pickle.load(open("ml_models/corn_iowa_us_classifier.pkl", "rb"))
feat_row = pd.Series({...}) # one row of season features
print(clf.explain(feat_row))
# → {'ndvi_max_peak': 0.42, 'lswi_mean_avg': 0.18, 'vel_mean_avg': -0.09, ...}One row per active region per day in quantagri_live_results.csv:
| Column | Example | Notes |
|---|---|---|
as_of_date |
2026-03-06 | Date of latest satellite data |
commodity |
corn | Crop |
region_id |
iowa_us | Region |
current_ndvi |
0.407 | Latest composite value |
current_ndvi_velocity |
-0.0086 | dNDVI/day — rate of change |
peak_ndvi |
0.497 | Season high |
peak_ndvi_date |
2026-01-05 | Date peak was reached |
tercile_mean_early/mid/late |
0.372 / 0.480 / 0.454 | Season thirds |
yield_surprise |
+1.2 | bpa vs official forecast |
surprise_pct |
+2.8% | Relative surprise |
calibration_r2 |
0.78 | Historical NDVI→yield fit |
Note: tercile_mean_*, velocity_std, yield_surprise, and calibration_r2 are blank early in the season — they need multiple composites to compute and fill in naturally over 3–4 weeks.
One row per commodity/region in quantagri_live/quantagri_signal_scorecard.csv:
| Column | Example | Notes |
|---|---|---|
commodity |
corn | Crop |
region_id |
iowa_us | Region |
conviction_score |
+3 | -4 to +4 — the bottom line |
conviction |
STRONG LONG 🟢 | Human-readable label |
ml_signal |
LONG | LONG / SHORT / NEUTRAL |
ml_beat_prob |
0.76 | Probability of beating consensus |
ml_confidence |
high | high / medium / low |
anomaly_score |
-0.14 | Below -0.05 = flagged |
yield_surprise |
+2.1 | bpa vs USDA |
Option A — GitHub UI
quantagri_live/quantagri_live_results.csv → GitHub renders as a table.
Option B — Raw CSV (paste into Excel or Google Sheets)
https://raw.githubusercontent.com/rmkenv/quantag/main/quantagri_live/quantagri_live_results.csv
https://raw.githubusercontent.com/rmkenv/quantag/main/quantagri_live/quantagri_signal_scorecard.csv
Option C — Download artifact
Actions → latest run → Artifacts → quantagri-live-merged-N
Option D — pandas
import pandas as pd
df = pd.read_csv(
"https://raw.githubusercontent.com/rmkenv/quantag/main/quantagri_live/quantagri_live_results.csv"
)
print(df.tail(10).to_string())Thresholds in quantagri/quantagri_live_monitor.py:
SURPRISE_ALERT_BPS = 1.5 # bpa absolute value
VELOCITY_ALERT = 0.015 # dNDVI/dayWhen breached, GitHub opens an Issue and you receive an email notification.
Strong ML signals (conviction ±3 or ±4) open a separate Issue with the full scorecard table.
The monthly analysis workflow updates yields automatically from USDA/CONAB APIs. For manual updates after a major WASDE revision:
- Edit
official_yields.csvin the repo root - Commit:
"Update yields — WASDE March 2026"
Key WASDE dates to watch:
| Month | What changes |
|---|---|
| March | Winter wheat baseline (Kansas) |
| February / March | Brazilian soy CONAB monthly |
| May | First corn/cotton new-crop forecast |
| August | Most important — first field-survey corn/soy estimate |
| November | Near-final corn/soy/cotton |
Regions without API coverage (Xinjiang cotton, Indian sugar, Rostov wheat) must be updated manually from IGC or local ministry reports.
Set automatically per commodity — no manual configuration needed.
| Commodity | Resolution | Reason |
|---|---|---|
| Soy (Mato Grosso) | 500m | ~1.1M km² ROI, OOMs at finer resolution |
| Sugar | 300m | Two large tropical regions |
| Corn, Wheat, Cotton | 200m | Medium ROIs |
| Item | Free limit | QuantAgri usage |
|---|---|---|
| Minutes/month | 2,000 | ~30 min/day × 30 + ~30 min/month analysis = ~930 min ✅ |
| Storage | 500MB | CSVs + model files ~5MB/month ✅ |
| Concurrent jobs | 20 | 5 parallel (daily) ✅ |
| Problem | Fix |
|---|---|
Permission denied on git push |
Settings → Actions → General → Workflow permissions → Read and write |
No active seasons in log |
Normal outside growing season. Corn/cotton start April/May. |
No S2 scenes found |
Cloud cover >80% that day — monitor picks up next clear day automatically |
| Job killed after ~3 min (exit 143) | Out of memory — resolution overrides in COMMODITY_RESOLUTION dict handle this automatically |
RasterioIOError: not a supported format |
SAS token expiry — fixed by pc.sign() re-sign in quantagri_spectral_velocity_pc.py |
| Duplicate rows in CSV | Fixed — merge step deduplicates by commodity + region + date |
Failed to queue workflow run |
YAML syntax error — validate with python3 -c "import yaml; yaml.safe_load(open('file.yml'))" |
| Historical job times out | Re-run — resumes from where it left off, skips completed season-years |
| Z-scores all blank in analysis | Run historical workflow first to build the baseline |
yield_surprise blank |
Add current season year rows to official_yields.csv with forecast values |
| ML models all show NEUTRAL | Needs 3+ seasons per commodity — run historical workflow first |
lightgbm not found in ML step |
Added to workflow install step — re-run the daily workflow to pick up |
ImportError: attempted relative import with no known parent package |
PYTHONPATH must be set to ${{ github.workspace }}/quantagri/ml (not just quantagri) in the Train and Scorecard workflow steps |
| Workflow not running at scheduled time | GitHub delays scheduled runs up to 30 min under load |
fatal: pathspec 'ml_models/' did not match any files |
Create ml_models/.gitkeep in repo root — the commit step now handles this automatically |