Skip to content

Beauhurst/rvc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RVC

RVC is an experimental, vibe-coded reimagining of DVC. DVC is excellent - this project exists to explore what a data pipeline tool might look like if rebuilt from scratch with different trade-offs.

This is not production software. It is a playground for trying things out.

Overview

A pipeline is defined in a single rvc.yaml file. RVC resolves dependencies between steps, determines which steps are stale, caches outputs by content hash, and executes what needs running - in parallel where the dependency graph allows. Scripts receive all their paths and parameters as environment variables, requiring no awareness of RVC itself.

$ rvc status
⏺ download  up to date
○ analyze   deps changed (analyze.py)
◒ split     up to date
○ train     upstream changed (analyze)
◒ evaluate  up to date

$ rvc run
Running analyze

$ rvc metrics analyze:STATS
analyze:STATS
  average_age: 29.700000
  rows: 891
  survival_rate: 0.383800
  survived: 342

Design choices

DVC had a great insight: managing ML workflows alongside git. Because git already tracks code, you can track data versions too - the lockfile acts as a signature for the state of the pipeline at any commit. RVC leans into that.

Experiments are branches. DVC's experiment management system (dvc exp) is too complex for what I need. RVC assumes each experiment is a branch. You change code, data, or parameters in a branch, run the pipeline, and the lockfile + metrics get committed alongside everything else. Instead of focusing on tracking inputs, RVC focuses on being able to track and compare metrics across branches. It understands three metric shapes natively - flat key-values, timeseries, and histograms - and can diff and plot them in detail across any set of git refs.

Simpler step communication. DVC's insight of tracking DAG dependencies via inputs and outputs is the right idea, but the stage description is too verbose and communication with steps is broken - you end up writing argument-parsing boilerplate in every script. RVC makes writing steps more comfortable: variables can be interpolated anywhere in the config, and inputs, outputs, params, metrics, and dependencies are all named and passed to steps as environment variables. Your Python script just reads os.environ["MODEL"]. You can use pydantic-settings or any other env-based config pattern with zero friction. All of this is language-agnostic.

CI and automation as first-class consumers. DVC's idea of running pipelines in CI-like environments is great - it keeps a tight relationship between code versions and artifact versions, enables automatic retraining when data or libraries change, and keeps something that can get very complex relatively simple to manage. RVC takes this further: every command can produce machine-readable output (--json, --yaml). You can filter by step, by artifact kind, or by specific artifact name, and produce structured documents of pipeline state or performance that can be passed to other systems - experiment dashboards, deployment pipelines, monitoring, whatever.

Everything is in two files. rvc.yaml for configuration, rvc.lock for resolved state. No .rvc/config, no Python params files, no .dvc files scattered around the repo.

Small CLI, edit config yourself. dvc exp never worked for me. Editing pipelines through the CLI always felt cumbersome. In RVC you just edit rvc.yaml directly. The CLI has a handful of entry points (run, status, dag, metrics, plots, cache), and all structured output follows defined JSON schemas - so there is no need for a Python API or SDK.

Table of Contents

Install

git clone <repository-url>
cd rvc
cargo build --release
cp target/release/rvc /usr/local/bin/

Requires Rust 1.70+.

Quick Start

Create rvc.yaml:

steps:
  prepare:
    cmd: python prepare.py
    deps:
      - prepare.py
    inputs:
      RAW: data/raw.csv
    outputs:
      CLEAN: data/clean.csv

  train:
    cmd: python train.py
    deps:
      - train.py
    params:
      EPOCHS: '100'
    inputs:
      DATA: data/clean.csv
    outputs:
      MODEL: models/model.pkl
    metrics:
      STATS: metrics/train.json

Run it:

rvc status   # see what needs to run
rvc run      # execute the pipeline
rvc metrics  # view metrics

Scripts access paths and parameters through environment variables:

import os

data_path = os.environ["DATA"]      # "data/clean.csv"
model_path = os.environ["MODEL"]    # "models/model.pkl"
epochs = os.environ["EPOCHS"]       # "100"

Works with pydantic-settings, argparse defaults from env, or a plain os.environ call. Language-agnostic.

Configuration

Everything lives in rvc.yaml. Runtime state lives in rvc.lock. You edit one YAML file directly - no stage-editing CLI commands, no .rvc/config, no external params files.

Here is the Titanic example, which exercises most features - variables, multiple step types, persistent outputs, three metric shapes, and parameter passing:

settings:
  algorithm: blake2b

vars:
  data_dir: data
  out_dir: out
  model_dir: models
  paths:
    data:
      TITANIC: ${data_dir}/titanic.csv
      TRAIN: ${data_dir}/train.csv
      VAL: ${data_dir}/val.csv
    train:
      MODEL: ${model_dir}/survival_net.pt
      TRAINING_LOG: ${out_dir}/training_log.json
      FINAL_METRICS: ${out_dir}/train_metrics.json
      EVAL_METRICS: ${out_dir}/eval_metrics.json

steps:
  download:
    cmd: curl -L -o $TITANIC https://csvbase.com/lyuehh/titanic
    outputs:
      TITANIC: ${paths.data.TITANIC}

  analyze:
    cmd: uv run analyze.py
    deps:
      - analyze.py
    params:
      AGE_BIN: '10'
    inputs:
      TITANIC: ${paths.data.TITANIC}
    outputs:
      REPORT:
        path: ${out_dir}/report.txt
        persist: true
    metrics:
      STATS: ${out_dir}/metrics.json
      AGE_SURVIVAL: ${out_dir}/age_survival.csv

  split:
    cmd: uv run split.py
    deps:
      - split.py
    params:
      VAL_RATIO: 0.2
      SPLIT_SEED: 42
    inputs:
      TITANIC: ${paths.data.TITANIC}
    outputs:
      TRAIN: ${paths.data.TRAIN}
      VAL: ${paths.data.VAL}

  train:
    cmd: uv run train.py
    deps:
      - train.py
    params:
      EPOCHS: 40
      LEARNING_RATE: 0.001
    inputs:
      TRAIN: ${paths.data.TRAIN}
    outputs:
      MODEL: ${paths.train.MODEL}
    metrics:
      TRAINING_LOG: ${paths.train.TRAINING_LOG}
      FINAL_METRICS: ${paths.train.FINAL_METRICS}

  evaluate:
    cmd: uv run evaluate.py
    deps:
      - evaluate.py
    inputs:
      VAL: ${paths.data.VAL}
      MODEL: ${paths.train.MODEL}
    metrics:
      EVAL_METRICS: ${paths.train.EVAL_METRICS}

Step fields

Field Effect
cmd Shell command to execute (required)
wdir Working directory for the command (default: directory containing rvc.yaml)
deps Source files to track for changes (scripts, configs)
inputs Named data files the step reads
outputs Named data files the step produces
params Named string values passed as env vars
metrics Named output files that RVC can parse and compare
frozen: true Skip this step during rvc run
always_changed: true Always re-run regardless of dependency state
sequential: true Run in isolation, not parallel with other steps at the same DAG level

Output forms

Outputs support a short form (just a path) and a long form (with options):

outputs:
  MODEL: models/model.pkl # short: cached, replaced with reflink/symlink

  REPORT:
    path: out/report.txt
    persist: true # long: tracked but NOT cached; stays as a regular file

Persistent outputs are useful for files you want to commit to git (reports, READMEs, etc.). They are still hashed for change detection, but RVC leaves them in place instead of moving them to the cache.

What Gets Tracked

Steps declare five kinds of artifacts, each with different behaviour:

Kind Hashed Cached Env var Triggers re-run on change
deps - -
params - -
inputs
outputs
metrics -
  • deps - source files (scripts, configs). Hashed for change detection. Not cached, not exposed as env vars.
  • params - string values exposed as env vars. A change in value triggers a re-run. The value itself is stored in the lockfile.
  • inputs - data files the step reads. Hashed, cached, and exposed as env vars.
  • outputs - data files the step produces. Hashed, cached (unless persist: true), and exposed as env vars.
  • metrics - output files that RVC can parse and compare. Hashed and exposed as env vars, but not cached (they are typically small text files).

All five kinds trigger a re-run when they change. The distinction is in what else happens: deps are silent trackers, params carry values, inputs and outputs are cached, and metrics are additionally parseable for display and diffing.

Status

rvc status checks every step against the lockfile and reports what is stale and why:

$ rvc status
⏺ download  up to date
◒ analyze   up to date
○ split     params changed (VAL_RATIO)
○ train     upstream changed (split)
◒ evaluate  up to date

Status symbols (plain output):

Symbol Meaning
Step is up to date and pushed
Step is up to date but not pushed
Step is not up to date

The reason text is coloured to match the symbol.

Specific steps can be checked:

rvc status train evaluate

Changes propagate: if analyze is stale, downstream steps like train will show ○ upstream changed (analyze).

Structured output:

$ rvc status --json
{
  "steps": {
    "download": { "status": "up_to_date", "pushed": true },
    "analyze": { "status": "up_to_date", "pushed": false },
    "split": { "status": "not_up_to_date", "reason": "params changed (VAL_RATIO)" },
    "train": { "status": "not_up_to_date", "reason": "upstream changed (split)" },
    "evaluate": { "status": "up_to_date", "pushed": false }
  }
}

Also available as --yaml.

Running the Pipeline

rvc run executes all steps that need updating, in dependency order, with independent steps running in parallel:

$ rvc run
▶ download
  ✓ completed in 0.8s
▶ analyze
  ✓ completed in 1.2s
▶ split
  ✓ completed in 0.4s
▶ train
  ✓ completed in 3.4s
▶ evaluate
  ✓ completed in 0.6s

If everything is already up to date, RVC says so and exits:

$ rvc run
Pipeline is up to date, nothing to run

Options:

rvc run -f                      # force re-run all steps regardless of status
rvc run <target>                # run target and ensure upstream dependencies are up to date
rvc run <target> --single       # run only target, skip dependency freshness checks
rvc run <target> --force        # force target and all downstream dependents
rvc run -n                      # dry run - show what would execute without running
rvc run --no-parallel           # disable parallel execution (run steps one at a time)

Dry run with verbose (-n -v) shows the full execution plan - commands, working directories, and the environment each step receives:

$ rvc run -n -f -v
Dry run: download
  Command: curl -L -o $TITANIC https://csvbase.com/lyuehh/titanic
  Working dir: .
  Environment:
    TITANIC=data/titanic.csv
Dry run: analyze
  Command: uv run analyze.py
  Working dir: .
  Environment:
    TITANIC=data/titanic.csv
    REPORT=out/report.txt
    AGE_BIN=10
    STATS=out/metrics.json
    AGE_SURVIVAL=out/age_survival.csv
...

Failure handling

When a step finishes, RVC checks that all declared outputs and metrics actually exist on disk. If any are missing, the step is marked as failed. This catches a common mistake: declaring an output path that the script doesn't actually write to.

If a step fails, the pipeline stops. Steps that already succeeded are recorded in the lockfile, so a subsequent rvc run will only retry from the failed step onward.

Parallel execution

Steps at the same DAG level (no dependencies between them) run concurrently by default, up to the number of available CPU cores. Steps marked sequential: true run alone in their own batch.

--no-parallel forces all steps to run one at a time, which can be useful for debugging or when steps compete for shared resources.

Variables

Variables reduce repetition in rvc.yaml. They support nesting and can reference each other:

vars:
  data_dir: data
  out_dir: out
  paths:
    data:
      TITANIC: ${data_dir}/titanic.csv
      TRAIN: ${data_dir}/train.csv

steps:
  analyze:
    inputs:
      TITANIC: ${paths.data.TITANIC} # resolves to data/titanic.csv

Variable substitution is plain string replacement applied to: cmd, wdir, deps, inputs, outputs, metrics, and params. Variables can reference other variables; references are resolved iteratively until stable.

Caching

When a step completes successfully, RVC hashes its outputs and stores copies in .rvc/cache/, organised by hash prefix. The original files are then replaced with reflinks or symlinks pointing to the cache.

On subsequent runs, if the inputs and deps haven't changed, the step is skipped - its outputs are already in the cache and linked into the working tree. If the step does need to re-run, existing outputs are backed up to the cache first (so you don't lose them if the new run fails), then removed before execution.

Reflinks vs symlinks: On filesystems that support copy-on-write (APFS on macOS, btrfs and XFS on Linux), RVC uses reflinks. These are instant, use no extra disk space, and behave like regular files. On other filesystems, symlinks are used as a fallback.

Persistent outputs: Outputs declared with persist: true are tracked (hashed for change detection) but not cached or replaced with links. They stay as regular files. Use this for files you want to commit to git.

The hash algorithm is configurable:

settings:
  algorithm: blake2b # default: md5. Also: sha256

To clean cache safely by lock/git refs, use rvc cache clean. Manual wipe is still possible with rm -rf .rvc/cache.

Remote Cache

RVC can sync cached artifacts to/from S3-compatible storage for collaboration and CI workflows.

Cache commands are lock-driven:

  • they operate on hashes referenced by rvc.lock
  • they use persist: true information stored in rvc.lock
  • persisted outputs are excluded from cache sync/restore

Remote configuration is read from rvc.yaml, with AWS configuration used as fallback for region/endpoint resolution:

settings:
  remote: s3://my-bucket/project-prefix
  region: eu-west-1

settings.region and settings.endpoint are optional when AWS configuration already resolves them. For custom endpoints, set settings.region explicitly.

There is no per-command remote override flag.

rvc cache push

Upload cached artifacts to remote storage:

rvc cache push                   # push all cached artifacts in lock
rvc cache push analyze           # push only analyze step artifacts
rvc cache push analyze.output    # push only analyze outputs
rvc cache push analyze.output.REPORT  # push specific output

Push tracks synced objects via ETag sidecars in cache (.rvc/cache/... .etag).

  • regular files: <cache_rel>.etag
  • manifests: <manifest_rel>.etag
  • manifest members: each member file has its own .etag

rvc cache push uploads only what is missing for the selected artifacts (unless forced).

Options:

rvc cache push --force           # re-upload even if ETag exists
rvc cache push --dry-run         # show what would be uploaded
rvc cache push --verbose         # detailed transfer progress
rvc cache push --jobs N          # concurrent transfers (default: CPU cores)

Persist outputs are excluded from push (driven by persist: true in lock).

rvc cache pull

Download artifacts from remote storage and restore them to the working tree:

rvc cache pull                   # pull all missing artifacts in lock
rvc cache pull train             # pull only train step artifacts
rvc cache pull train.input.DATA  # pull specific input

Pull only considers artifacts referenced by the current lock and excluded persisted outputs.

Normal pull downloads artifacts that are missing from local cache. --force re-downloads selected artifacts from remote.

Options:

rvc cache pull --force           # re-download selected artifacts
rvc cache pull --dry-run         # show what would be downloaded/restored
rvc cache pull --verbose         # detailed transfer progress
rvc cache pull --jobs N          # concurrent transfers (default: CPU cores)

Artifacts are downloaded to .rvc/cache, then restored to workspace. For directory outputs, pull handles both the manifest and manifest member files. Persist outputs are never restored from cache.

rvc cache status

Show cache presence and push status for locked artifacts:

rvc cache status                 # all artifacts in lock
rvc cache status train           # train step only
rvc cache status train.output.MODEL  # specific artifact

Plain output uses symbols to indicate state:

Symbol State
cached + pushed (green)
cached but not fully pushed (yellow)
🞊 persisted output, ignored by sync (gray)
not cached, expected (red)

For manifest artifacts in plain output, is shown only when manifest + all member files are pushed (all relevant .etag sidecars exist).

Example:

$ rvc cache status
⏺ data/clean.csv
◒ models/model.pkl
🞊 out/report.txt
○ metrics/train.json

Structured output:

$ rvc cache status --json
[
  {
    "path": "data/clean.csv",
    "cache": ".rvc/cache/9c/9c2e...",
    "remote": "s3://my-bucket/prefix/9c/9c2e...",
    "roles": ["prepare.output.CLEAN", "train.input.DATA"]
  },
  {
    "path": "models/model.pkl",
    "cache": ".rvc/cache/f1/f1b0...",
    "roles": ["train.output.MODEL"]
  },
  {
    "path": "out/report.txt",
    "ignored": true,
    "roles": ["analyze.output.REPORT"]
  }
]

Also available as --yaml.

rvc cache clean

Remove cache entries not referenced by any git branch or tag:

rvc cache clean                  # scan all branches/tags, delete unreferenced
rvc cache clean --ignore-branches  # only scan tags
rvc cache clean --ignore-tags    # only scan branches

Clean always keeps:

  1. Artifacts from current working tree lock
  2. Artifacts from locks in scanned git refs
  3. Manifest member files for kept manifests

ETag sidecars are deleted alongside their cache objects.

Options:

rvc cache clean --dry-run        # show what would be deleted
rvc cache clean --verbose        # list all refs scanned and entries deleted

Default output:

$ rvc cache clean
cache clean: scanned 8 refs, deleted 23 files, 142.3MB, remaining 1.8GB

Verbose output includes full ref list and deleted entries:

$ rvc cache clean --verbose --dry-run
refs:
  - (working tree)
  - refs/heads/main
  - refs/heads/feature-a
  - refs/tags/v1.0
  - refs/tags/v1.1
delete:
  - ab/abcd1234... (12.4KB)
  - cd/cdef5678... (8.1MB)
  - ef/ef901234... (134.2MB)
cache clean: scanned 5 refs, would delete 3 files, 142.3MB, remaining 1.8GB

Selectors

All cache commands support fine-grained selectors:

<step>                           # all inputs/outputs for step
<step>.input                     # all inputs for step
<step>.output                    # all outputs for step
<step>.input.<NAME>              # specific input
<step>.output.<NAME>             # specific output

Multiple selectors can be combined:

rvc cache push analyze.output train.output.MODEL
rvc cache status train.input evaluate.input

If a selector matches nothing, the command errors clearly:

$ rvc cache push nonexistent
Error: No artifacts match selector(s): nonexistent

Working with CI

Common CI patterns:

Before pipeline execution - pull only what's needed:

rvc cache pull
rvc run

After successful run - push new artifacts:

rvc run
rvc cache push

Selective cache - for large pipelines, cache only expensive steps:

rvc cache pull train
rvc run
rvc cache push train evaluate

Cache validation:

rvc cache status --json > cache-status.json

The Lockfile

After each successful run, RVC writes rvc.lock - a YAML file recording the state of each step: command, hashes, params, and output semantics.

version: '1.0'
algorithm: blake2b
steps:
  train:
    cmd: uv run train.py
    deps:
      train.py:
        path: train.py
        hash: 3a7f...
        size: 1024
    inputs:
      TRAIN:
        path: data/train.csv
        hash: 9c2e...
        size: 51200
    outputs:
      MODEL:
        path: models/survival_net.pt
        hash: f1b0...
        size: 204800
    params:
      EPOCHS: '40'
      LEARNING_RATE: '0.001'
    metrics:
      TRAINING_LOG:
        path: out/training_log.json
        hash: 7d3a...
        size: 4096
  analyze:
    cmd: uv run analyze.py
    outputs:
      REPORT:
        path: out/report.txt
        hash: ab12...
        size: 108
        persist: true

rvc status, rvc run, and all rvc cache commands use the lockfile to determine what is stale, what to sync, and where metrics live.

Commit both rvc.lock and your metric files to git. When diffing, RVC reads the lockfile from a git ref to find which metrics exist and where they live, then reads the metric file contents from that same ref. If either is missing from git history, the diff has nothing to compare against.

Gitignore Management

After each run, RVC updates .gitignore in the directory containing rvc.yaml. It maintains a clearly marked section:

# BEGIN RVC managed
/.rvc
/data/clean.csv
/models/model.pkl
# END RVC managed

This section lists the .rvc cache directory and all non-persistent output paths. RVC only touches lines between the markers; everything else in your .gitignore is preserved.

Persistent outputs (persist: true) are not added to .gitignore, since the intent is for those files to be committed.

The DAG

RVC infers the dependency graph from step inputs and outputs. If step A produces data/clean.csv and step B declares it as an input (or dep), B depends on A. No need to declare dependencies between steps explicitly.

rvc dag supports two output formats:

  • mermaid (default)
  • plain (terminal-friendly topological levels + edge list)

Mermaid output (default):

rvc dag
flowchart TD
    accTitle: Pipeline DAG
    download[download]
    analyze[analyze]
    split[split]
    train[train]
    evaluate[evaluate]
    download --> analyze
    download --> split
    split --> train
    split --> evaluate
    train --> evaluate
Loading

Plain output:

rvc dag --plain
DAG (topological levels)

[0] ○ download
[1] ○ analyze, ○ split
[2] ○ train
[3] ○ evaluate

Edges
download -> analyze
download -> split
split    -> train
split    -> evaluate
train    -> evaluate

The Euro FX example shows a fan-out / fan-in pattern - three independent filter steps that converge on a summary:

cd example/eurofx && rvc dag
flowchart TD
    accTitle: Pipeline DAG
    download[download]
    filter_majors[filter_majors]
    monthly_avg[monthly_avg]
    extremes[extremes]
    summary[summary]
    download --> filter_majors
    download --> monthly_avg
    download --> extremes
    extremes --> summary
    filter_majors --> summary
    monthly_avg --> summary
Loading

Options:

rvc dag                          # Mermaid flowchart output (default)
rvc dag --plain                  # plain-text levels + edges for terminal viewing
rvc dag --direction left-right   # Mermaid layout direction (also: bottom-up, right-left)
rvc dag > pipeline.md            # write to file via shell redirection

Frozen steps render as hexagons with ❄, always-changed steps as circles with ↻ in Mermaid output.

In plain output, means up to date (green), means needs running (red), and · means unknown (yellow).


Metrics

RVC parses metric files, understands their structure, and renders them. It recognises three shapes - flat, timeseries, and histogram - and auto-detects both the file format (JSON, NDJSON, YAML, CSV) and the metric shape from the content.

Flat Metrics

Key-value pairs. The most common shape for summary statistics.

A JSON file like this:

{ "rows": 891, "survived": 342, "survival_rate": 0.3838, "average_age": 29.7 }

Renders as:

$ rvc metrics analyze:STATS
analyze:STATS
  average_age: 29.700000
  rows: 891
  survival_rate: 0.383800
  survived: 342

Nested JSON objects are flattened with dot notation: {"train": {"loss": 0.1}} becomes train.loss: 0.100000.

YAML key-value mappings and two-column CSV files (metric,value) are also recognised as flat metrics.

Timeseries

Sequences of timestamped data points. The expected use case is training logs - epoch-by-epoch loss and accuracy values streamed during model training.

An NDJSON file (one JSON object per line) like this:

{"timestamp": "2026-02-07T15:49:37.360054+00:00", "epoch": 1, "train_loss": 0.6742, "train_acc": 0.6087}
{"timestamp": "2026-02-07T15:49:37.370471+00:00", "epoch": 2, "train_loss": 0.6489, "train_acc": 0.6101}
{"timestamp": "2026-02-07T15:49:37.381126+00:00", "epoch": 3, "train_loss": 0.619, "train_acc": 0.6199}
...40 lines total...

Renders as a sampled summary. RVC picks points from the head, middle, and tail of the series, with ... indicating skipped regions. The default sample count is 12:

$ rvc metrics train:TRAINING_LOG
train:TRAINING_LOG (timeseries)
points: 40
start: 2026-02-07T15:49:37Z
end: 2026-02-07T15:49:37Z
duration: +289.084ms
                                Δt        epoch    train_acc   train_loss
  -------------------------------- ------------ ------------ ------------
                              +0µs            1     0.608700     0.674200
                         +10.417ms            2     0.610100     0.648900
                         +21.072ms            3     0.619900     0.619000
                               ...
                         +60.375ms            8     0.792400     0.478400
                               ...
                        +243.122ms           33     0.809300     0.428200
                               ...
                        +282.293ms           39     0.820500     0.425100
                        +289.084ms           40     0.823300     0.421800

Control the number of sample points:

rvc metrics train:TRAINING_LOG --points 5    # fewer samples
rvc metrics train:TRAINING_LOG --all         # every point, no sampling
rvc metrics train:TRAINING_LOG --timestamps  # absolute timestamps instead of Δt

RVC detects timeseries by looking for a timestamp, time, datetime, date, ts, or t field in the first record. JSON arrays, NDJSON, and CSV files with a matching column header are all supported. Headerless CSV where the first column looks like a timestamp is also handled.

Histograms

Binned data. Detected by the presence of bin or bin_start/bin_end keys/columns.

A JSON array like this:

[
  { "bin_start": 0, "bin_end": 10, "count": 62, "survival_rate": 0.6129 },
  { "bin_start": 10, "bin_end": 20, "count": 102, "survival_rate": 0.402 },
  { "bin_start": 20, "bin_end": 30, "count": 220, "survival_rate": 0.35 }
]

Renders as:

$ rvc metrics analyze:AGE_HIST_JSON
analyze:AGE_HIST_JSON (histogram)
   bin      count   survival_rate
   0..10    62      0.612903
   10..20   102     0.401961
   20..30   220     0.350000
   30..40   167     0.437126
   40..50   89      0.382022
   50..60   48      0.416667
   60..70   19      0.315789
   70..80   6       0.000000
   80..90   1       1.000000

If both bin (a label) and bin_start/bin_end (range boundaries) are present, bin is used as the display label. CSV with a bin column using range syntax (0..10) is also recognised.

Selectors

Metrics can be narrowed using the step:metric:field selector pattern:

rvc metrics analyze                            # all metrics from the analyze step
rvc metrics analyze:STATS                      # one specific metric
rvc metrics train:TRAINING_LOG:train_acc       # one field within a timeseries
rvc metrics analyze:AGE_HIST_JSON:survival_rate  # one value column in a histogram

For timeseries, field filtering reduces the table to a single value column:

$ rvc metrics train:TRAINING_LOG:train_acc
train:TRAINING_LOG (timeseries)
points: 40
start: 2026-02-07T15:49:37Z
end: 2026-02-07T15:49:37Z
duration: +289.084ms
                                Δt    train_acc
  -------------------------------- ------------
                              +0µs     0.608700
                         +10.417ms     0.610100
                         +21.072ms     0.619900
                               ...
                        +282.293ms     0.820500
                        +289.084ms     0.823300

For histograms, it reduces to the selected value column while preserving bin labels. For flat metrics, it shows only the matching keys.

Output Formats

All metrics render in four formats:

Plain text (default):

$ rvc metrics analyze:STATS
analyze:STATS
  average_age: 29.700000
  rows: 891
  survival_rate: 0.383800
  survived: 342

JSON (--json):

$ rvc metrics analyze:STATS --json
{
  "steps": {
    "analyze": {
      "STATS": {
        "average_age": 29.7,
        "rows": 891,
        "survival_rate": 0.3838,
        "survived": 342
      }
    }
  }
}

Flat metrics emit plain key-value objects. Timeseries and histogram metrics include "kind" and "values" fields. JSON schemas for both metric output and diff output are in schema/.

YAML (--yaml):

$ rvc metrics analyze:STATS --yaml
steps:
  analyze:
    STATS:
      average_age: 29.7
      rows: 891
      survival_rate: 0.3838
      survived: 342

Markdown (--markdown):

$ rvc metrics analyze:STATS --markdown
### analyze:STATS

| metric        | value     |
|---------------|-----------|
| average_age   | 29.700000 |
| rows          | 891       |
| survival_rate | 0.383800  |
| survived      | 342       |

Only one format flag can be used at a time; combining them (e.g. --json --yaml) is an error.


Diffing

rvc metrics --diff compares metrics across git refs. It reads metric files from each ref (using the lockfile from that ref to locate them), computes a diff, and renders the result.

Auto-detection

With no --ref arguments, RVC picks sensible defaults based on git state:

Git state Comparison
Uncommitted changes exist working tree vs HEAD
On a feature branch, clean current branch vs main/master
On the default branch, clean HEAD vs HEAD~1
rvc metrics --diff                             # auto-detect
rvc metrics --diff --ref main                  # working tree vs main
rvc metrics --diff --ref v1 --ref v2           # two specific refs
rvc metrics --diff --ref v1 --ref v2 --ref v3  # multi-way (v1 is base)

When one --ref is given, it becomes the base and the working tree is the target. When two or more are given, the first is the base and the rest are targets.

2-way diffs

With two refs, RVC produces a side-by-side comparison with delta columns showing the difference from base to target:

$ rvc metrics --diff --ref HEAD~1 --ref HEAD evaluate:EVAL_METRICS
Comparing HEAD~1 against HEAD

evaluate:EVAL_METRICS
    metric   HEAD~1     HEAD        Δ
  -------- -------- -------- --------
        fn       26       25       -1
        fp       11       11       +0
        tn      103      103       +0
        tp       38       39       +1
   val_acc 0.792100 0.797800  +0.0057
  val_loss 0.436900 0.429900  -0.0070
  val_rows      178      178       +0

For timeseries, RVC aligns both series to a shared duration axis using nearest-neighbour resampling. A summary table shows metadata (start, end, duration, point count) followed by the aligned value table. When one series is shorter than the other, marks points with no corresponding data:

$ rvc metrics --diff --ref HEAD~2 --ref HEAD train:TRAINING_LOG:train_acc --points 5
Comparing HEAD~2 against HEAD

train:TRAINING_LOG:train_acc
     field               HEAD~2                 HEAD          Δ
  -------- -------------------- -------------------- ----------
     start 2026-02-06T23:10:26Z 2026-02-07T15:49:37Z   +16.654h
       end 2026-02-06T23:10:26Z 2026-02-07T15:49:37Z   +16.654h
  duration           +304.508ms           +289.084ms  -15.424ms
    points                   40                   40         +0

          Δt HEAD~2.train_acc HEAD.train_acc  Δ(train_acc)
  ---------- ---------------- -------------- -------------
        +0µs         0.610100       0.608700       -0.0014
   +76.127ms         0.789600       0.793800       +0.0042
  +152.254ms         0.791000       0.805000       +0.0140
  +228.381ms         0.789600       0.807900       +0.0183
  +304.508ms         0.823300              ∅

Histograms are aligned by bin label. Bins that appear in one ref but not the other show , with deltas computed per value column.

Multi-way diffs

With three or more refs, the first ref is the base and each subsequent ref is compared against it. Delta columns are hidden by default in multi-way mode to keep tables readable:

$ rvc metrics --diff --ref HEAD~2 --ref HEAD~1 --ref HEAD train:FINAL_METRICS
Comparing snapshots: base (HEAD~2) + HEAD~1, HEAD

train:FINAL_METRICS
            metric   HEAD~2   HEAD~1     HEAD
  ---------------- -------- -------- --------
    best_train_acc 0.823300 0.813500 0.823300
            epochs       40       40       40
   final_train_acc 0.823300 0.800800 0.823300
  final_train_loss 0.431800 0.432400 0.421800

--deltas forces delta columns on; --no-deltas forces them off:

$ rvc metrics --diff --ref HEAD~2 --ref HEAD~1 --ref HEAD train:FINAL_METRICS --deltas
Comparing HEAD~2 against HEAD~1, HEAD

train:FINAL_METRICS
            metric   HEAD~2   HEAD~1 HEAD~1.Δ     HEAD   HEAD.Δ
  ---------------- -------- -------- -------- -------- ---------
    best_train_acc 0.823300 0.813500  -0.0098 0.823300   +0.0000
            epochs       40       40       +0       40        +0
   final_train_acc 0.823300 0.800800  -0.0225 0.823300   +0.0000
  final_train_loss 0.431800 0.432400  +0.0006 0.421800   -0.0100

In 2-way diffs, delta headers are shortened to Δ or Δ(field). In multi-way diffs, they include the ref name: HEAD.Δ or HEAD.Δ(field).

Mode Default deltas Override
2-way shown --no-deltas to hide
3+ way hidden --deltas to show

Mixed-kind fallback

When a metric changes shape between refs (e.g. flat in v1, timeseries in v2), RVC cannot produce a structured comparison table. Instead, it shows each ref's snapshot independently.

Structured diff output

Diff output supports all four formats. JSON and YAML include "base" and "targets" at the top level.

Flat diffs are ref-keyed objects (no "kind" field). Non-flat diffs include "kind" + "values". Timeseries diffs also include a "summary" object with start/end/duration/points per ref:

$ rvc metrics --diff --ref HEAD~1 --ref HEAD evaluate:EVAL_METRICS --json
{
  "base": "HEAD~1",
  "targets": ["HEAD"],
  "steps": {
    "evaluate": {
      "EVAL_METRICS": {
        "HEAD~1": {
          "fn": "26",
          "fp": "11",
          "val_acc": "0.792100"
        },
        "HEAD": {
          "fn": "25",
          "Δ(fn)": "-1",
          "fp": "11",
          "Δ(fp)": "+0",
          "val_acc": "0.797800",
          "Δ(val_acc)": "+0.0057"
        }
      }
    }
  }
}

Markdown (--markdown):

$ rvc metrics --diff --ref HEAD~1 --ref HEAD evaluate:EVAL_METRICS --markdown
### evaluate:EVAL_METRICS

| metric   | HEAD~1   | HEAD     | Δ        |
|----------|----------|----------|----------|
| fn       | 26       | 25       | -1       |
| fp       | 11       | 11       | +0       |
| tn       | 103      | 103      | +0       |
| tp       | 38       | 39       | +1       |
| val_acc  | 0.792100 | 0.797800 | +0.0057  |
| val_loss | 0.436900 | 0.429900 | -0.0070  |
| val_rows | 178      | 178      | +0       |

Example of structured output for timeseries and histogram diffs:

base: HEAD~1
targets:
  - HEAD
steps:
  analyze:
    AGE_HIST_JSON:
      kind: histogram
      values:
        - bin_start: '0'
          bin_end: '10'
          HEAD~1:
            count: '62'
            survival_rate: '0.612903'
          HEAD:
            count: '62'
            Δ(count): '+0'
            survival_rate: '0.612903'
            Δ(survival_rate): '+0.0000'
  train:
    TRAINING_LOG:
      kind: timeseries
      summary:
        HEAD~1:
          start: '2026-02-06T23:10:26Z'
          end: '2026-02-06T23:10:26Z'
          duration: '+304.508ms'
          points: '40'
        HEAD:
          start: '2026-02-07T15:49:37Z'
          Δ(start): '+16.654h'
          end: '2026-02-07T15:49:37Z'
          Δ(end): '+16.654h'
          duration: '+289.084ms'
          Δ(duration): '-15.424ms'
          points: '40'
          Δ(points): '+0'
      values:
        - Δt: '+0µs'
          HEAD~1:
            train_acc: '0.610100'
          HEAD:
            train_acc: '0.608700'
            Δ(train_acc): '-0.0014'

Composability

Selectors, output formats, and diff all compose freely:

# specific field + multi-way diff + markdown
rvc metrics train:TRAINING_LOG:train_acc --diff --ref HEAD~2 --ref HEAD~1 --ref HEAD --markdown

# all metrics as structured JSON diff
rvc metrics --diff --ref main --ref experiment --json

# histogram diff against working tree
rvc metrics analyze:AGE_HIST_JSON --diff

# timeseries diff with explicit sample density
rvc metrics train:TRAINING_LOG --diff --ref main --points 30

# alternate config file
rvc -d projects/titanic/rvc.yaml metrics --diff --ref main

Plots

rvc plots generates an interactive HTML page with charts for all metrics in the pipeline. It renders directly in the browser using Vega-Lite - no server needed.

Chart types

Each metric shape maps to a chart type:

  • Flat metrics → horizontal bar grid with delta labels, one panel per key, 3 columns
  • Timeseries → line chart per value column, refs as coloured lines, cross-ref tooltip on hover at each x-step, summary stats table below
  • Histogram → grouped bars by bin (one bar per ref), missing bins filled with 0, cross-ref tooltip on hover per bin

Basic usage

rvc plots                  # generate charts, open in browser
rvc plots --no-open        # generate without opening
rvc plots train eval       # only chart specific steps

By default, rvc plots diffs metrics against git (same ref logic as rvc metrics --diff). Use --no-diff to chart only the working tree:

rvc plots --no-diff        # no diff, just current metrics

Diff mode

Ref selection works the same as rvc metrics --diff:

rvc plots                          # auto-detect (working vs HEAD, branch vs main, etc.)
rvc plots --ref main               # working tree vs main
rvc plots --ref v1 --ref v2        # two specific refs

When diffing, all refs appear as separate series in timeseries/histogram charts, and as grouped bars in flat charts. The first ref is the base for delta computation.

Output files

rvc plots writes to a fixed plots/ directory next to rvc.yaml (and auto-adds /plots to the RVC-managed .gitignore block):

<dir-with-rvc.yaml>/plots/
  index.html          # self-contained interactive page
  specs/              # individual Vega-Lite specs (copy-paste into editors)
    eval-METRICS.vl.json
    train-LOG-loss.vl.json
    train-LOG-acc.vl.json

Spec files are standalone Vega-Lite specs with inline data - open them in the Vega Editor, embed in notebooks, or use in other tools.

Live mode

rvc run --live opens a browser page before pipeline execution and updates charts as metric files change on disk:

rvc run --live             # run pipeline with live chart updates
rvc run --live --force     # force re-run with live charts
rvc run --live train       # live charts filtered to train + its dependencies
rvc run --live --ref main  # live comparison baseline/targets (same ref rules as plots/metrics)

The page shows a pulsing ● LIVE badge during execution, then switches to ✓ DONE or ✗ FAILED when the pipeline finishes. The browser polls a companion JS file for updates - no WebSocket server needed.

When targeting specific steps (rvc run --live train), live charts are automatically filtered to only the steps in scope (the target plus its dependency chain). This keeps the page focused on what's actually executing.

Page layout

The HTML page includes:

  • Step sections - charts grouped by pipeline step, ordered by execution (topological) order
  • Pipeline DAG - a Mermaid diagram of the dependency graph, rendered below the charts

Charts are generated with a fixed, shared Vega-Lite base width across chart types so exported SVGs are consistent, then scale responsively in the page. The page uses Tailwind CSS via CDN for styling and works from file:// URLs - no server required.

Architecture

Rust handles data assembly: parsing metrics, computing deltas, aligning histogram bins, enriching data points with display labels. The results are Vega-Lite specs with inline data, plus optional structured summary tables.

The browser handles all rendering: Vega-Lite renders charts, JS renders summary tables from structured data, Mermaid renders the DAG. No HTML is generated in Rust.

Vega-Lite spec templates live in src/templates/ as standalone .vl.json files (valid Vega-Lite with empty data). Rust parses them as JSON and fills in data.values, title, and axis titles - no string-based template substitution.

Metric loading, ref resolution, and step filtering are shared between rvc metrics and rvc plots via app/shared.rs - the same code paths serve both commands.


Examples

The example/ directory contains three complete pipelines that can be run directly. They are independent of the test suite.

example/basic/

A minimal pipeline: uppercase text, count words, generate random numbers, split into files, produce a report. Useful for understanding the mechanics without external dependencies.

$ cd example/basic && rvc run && rvc metrics report:STATS
report:STATS
  files.count: 30
  files.first_name: a.txt
  word_count: 32

example/titanic/

Download the Titanic dataset, analyse survival rates, split train/val, train a neural network, evaluate. Exercises all three metric shapes - flat statistics, a timeseries training log, and survival-by-age histograms. Requires Python with uv.

cd example/titanic && rvc run && rvc metrics

example/eurofx/

Euro exchange rate analysis: download historical FX data, run three parallel filter steps (majors, monthly averages, extremes), then produce a summary. Demonstrates parallel execution and fan-out / fan-in DAG structure.

cd example/eurofx && rvc run && rvc dag

Global Options

Flag Effect
-d path/to/rvc.yaml Use an alternate config file
-v Verbose output
-n Dry run (show plan, don't execute)

These are global and can be combined with any subcommand.


TODO

Possible directions. This is a playground - these are ideas, not commitments.

Near-term

  • Metric thresholds - declare assertions like accuracy > 0.9 in rvc.yaml. Fail the run if metrics regress past a threshold.
  • Terminal sparklines - ASCII charts for timeseries metrics directly in the terminal.
  • Step-level timing - record execution duration in the lockfile. Surface trends across runs.

Medium-term

  • Parameterised sweeps - rvc run --sweep LEARNING_RATE=0.001,0.01,0.1 to fan out a step across parameter values and collect metrics from each.
  • rvc bisect - binary search across commits for the point where a metric regressed, similar to git bisect but driven by pipeline metrics.
  • CI reporting - a rvc report command that produces a self-contained markdown summary for PR comments: diff against the base branch, show what changed.
  • Partial restarts - if step 3 of 5 fails, resume from step 3 without re-validating earlier steps.
  • Config validation - a rvc check command to validate rvc.yaml without running anything: catch typos, missing deps, and circular references.

Exploratory

  • Notebook integration - metric collection from Jupyter notebooks, possibly via a lightweight Python package that writes NDJSON.
  • Distributed execution - run steps on remote machines or clusters.
  • Plugin system - custom step executors (Docker, Slurm, etc.) without hard-coding them into RVC.
  • Metric annotations - attach notes to specific runs ("tried larger batch size", "new augmentation") that appear in diff output.

Out of scope

  • Web UI / server. rvc plots generates static HTML opened locally. No hosted dashboard or server component.
  • Language-specific SDKs. The env-var approach is language-agnostic by design.
  • DVC compatibility. RVC is a separate project with different trade-offs.

Inspired By

DVC - the tool that solved these problems first. For production use, DVC is the right choice.

License

MIT

About

RVC is an experimental, vibe-coded reimagining of DVC

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages