Skip to content

Comments

Add management command to compare DVC and database-backed datasets#236

Open
tituomin wants to merge 1 commit intomainfrom
feature/dataset-comparison
Open

Add management command to compare DVC and database-backed datasets#236
tituomin wants to merge 1 commit intomainfrom
feature/dataset-comparison

Conversation

@tituomin
Copy link
Contributor

Description

Add a management command called compare_dataset which can be used to:

  • find datasets which have data both in DVC and the database
  • for a single dataset at a time, examine how the data differs in detail for a dataset which has data both in DVC and the database.

Screenshots/Videos (if applicable)

N/A

Related issue

https://app.asana.com/1/1201243246741462/project/1206017643443542/task/1212662157746701?focus=true

Requirements, dependencies and related PRs

N/A

Additional Notes

Internal tool, read-only use, non-critical.


✅ Pre-Merge Checklist

Type of Change

  • Set the PR's label to match the nature of this change

Testing

  • Built Unit tests (unit tests added/updated)
  • Built E2E tests (if applicable. E2E tests added/updated)
  • Authorization is tested (permissions and access controls verified)
  • Manually tested locally (functionality verified)
    Manual testing instructions
    Look at ./manage.py compare_dataset --help for info

Internationalization & Accessibility

  • New strings are translatable (all user-facing text uses i18n)
  • Accessibility standards met (WCAG compliance, screen reader support)

Dependencies

  • Dependencies are merged (if applicable. If the change depends on other PRs e.g. kausal_common)

Screenshots/Videos (if applicable)

N/A

Additional Notes

N/A

@tituomin tituomin added the New feature New feature (non-breaking change which adds functionality) label Feb 19, 2026
@tituomin tituomin force-pushed the feature/dataset-comparison branch from 84032a8 to e8ecb36 Compare February 19, 2026 15:14
@kausal-code-coverage
Copy link

kausal-code-coverage bot commented Feb 19, 2026

Codecov Report

❌ Patch coverage is 87.00000% with 13 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
nodes/datasets.py 12.50% 7 Missing ⚠️
nodes/dataset_diff.py 93.47% 3 Missing and 3 partials ⚠️

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #236      +/-   ##
==========================================
+ Coverage   41.21%   41.34%   +0.13%     
==========================================
  Files         287      288       +1     
  Lines       34471    34570      +99     
  Branches     5079     5095      +16     
==========================================
+ Hits        14206    14292      +86     
- Misses      19114    19124      +10     
- Partials     1151     1154       +3     
Flag Coverage Δ
e2e-tests 12.78% <0.00%> (-0.04%) ⬇️
unittests 34.14% <87.00%> (+0.15%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
nodes/dataset_diff.py 93.47% <93.47%> (ø)
nodes/datasets.py 39.29% <12.50%> (-0.43%) ⬇️

Impacted file tree graph

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@jtuomist
Copy link
Contributor

I tested this and it works just fine. I cannot think of improvements, ready to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

New feature New feature (non-breaking change which adds functionality)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants