Skip to content

Update trainset sampling#143

Merged
nadahlberg merged 13 commits intomainfrom
update-trainset-sampling
Mar 3, 2026
Merged

Update trainset sampling#143
nadahlberg merged 13 commits intomainfrom
update-trainset-sampling

Conversation

@nadahlberg
Copy link
Member

@nadahlberg nadahlberg commented Mar 3, 2026

This PR makes several changes to how the trainset is sampled:

  1. Now each sampling method tracks it's own status, so that we don't need to resample anytime anything changes. So sampling from an annotation decision or a querystring sampler only happens once.
  2. Heuristic bucket sampling is not idempotent. This is also needed so that we can preserve the trainset across updates.
  3. Querystring samplers: User's can now anchor on a querystring search and save it as a "sampler" so that at least N examples that satisfy the query will be added to the trainset.
  4. This PR also replaces the DSPy predictor with a simpler few-shot annotation agent.

@nadahlberg
Copy link
Member Author

Fixes: #142 #141 #140

@nadahlberg nadahlberg merged commit 35a6dcc into main Mar 3, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant