🚀 Feat: Model selection + e2e tests by LedjoLleshaj · Pull Request #61 · nicofretti/DataGenFlow

LedjoLleshaj · 2026-01-31T17:53:44Z

🚀 Feat: Model selection + e2e tests

Description

Added option to select/switch between the models in the settings view.
Implemented default reassignment: deleting the current default model automatically promotes another model to default.
Added integration tests covering auto-default and reassignment scenarios.

Related Issue

closes 🚀 Feat: Model selection in settings #60

Checklist

Code follows project style guidelines
Comments explain "why" not "what"
Documentation updated (if needed)
No debug code or console statements
make format passes
make pre-merge passes
PR update from develop branch
Copilot review run and addressed

Summary by CodeRabbit

New Features
- API endpoints to mark default LLM and embedding models.
- Settings UI: clickable model cards with default badge and inline actions.
- Pipeline editor & block config: JSON↔template toggle and model dropdowns preserving custom names.
- New data-augmentation template and three built-in blocks for sampling, semantic infill, and duplicate removal.
- Generator UI: improved Markdown vs JSON mode handling.
Documentation
- Docs and guides updated for defaults, blocks, templates, and e2e testing.
Tests
- Extensive unit, integration, and e2e test additions.
Chores
- Added e2e Make targets and CI/test helpers.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

…abbit (nicofretti#57) - Adds a data augmentation pipeline template via three new blocks (StructureSampler, SemanticInfiller, DuplicateRemover) - Enables JSON-or-template field configuration across blocks - Implements E2E testing infrastructure with Playwright - Updates frontend UI for model selection and template editing - Integrates CodeRabbit for automated code review.

Fixed frontend display bug where models named "default" incorrectly overrode explicit default settings. Updated lib/storage.py to strictly enforce a single default model constraint. Implemented auto-default logic: the first model created is automatically set as default. Implemented default reassignment: deleting the current default model automatically promotes another model to default. Added integration tests covering auto-default and reassignment scenarios.

coderabbitai · 2026-01-31T17:54:01Z

Walkthrough

Adds default-model selection end-to-end: DB + storage changes to track a single default, LLMConfigManager methods and API PUT endpoints to set defaults, frontend Settings UI to mark defaults, updated lookup fallbacks to prefer is_default, pipeline metadata tweak (first_block_type), multiple new blocks/utilities, and extensive tests and e2e tooling.

Changes

Cohort / File(s)	Summary
API & App `app.py`	Added PUT endpoints `PUT /llm-models/{name}/default` and `PUT /embedding-models/{name}/default`; exposes `first_block_type` in pipeline output; imports `LLMConfigError`/not-found handling.
Manager / Logic `lib/llm_config.py`, `lib/entities/llm_config.py`	Added `is_default` on model entities; lookup fallback updated to prefer `is_default=True` then legacy `"default"` name; added `set_default_llm_model()` and `set_default_embedding_model()` on manager.
Storage & Migrations `lib/storage.py`, migrations...	Schema adds `is_default` columns; transactional upsert/delete flows enforce single default; added `set_default_*` methods; reads/writes populate `is_default`.
Frontend — Settings & API client `frontend/src/pages/Settings.tsx`, `frontend/src/services/llmConfigApi.ts`, `frontend/src/types/index.ts`	Settings UI: clickable cards to mark defaults, visual badge/state; added client methods `setDefaultLLMModel`/`setDefaultEmbeddingModel`; types include optional `is_default`.
New Blocks & Template Utilities `lib/blocks/builtin/...`, `lib/blocks/commons/template_utils.py`, `lib/blocks/config.py`, `lib/template_renderer.py`	Added StructureSampler, SemanticInfiller, DuplicateRemover and multiple block config enhancements; introduced `json-or-template` helpers, robust LLM JSON parsing, and `_config_formats` plumbing.
Block changes (validation/mapping/metrics/etc.) `lib/blocks/builtin/field_mapper.py`, `json_validator.py`, `ragas_metrics.py`, `structured_generator.py`, `validator.py`	Multiple blocks now accept json-or-template inputs, validate rendered JSON, and expose template-based config fields; signatures updated accordingly.
Pipeline runtime `lib/workflow.py`, `lib/templates/data_augmentation.yaml`	Filter outputs to remove initial input metadata from saved/returned results; added data_augmentation template wiring new blocks.
Tests — unit/integration/e2e `tests/blocks/`, `tests/integration/`, `tests/e2e/*`, `tests/test_api.py`	Extensive new tests for default-model behavior, new blocks, template utils, and multiple e2e suites; API tests for default endpoints.
Docs, tooling & CI `llm/state-.md`, `docs/`, `.claude/*`, `.coderabbit.yaml`, `Makefile`, `pyproject.toml`, `.gitignore`	Documentation updates for new blocks and endpoints, new skills/docs, Makefile targets for e2e, dependency updates, and .gitignore adjustments.
Dev scripts & helpers `scripts/with_server.py`, `scripts/inspect_db_configs.py`, tests e2e helpers	New server lifecycle and DB inspection scripts; e2e helpers and orchestration scripts added.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Frontend
    participant API
    participant Manager
    participant Storage

    User->>Frontend: click "Set default" on model card
    Frontend->>API: PUT /llm-models/{name}/default
    API->>Manager: set_default_llm_model(name)
    Manager->>Storage: set_default_llm_model(name)
    Storage->>Storage: BEGIN TRANSACTION
    Storage->>Storage: UPDATE llm_models SET is_default=0
    Storage->>Storage: UPDATE llm_models SET is_default=1 WHERE name={name}
    Storage->>Storage: COMMIT
    Storage-->>Manager: OK
    Manager-->>API: success
    API-->>Frontend: 200 OK (message)
    Frontend->>API: GET /llm-models
    API->>Manager: list_llm_models()
    Manager->>Storage: list_llm_models()
    Storage-->>Manager: rows (with is_default)
    Manager-->>API: model list
    API-->>Frontend: updated list (default badge)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

🚀 Feat: template conversational data augmentation + e2e tests + CodeRabbit #57 — Overlaps substantially: similar additions of StructureSampler/SemanticInfiller/DuplicateRemover blocks, template_utils, BlockConfigPanel/frontend changes, and storage/manager default-model logic.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Out of Scope Changes check	⚠️ Warning	Scope creep detected: PR includes major new blocks (StructureSampler, SemanticInfiller, DuplicateRemover), data augmentation template, extensive documentation/CLI utilities, and schema changes beyond model selection.	Split PR into focused commits: (1) model selection core, (2) new blocks & data augmentation, (3) testing infrastructure. Align scope with issue `#60` requirements.
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.93% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	Title clearly summarizes the main changes: adding model selection UI/API and end-to-end tests.
Description check	✅ Passed	Description covers main objectives and references the linked issue, but checklist items are incomplete and should be marked as done or removed.
Linked Issues check	✅ Passed	All objectives from issue `#60` are addressed: UI for model selection in Settings, backend endpoints for default-model switching, auto-default reassignment, and integration tests.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 6

🤖 Fix all issues with AI agents

In `@app.py`:
- Around line 700-710: Add integration tests using TestClient that cover the new
endpoints PUT /llm-models/{name}/default and PUT
/embedding-models/{name}/default for both success and failure paths: call the
app endpoints (following patterns in test_api.py) and assert 200 success
responses on normal behavior; mock llm_config_manager.set_default_llm_model and
embedding_config_manager.set_default_embedding_model to raise
LLMConfigNotFoundError and assert the HTTP response is 404 with the error
detail; also mock them to raise a generic Exception and assert the response is
400 and contains the exception message; ensure tests reference the handler
functions by endpoint paths and use the same import/mocking approach used
elsewhere in tests.
- Around line 700-710: Replace the broad except Exception in the
set_default_llm_model endpoint with a specific except LLMConfigError handler
(LLMConfigError is the base for LLMConfigNotFoundError) so you catch
storage/manager errors without swallowing unrelated exceptions; keep logging via
logger.exception(f"failed to set default llm model {name}") and raise
HTTPException(status_code=400, detail=e.message) from e to preserve the message
format; make the analogous change in the embedding default endpoint (the
function that sets embedding models default) using the same LLMConfigError
pattern and message handling; finally add unit tests that exercise both
/llm-models/{name}/default and /embedding-models/{name}/default for success,
not-found (LLMConfigNotFoundError), and generic manager/storage error
(LLMConfigError) to ensure response codes and detail strings match expectations.

In `@frontend/src/pages/Settings.tsx`:
- Around line 396-397: The embedding list uses a legacy fallback causing
inconsistent default detection: inside the embeddingModels.map callback (where
you set const isDefault = model.is_default || model.name === "default";) remove
the legacy check so it matches the LLM logic (use only model.is_default). Update
that line to compute isDefault solely from model.is_default to ensure
backend-controlled defaults are honored.

In `@lib/llm_config.py`:
- Around line 135-141: Remove the duplicated "fallback chain:" line in the
docstring (lines 135-136) so the docstring reads the fallback sequence only
once; mirror the same fix applied in get_llm_model’s docstring to keep wording
consistent and ensure the list (1.–4.) follows a single "fallback chain:"
header.
- Around line 42-48: Remove the duplicated docstring line describing the
fallback chain in lib/llm_config.py so the description only appears once; locate
the module-level docstring (the block explaining the fallback chain: "uses
fallback chain to ensure blocks always have a model available:") and delete the
repeated occurrence so the list of fallback steps remains correct and not
duplicated.

In `@tests/integration/test_default_model_selection_integration.py`:
- Around line 143-150: Replace the broad Exception in the
test_set_nonexistent_default_raises_error with the specific
LLMConfigNotFoundError: update the pytest.raises(...) to
pytest.raises(LLMConfigNotFoundError) and add/import LLMConfigNotFoundError at
the top of the test module so the assertion targets the exact error raised by
llm_config_manager.set_default_llm_model("non_existent_model").

🧹 Nitpick comments (2)

tests/integration/test_default_model_selection_integration.py (1)

31-38: Consider adding test for legacy "default" name fallback.

The docstring mentions "Fallback to 'default' named model (legacy support)" but the test doesn't explicitly verify this behavior. The current tests only check is_default=True fallback.
lib/storage.py (1)
707-709: Consider adding logging before rollback for debugging.

The transaction rollback blocks re-raise exceptions but don't log them at the storage layer. While the caller may log, adding logger.exception() here would preserve context closer to the failure point.
♻️ Example for save_llm_model (apply similar pattern to others)
             except Exception:
+                logger.exception("transaction failed during save_llm_model")
                 await db.execute("ROLLBACK")
                 raise
As per coding guidelines: "Use logger.exception() in backend — preserves stack traces in error logs."

Also applies to: 736-738, 759-761, 848-850, 876-878, 901-903

frontend/src/pages/Settings.tsx

lib/llm_config.py

tests/integration/test_default_model_selection_integration.py

Fixed frontend display bug where models named "default" incorrectly overrode explicit default settings. Updated lib/storage.py to strictly enforce a single default model constraint. Implemented auto-default logic: the first model created is automatically set as default. Implemented default reassignment: deleting the current default model automatically promotes another model to default. Added integration tests covering auto-default and reassignment scenarios.

…shaj/DataGenFlow into feature-model-selection

coderabbitai

Actionable comments posted: 18

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

lib/blocks/builtin/ragas_metrics.py (1)

110-149: ⚠️ Potential issue | 🟠 Major

avoid mutating self.metrics inside execute.

Line 110 mutates instance state per-run; if block instances are reused concurrently, metrics can leak across executions. Keep parsed metrics local and pass them to helpers.

🔧 suggested fix

-        # store parsed metrics for use in other methods
-        self.metrics = metrics
+        metrics_list = metrics

@@
-            return {"ragas_scores": self._empty_scores()}
+            return {"ragas_scores": self._empty_scores(metrics_list)}

@@
-            metric_instances = self._build_metrics(llm, embeddings)
+            metric_instances = self._build_metrics(llm, embeddings, metrics_list)

-    def _empty_scores(self) -> dict[str, Any]:
+    def _empty_scores(self, metrics: list[str]) -> dict[str, Any]:
         """return empty scores with passed=False"""
-        return {metric: 0.0 for metric in self.metrics} | {"passed": False}
+        return {metric: 0.0 for metric in metrics} | {"passed": False}

-    def _build_metrics(self, llm: Any, embeddings: Any) -> dict[str, Any]:
+    def _build_metrics(
+        self, llm: Any, embeddings: Any, metrics: list[str]
+    ) -> dict[str, Any]:
         """build metric instances"""
@@
-        return {k: v for k, v in available.items() if k in self.metrics}
+        return {k: v for k, v in available.items() if k in metrics}

frontend/src/pages/Generator.tsx (1)

614-616: ⚠️ Potential issue | 🟡 Minor

Empty catch block for metadata parsing.

While this handles malformed JSON gracefully by returning null, adding a console.error would help debug production issues. As per coding guidelines, empty catch blocks should include logging.
🛡️ Add error logging
             } catch {
+              console.error("Failed to parse job metadata");
               return null;
             }

🤖 Fix all issues with AI agents

In @.claude/skills/address-pr-review/scripts/fetch_comments.py:
- Around line 23-31: The get_repo() function currently exits silently when the
gh repo view subprocess fails; update it to mirror fetch_comments() error
handling by checking result.returncode and printing or logging the subprocess
error output (result.stderr) and/or stdout before exiting with non-zero status.
Locate get_repo() and change the failure branch to emit a clear error message
including result.stderr (and result.stdout if helpful) and then call sys.exit(1)
so failures are visible for debugging; use the same pattern used in
fetch_comments() around the subprocess.run call for consistency.

In @.claude/skills/implementing-datagenflow-blocks/SKILL.md:
- Line 42: Update the category enumeration assigned to the variable `category`
in SKILL.md so it matches the canonical list used by the state backend: replace
the current value "generators|transformers|validators|utilities" with
"generators|validators|metrics|seeders|multipliers|observability|utilities"
(i.e., remove "transformers" and add "metrics", "seeders", "multipliers", and
"observability") so the SKILL `category` options align with the state-backend
definitions.
- Line 386: The private method _prepare_embedding_call is used across modules
(duplicate_remover.py, semantic_infiller.py, ragas_metrics.py); make it public
by adding a prepare_embedding_call() wrapper on the LLM config manager that
delegates to the existing _prepare_embedding_call (or rename the method and keep
a short-lived alias for backward compatibility), update all callers to invoke
prepare_embedding_call instead of _prepare_embedding_call, and ensure any
docstring/visibility tests mirror the existing prepare_llm_call pattern so
behavior and signatures remain identical.

In @.claude/skills/webapp-testing/examples/static_html_automation.py:
- Around line 13-35: Wrap the browser/page operations inside a try/finally so
browser.close() always runs: after calling sync_playwright() and creating
browser and page (symbols: sync_playwright, p.chromium.launch, browser, page),
put navigation/page interactions (page.goto, page.screenshot, page.click,
page.fill, page.wait_for_timeout) inside a try block and call browser.close() in
the finally block; ensure screenshots (output_dir) and any pending actions are
flushed before closing by keeping the final page.screenshot call in the try and
leaving only browser.close() in finally.

In `@app.py`:
- Around line 701-711: Add documentation for the new default-model endpoints to
llm/state-backend.md: describe the PUT /llm-models/{name}/default endpoint
implemented by set_default_llm_model (which calls
llm_config_manager.set_default_llm_model), the expected request/response shape,
HTTP status codes (200 success, 404 LLMConfigNotFoundError, 400 other errors),
and the semantics of "default model" behavior; also scan nearby endpoints (e.g.,
the other default-model related handlers around lines 769-779) and update any
related llm/*.md sections to satisfy the backend code review checklist from
llm/rules-backend.md.

In `@frontend/src/components/pipeline-editor/BlockConfigPanel.tsx`:
- Around line 85-119: The component BlockConfigPanel currently contains inline
fetchModels inside its useEffect and directly calls
setLlmModels/setEmbeddingModels; refactor by moving the fetch logic into a
service (e.g., add modelsService.getLlmModels and
modelsService.getEmbeddingModels or a single modelsService.fetchAllModels that
handles AbortController) and implement a custom hook (e.g., useModels or
useAvailableModels) that returns { llmModels, embeddingModels, loading, error,
refresh }. Replace the inline useEffect/fetchModels in BlockConfigPanel with
that hook, wire loading to the UI (spinner/disabled states) and surface errors
via your app's toast/notification mechanism when hook.error is set, and ensure
the hook supports aborting on unmount and exposes a refresh function for
retries.
- Around line 97-113: Replace the unsafe "any" usages in BlockConfigPanel: add a
type guard function (e.g., function isModel(obj: unknown): obj is { name: string
}) and use it to filter llmData/embeddingData before mapping to names passed to
setLlmModels and setEmbeddingModels (e.g., Array.isArray(llmData) ?
llmData.filter(isModel).map(m => m.name) : []). Also change the catch error
handling to use unknown and narrow it safely (e.g., catch (error: unknown) { if
(!(error instanceof DOMException && error.name === 'AbortError'))
console.error('Failed to fetch models:', error instanceof Error ? error :
String(error)); }) so you remove "m: any" and "error as any" while preserving
the AbortError check and logging.

In `@frontend/src/components/pipeline-editor/BlockNode.tsx`:
- Around line 108-117: The catch block in BlockNode.tsx that parses
config["fields_to_generate"] is currently empty and hides JSON parse errors;
update the catch to provide a fallback and record the failure by setting
displayValue to a clear fallback (e.g., "[invalid JSON]" or treat as template
string) and/or logging the error via the component logger or console, so
failures aren’t silent; specifically modify the try/catch around
JSON.parse(config[key]) in the fields_to_generate handling to catch the error,
call console.error or the component logging utility with the error and
key/context, and set displayValue to the chosen fallback (instead of leaving it
unchanged).

In `@lib/blocks/builtin/duplicate_remover.py`:
- Around line 217-253: The code filters out empty entries from sample_texts but
not from samples, breaking alignment and causing index errors in
_compute_similarities; update the logic after creating sample_texts so that you
filter samples in tandem (e.g., build pairs of (sample, text) via a
comprehension or zip, then split back into filtered_samples and sample_texts)
and pass filtered_samples and their corresponding sample_embeddings to
_compute_similarities instead of the original samples so indices remain aligned
(refer to _extract_text, sample_texts, samples, and _compute_similarities).

In `@lib/blocks/builtin/ragas_metrics.py`:
- Around line 86-109: After parsing metrics_template into metrics_list in the
method that uses render_template and assigns metrics, validate every entry
against the allowed set (e.g., a module-level or class constant like
ALLOWED_METRICS or self.allowed_metrics); if any items are not in that set,
raise BlockExecutionError with a clear message and detail that includes the
invalid metrics and the rendered value (similar to the existing JSON error
handling). Insert this check after the isinstance checks on metrics_list and
before assigning metrics = metrics_list so unknown metric names cause an
immediate BlockExecutionError instead of being silently ignored.

In `@lib/blocks/builtin/structure_sampler.py`:
- Around line 348-369: When parsing dependencies (the dependencies variable
populated via render_and_parse_json from self.dependencies_template), validate
that every dependency key references a declared categorical field in
self.categorical_fields before proceeding to _topological_sort; if any key is
missing, raise BlockExecutionError with a clear message and include the full
dependencies and categorical_fields in the detail so callers can see the
mismatch. Also consider validating that each listed parent in the dependency
values is a declared categorical field and raise similarly if not, to avoid
misleading circular-dependency errors or sampling with missing parents.

In `@lib/blocks/commons/template_utils.py`:
- Around line 66-96: The parse_llm_json_response function currently swallows
JSONDecodeError in three empty except blocks; capture each exception (e.g.,
"except json.JSONDecodeError as e") and either log it at debug level via the
module logger or store the last exception and include its message in the
BlockExecutionError detail; update the blocks around markdown_match and
json_match and the initial direct-parse try to reference the captured exception
(and optionally logger) so failures aren't silent, and ensure the final raise
includes the last exception info (or call logger.debug with field_name and
exception) to aid debugging.

In `@lib/workflow.py`:
- Around line 275-284: Document that multiplier pipelines run
_filter_output_data and remove any keys present in initial_data_keys (used to
strip template/config/seed fields) by updating llm/state-backend.md; explicitly
note this behavior is currently applied only for multiplier pipelines and either
justify the asymmetry or state whether the filter should also apply to normal
pipelines. Also add guidance recommending moving from the current deny-list
approach to an explicit allow-list of preserved output fields (or provide a
configurable whitelist) to avoid accidental removal of legitimate block outputs
(reference _filter_output_data and places where multiplier pipelines invoke it)
and include migration notes for blocks that emit keys like "samples".

In `@Makefile`:
- Line 1: The .PHONY declaration currently lists many targets but omits "all",
which can cause a file named "all" to shadow the make target; update the .PHONY
line (the .PHONY declaration containing targets like check-deps, install, dev,
etc.) to include the "all" target so Make treats "all" as phony and avoids file
conflicts.

In `@pyproject.toml`:
- Line 85: The CI is skipping E2E tests because pyproject.toml's addopts ignores
tests/e2e and the pre-merge CI job only runs the Makefile's test target; update
CI to run E2E by adding the existing Makefile target test-e2e (which calls
./tests/e2e/run_all_tests.sh) to the pre-merge workflow or add an explicit step
in .github/workflows/pre-merge.yml to run `make test-e2e` so E2E coverage is
executed on PRs.

In `@tests/e2e/README.md`:
- Around line 204-205: In the "Writing New Tests" section replace the term
"webapp-testing" with the preferred phrase "web app testing" (or "web-app
testing") to correct terminology; locate occurrences of the exact token
"webapp-testing" in the README (e.g., the line under the "Writing New Tests"
heading) and update them to "web app testing" consistently throughout the
document.

In `@tests/integration/test_data_augmentation.py`:
- Around line 295-297: The test currently double-serializes the pipeline by
calling storage.save_pipeline with json.dumps(pipeline_def); change the call to
pass the raw dict (pipeline_def) to storage.save_pipeline so it isn't serialized
twice, keeping the subsequent Pipeline("test_no_embedding",
pipeline_def["blocks"]) as-is; update the use of storage.save_pipeline(...)
accordingly.
- Around line 192-194: The test is double-serializing pipeline_def because it
calls storage.save_pipeline with json.dumps(pipeline_def); change the call to
pass the dict directly (pipeline_def) so save_pipeline can perform the single
json.dumps internally; update the line that constructs the Pipeline instance
remains as Pipeline("test_sampler", pipeline_def["blocks"]) and ensure you only
remove the extra json.dumps wrapper around pipeline_def when invoking
save_pipeline.

🧹 Nitpick comments (23)

scripts/inspect_db_configs.py (3)
50-50: Return value from main() is discarded.

asyncio.run(main()) at line 56 doesn't capture the returned tuple. Either remove the return statement or capture/use the result if needed for testing.
♻️ Proposed fix
-        return llm_models, embedding_models
     finally:
         await storage.close()
Or if the return is intentional for testing:
 if __name__ == "__main__":
-    asyncio.run(main())
+    llm_models, embedding_models = asyncio.run(main())
+    print(f"\nTotal: {len(llm_models)} LLM models, {len(embedding_models)} embedding models")
Also applies to: 55-56

17-17: Hardcoded database path.

Consider accepting the path as a CLI argument for flexibility, or document that data/qa_records.db is the expected location.
♻️ Proposed enhancement
+import argparse
+
 async def main():
-    storage = Storage("data/qa_records.db")
+    parser = argparse.ArgumentParser(description=__doc__)
+    parser.add_argument("--db", default="data/qa_records.db", help="path to database")
+    args = parser.parse_args()
+    storage = Storage(args.db)
30-34: Use public Storage methods instead of private API.

Storage exposes list_llm_models() and list_embedding_models() public methods that handle the database queries. Replace the private _execute_with_connection() calls (lines 30 and 44) with these public methods, which return properly typed objects instead of raw dicts.

Also remove the unused return statement at line 50—asyncio.run(main()) at line 56 discards the result.
.claude/skills/debugging-pipelines/SKILL.md (1)

90-92: Hardcoded line numbers may drift.

References like lib/workflow.py:85-224 and lib/workflow.py:305-449 will become stale as code evolves. Consider using function/method names or comments to anchor references instead.
tests/integration/conftest.py (1)
14-15: Consider fixture scope for e2e efficiency.

The fixture defaults to scope="function", creating a new DB per test. For e2e tests that don't mutate shared state, scope="module" or scope="session" could reduce overhead.
Example if appropriate
 `@pytest_asyncio.fixture`
+async def e2e_storage(scope="module"):
-async def e2e_storage():
.claude/skills/webapp-testing/scripts/with_server.py (2)
76-77: Add strict=True to zip for defense in depth.

While line 71-73 validates count match, adding strict=True prevents silent truncation if that check is ever bypassed or refactored.
Proposed fix
-    for cmd, port in zip(args.servers, args.ports):
+    for cmd, port in zip(args.servers, args.ports, strict=True):
28-32: Use OSError instead of socket.error.

socket.error is an alias for OSError since Python 3.3. Using OSError directly is more idiomatic.
Proposed fix
-        except (socket.error, ConnectionRefusedError):
+        except OSError:
Note: ConnectionRefusedError is a subclass of OSError, so catching OSError alone suffices.
scripts/with_server.py (1)
46-63: Fail fast if a server process exits during readiness polling.
Right now a crashed process just burns the full timeout; checking proc.poll() gives faster feedback and clearer failures.
💡 Suggested change
             while time.time() - start_time < self.max_wait:
+                if proc.poll() is not None:
+                    print(f"  server on port {port} exited (code {proc.returncode})")
+                    self.cleanup()
+                    sys.exit(1)
                 try:
                     with urllib.request.urlopen(url, timeout=2) as response:
                         if response.status == 200:
                             print(f"  server on port {port} is ready")
lib/workflow.py (1)

334-435: Consider aligning output filtering across pipeline types.
Filtering is now applied in the multiplier seed path, but normal pipelines still return full accumulated_state. If a consistent response shape is desired, consider applying the same filter in the non-multiplier path.

tests/test_template_renderer.py (3)

8-33: split conditionals test and rename for convention.

test_render_template_with_conditionals exercises two scenarios in one test. Prefer one behavior per test and use the test_<method>_<scenario>_<expected> naming pattern.

As per coding guidelines, One behavior per test; Test names: test_.

35-97: split multi-scenario tojson tests and align naming.

test_tojson_filter_nested_in_complex_template mixes defined/undefined cases; split into separate tests and rename to the standard pattern.

As per coding guidelines, One behavior per test; Test names: test_.

99-133: separate truncate scenarios and rename error tests.

test_truncate_filter covers short + long inputs in one test; split and rename to the required convention. Consider renaming the error tests (test_undefined_variable_without_filter, test_template_syntax_error) to match the same pattern.

As per coding guidelines, One behavior per test; Test names: test_.
lib/blocks/builtin/semantic_infiller.py (5)
174-182: Use exception chaining for better traceability.

When re-raising exceptions, chain with from e to preserve the original traceback.
Proposed fix
         try:
             response = await litellm.acompletion(**llm_params)
         except Exception as e:
             raise BlockExecutionError(
-                f"LLM call failed: {str(e)}",
+                f"LLM call failed: {e!s}",
                 detail={
                     "skeleton": skeleton,
                     "prompt_preview": prompt[:200],
                     "error": str(e),
                 },
-            )
+            ) from e
247-252: Add strict=True to zip for defensive iteration.

If embeddings and samples somehow get out of sync, this would silently truncate. Adding strict=True ensures length mismatches raise immediately.
Proposed fix
-        for emb, sample in zip(embeddings, samples):
+        for emb, sample in zip(embeddings, samples, strict=True):
             sim = self._cosine_similarity(target_embedding, emb)
             similarities.append((sim, sample))
268-270: Rename unused loop variable.

sim is unpacked but not used in the loop body.
Proposed fix
-        for sim, sample in similar_samples:
+        for _sim, sample in similar_samples:
             fields_str = json.dumps({f: sample.get(f, "") for f in fields_to_generate})
             negative_lines.append(f"  - {fields_str}")
332-338: Use exception chaining here as well.

Same issue as the other LLM call - chain with from e.
Proposed fix
             try:
                 response = await litellm.acompletion(**llm_params)
             except Exception as e:
                 raise BlockExecutionError(
-                    f"LLM call failed: {str(e)}",
+                    f"LLM call failed: {e!s}",
                     detail={"skeleton": skeleton, "attempt": attempt, "error": str(e)},
-                )
+                ) from e
389-390: Unreachable code after exhausted for-loop.

These lines execute only if the loop completes all iterations without returning. However, the loop always returns on the last iteration (attempt == max_diversity_retries) after exhausting retries. The variables result and embedding will be defined from the last iteration, so this is technically safe but misleading.

Consider adding a comment or restructuring to clarify intent.
tests/integration/test_data_augmentation_pipeline.py (1)
28-32: Monkey-patching global state works but consider fixture-based approach.

The manual save/restore of app.llm_config_manager is functional but fragile. Consider using unittest.mock.patch as a context manager or decorator for cleaner lifecycle management:
with patch.object(app, 'llm_config_manager', llm_config_manager):
    # test body
This eliminates the need for try/finally cleanup.
tests/blocks/test_semantic_infiller.py (1)
386-417: Unused mock_embedding parameter - add assertion or remove.

The mock_embedding mock is patched but never used or asserted. If the intent is to verify embeddings are NOT called when unavailable, add an assertion:
♻️ Option 1: Add assertion
         # should still work, just without diversity check
         assert "samples" in result
         assert len(result["samples"]) == 1
+        # embedding should not be called when model unavailable
+        mock_embedding.assert_not_called()
Alternatively, remove the mock from the decorator if not needed.
tests/e2e/test_review_e2e.py (2)
17-37: Missing setup/teardown fixture - inconsistent with other e2e tests.

test_pipelines_e2e.py and test_generator_e2e.py use a module-scoped _e2e_setup_teardown fixture that waits for server and cleans database. This file lacks that pattern, which could lead to test pollution or failures if server isn't ready.
♻️ Add fixture for consistency
+@pytest.fixture(scope="module", autouse=True)
+def _e2e_setup_teardown():
+    """setup and teardown for e2e tests"""
+    from test_helpers import cleanup_database, wait_for_server
+    if not wait_for_server():
+        pytest.skip("server not ready for e2e tests")
+    cleanup_database()
+    yield
+    cleanup_database()
+
+
 def test_review_page_loads():
40-68: Tests may pass vacuously when UI elements are missing.

Several tests (like test_select_job) check if len(selectors) > 0 but don't assert or skip when elements aren't found. The test passes without verifying anything meaningful.

Consider using pytest.skip() when preconditions aren't met:
♻️ Make preconditions explicit
         selectors = page.locator('select, [role="combobox"]').all()
 
-        if len(selectors) > 0:
+        if len(selectors) == 0:
+            browser.close()
+            pytest.skip("no job selector found on review page")
+
+        # click first selector
+        selectors[0].click()
tests/e2e/test_generator_e2e.py (2)
56-74: Inconsistent screenshot path handling vs test_pipelines_e2e.py.

This file uses hardcoded /tmp/ paths while test_pipelines_e2e.py uses pytest's tmp_path fixture. While acceptable per project guidelines, consider aligning patterns across e2e tests for maintainability.

115-163: Seed file cleanup could fail silently if test errors before os.remove.

If an assertion fails before line 160 (os.remove(seed_path)), the temp file persists. Consider using a context manager or fixture for cleanup.
♻️ Use try/finally for cleanup
         seed_path = "/tmp/test_seed.json"
-        with open(seed_path, "w") as f:
-            json.dump(seed_data, f)
-
-        # go to generator page
-        ...
-
-        # cleanup
-        os.remove(seed_path)
+        try:
+            with open(seed_path, "w") as f:
+                json.dump(seed_data, f)
+
+            # go to generator page
+            ...
+        finally:
+            if os.path.exists(seed_path):
+                os.remove(seed_path)

coderabbitai · 2026-02-01T12:35:26Z

.claude/skills/address-pr-review/scripts/fetch_comments.py

+def get_repo() -> str:
+    result = subprocess.run(
+        ["gh", "repo", "view", "--json", "owner,name", "-q", '.owner.login + "/" + .name'],
+        capture_output=True,
+        text=True,
+    )
+    if result.returncode != 0:
+        sys.exit(1)
+    return result.stdout.strip()


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cat -n .claude/skills/address-pr-review/scripts/fetch_comments.py | head -40

Repository: nicofretti/DataGenFlow

Length of output: 1534

🏁 Script executed:

# Check for callers of get_repo() and other error handling patterns rg "get_repo" .claude/skills/address-pr-review/scripts/ -B 2 -A 2

Repository: nicofretti/DataGenFlow

Length of output: 1008

🏁 Script executed:

# Check the broader context of the file and other functions wc -l .claude/skills/address-pr-review/scripts/fetch_comments.py

Repository: nicofretti/DataGenFlow

Length of output: 128

🏁 Script executed:

# Check other error handling in the file to see patterns and consistency cat -n .claude/skills/address-pr-review/scripts/fetch_comments.py | sed -n '34,50p'

Repository: nicofretti/DataGenFlow

Length of output: 817

🏁 Script executed:

# Check if there are other subprocess calls and how they handle errors rg "subprocess.run|sys.exit" .claude/skills/address-pr-review/scripts/fetch_comments.py -B 1 -A 1

Repository: nicofretti/DataGenFlow

Length of output: 1106

Add error output to get_repo() when gh repo view fails.

The function exits silently on error (line 29-30). The fetch_comments() function below already implements the correct pattern (line 42); apply the same error handling here for consistency.

🔧 Suggested fix

if result.returncode != 0: - sys.exit(1) + print(f"failed to resolve repo: {result.stderr.strip()}", file=sys.stderr) + sys.exit(1)

🧰 Tools

🪛 Ruff (0.14.14)

[error] 25-25: Starting a process with a partial executable path

(S607)

🤖 Prompt for AI Agents

In @.claude/skills/address-pr-review/scripts/fetch_comments.py around lines 23 - 31, The get_repo() function currently exits silently when the gh repo view subprocess fails; update it to mirror fetch_comments() error handling by checking result.returncode and printing or logging the subprocess error output (result.stderr) and/or stdout before exiting with non-zero status. Locate get_repo() and change the failure branch to emit a clear error message including result.stderr (and result.stdout if helpful) and then call sys.exit(1) so failures are visible for debugging; use the same pattern used in fetch_comments() around the subprocess.run call for consistency.

coderabbitai · 2026-02-01T12:35:26Z

.claude/skills/implementing-datagenflow-blocks/SKILL.md

+class MyBlock(BaseBlock):
+    name = "My Block"
+    description = "Short description of what this block does"
+    category = "generators"  # generators|transformers|validators|utilities


⚠️ Potential issue | 🟡 Minor

Category list inconsistent with state-backend.md.

Line 42 shows generators|transformers|validators|utilities but llm/state-backend.md (line 384) lists: generators, validators, metrics, seeders, multipliers, observability, utilities.

Missing: metrics, seeders, multipliers, observability
Extra: transformers

📝 Proposed fix

- category = "generators" # generators|transformers|validators|utilities + category = "generators" # generators|validators|metrics|seeders|multipliers|observability|utilities

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

category = "generators" # generators|transformers|validators|utilities

category = "generators" # generators|validators|metrics|seeders|multipliers|observability|utilities

🤖 Prompt for AI Agents

In @.claude/skills/implementing-datagenflow-blocks/SKILL.md at line 42, Update the category enumeration assigned to the variable `category` in SKILL.md so it matches the canonical list used by the state backend: replace the current value "generators|transformers|validators|utilities" with "generators|validators|metrics|seeders|multipliers|observability|utilities" (i.e., remove "transformers" and add "metrics", "seeders", "multipliers", and "observability") so the SKILL `category` options align with the state-backend definitions.

coderabbitai · 2026-02-01T12:35:26Z

.claude/skills/implementing-datagenflow-blocks/SKILL.md

+    )
+
+    # prepare embedding call
+    embedding_params = llm_config_manager._prepare_embedding_call(


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # First, let's verify the method exists and check for public alternatives ast-grep --pattern 'def _prepare_embedding_call($$$)' ast-grep --pattern 'def prepare_embedding_call($$$)'

Repository: nicofretti/DataGenFlow

Length of output: 2657

🏁 Script executed:

#!/bin/bash # Check for prepare_llm_call (public method mentioned in docstring) ast-grep --pattern 'def prepare_llm_call($$$)' # Find all usages of _prepare_embedding_call rg '_prepare_embedding_call' --type py -B 2 -A 2 # Check the LLMConfigManager class structure rg 'class.*[Cc]onfig' lib/llm_config.py --type py -A 5

Repository: nicofretti/DataGenFlow

Length of output: 7400

Make _prepare_embedding_call public or refactor internal usage.

The _prepare_embedding_call() method is private (underscore prefix) but is actively used from multiple modules (duplicate_remover.py, semantic_infiller.py, ragas_metrics.py). This violates encapsulation. Either expose it as a public method prepare_embedding_call() (matching the pattern of prepare_llm_call()), or refactor the modules to use a public API.

🤖 Prompt for AI Agents

In @.claude/skills/implementing-datagenflow-blocks/SKILL.md at line 386, The private method _prepare_embedding_call is used across modules (duplicate_remover.py, semantic_infiller.py, ragas_metrics.py); make it public by adding a prepare_embedding_call() wrapper on the LLM config manager that delegates to the existing _prepare_embedding_call (or rename the method and keep a short-lived alias for backward compatibility), update all callers to invoke prepare_embedding_call instead of _prepare_embedding_call, and ensure any docstring/visibility tests mirror the existing prepare_llm_call pattern so behavior and signatures remain identical.

coderabbitai · 2026-02-01T12:35:26Z

.claude/skills/webapp-testing/examples/static_html_automation.py

+with sync_playwright() as p:
+    browser = p.chromium.launch(headless=True)
+    page = browser.new_page(viewport={"width": 1920, "height": 1080})
+
+    # Navigate to local HTML file
+    page.goto(file_url)
+
+    # Take screenshot
+    page.screenshot(path=str(output_dir / "static_page.png"), full_page=True)
+
+    # Interact with elements
+    page.click("text=Click Me")
+    page.fill("#name", "John Doe")
+    page.fill("#email", "john@example.com")
+
+    # Submit form
+    page.click('button[type="submit"]')
+    page.wait_for_timeout(500)
+
+    # Take final screenshot
+    page.screenshot(path=str(output_dir / "after_submit.png"), full_page=True)
+
+    browser.close()


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

find . -name "static_html_automation.py" -type f

Repository: nicofretti/DataGenFlow

Length of output: 132

🏁 Script executed:

fd "static_html_automation" --type f

Repository: nicofretti/DataGenFlow

Length of output: 48

🏁 Script executed:

cat -n .claude/skills/webapp-testing/examples/static_html_automation.py

Repository: nicofretti/DataGenFlow

Length of output: 1371

🏁 Script executed:

# Check if there's any documentation or tests showing how playwright context manager handles exceptions rg "sync_playwright" --type py -A 10 -B 2 | head -60

Repository: nicofretti/DataGenFlow

Length of output: 3925

🏁 Script executed:

# Search for other sync_playwright usages with try/finally or exception handling patterns rg "with sync_playwright" -A 20 | grep -E "(try|except|finally)" | head -20

Repository: nicofretti/DataGenFlow

Length of output: 48

🏁 Script executed:

# Check playwright documentation patterns in comments or existing code rg "browser.close" --type py -B 3 -A 1

Repository: nicofretti/DataGenFlow

Length of output: 6605

🌐 Web query:

Playwright Python sync_playwright context manager exception handling browser cleanup

💡 Result:

In Playwright Python (sync), with sync_playwright() as p: ensures the Playwright driver is stopped when the block exits, even if an exception is raised—but you should still explicitly close any BrowserContext and Browser you created using try/finally so browsers shut down cleanly and artifacts (videos/HAR, etc.) flush properly. (playwright.dev)

from playwright.sync_api import sync_playwright with sync_playwright() as p: browser = p.chromium.launch() context = browser.new_context() try: page = context.new_page() page.goto("https://example.com") # ... your actions/assertions ... finally: # Close context first (graceful; flushes artifacts), then browser context.close() browser.close()

Notes:

Playwright docs recommend explicitly closing contexts you created via browser.new_context() before calling browser.close(). (playwright.dev)

If you don’t use the context manager (sync_playwright().start()), you must call playwright.stop() yourself (typically in a finally). (playwright.dev)

Citations:

1: https://playwright.dev/python/docs/api/class-browser?utm_source=openai

2: https://playwright.dev/python/docs/api/class-browser?utm_source=openai

3: https://playwright.dev/python/docs/api/class-playwright?utm_source=openai

Wrap browser operations in try/finally to ensure cleanup on exceptions.
If page navigation or interactions fail, browser.close() is never reached, risking orphaned processes. While sync_playwright() provides cleanup, explicit resource handling via try/finally is a best practice for proper shutdown and artifact flushing.

Proposed fix

with sync_playwright() as p: browser = p.chromium.launch(headless=True) - page = browser.new_page(viewport={"width": 1920, "height": 1080}) - - # Navigate to local HTML file - page.goto(file_url) - - # Take screenshot - page.screenshot(path=str(output_dir / "static_page.png"), full_page=True) - - # Interact with elements - page.click("text=Click Me") - page.fill("#name", "John Doe") - page.fill("#email", "john@example.com") - - # Submit form - page.click('button[type="submit"]') - page.wait_for_timeout(500) - - # Take final screenshot - page.screenshot(path=str(output_dir / "after_submit.png"), full_page=True) - - browser.close() + try: + page = browser.new_page(viewport={"width": 1920, "height": 1080}) + + # Navigate to local HTML file + page.goto(file_url) + + # Take screenshot + page.screenshot(path=str(output_dir / "static_page.png"), full_page=True) + + # Interact with elements + page.click("text=Click Me") + page.fill("#name", "John Doe") + page.fill("#email", "john@example.com") + + # Submit form + page.click('button[type="submit"]') + page.wait_for_timeout(500) + + # Take final screenshot + page.screenshot(path=str(output_dir / "after_submit.png"), full_page=True) + finally: + browser.close()

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

with sync_playwright() as p:

browser = p.chromium.launch(headless=True)

page = browser.new_page(viewport={"width": 1920, "height": 1080})

# Navigate to local HTML file

page.goto(file_url)

# Take screenshot

page.screenshot(path=str(output_dir / "static_page.png"), full_page=True)

# Interact with elements

page.click("text=Click Me")

page.fill("#name", "John Doe")

page.fill("#email", "john@example.com")

# Submit form

page.click('button[type="submit"]')

page.wait_for_timeout(500)

# Take final screenshot

page.screenshot(path=str(output_dir / "after_submit.png"), full_page=True)

browser.close()

with sync_playwright() as p:

browser = p.chromium.launch(headless=True)

try:

page = browser.new_page(viewport={"width": 1920, "height": 1080})

# Navigate to local HTML file

page.goto(file_url)

# Take screenshot

page.screenshot(path=str(output_dir / "static_page.png"), full_page=True)

# Interact with elements

page.click("text=Click Me")

page.fill("#name", "John Doe")

page.fill("#email", "john@example.com")

# Submit form

page.click('button[type="submit"]')

page.wait_for_timeout(500)

# Take final screenshot

page.screenshot(path=str(output_dir / "after_submit.png"), full_page=True)

finally:

browser.close()

🤖 Prompt for AI Agents

In @.claude/skills/webapp-testing/examples/static_html_automation.py around lines 13 - 35, Wrap the browser/page operations inside a try/finally so browser.close() always runs: after calling sync_playwright() and creating browser and page (symbols: sync_playwright, p.chromium.launch, browser, page), put navigation/page interactions (page.goto, page.screenshot, page.click, page.fill, page.wait_for_timeout) inside a try block and call browser.close() in the finally block; ensure screenshots (output_dir) and any pending actions are flushed before closing by keeping the final page.screenshot call in the try and leaving only browser.close() in finally.

coderabbitai · 2026-02-01T12:35:26Z

app.py

+@api_router.put("/llm-models/{name}/default")
+async def set_default_llm_model(name: str) -> dict[str, str]:
+    """set default llm model"""
+    try:
+        await llm_config_manager.set_default_llm_model(name)
+        return {"message": "llm model set as default successfully"}
+    except LLMConfigNotFoundError as e:
+        raise HTTPException(status_code=404, detail=e.message)
+    except Exception as e:
+        logger.exception(f"failed to set default llm model {name}")
+        raise HTTPException(status_code=400, detail=str(e))


⚠️ Potential issue | 🟡 Minor

Document the new default-model endpoints in backend state docs.
Please update llm/state-backend.md to capture these endpoints and default-model behavior.

As per coding guidelines: **/.py: Apply backend code review checklist from llm/rules-backend.md: Identify which llm/.md files need updates: - New API endpoints → update llm/state-backend.md; Identify if the docs needs updates.

Also applies to: 769-779

🧰 Tools

🪛 Ruff (0.14.14)

[warning] 706-706: Consider moving this statement to an else block

(TRY300)

[warning] 708-708: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

[warning] 709-709: Do not catch blind exception: Exception

(BLE001)

[warning] 711-711: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🤖 Prompt for AI Agents

In `@app.py` around lines 701 - 711, Add documentation for the new default-model endpoints to llm/state-backend.md: describe the PUT /llm-models/{name}/default endpoint implemented by set_default_llm_model (which calls llm_config_manager.set_default_llm_model), the expected request/response shape, HTTP status codes (200 success, 404 LLMConfigNotFoundError, 400 other errors), and the semantics of "default model" behavior; also scan nearby endpoints (e.g., the other default-model related handlers around lines 769-779) and update any related llm/*.md sections to satisfy the backend code review checklist from llm/rules-backend.md.

coderabbitai · 2026-02-01T12:35:27Z

Makefile

@@ -1,4 +1,4 @@
-.PHONY: check-deps install dev dev-ui dev-backend run-dev build-ui run mock-llm clean lint format lint-frontend format-frontend format-all lint-all typecheck typecheck-frontend typecheck-all test test-integration pre-merge setup
+.PHONY: check-deps install dev dev-ui dev-backend run-dev build-ui run mock-llm clean lint format lint-frontend format-frontend format-all lint-all typecheck typecheck-frontend typecheck-all test test-integration test-e2e test-e2e-ui pre-merge setup


⚠️ Potential issue | 🟡 Minor

Add all to .PHONY to avoid file conflicts.
Ensures a file named all doesn’t shadow the target.

Suggested fix

-.PHONY: check-deps install dev dev-ui dev-backend run-dev build-ui run mock-llm clean lint format lint-frontend format-frontend format-all lint-all typecheck typecheck-frontend typecheck-all test test-integration test-e2e test-e2e-ui pre-merge setup +.PHONY: check-deps install dev dev-ui dev-backend run-dev build-ui run mock-llm clean lint format lint-frontend format-frontend format-all lint-all typecheck typecheck-frontend typecheck-all test test-integration test-e2e test-e2e-ui pre-merge setup all

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

.PHONY: check-deps install dev dev-ui dev-backend run-dev build-ui run mock-llm clean lint format lint-frontend format-frontend format-all lint-all typecheck typecheck-frontend typecheck-all test test-integration test-e2e test-e2e-ui pre-merge setup

.PHONY: check-deps install dev dev-ui dev-backend run-dev build-ui run mock-llm clean lint format lint-frontend format-frontend format-all lint-all typecheck typecheck-frontend typecheck-all test test-integration test-e2e test-e2e-ui pre-merge setup all

🧰 Tools

🪛 checkmake (0.2.2)

[warning] 1-1: Missing required phony target "all"

(minphony)

🤖 Prompt for AI Agents

In `@Makefile` at line 1, The .PHONY declaration currently lists many targets but omits "all", which can cause a file named "all" to shadow the make target; update the .PHONY line (the .PHONY declaration containing targets like check-deps, install, dev, etc.) to include the "all" target so Make treats "all" as phony and avoids file conflicts.

coderabbitai · 2026-02-01T12:35:27Z

pyproject.toml


 [tool.pytest.ini_options]
-addopts = "-m 'not integration' --tb=short --ignore=scripts/"
+addopts = "-m 'not integration' --tb=short --ignore=scripts/ --ignore=tests/e2e/"


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Search for CI workflows that might run e2e tests find .github/workflows -type f -name "*.yml" -o -name "*.yaml" 2>/dev/null | head -20

Repository: nicofretti/DataGenFlow

Length of output: 97

🏁 Script executed:

#!/bin/bash # Search for e2e test execution in CI files rg -i "e2e|end.*to.*end|playwright|tests/e2e" --type yaml --type yml

Repository: nicofretti/DataGenFlow

Length of output: 93

🏁 Script executed:

#!/bin/bash # Search for test-related scripts rg -i "e2e|test.*all|integration" --type sh --type py scripts/ 2>/dev/null | head -30

Repository: nicofretti/DataGenFlow

Length of output: 249

🏁 Script executed:

#!/bin/bash # Check the current state of pyproject.toml at line 85 sed -n '80,90p' pyproject.toml

Repository: nicofretti/DataGenFlow

Length of output: 433

🏁 Script executed:

#!/bin/bash # Read the pre-merge CI workflow cat .github/workflows/pre-merge.yml

Repository: nicofretti/DataGenFlow

Length of output: 1214

🏁 Script executed:

#!/bin/bash # Search for pytest invocations that might override or run e2e tests specifically rg "pytest|test" --type yaml . | grep -i "e2e\|override\|pytest.*tests"

Repository: nicofretti/DataGenFlow

Length of output: 48

🏁 Script executed:

#!/bin/bash # Check if with_server.py is called by any CI or test script rg "with_server|test_mock_providers" --type yaml --type py

Repository: nicofretti/DataGenFlow

Length of output: 401

🏁 Script executed:

#!/bin/bash # Find and read Makefile find . -maxdepth 2 -name "Makefile" -o -name "makefile" | head -5

Repository: nicofretti/DataGenFlow

Length of output: 76

🏁 Script executed:

#!/bin/bash # Check Makefile content - specifically the test target rg "^test:" -A 10 Makefile

Repository: nicofretti/DataGenFlow

Length of output: 228

🏁 Script executed:

#!/bin/bash # Look for all test-related make targets rg "^[a-z-]+.*test" Makefile

Repository: nicofretti/DataGenFlow

Length of output: 116

E2E tests are skipped in CI — add make test-e2e to pre-merge checks.

The pytest addopts now ignores tests/e2e/, and the CI workflow only runs make test (which uses those options). Although make test-e2e exists and calls ./tests/e2e/run_all_tests.sh, it is not invoked by the pre-merge CI job. This means e2e test failures won't be caught on PRs.

Add test-e2e to the pre-merge target in the Makefile or call it separately in .github/workflows/pre-merge.yml.

🤖 Prompt for AI Agents

In `@pyproject.toml` at line 85, The CI is skipping E2E tests because pyproject.toml's addopts ignores tests/e2e and the pre-merge CI job only runs the Makefile's test target; update CI to run E2E by adding the existing Makefile target test-e2e (which calls ./tests/e2e/run_all_tests.sh) to the pre-merge workflow or add an explicit step in .github/workflows/pre-merge.yml to run `make test-e2e` so E2E coverage is executed on PRs.

coderabbitai · 2026-02-01T12:35:27Z

tests/e2e/README.md

+follow the webapp-testing skill patterns:
+


⚠️ Potential issue | 🟡 Minor

Fix minor terminology in “Writing New Tests”.
Consider “web app testing” (or “web-app testing”) instead of “webapp-testing.”

Suggested edit

-follow the webapp-testing skill patterns: +follow the web app testing skill patterns:

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

follow the webapp-testing skill patterns:

follow the web app testing skill patterns:

🧰 Tools

🪛 LanguageTool

[grammar] ~204-~204: Ensure spelling is correct
Context: ...s ``` ## Writing New Tests follow the webapp-testing skill patterns: 1. **wait for n...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🤖 Prompt for AI Agents

In `@tests/e2e/README.md` around lines 204 - 205, In the "Writing New Tests" section replace the term "webapp-testing" with the preferred phrase "web app testing" (or "web-app testing") to correct terminology; locate occurrences of the exact token "webapp-testing" in the README (e.g., the line under the "Writing New Tests" heading) and update them to "web app testing" consistently throughout the document.

coderabbitai · 2026-02-01T12:35:27Z

tests/integration/test_data_augmentation.py

+        pipeline_id = await storage.save_pipeline("test_sampler", json.dumps(pipeline_def))
+        assert pipeline_id > 0
+        pipeline = Pipeline("test_sampler", pipeline_def["blocks"])


⚠️ Potential issue | 🟠 Major

Double JSON serialization bug.

save_pipeline internally calls json.dumps(definition) (see lib/storage.py:459-469). Passing json.dumps(pipeline_def) here will double-serialize, storing escaped JSON strings instead of proper JSON objects.

🐛 Fix: Pass dict directly

- pipeline_id = await storage.save_pipeline("test_sampler", json.dumps(pipeline_def)) + pipeline_id = await storage.save_pipeline("test_sampler", pipeline_def)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

pipeline_id = await storage.save_pipeline("test_sampler", json.dumps(pipeline_def))

assert pipeline_id > 0

pipeline = Pipeline("test_sampler", pipeline_def["blocks"])

pipeline_id = await storage.save_pipeline("test_sampler", pipeline_def)

assert pipeline_id > 0

pipeline = Pipeline("test_sampler", pipeline_def["blocks"])

🤖 Prompt for AI Agents

In `@tests/integration/test_data_augmentation.py` around lines 192 - 194, The test is double-serializing pipeline_def because it calls storage.save_pipeline with json.dumps(pipeline_def); change the call to pass the dict directly (pipeline_def) so save_pipeline can perform the single json.dumps internally; update the line that constructs the Pipeline instance remains as Pipeline("test_sampler", pipeline_def["blocks"]) and ensure you only remove the extra json.dumps wrapper around pipeline_def when invoking save_pipeline.

coderabbitai · 2026-02-01T12:35:27Z

tests/integration/test_data_augmentation.py

+        pipeline_id = await storage.save_pipeline("test_no_embedding", json.dumps(pipeline_def))
+        assert pipeline_id > 0
+        pipeline = Pipeline("test_no_embedding", pipeline_def["blocks"])


⚠️ Potential issue | 🟠 Major

Same double-serialization issue.

Apply the same fix here.

🐛 Fix

- pipeline_id = await storage.save_pipeline("test_no_embedding", json.dumps(pipeline_def)) + pipeline_id = await storage.save_pipeline("test_no_embedding", pipeline_def)

🤖 Prompt for AI Agents

In `@tests/integration/test_data_augmentation.py` around lines 295 - 297, The test currently double-serializes the pipeline by calling storage.save_pipeline with json.dumps(pipeline_def); change the call to pass the raw dict (pipeline_def) to storage.save_pipeline so it isn't serialized twice, keeping the subsequent Pipeline("test_no_embedding", pipeline_def["blocks"]) as-is; update the use of storage.save_pipeline(...) accordingly.

coderabbitai

Actionable comments posted: 4

🤖 Fix all issues with AI agents

In `@CHANGELOG.md`:
- Around line 10-11: The phrasing "first configured model" in the "Default model
selection" changelog entry is ambiguous; update the sentence to explicitly
define the ordering rule (e.g., "the earliest created model" or "the first model
in the Settings UI list as displayed") and state which attribute is used
(creation timestamp or list position). Reference the changelog entry title
"Default model selection" and the `is_default` flag so readers know the fallback
deterministically uses that defined ordering.

In `@frontend/src/pages/Settings.tsx`:
- Around line 236-252: The model card Box (key={model.name}) is clickable but
not keyboard-accessible; make it focusable and operable by adding tabIndex={0}
(so it can receive focus), role="button" and an onKeyDown handler that calls
handleSetDefaultLlm(model.name) when Enter or Space is pressed (respecting the
isDefault guard so default cards remain inert). Also add appropriate ARIA state
attributes such as aria-pressed or aria-current (or aria-disabled when
isDefault) to convey selection to assistive tech; apply the same changes to the
other card block around lines 399-415.
- Around line 132-152: Update the llm/state-frontend.md documentation to
describe the new Settings flow: explain the ability to select default models and
the new API integration patterns used by the frontend (reference the handler
functions handleSetDefaultLlm and handleSetDefaultEmbedding and the llmConfigApi
methods setDefaultLLMModel and setDefaultEmbeddingModel), and document the state
update calls loadLlmModels and loadEmbeddingModels that refresh UI state after
API calls; include example UX steps for selecting defaults and note expected
success/error handling (toast notifications) so the docs match the updated
Settings.tsx behavior.
- Around line 132-151: In the catch blocks of handleSetDefaultLlm and
handleSetDefaultEmbedding, log the caught error to the console in addition to
showing the toast; capture the error variable (use the existing error) and call
console.error with a descriptive message and the error object (e.g., "Failed to
set default LLM model" / "Failed to set default embedding model") so that
llmConfigApi.setDefaultLLMModel and llmConfigApi.setDefaultEmbeddingModel
failures are recorded for debugging.

coderabbitai · 2026-02-01T12:49:36Z

CHANGELOG.md

+### Added
+- Default model selection: Users can designate a default LLM and embedding model via the Settings UI. Both model types support `is_default` flag with automatic fallback to the first configured model.


⚠️ Potential issue | 🟡 Minor

Clarify “first configured model” ordering.
“First configured” is ambiguous (creation time? list order? name sort?). Please specify the ordering criteria to avoid user confusion.

🤖 Prompt for AI Agents

In `@CHANGELOG.md` around lines 10 - 11, The phrasing "first configured model" in the "Default model selection" changelog entry is ambiguous; update the sentence to explicitly define the ordering rule (e.g., "the earliest created model" or "the first model in the Settings UI list as displayed") and state which attribute is used (creation timestamp or list position). Reference the changelog entry title "Default model selection" and the `is_default` flag so readers know the fallback deterministically uses that defined ordering.

coderabbitai · 2026-02-01T12:49:36Z

frontend/src/pages/Settings.tsx

+  const handleSetDefaultLlm = async (name: string) => {
+    try {
+      await llmConfigApi.setDefaultLLMModel(name);
+      toast.success("Default LLM model updated");
+      loadLlmModels();
+    } catch (error) {
+      const message = error instanceof Error ? error.message : "Unknown error";
+      toast.error(`Failed to set default LLM model: ${message}`);
+    }
+  };
+
+  const handleSetDefaultEmbedding = async (name: string) => {
+    try {
+      await llmConfigApi.setDefaultEmbeddingModel(name);
+      toast.success("Default embedding model updated");
+      loadEmbeddingModels();
+    } catch (error) {
+      const message = error instanceof Error ? error.message : "Unknown error";
+      toast.error(`Failed to set default embedding model: ${message}`);
+    }
+  };


⚠️ Potential issue | 🟡 Minor

Update frontend state docs for the new Settings flow.

The Settings page now supports default-model selection and new API integration patterns. Please reflect this in llm/state-frontend.md.

Based on learnings: Applies to llm/state-frontend.md : Update llm/state-frontend.md when API integration patterns change. Based on learnings: Applies to llm/state-frontend.md : Update llm/state-frontend.md when UI flow changes.

🤖 Prompt for AI Agents

In `@frontend/src/pages/Settings.tsx` around lines 132 - 152, Update the llm/state-frontend.md documentation to describe the new Settings flow: explain the ability to select default models and the new API integration patterns used by the frontend (reference the handler functions handleSetDefaultLlm and handleSetDefaultEmbedding and the llmConfigApi methods setDefaultLLMModel and setDefaultEmbeddingModel), and document the state update calls loadLlmModels and loadEmbeddingModels that refresh UI state after API calls; include example UX steps for selecting defaults and note expected success/error handling (toast notifications) so the docs match the updated Settings.tsx behavior.

coderabbitai · 2026-02-01T12:49:37Z

frontend/src/pages/Settings.tsx

+  const handleSetDefaultLlm = async (name: string) => {
+    try {
+      await llmConfigApi.setDefaultLLMModel(name);
+      toast.success("Default LLM model updated");
+      loadLlmModels();
+    } catch (error) {
+      const message = error instanceof Error ? error.message : "Unknown error";
+      toast.error(`Failed to set default LLM model: ${message}`);
+    }
+  };
+
+  const handleSetDefaultEmbedding = async (name: string) => {
+    try {
+      await llmConfigApi.setDefaultEmbeddingModel(name);
+      toast.success("Default embedding model updated");
+      loadEmbeddingModels();
+    } catch (error) {
+      const message = error instanceof Error ? error.message : "Unknown error";
+      toast.error(`Failed to set default embedding model: ${message}`);
+    }


⚠️ Potential issue | 🟡 Minor

Log default-setting failures to console.

The toast is user-facing, but the error isn’t logged, which makes debugging harder. Add a console.error alongside the toast.

Proposed fix

const handleSetDefaultLlm = async (name: string) => { try { await llmConfigApi.setDefaultLLMModel(name); toast.success("Default LLM model updated"); loadLlmModels(); } catch (error) { const message = error instanceof Error ? error.message : "Unknown error"; + console.error("Failed to set default LLM model:", error); toast.error(`Failed to set default LLM model: ${message}`); } }; const handleSetDefaultEmbedding = async (name: string) => { try { await llmConfigApi.setDefaultEmbeddingModel(name); toast.success("Default embedding model updated"); loadEmbeddingModels(); } catch (error) { const message = error instanceof Error ? error.message : "Unknown error"; + console.error("Failed to set default embedding model:", error); toast.error(`Failed to set default embedding model: ${message}`); } };

As per coding guidelines: Log errors to console in frontend - ensure errors are logged for debugging.

🤖 Prompt for AI Agents

In `@frontend/src/pages/Settings.tsx` around lines 132 - 151, In the catch blocks of handleSetDefaultLlm and handleSetDefaultEmbedding, log the caught error to the console in addition to showing the toast; capture the error variable (use the existing error) and call console.error with a descriptive message and the error object (e.g., "Failed to set default LLM model" / "Failed to set default embedding model") so that llmConfigApi.setDefaultLLMModel and llmConfigApi.setDefaultEmbeddingModel failures are recorded for debugging.

coderabbitai · 2026-02-01T12:49:37Z

frontend/src/pages/Settings.tsx

+                <Box
+                  key={model.name}
+                  onClick={() => !isDefault && handleSetDefaultLlm(model.name)}
+                  sx={{
+                    p: 3,
+                    border: "1px solid",
+                    borderColor: isDefault ? "success.emphasis" : "border.default",
+                    borderRadius: 2,
+                    bg: isDefault ? "success.subtle" : "canvas.subtle",
+                    cursor: isDefault ? "default" : "pointer",
+                    transition: "all 0.2s",
+                    "&:hover": {
+                      borderColor: isDefault ? "success.emphasis" : "accent.emphasis",
+                      transform: isDefault ? "none" : "translateY(-2px)",
+                      boxShadow: isDefault ? "none" : "shadow.medium",
+                    },
+                  }}


⚠️ Potential issue | 🟠 Major

Make default-selection cards keyboard accessible.

The cards are clickable but not focusable or operable via keyboard, which blocks non-mouse users from setting defaults.

Proposed fix

<Box key={model.name} onClick={() => !isDefault && handleSetDefaultLlm(model.name)} + role="button" + tabIndex={isDefault ? -1 : 0} + aria-pressed={isDefault} + onKeyDown={(e: React.KeyboardEvent) => { + if (!isDefault && (e.key === "Enter" || e.key === " ")) { + e.preventDefault(); + handleSetDefaultLlm(model.name); + } + }} sx={{ p: 3, border: "1px solid", borderColor: isDefault ? "success.emphasis" : "border.default",

<Box key={model.name} onClick={() => !isDefault && handleSetDefaultEmbedding(model.name)} + role="button" + tabIndex={isDefault ? -1 : 0} + aria-pressed={isDefault} + onKeyDown={(e: React.KeyboardEvent) => { + if (!isDefault && (e.key === "Enter" || e.key === " ")) { + e.preventDefault(); + handleSetDefaultEmbedding(model.name); + } + }} sx={{ p: 3, border: "1px solid", borderColor: isDefault ? "success.emphasis" : "border.default",

Also applies to: 399-415

🤖 Prompt for AI Agents

In `@frontend/src/pages/Settings.tsx` around lines 236 - 252, The model card Box (key={model.name}) is clickable but not keyboard-accessible; make it focusable and operable by adding tabIndex={0} (so it can receive focus), role="button" and an onKeyDown handler that calls handleSetDefaultLlm(model.name) when Enter or Space is pressed (respecting the isDefault guard so default cards remain inert). Also add appropriate ARIA state attributes such as aria-pressed or aria-current (or aria-disabled when isDefault) to convey selection to assistive tech; apply the same changes to the other card block around lines 399-415.

nicofretti and others added 9 commits January 20, 2026 23:37

add: coderabbit

2ff4703

🚀 Feature: add database support for default models

0f563f8

🚀 Feature: expose default model selection endpoints

623648e

🚀 Feature: UI for default model selection

01e748a

test: integration tests for default model selection

2fe7bab

📚 Docs: updated files inside llm/

111aef6

📐 Refactor: ran make format and pre-merge

d24ea7f

LedjoLleshaj requested a review from nicofretti as a code owner January 31, 2026 17:53

coderabbitai bot reviewed Jan 31, 2026

View reviewed changes

frontend/src/pages/Settings.tsx Outdated Show resolved Hide resolved

lib/llm_config.py Outdated Show resolved Hide resolved

lib/llm_config.py Outdated Show resolved Hide resolved

tests/integration/test_default_model_selection_integration.py Show resolved Hide resolved

LedjoLleshaj added 8 commits February 1, 2026 13:19

🚀 Feature: add database support for default models

a0219c5

🚀 Feature: expose default model selection endpoints

43ca5bc

🚀 Feature: UI for default model selection

07b537c

test: integration tests for default model selection

a2ac623

📚 Docs: updated files inside llm/

b4505ab

📐 Refactor: ran make format and pre-merge

ffead9f

Merge branch 'feature-model-selection' of https://github.com/LedjoLle…

e304733

…shaj/DataGenFlow into feature-model-selection

coderabbitai bot requested changes Feb 1, 2026

View reviewed changes

unit tests and small format fixes

4170eb4

LedjoLleshaj closed this Feb 1, 2026

coderabbitai bot requested changes Feb 1, 2026

View reviewed changes

	category = "generators" # generators\|transformers\|validators\|utilities
	category = "generators" # generators\|validators\|metrics\|seeders\|multipliers\|observability\|utilities

		@@ -1,4 +1,4 @@
		.PHONY: check-deps install dev dev-ui dev-backend run-dev build-ui run mock-llm clean lint format lint-frontend format-frontend format-all lint-all typecheck typecheck-frontend typecheck-all test test-integration pre-merge setup
		.PHONY: check-deps install dev dev-ui dev-backend run-dev build-ui run mock-llm clean lint format lint-frontend format-frontend format-all lint-all typecheck typecheck-frontend typecheck-all test test-integration test-e2e test-e2e-ui pre-merge setup

	follow the webapp-testing skill patterns:
	follow the web app testing skill patterns:

		### Added
		- Default model selection: Users can designate a default LLM and embedding model via the Settings UI. Both model types support `is_default` flag with automatic fallback to the first configured model.

Conversation

LedjoLleshaj commented Jan 31, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issue

Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

LedjoLleshaj commented Jan 31, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 31, 2026 •

edited

Loading