Skip to content

🚀 Feat: Model selection + e2e tests#61

Closed
LedjoLleshaj wants to merge 18 commits intonicofretti:mainfrom
LedjoLleshaj:feature-model-selection
Closed

🚀 Feat: Model selection + e2e tests#61
LedjoLleshaj wants to merge 18 commits intonicofretti:mainfrom
LedjoLleshaj:feature-model-selection

Conversation

@LedjoLleshaj
Copy link
Contributor

@LedjoLleshaj LedjoLleshaj commented Jan 31, 2026

🚀 Feat: Model selection + e2e tests

Description

  • Added option to select/switch between the models in the settings view.
  • Implemented default reassignment: deleting the current default model automatically promotes another model to default.
  • Added integration tests covering auto-default and reassignment scenarios.

Related Issue

Checklist

  • Code follows project style guidelines
  • Comments explain "why" not "what"
  • Documentation updated (if needed)
  • No debug code or console statements
  • make format passes
  • make pre-merge passes
  • PR update from develop branch
  • Copilot review run and addressed

Summary by CodeRabbit

  • New Features

    • API endpoints to mark default LLM and embedding models.
    • Settings UI: clickable model cards with default badge and inline actions.
    • Pipeline editor & block config: JSON↔template toggle and model dropdowns preserving custom names.
    • New data-augmentation template and three built-in blocks for sampling, semantic infill, and duplicate removal.
    • Generator UI: improved Markdown vs JSON mode handling.
  • Documentation

    • Docs and guides updated for defaults, blocks, templates, and e2e testing.
  • Tests

    • Extensive unit, integration, and e2e test additions.
  • Chores

    • Added e2e Make targets and CI/test helpers.

✏️ Tip: You can customize this high-level summary in your review settings.

nicofretti and others added 9 commits January 20, 2026 23:37
…abbit (nicofretti#57)

- Adds a data augmentation pipeline template via three new blocks (StructureSampler, SemanticInfiller, DuplicateRemover)
- Enables JSON-or-template field configuration across blocks
- Implements E2E testing infrastructure with Playwright
- Updates frontend UI for model selection and template editing
- Integrates CodeRabbit for automated code review.
Fixed frontend display bug where models named "default" incorrectly overrode explicit default settings.
Updated lib/storage.py to strictly enforce a single default model constraint.
Implemented auto-default logic: the first model created is automatically set as default.
Implemented default reassignment: deleting the current default model automatically promotes another model to default.
Added integration tests covering auto-default and reassignment scenarios.
@coderabbitai
Copy link

coderabbitai bot commented Jan 31, 2026

Walkthrough

Adds default-model selection end-to-end: DB + storage changes to track a single default, LLMConfigManager methods and API PUT endpoints to set defaults, frontend Settings UI to mark defaults, updated lookup fallbacks to prefer is_default, pipeline metadata tweak (first_block_type), multiple new blocks/utilities, and extensive tests and e2e tooling.

Changes

Cohort / File(s) Summary
API & App
app.py
Added PUT endpoints PUT /llm-models/{name}/default and PUT /embedding-models/{name}/default; exposes first_block_type in pipeline output; imports LLMConfigError/not-found handling.
Manager / Logic
lib/llm_config.py, lib/entities/llm_config.py
Added is_default on model entities; lookup fallback updated to prefer is_default=True then legacy "default" name; added set_default_llm_model() and set_default_embedding_model() on manager.
Storage & Migrations
lib/storage.py, migrations...
Schema adds is_default columns; transactional upsert/delete flows enforce single default; added set_default_* methods; reads/writes populate is_default.
Frontend — Settings & API client
frontend/src/pages/Settings.tsx, frontend/src/services/llmConfigApi.ts, frontend/src/types/index.ts
Settings UI: clickable cards to mark defaults, visual badge/state; added client methods setDefaultLLMModel/setDefaultEmbeddingModel; types include optional is_default.
New Blocks & Template Utilities
lib/blocks/builtin/..., lib/blocks/commons/template_utils.py, lib/blocks/config.py, lib/template_renderer.py
Added StructureSampler, SemanticInfiller, DuplicateRemover and multiple block config enhancements; introduced json-or-template helpers, robust LLM JSON parsing, and _config_formats plumbing.
Block changes (validation/mapping/metrics/etc.)
lib/blocks/builtin/field_mapper.py, json_validator.py, ragas_metrics.py, structured_generator.py, validator.py
Multiple blocks now accept json-or-template inputs, validate rendered JSON, and expose template-based config fields; signatures updated accordingly.
Pipeline runtime
lib/workflow.py, lib/templates/data_augmentation.yaml
Filter outputs to remove initial input metadata from saved/returned results; added data_augmentation template wiring new blocks.
Tests — unit/integration/e2e
tests/blocks/*, tests/integration/*, tests/e2e/*, tests/test_api.py
Extensive new tests for default-model behavior, new blocks, template utils, and multiple e2e suites; API tests for default endpoints.
Docs, tooling & CI
llm/state-*.md, docs/*, .claude/*, .coderabbit.yaml, Makefile, pyproject.toml, .gitignore
Documentation updates for new blocks and endpoints, new skills/docs, Makefile targets for e2e, dependency updates, and .gitignore adjustments.
Dev scripts & helpers
scripts/with_server.py, scripts/inspect_db_configs.py, tests e2e helpers
New server lifecycle and DB inspection scripts; e2e helpers and orchestration scripts added.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Frontend
    participant API
    participant Manager
    participant Storage

    User->>Frontend: click "Set default" on model card
    Frontend->>API: PUT /llm-models/{name}/default
    API->>Manager: set_default_llm_model(name)
    Manager->>Storage: set_default_llm_model(name)
    Storage->>Storage: BEGIN TRANSACTION
    Storage->>Storage: UPDATE llm_models SET is_default=0
    Storage->>Storage: UPDATE llm_models SET is_default=1 WHERE name={name}
    Storage->>Storage: COMMIT
    Storage-->>Manager: OK
    Manager-->>API: success
    API-->>Frontend: 200 OK (message)
    Frontend->>API: GET /llm-models
    API->>Manager: list_llm_models()
    Manager->>Storage: list_llm_models()
    Storage-->>Manager: rows (with is_default)
    Manager-->>API: model list
    API-->>Frontend: updated list (default badge)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 3 | ❌ 2
❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Out of Scope Changes check ⚠️ Warning Scope creep detected: PR includes major new blocks (StructureSampler, SemanticInfiller, DuplicateRemover), data augmentation template, extensive documentation/CLI utilities, and schema changes beyond model selection. Split PR into focused commits: (1) model selection core, (2) new blocks & data augmentation, (3) testing infrastructure. Align scope with issue #60 requirements.
Docstring Coverage ⚠️ Warning Docstring coverage is 50.93% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed Title clearly summarizes the main changes: adding model selection UI/API and end-to-end tests.
Description check ✅ Passed Description covers main objectives and references the linked issue, but checklist items are incomplete and should be marked as done or removed.
Linked Issues check ✅ Passed All objectives from issue #60 are addressed: UI for model selection in Settings, backend endpoints for default-model switching, auto-default reassignment, and integration tests.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Fix all issues with AI agents
In `@app.py`:
- Around line 700-710: Add integration tests using TestClient that cover the new
endpoints PUT /llm-models/{name}/default and PUT
/embedding-models/{name}/default for both success and failure paths: call the
app endpoints (following patterns in test_api.py) and assert 200 success
responses on normal behavior; mock llm_config_manager.set_default_llm_model and
embedding_config_manager.set_default_embedding_model to raise
LLMConfigNotFoundError and assert the HTTP response is 404 with the error
detail; also mock them to raise a generic Exception and assert the response is
400 and contains the exception message; ensure tests reference the handler
functions by endpoint paths and use the same import/mocking approach used
elsewhere in tests.
- Around line 700-710: Replace the broad except Exception in the
set_default_llm_model endpoint with a specific except LLMConfigError handler
(LLMConfigError is the base for LLMConfigNotFoundError) so you catch
storage/manager errors without swallowing unrelated exceptions; keep logging via
logger.exception(f"failed to set default llm model {name}") and raise
HTTPException(status_code=400, detail=e.message) from e to preserve the message
format; make the analogous change in the embedding default endpoint (the
function that sets embedding models default) using the same LLMConfigError
pattern and message handling; finally add unit tests that exercise both
/llm-models/{name}/default and /embedding-models/{name}/default for success,
not-found (LLMConfigNotFoundError), and generic manager/storage error
(LLMConfigError) to ensure response codes and detail strings match expectations.

In `@frontend/src/pages/Settings.tsx`:
- Around line 396-397: The embedding list uses a legacy fallback causing
inconsistent default detection: inside the embeddingModels.map callback (where
you set const isDefault = model.is_default || model.name === "default";) remove
the legacy check so it matches the LLM logic (use only model.is_default). Update
that line to compute isDefault solely from model.is_default to ensure
backend-controlled defaults are honored.

In `@lib/llm_config.py`:
- Around line 135-141: Remove the duplicated "fallback chain:" line in the
docstring (lines 135-136) so the docstring reads the fallback sequence only
once; mirror the same fix applied in get_llm_model’s docstring to keep wording
consistent and ensure the list (1.–4.) follows a single "fallback chain:"
header.
- Around line 42-48: Remove the duplicated docstring line describing the
fallback chain in lib/llm_config.py so the description only appears once; locate
the module-level docstring (the block explaining the fallback chain: "uses
fallback chain to ensure blocks always have a model available:") and delete the
repeated occurrence so the list of fallback steps remains correct and not
duplicated.

In `@tests/integration/test_default_model_selection_integration.py`:
- Around line 143-150: Replace the broad Exception in the
test_set_nonexistent_default_raises_error with the specific
LLMConfigNotFoundError: update the pytest.raises(...) to
pytest.raises(LLMConfigNotFoundError) and add/import LLMConfigNotFoundError at
the top of the test module so the assertion targets the exact error raised by
llm_config_manager.set_default_llm_model("non_existent_model").
🧹 Nitpick comments (2)
tests/integration/test_default_model_selection_integration.py (1)

31-38: Consider adding test for legacy "default" name fallback.

The docstring mentions "Fallback to 'default' named model (legacy support)" but the test doesn't explicitly verify this behavior. The current tests only check is_default=True fallback.

lib/storage.py (1)

707-709: Consider adding logging before rollback for debugging.

The transaction rollback blocks re-raise exceptions but don't log them at the storage layer. While the caller may log, adding logger.exception() here would preserve context closer to the failure point.

♻️ Example for save_llm_model (apply similar pattern to others)
             except Exception:
+                logger.exception("transaction failed during save_llm_model")
                 await db.execute("ROLLBACK")
                 raise

As per coding guidelines: "Use logger.exception() in backend — preserves stack traces in error logs."

Also applies to: 736-738, 759-761, 848-850, 876-878, 901-903

Fixed frontend display bug where models named "default" incorrectly overrode explicit default settings.
Updated lib/storage.py to strictly enforce a single default model constraint.
Implemented auto-default logic: the first model created is automatically set as default.
Implemented default reassignment: deleting the current default model automatically promotes another model to default.
Added integration tests covering auto-default and reassignment scenarios.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 18

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
lib/blocks/builtin/ragas_metrics.py (1)

110-149: ⚠️ Potential issue | 🟠 Major

avoid mutating self.metrics inside execute.

Line 110 mutates instance state per-run; if block instances are reused concurrently, metrics can leak across executions. Keep parsed metrics local and pass them to helpers.

🔧 suggested fix
-        # store parsed metrics for use in other methods
-        self.metrics = metrics
+        metrics_list = metrics

@@
-            return {"ragas_scores": self._empty_scores()}
+            return {"ragas_scores": self._empty_scores(metrics_list)}

@@
-            metric_instances = self._build_metrics(llm, embeddings)
+            metric_instances = self._build_metrics(llm, embeddings, metrics_list)
-    def _empty_scores(self) -> dict[str, Any]:
+    def _empty_scores(self, metrics: list[str]) -> dict[str, Any]:
         """return empty scores with passed=False"""
-        return {metric: 0.0 for metric in self.metrics} | {"passed": False}
+        return {metric: 0.0 for metric in metrics} | {"passed": False}

-    def _build_metrics(self, llm: Any, embeddings: Any) -> dict[str, Any]:
+    def _build_metrics(
+        self, llm: Any, embeddings: Any, metrics: list[str]
+    ) -> dict[str, Any]:
         """build metric instances"""
@@
-        return {k: v for k, v in available.items() if k in self.metrics}
+        return {k: v for k, v in available.items() if k in metrics}
frontend/src/pages/Generator.tsx (1)

614-616: ⚠️ Potential issue | 🟡 Minor

Empty catch block for metadata parsing.

While this handles malformed JSON gracefully by returning null, adding a console.error would help debug production issues. As per coding guidelines, empty catch blocks should include logging.

🛡️ Add error logging
             } catch {
+              console.error("Failed to parse job metadata");
               return null;
             }
🤖 Fix all issues with AI agents
In @.claude/skills/address-pr-review/scripts/fetch_comments.py:
- Around line 23-31: The get_repo() function currently exits silently when the
gh repo view subprocess fails; update it to mirror fetch_comments() error
handling by checking result.returncode and printing or logging the subprocess
error output (result.stderr) and/or stdout before exiting with non-zero status.
Locate get_repo() and change the failure branch to emit a clear error message
including result.stderr (and result.stdout if helpful) and then call sys.exit(1)
so failures are visible for debugging; use the same pattern used in
fetch_comments() around the subprocess.run call for consistency.

In @.claude/skills/implementing-datagenflow-blocks/SKILL.md:
- Line 42: Update the category enumeration assigned to the variable `category`
in SKILL.md so it matches the canonical list used by the state backend: replace
the current value "generators|transformers|validators|utilities" with
"generators|validators|metrics|seeders|multipliers|observability|utilities"
(i.e., remove "transformers" and add "metrics", "seeders", "multipliers", and
"observability") so the SKILL `category` options align with the state-backend
definitions.
- Line 386: The private method _prepare_embedding_call is used across modules
(duplicate_remover.py, semantic_infiller.py, ragas_metrics.py); make it public
by adding a prepare_embedding_call() wrapper on the LLM config manager that
delegates to the existing _prepare_embedding_call (or rename the method and keep
a short-lived alias for backward compatibility), update all callers to invoke
prepare_embedding_call instead of _prepare_embedding_call, and ensure any
docstring/visibility tests mirror the existing prepare_llm_call pattern so
behavior and signatures remain identical.

In @.claude/skills/webapp-testing/examples/static_html_automation.py:
- Around line 13-35: Wrap the browser/page operations inside a try/finally so
browser.close() always runs: after calling sync_playwright() and creating
browser and page (symbols: sync_playwright, p.chromium.launch, browser, page),
put navigation/page interactions (page.goto, page.screenshot, page.click,
page.fill, page.wait_for_timeout) inside a try block and call browser.close() in
the finally block; ensure screenshots (output_dir) and any pending actions are
flushed before closing by keeping the final page.screenshot call in the try and
leaving only browser.close() in finally.

In `@app.py`:
- Around line 701-711: Add documentation for the new default-model endpoints to
llm/state-backend.md: describe the PUT /llm-models/{name}/default endpoint
implemented by set_default_llm_model (which calls
llm_config_manager.set_default_llm_model), the expected request/response shape,
HTTP status codes (200 success, 404 LLMConfigNotFoundError, 400 other errors),
and the semantics of "default model" behavior; also scan nearby endpoints (e.g.,
the other default-model related handlers around lines 769-779) and update any
related llm/*.md sections to satisfy the backend code review checklist from
llm/rules-backend.md.

In `@frontend/src/components/pipeline-editor/BlockConfigPanel.tsx`:
- Around line 85-119: The component BlockConfigPanel currently contains inline
fetchModels inside its useEffect and directly calls
setLlmModels/setEmbeddingModels; refactor by moving the fetch logic into a
service (e.g., add modelsService.getLlmModels and
modelsService.getEmbeddingModels or a single modelsService.fetchAllModels that
handles AbortController) and implement a custom hook (e.g., useModels or
useAvailableModels) that returns { llmModels, embeddingModels, loading, error,
refresh }. Replace the inline useEffect/fetchModels in BlockConfigPanel with
that hook, wire loading to the UI (spinner/disabled states) and surface errors
via your app's toast/notification mechanism when hook.error is set, and ensure
the hook supports aborting on unmount and exposes a refresh function for
retries.
- Around line 97-113: Replace the unsafe "any" usages in BlockConfigPanel: add a
type guard function (e.g., function isModel(obj: unknown): obj is { name: string
}) and use it to filter llmData/embeddingData before mapping to names passed to
setLlmModels and setEmbeddingModels (e.g., Array.isArray(llmData) ?
llmData.filter(isModel).map(m => m.name) : []). Also change the catch error
handling to use unknown and narrow it safely (e.g., catch (error: unknown) { if
(!(error instanceof DOMException && error.name === 'AbortError'))
console.error('Failed to fetch models:', error instanceof Error ? error :
String(error)); }) so you remove "m: any" and "error as any" while preserving
the AbortError check and logging.

In `@frontend/src/components/pipeline-editor/BlockNode.tsx`:
- Around line 108-117: The catch block in BlockNode.tsx that parses
config["fields_to_generate"] is currently empty and hides JSON parse errors;
update the catch to provide a fallback and record the failure by setting
displayValue to a clear fallback (e.g., "[invalid JSON]" or treat as template
string) and/or logging the error via the component logger or console, so
failures aren’t silent; specifically modify the try/catch around
JSON.parse(config[key]) in the fields_to_generate handling to catch the error,
call console.error or the component logging utility with the error and
key/context, and set displayValue to the chosen fallback (instead of leaving it
unchanged).

In `@lib/blocks/builtin/duplicate_remover.py`:
- Around line 217-253: The code filters out empty entries from sample_texts but
not from samples, breaking alignment and causing index errors in
_compute_similarities; update the logic after creating sample_texts so that you
filter samples in tandem (e.g., build pairs of (sample, text) via a
comprehension or zip, then split back into filtered_samples and sample_texts)
and pass filtered_samples and their corresponding sample_embeddings to
_compute_similarities instead of the original samples so indices remain aligned
(refer to _extract_text, sample_texts, samples, and _compute_similarities).

In `@lib/blocks/builtin/ragas_metrics.py`:
- Around line 86-109: After parsing metrics_template into metrics_list in the
method that uses render_template and assigns metrics, validate every entry
against the allowed set (e.g., a module-level or class constant like
ALLOWED_METRICS or self.allowed_metrics); if any items are not in that set,
raise BlockExecutionError with a clear message and detail that includes the
invalid metrics and the rendered value (similar to the existing JSON error
handling). Insert this check after the isinstance checks on metrics_list and
before assigning metrics = metrics_list so unknown metric names cause an
immediate BlockExecutionError instead of being silently ignored.

In `@lib/blocks/builtin/structure_sampler.py`:
- Around line 348-369: When parsing dependencies (the dependencies variable
populated via render_and_parse_json from self.dependencies_template), validate
that every dependency key references a declared categorical field in
self.categorical_fields before proceeding to _topological_sort; if any key is
missing, raise BlockExecutionError with a clear message and include the full
dependencies and categorical_fields in the detail so callers can see the
mismatch. Also consider validating that each listed parent in the dependency
values is a declared categorical field and raise similarly if not, to avoid
misleading circular-dependency errors or sampling with missing parents.

In `@lib/blocks/commons/template_utils.py`:
- Around line 66-96: The parse_llm_json_response function currently swallows
JSONDecodeError in three empty except blocks; capture each exception (e.g.,
"except json.JSONDecodeError as e") and either log it at debug level via the
module logger or store the last exception and include its message in the
BlockExecutionError detail; update the blocks around markdown_match and
json_match and the initial direct-parse try to reference the captured exception
(and optionally logger) so failures aren't silent, and ensure the final raise
includes the last exception info (or call logger.debug with field_name and
exception) to aid debugging.

In `@lib/workflow.py`:
- Around line 275-284: Document that multiplier pipelines run
_filter_output_data and remove any keys present in initial_data_keys (used to
strip template/config/seed fields) by updating llm/state-backend.md; explicitly
note this behavior is currently applied only for multiplier pipelines and either
justify the asymmetry or state whether the filter should also apply to normal
pipelines. Also add guidance recommending moving from the current deny-list
approach to an explicit allow-list of preserved output fields (or provide a
configurable whitelist) to avoid accidental removal of legitimate block outputs
(reference _filter_output_data and places where multiplier pipelines invoke it)
and include migration notes for blocks that emit keys like "samples".

In `@Makefile`:
- Line 1: The .PHONY declaration currently lists many targets but omits "all",
which can cause a file named "all" to shadow the make target; update the .PHONY
line (the .PHONY declaration containing targets like check-deps, install, dev,
etc.) to include the "all" target so Make treats "all" as phony and avoids file
conflicts.

In `@pyproject.toml`:
- Line 85: The CI is skipping E2E tests because pyproject.toml's addopts ignores
tests/e2e and the pre-merge CI job only runs the Makefile's test target; update
CI to run E2E by adding the existing Makefile target test-e2e (which calls
./tests/e2e/run_all_tests.sh) to the pre-merge workflow or add an explicit step
in .github/workflows/pre-merge.yml to run `make test-e2e` so E2E coverage is
executed on PRs.

In `@tests/e2e/README.md`:
- Around line 204-205: In the "Writing New Tests" section replace the term
"webapp-testing" with the preferred phrase "web app testing" (or "web-app
testing") to correct terminology; locate occurrences of the exact token
"webapp-testing" in the README (e.g., the line under the "Writing New Tests"
heading) and update them to "web app testing" consistently throughout the
document.

In `@tests/integration/test_data_augmentation.py`:
- Around line 295-297: The test currently double-serializes the pipeline by
calling storage.save_pipeline with json.dumps(pipeline_def); change the call to
pass the raw dict (pipeline_def) to storage.save_pipeline so it isn't serialized
twice, keeping the subsequent Pipeline("test_no_embedding",
pipeline_def["blocks"]) as-is; update the use of storage.save_pipeline(...)
accordingly.
- Around line 192-194: The test is double-serializing pipeline_def because it
calls storage.save_pipeline with json.dumps(pipeline_def); change the call to
pass the dict directly (pipeline_def) so save_pipeline can perform the single
json.dumps internally; update the line that constructs the Pipeline instance
remains as Pipeline("test_sampler", pipeline_def["blocks"]) and ensure you only
remove the extra json.dumps wrapper around pipeline_def when invoking
save_pipeline.
🧹 Nitpick comments (23)
scripts/inspect_db_configs.py (3)

50-50: Return value from main() is discarded.

asyncio.run(main()) at line 56 doesn't capture the returned tuple. Either remove the return statement or capture/use the result if needed for testing.

♻️ Proposed fix
-        return llm_models, embedding_models
     finally:
         await storage.close()

Or if the return is intentional for testing:

 if __name__ == "__main__":
-    asyncio.run(main())
+    llm_models, embedding_models = asyncio.run(main())
+    print(f"\nTotal: {len(llm_models)} LLM models, {len(embedding_models)} embedding models")

Also applies to: 55-56


17-17: Hardcoded database path.

Consider accepting the path as a CLI argument for flexibility, or document that data/qa_records.db is the expected location.

♻️ Proposed enhancement
+import argparse
+
 async def main():
-    storage = Storage("data/qa_records.db")
+    parser = argparse.ArgumentParser(description=__doc__)
+    parser.add_argument("--db", default="data/qa_records.db", help="path to database")
+    args = parser.parse_args()
+    storage = Storage(args.db)

30-34: Use public Storage methods instead of private API.

Storage exposes list_llm_models() and list_embedding_models() public methods that handle the database queries. Replace the private _execute_with_connection() calls (lines 30 and 44) with these public methods, which return properly typed objects instead of raw dicts.

Also remove the unused return statement at line 50—asyncio.run(main()) at line 56 discards the result.

.claude/skills/debugging-pipelines/SKILL.md (1)

90-92: Hardcoded line numbers may drift.

References like lib/workflow.py:85-224 and lib/workflow.py:305-449 will become stale as code evolves. Consider using function/method names or comments to anchor references instead.

tests/integration/conftest.py (1)

14-15: Consider fixture scope for e2e efficiency.

The fixture defaults to scope="function", creating a new DB per test. For e2e tests that don't mutate shared state, scope="module" or scope="session" could reduce overhead.

Example if appropriate
 `@pytest_asyncio.fixture`
+async def e2e_storage(scope="module"):
-async def e2e_storage():
.claude/skills/webapp-testing/scripts/with_server.py (2)

76-77: Add strict=True to zip for defense in depth.

While line 71-73 validates count match, adding strict=True prevents silent truncation if that check is ever bypassed or refactored.

Proposed fix
-    for cmd, port in zip(args.servers, args.ports):
+    for cmd, port in zip(args.servers, args.ports, strict=True):

28-32: Use OSError instead of socket.error.

socket.error is an alias for OSError since Python 3.3. Using OSError directly is more idiomatic.

Proposed fix
-        except (socket.error, ConnectionRefusedError):
+        except OSError:

Note: ConnectionRefusedError is a subclass of OSError, so catching OSError alone suffices.

scripts/with_server.py (1)

46-63: Fail fast if a server process exits during readiness polling.
Right now a crashed process just burns the full timeout; checking proc.poll() gives faster feedback and clearer failures.

💡 Suggested change
             while time.time() - start_time < self.max_wait:
+                if proc.poll() is not None:
+                    print(f"  server on port {port} exited (code {proc.returncode})")
+                    self.cleanup()
+                    sys.exit(1)
                 try:
                     with urllib.request.urlopen(url, timeout=2) as response:
                         if response.status == 200:
                             print(f"  server on port {port} is ready")
lib/workflow.py (1)

334-435: Consider aligning output filtering across pipeline types.
Filtering is now applied in the multiplier seed path, but normal pipelines still return full accumulated_state. If a consistent response shape is desired, consider applying the same filter in the non-multiplier path.

tests/test_template_renderer.py (3)

8-33: split conditionals test and rename for convention.

test_render_template_with_conditionals exercises two scenarios in one test. Prefer one behavior per test and use the test_<method>_<scenario>_<expected> naming pattern.

As per coding guidelines, One behavior per test; Test names: test_.


35-97: split multi-scenario tojson tests and align naming.

test_tojson_filter_nested_in_complex_template mixes defined/undefined cases; split into separate tests and rename to the standard pattern.

As per coding guidelines, One behavior per test; Test names: test_.


99-133: separate truncate scenarios and rename error tests.

test_truncate_filter covers short + long inputs in one test; split and rename to the required convention. Consider renaming the error tests (test_undefined_variable_without_filter, test_template_syntax_error) to match the same pattern.

As per coding guidelines, One behavior per test; Test names: test_.

lib/blocks/builtin/semantic_infiller.py (5)

174-182: Use exception chaining for better traceability.

When re-raising exceptions, chain with from e to preserve the original traceback.

Proposed fix
         try:
             response = await litellm.acompletion(**llm_params)
         except Exception as e:
             raise BlockExecutionError(
-                f"LLM call failed: {str(e)}",
+                f"LLM call failed: {e!s}",
                 detail={
                     "skeleton": skeleton,
                     "prompt_preview": prompt[:200],
                     "error": str(e),
                 },
-            )
+            ) from e

247-252: Add strict=True to zip for defensive iteration.

If embeddings and samples somehow get out of sync, this would silently truncate. Adding strict=True ensures length mismatches raise immediately.

Proposed fix
-        for emb, sample in zip(embeddings, samples):
+        for emb, sample in zip(embeddings, samples, strict=True):
             sim = self._cosine_similarity(target_embedding, emb)
             similarities.append((sim, sample))

268-270: Rename unused loop variable.

sim is unpacked but not used in the loop body.

Proposed fix
-        for sim, sample in similar_samples:
+        for _sim, sample in similar_samples:
             fields_str = json.dumps({f: sample.get(f, "") for f in fields_to_generate})
             negative_lines.append(f"  - {fields_str}")

332-338: Use exception chaining here as well.

Same issue as the other LLM call - chain with from e.

Proposed fix
             try:
                 response = await litellm.acompletion(**llm_params)
             except Exception as e:
                 raise BlockExecutionError(
-                    f"LLM call failed: {str(e)}",
+                    f"LLM call failed: {e!s}",
                     detail={"skeleton": skeleton, "attempt": attempt, "error": str(e)},
-                )
+                ) from e

389-390: Unreachable code after exhausted for-loop.

These lines execute only if the loop completes all iterations without returning. However, the loop always returns on the last iteration (attempt == max_diversity_retries) after exhausting retries. The variables result and embedding will be defined from the last iteration, so this is technically safe but misleading.

Consider adding a comment or restructuring to clarify intent.

tests/integration/test_data_augmentation_pipeline.py (1)

28-32: Monkey-patching global state works but consider fixture-based approach.

The manual save/restore of app.llm_config_manager is functional but fragile. Consider using unittest.mock.patch as a context manager or decorator for cleaner lifecycle management:

with patch.object(app, 'llm_config_manager', llm_config_manager):
    # test body

This eliminates the need for try/finally cleanup.

tests/blocks/test_semantic_infiller.py (1)

386-417: Unused mock_embedding parameter - add assertion or remove.

The mock_embedding mock is patched but never used or asserted. If the intent is to verify embeddings are NOT called when unavailable, add an assertion:

♻️ Option 1: Add assertion
         # should still work, just without diversity check
         assert "samples" in result
         assert len(result["samples"]) == 1
+        # embedding should not be called when model unavailable
+        mock_embedding.assert_not_called()

Alternatively, remove the mock from the decorator if not needed.

tests/e2e/test_review_e2e.py (2)

17-37: Missing setup/teardown fixture - inconsistent with other e2e tests.

test_pipelines_e2e.py and test_generator_e2e.py use a module-scoped _e2e_setup_teardown fixture that waits for server and cleans database. This file lacks that pattern, which could lead to test pollution or failures if server isn't ready.

♻️ Add fixture for consistency
+@pytest.fixture(scope="module", autouse=True)
+def _e2e_setup_teardown():
+    """setup and teardown for e2e tests"""
+    from test_helpers import cleanup_database, wait_for_server
+    if not wait_for_server():
+        pytest.skip("server not ready for e2e tests")
+    cleanup_database()
+    yield
+    cleanup_database()
+
+
 def test_review_page_loads():

40-68: Tests may pass vacuously when UI elements are missing.

Several tests (like test_select_job) check if len(selectors) > 0 but don't assert or skip when elements aren't found. The test passes without verifying anything meaningful.

Consider using pytest.skip() when preconditions aren't met:

♻️ Make preconditions explicit
         selectors = page.locator('select, [role="combobox"]').all()
 
-        if len(selectors) > 0:
+        if len(selectors) == 0:
+            browser.close()
+            pytest.skip("no job selector found on review page")
+
+        # click first selector
+        selectors[0].click()
tests/e2e/test_generator_e2e.py (2)

56-74: Inconsistent screenshot path handling vs test_pipelines_e2e.py.

This file uses hardcoded /tmp/ paths while test_pipelines_e2e.py uses pytest's tmp_path fixture. While acceptable per project guidelines, consider aligning patterns across e2e tests for maintainability.


115-163: Seed file cleanup could fail silently if test errors before os.remove.

If an assertion fails before line 160 (os.remove(seed_path)), the temp file persists. Consider using a context manager or fixture for cleanup.

♻️ Use try/finally for cleanup
         seed_path = "/tmp/test_seed.json"
-        with open(seed_path, "w") as f:
-            json.dump(seed_data, f)
-
-        # go to generator page
-        ...
-
-        # cleanup
-        os.remove(seed_path)
+        try:
+            with open(seed_path, "w") as f:
+                json.dump(seed_data, f)
+
+            # go to generator page
+            ...
+        finally:
+            if os.path.exists(seed_path):
+                os.remove(seed_path)

Comment on lines +23 to +31
def get_repo() -> str:
result = subprocess.run(
["gh", "repo", "view", "--json", "owner,name", "-q", '.owner.login + "/" + .name'],
capture_output=True,
text=True,
)
if result.returncode != 0:
sys.exit(1)
return result.stdout.strip()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cat -n .claude/skills/address-pr-review/scripts/fetch_comments.py | head -40

Repository: nicofretti/DataGenFlow

Length of output: 1534


🏁 Script executed:

# Check for callers of get_repo() and other error handling patterns
rg "get_repo" .claude/skills/address-pr-review/scripts/ -B 2 -A 2

Repository: nicofretti/DataGenFlow

Length of output: 1008


🏁 Script executed:

# Check the broader context of the file and other functions
wc -l .claude/skills/address-pr-review/scripts/fetch_comments.py

Repository: nicofretti/DataGenFlow

Length of output: 128


🏁 Script executed:

# Check other error handling in the file to see patterns and consistency
cat -n .claude/skills/address-pr-review/scripts/fetch_comments.py | sed -n '34,50p'

Repository: nicofretti/DataGenFlow

Length of output: 817


🏁 Script executed:

# Check if there are other subprocess calls and how they handle errors
rg "subprocess.run|sys.exit" .claude/skills/address-pr-review/scripts/fetch_comments.py -B 1 -A 1

Repository: nicofretti/DataGenFlow

Length of output: 1106


Add error output to get_repo() when gh repo view fails.

The function exits silently on error (line 29-30). The fetch_comments() function below already implements the correct pattern (line 42); apply the same error handling here for consistency.

🔧 Suggested fix
     if result.returncode != 0:
-        sys.exit(1)
+        print(f"failed to resolve repo: {result.stderr.strip()}", file=sys.stderr)
+        sys.exit(1)
🧰 Tools
🪛 Ruff (0.14.14)

[error] 25-25: Starting a process with a partial executable path

(S607)

🤖 Prompt for AI Agents
In @.claude/skills/address-pr-review/scripts/fetch_comments.py around lines 23 -
31, The get_repo() function currently exits silently when the gh repo view
subprocess fails; update it to mirror fetch_comments() error handling by
checking result.returncode and printing or logging the subprocess error output
(result.stderr) and/or stdout before exiting with non-zero status. Locate
get_repo() and change the failure branch to emit a clear error message including
result.stderr (and result.stdout if helpful) and then call sys.exit(1) so
failures are visible for debugging; use the same pattern used in
fetch_comments() around the subprocess.run call for consistency.

class MyBlock(BaseBlock):
name = "My Block"
description = "Short description of what this block does"
category = "generators" # generators|transformers|validators|utilities
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Category list inconsistent with state-backend.md.

Line 42 shows generators|transformers|validators|utilities but llm/state-backend.md (line 384) lists: generators, validators, metrics, seeders, multipliers, observability, utilities.

Missing: metrics, seeders, multipliers, observability
Extra: transformers

📝 Proposed fix
-    category = "generators"  # generators|transformers|validators|utilities
+    category = "generators"  # generators|validators|metrics|seeders|multipliers|observability|utilities
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
category = "generators" # generators|transformers|validators|utilities
category = "generators" # generators|validators|metrics|seeders|multipliers|observability|utilities
🤖 Prompt for AI Agents
In @.claude/skills/implementing-datagenflow-blocks/SKILL.md at line 42, Update
the category enumeration assigned to the variable `category` in SKILL.md so it
matches the canonical list used by the state backend: replace the current value
"generators|transformers|validators|utilities" with
"generators|validators|metrics|seeders|multipliers|observability|utilities"
(i.e., remove "transformers" and add "metrics", "seeders", "multipliers", and
"observability") so the SKILL `category` options align with the state-backend
definitions.

)

# prepare embedding call
embedding_params = llm_config_manager._prepare_embedding_call(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, let's verify the method exists and check for public alternatives
ast-grep --pattern 'def _prepare_embedding_call($$$)'
ast-grep --pattern 'def prepare_embedding_call($$$)'

Repository: nicofretti/DataGenFlow

Length of output: 2657


🏁 Script executed:

#!/bin/bash
# Check for prepare_llm_call (public method mentioned in docstring)
ast-grep --pattern 'def prepare_llm_call($$$)'

# Find all usages of _prepare_embedding_call
rg '_prepare_embedding_call' --type py -B 2 -A 2

# Check the LLMConfigManager class structure
rg 'class.*[Cc]onfig' lib/llm_config.py --type py -A 5

Repository: nicofretti/DataGenFlow

Length of output: 7400


Make _prepare_embedding_call public or refactor internal usage.

The _prepare_embedding_call() method is private (underscore prefix) but is actively used from multiple modules (duplicate_remover.py, semantic_infiller.py, ragas_metrics.py). This violates encapsulation. Either expose it as a public method prepare_embedding_call() (matching the pattern of prepare_llm_call()), or refactor the modules to use a public API.

🤖 Prompt for AI Agents
In @.claude/skills/implementing-datagenflow-blocks/SKILL.md at line 386, The
private method _prepare_embedding_call is used across modules
(duplicate_remover.py, semantic_infiller.py, ragas_metrics.py); make it public
by adding a prepare_embedding_call() wrapper on the LLM config manager that
delegates to the existing _prepare_embedding_call (or rename the method and keep
a short-lived alias for backward compatibility), update all callers to invoke
prepare_embedding_call instead of _prepare_embedding_call, and ensure any
docstring/visibility tests mirror the existing prepare_llm_call pattern so
behavior and signatures remain identical.

Comment on lines +13 to +35
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page(viewport={"width": 1920, "height": 1080})

# Navigate to local HTML file
page.goto(file_url)

# Take screenshot
page.screenshot(path=str(output_dir / "static_page.png"), full_page=True)

# Interact with elements
page.click("text=Click Me")
page.fill("#name", "John Doe")
page.fill("#email", "john@example.com")

# Submit form
page.click('button[type="submit"]')
page.wait_for_timeout(500)

# Take final screenshot
page.screenshot(path=str(output_dir / "after_submit.png"), full_page=True)

browser.close()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

find . -name "static_html_automation.py" -type f

Repository: nicofretti/DataGenFlow

Length of output: 132


🏁 Script executed:

fd "static_html_automation" --type f

Repository: nicofretti/DataGenFlow

Length of output: 48


🏁 Script executed:

cat -n .claude/skills/webapp-testing/examples/static_html_automation.py

Repository: nicofretti/DataGenFlow

Length of output: 1371


🏁 Script executed:

# Check if there's any documentation or tests showing how playwright context manager handles exceptions
rg "sync_playwright" --type py -A 10 -B 2 | head -60

Repository: nicofretti/DataGenFlow

Length of output: 3925


🏁 Script executed:

# Search for other sync_playwright usages with try/finally or exception handling patterns
rg "with sync_playwright" -A 20 | grep -E "(try|except|finally)" | head -20

Repository: nicofretti/DataGenFlow

Length of output: 48


🏁 Script executed:

# Check playwright documentation patterns in comments or existing code
rg "browser.close\(\)" --type py -B 3 -A 1

Repository: nicofretti/DataGenFlow

Length of output: 6605


🌐 Web query:

Playwright Python sync_playwright context manager exception handling browser cleanup

💡 Result:

In Playwright Python (sync), with sync_playwright() as p: ensures the Playwright driver is stopped when the block exits, even if an exception is raised—but you should still explicitly close any BrowserContext and Browser you created using try/finally so browsers shut down cleanly and artifacts (videos/HAR, etc.) flush properly. (playwright.dev)

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch()
    context = browser.new_context()
    try:
        page = context.new_page()
        page.goto("https://example.com")
        # ... your actions/assertions ...
    finally:
        # Close context first (graceful; flushes artifacts), then browser
        context.close()
        browser.close()

Notes:

  • Playwright docs recommend explicitly closing contexts you created via browser.new_context() before calling browser.close(). (playwright.dev)
  • If you don’t use the context manager (sync_playwright().start()), you must call playwright.stop() yourself (typically in a finally). (playwright.dev)

Citations:


Wrap browser operations in try/finally to ensure cleanup on exceptions.
If page navigation or interactions fail, browser.close() is never reached, risking orphaned processes. While sync_playwright() provides cleanup, explicit resource handling via try/finally is a best practice for proper shutdown and artifact flushing.

Proposed fix
 with sync_playwright() as p:
     browser = p.chromium.launch(headless=True)
-    page = browser.new_page(viewport={"width": 1920, "height": 1080})
-
-    # Navigate to local HTML file
-    page.goto(file_url)
-
-    # Take screenshot
-    page.screenshot(path=str(output_dir / "static_page.png"), full_page=True)
-
-    # Interact with elements
-    page.click("text=Click Me")
-    page.fill("#name", "John Doe")
-    page.fill("#email", "john@example.com")
-
-    # Submit form
-    page.click('button[type="submit"]')
-    page.wait_for_timeout(500)
-
-    # Take final screenshot
-    page.screenshot(path=str(output_dir / "after_submit.png"), full_page=True)
-
-    browser.close()
+    try:
+        page = browser.new_page(viewport={"width": 1920, "height": 1080})
+
+        # Navigate to local HTML file
+        page.goto(file_url)
+
+        # Take screenshot
+        page.screenshot(path=str(output_dir / "static_page.png"), full_page=True)
+
+        # Interact with elements
+        page.click("text=Click Me")
+        page.fill("#name", "John Doe")
+        page.fill("#email", "john@example.com")
+
+        # Submit form
+        page.click('button[type="submit"]')
+        page.wait_for_timeout(500)
+
+        # Take final screenshot
+        page.screenshot(path=str(output_dir / "after_submit.png"), full_page=True)
+    finally:
+        browser.close()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page(viewport={"width": 1920, "height": 1080})
# Navigate to local HTML file
page.goto(file_url)
# Take screenshot
page.screenshot(path=str(output_dir / "static_page.png"), full_page=True)
# Interact with elements
page.click("text=Click Me")
page.fill("#name", "John Doe")
page.fill("#email", "john@example.com")
# Submit form
page.click('button[type="submit"]')
page.wait_for_timeout(500)
# Take final screenshot
page.screenshot(path=str(output_dir / "after_submit.png"), full_page=True)
browser.close()
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
try:
page = browser.new_page(viewport={"width": 1920, "height": 1080})
# Navigate to local HTML file
page.goto(file_url)
# Take screenshot
page.screenshot(path=str(output_dir / "static_page.png"), full_page=True)
# Interact with elements
page.click("text=Click Me")
page.fill("#name", "John Doe")
page.fill("#email", "john@example.com")
# Submit form
page.click('button[type="submit"]')
page.wait_for_timeout(500)
# Take final screenshot
page.screenshot(path=str(output_dir / "after_submit.png"), full_page=True)
finally:
browser.close()
🤖 Prompt for AI Agents
In @.claude/skills/webapp-testing/examples/static_html_automation.py around
lines 13 - 35, Wrap the browser/page operations inside a try/finally so
browser.close() always runs: after calling sync_playwright() and creating
browser and page (symbols: sync_playwright, p.chromium.launch, browser, page),
put navigation/page interactions (page.goto, page.screenshot, page.click,
page.fill, page.wait_for_timeout) inside a try block and call browser.close() in
the finally block; ensure screenshots (output_dir) and any pending actions are
flushed before closing by keeping the final page.screenshot call in the try and
leaving only browser.close() in finally.

app.py Outdated
Comment on lines 701 to 711
@api_router.put("/llm-models/{name}/default")
async def set_default_llm_model(name: str) -> dict[str, str]:
"""set default llm model"""
try:
await llm_config_manager.set_default_llm_model(name)
return {"message": "llm model set as default successfully"}
except LLMConfigNotFoundError as e:
raise HTTPException(status_code=404, detail=e.message)
except Exception as e:
logger.exception(f"failed to set default llm model {name}")
raise HTTPException(status_code=400, detail=str(e))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Document the new default-model endpoints in backend state docs.
Please update llm/state-backend.md to capture these endpoints and default-model behavior.

As per coding guidelines: **/.py: Apply backend code review checklist from llm/rules-backend.md: Identify which llm/.md files need updates: - New API endpoints → update llm/state-backend.md; Identify if the docs needs updates.

Also applies to: 769-779

🧰 Tools
🪛 Ruff (0.14.14)

[warning] 706-706: Consider moving this statement to an else block

(TRY300)


[warning] 708-708: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


[warning] 709-709: Do not catch blind exception: Exception

(BLE001)


[warning] 711-711: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🤖 Prompt for AI Agents
In `@app.py` around lines 701 - 711, Add documentation for the new default-model
endpoints to llm/state-backend.md: describe the PUT /llm-models/{name}/default
endpoint implemented by set_default_llm_model (which calls
llm_config_manager.set_default_llm_model), the expected request/response shape,
HTTP status codes (200 success, 404 LLMConfigNotFoundError, 400 other errors),
and the semantics of "default model" behavior; also scan nearby endpoints (e.g.,
the other default-model related handlers around lines 769-779) and update any
related llm/*.md sections to satisfy the backend code review checklist from
llm/rules-backend.md.

@@ -1,4 +1,4 @@
.PHONY: check-deps install dev dev-ui dev-backend run-dev build-ui run mock-llm clean lint format lint-frontend format-frontend format-all lint-all typecheck typecheck-frontend typecheck-all test test-integration pre-merge setup
.PHONY: check-deps install dev dev-ui dev-backend run-dev build-ui run mock-llm clean lint format lint-frontend format-frontend format-all lint-all typecheck typecheck-frontend typecheck-all test test-integration test-e2e test-e2e-ui pre-merge setup
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add all to .PHONY to avoid file conflicts.
Ensures a file named all doesn’t shadow the target.

Suggested fix
-.PHONY: check-deps install dev dev-ui dev-backend run-dev build-ui run mock-llm clean lint format lint-frontend format-frontend format-all lint-all typecheck typecheck-frontend typecheck-all test test-integration test-e2e test-e2e-ui pre-merge setup
+.PHONY: check-deps install dev dev-ui dev-backend run-dev build-ui run mock-llm clean lint format lint-frontend format-frontend format-all lint-all typecheck typecheck-frontend typecheck-all test test-integration test-e2e test-e2e-ui pre-merge setup all
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
.PHONY: check-deps install dev dev-ui dev-backend run-dev build-ui run mock-llm clean lint format lint-frontend format-frontend format-all lint-all typecheck typecheck-frontend typecheck-all test test-integration test-e2e test-e2e-ui pre-merge setup
.PHONY: check-deps install dev dev-ui dev-backend run-dev build-ui run mock-llm clean lint format lint-frontend format-frontend format-all lint-all typecheck typecheck-frontend typecheck-all test test-integration test-e2e test-e2e-ui pre-merge setup all
🧰 Tools
🪛 checkmake (0.2.2)

[warning] 1-1: Missing required phony target "all"

(minphony)

🤖 Prompt for AI Agents
In `@Makefile` at line 1, The .PHONY declaration currently lists many targets but
omits "all", which can cause a file named "all" to shadow the make target;
update the .PHONY line (the .PHONY declaration containing targets like
check-deps, install, dev, etc.) to include the "all" target so Make treats "all"
as phony and avoids file conflicts.


[tool.pytest.ini_options]
addopts = "-m 'not integration' --tb=short --ignore=scripts/"
addopts = "-m 'not integration' --tb=short --ignore=scripts/ --ignore=tests/e2e/"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for CI workflows that might run e2e tests
find .github/workflows -type f -name "*.yml" -o -name "*.yaml" 2>/dev/null | head -20

Repository: nicofretti/DataGenFlow

Length of output: 97


🏁 Script executed:

#!/bin/bash
# Search for e2e test execution in CI files
rg -i "e2e|end.*to.*end|playwright|tests/e2e" --type yaml --type yml

Repository: nicofretti/DataGenFlow

Length of output: 93


🏁 Script executed:

#!/bin/bash
# Search for test-related scripts
rg -i "e2e|test.*all|integration" --type sh --type py scripts/ 2>/dev/null | head -30

Repository: nicofretti/DataGenFlow

Length of output: 249


🏁 Script executed:

#!/bin/bash
# Check the current state of pyproject.toml at line 85
sed -n '80,90p' pyproject.toml

Repository: nicofretti/DataGenFlow

Length of output: 433


🏁 Script executed:

#!/bin/bash
# Read the pre-merge CI workflow
cat .github/workflows/pre-merge.yml

Repository: nicofretti/DataGenFlow

Length of output: 1214


🏁 Script executed:

#!/bin/bash
# Search for pytest invocations that might override or run e2e tests specifically
rg "pytest|test" --type yaml . | grep -i "e2e\|override\|pytest.*tests"

Repository: nicofretti/DataGenFlow

Length of output: 48


🏁 Script executed:

#!/bin/bash
# Check if with_server.py is called by any CI or test script
rg "with_server|test_mock_providers" --type yaml --type py

Repository: nicofretti/DataGenFlow

Length of output: 401


🏁 Script executed:

#!/bin/bash
# Find and read Makefile
find . -maxdepth 2 -name "Makefile" -o -name "makefile" | head -5

Repository: nicofretti/DataGenFlow

Length of output: 76


🏁 Script executed:

#!/bin/bash
# Check Makefile content - specifically the test target
rg "^test:" -A 10 Makefile

Repository: nicofretti/DataGenFlow

Length of output: 228


🏁 Script executed:

#!/bin/bash
# Look for all test-related make targets
rg "^[a-z-]+.*test" Makefile

Repository: nicofretti/DataGenFlow

Length of output: 116


E2E tests are skipped in CI — add make test-e2e to pre-merge checks.

The pytest addopts now ignores tests/e2e/, and the CI workflow only runs make test (which uses those options). Although make test-e2e exists and calls ./tests/e2e/run_all_tests.sh, it is not invoked by the pre-merge CI job. This means e2e test failures won't be caught on PRs.

Add test-e2e to the pre-merge target in the Makefile or call it separately in .github/workflows/pre-merge.yml.

🤖 Prompt for AI Agents
In `@pyproject.toml` at line 85, The CI is skipping E2E tests because
pyproject.toml's addopts ignores tests/e2e and the pre-merge CI job only runs
the Makefile's test target; update CI to run E2E by adding the existing Makefile
target test-e2e (which calls ./tests/e2e/run_all_tests.sh) to the pre-merge
workflow or add an explicit step in .github/workflows/pre-merge.yml to run `make
test-e2e` so E2E coverage is executed on PRs.

Comment on lines +204 to +205
follow the webapp-testing skill patterns:

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix minor terminology in “Writing New Tests”.
Consider “web app testing” (or “web-app testing”) instead of “webapp-testing.”

Suggested edit
-follow the webapp-testing skill patterns:
+follow the web app testing skill patterns:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
follow the webapp-testing skill patterns:
follow the web app testing skill patterns:
🧰 Tools
🪛 LanguageTool

[grammar] ~204-~204: Ensure spelling is correct
Context: ...s ``` ## Writing New Tests follow the webapp-testing skill patterns: 1. **wait for n...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🤖 Prompt for AI Agents
In `@tests/e2e/README.md` around lines 204 - 205, In the "Writing New Tests"
section replace the term "webapp-testing" with the preferred phrase "web app
testing" (or "web-app testing") to correct terminology; locate occurrences of
the exact token "webapp-testing" in the README (e.g., the line under the
"Writing New Tests" heading) and update them to "web app testing" consistently
throughout the document.

Comment on lines +192 to +194
pipeline_id = await storage.save_pipeline("test_sampler", json.dumps(pipeline_def))
assert pipeline_id > 0
pipeline = Pipeline("test_sampler", pipeline_def["blocks"])
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Double JSON serialization bug.

save_pipeline internally calls json.dumps(definition) (see lib/storage.py:459-469). Passing json.dumps(pipeline_def) here will double-serialize, storing escaped JSON strings instead of proper JSON objects.

🐛 Fix: Pass dict directly
-        pipeline_id = await storage.save_pipeline("test_sampler", json.dumps(pipeline_def))
+        pipeline_id = await storage.save_pipeline("test_sampler", pipeline_def)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
pipeline_id = await storage.save_pipeline("test_sampler", json.dumps(pipeline_def))
assert pipeline_id > 0
pipeline = Pipeline("test_sampler", pipeline_def["blocks"])
pipeline_id = await storage.save_pipeline("test_sampler", pipeline_def)
assert pipeline_id > 0
pipeline = Pipeline("test_sampler", pipeline_def["blocks"])
🤖 Prompt for AI Agents
In `@tests/integration/test_data_augmentation.py` around lines 192 - 194, The test
is double-serializing pipeline_def because it calls storage.save_pipeline with
json.dumps(pipeline_def); change the call to pass the dict directly
(pipeline_def) so save_pipeline can perform the single json.dumps internally;
update the line that constructs the Pipeline instance remains as
Pipeline("test_sampler", pipeline_def["blocks"]) and ensure you only remove the
extra json.dumps wrapper around pipeline_def when invoking save_pipeline.

Comment on lines +295 to +297
pipeline_id = await storage.save_pipeline("test_no_embedding", json.dumps(pipeline_def))
assert pipeline_id > 0
pipeline = Pipeline("test_no_embedding", pipeline_def["blocks"])
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Same double-serialization issue.

Apply the same fix here.

🐛 Fix
-        pipeline_id = await storage.save_pipeline("test_no_embedding", json.dumps(pipeline_def))
+        pipeline_id = await storage.save_pipeline("test_no_embedding", pipeline_def)
🤖 Prompt for AI Agents
In `@tests/integration/test_data_augmentation.py` around lines 295 - 297, The test
currently double-serializes the pipeline by calling storage.save_pipeline with
json.dumps(pipeline_def); change the call to pass the raw dict (pipeline_def) to
storage.save_pipeline so it isn't serialized twice, keeping the subsequent
Pipeline("test_no_embedding", pipeline_def["blocks"]) as-is; update the use of
storage.save_pipeline(...) accordingly.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@CHANGELOG.md`:
- Around line 10-11: The phrasing "first configured model" in the "Default model
selection" changelog entry is ambiguous; update the sentence to explicitly
define the ordering rule (e.g., "the earliest created model" or "the first model
in the Settings UI list as displayed") and state which attribute is used
(creation timestamp or list position). Reference the changelog entry title
"Default model selection" and the `is_default` flag so readers know the fallback
deterministically uses that defined ordering.

In `@frontend/src/pages/Settings.tsx`:
- Around line 236-252: The model card Box (key={model.name}) is clickable but
not keyboard-accessible; make it focusable and operable by adding tabIndex={0}
(so it can receive focus), role="button" and an onKeyDown handler that calls
handleSetDefaultLlm(model.name) when Enter or Space is pressed (respecting the
isDefault guard so default cards remain inert). Also add appropriate ARIA state
attributes such as aria-pressed or aria-current (or aria-disabled when
isDefault) to convey selection to assistive tech; apply the same changes to the
other card block around lines 399-415.
- Around line 132-152: Update the llm/state-frontend.md documentation to
describe the new Settings flow: explain the ability to select default models and
the new API integration patterns used by the frontend (reference the handler
functions handleSetDefaultLlm and handleSetDefaultEmbedding and the llmConfigApi
methods setDefaultLLMModel and setDefaultEmbeddingModel), and document the state
update calls loadLlmModels and loadEmbeddingModels that refresh UI state after
API calls; include example UX steps for selecting defaults and note expected
success/error handling (toast notifications) so the docs match the updated
Settings.tsx behavior.
- Around line 132-151: In the catch blocks of handleSetDefaultLlm and
handleSetDefaultEmbedding, log the caught error to the console in addition to
showing the toast; capture the error variable (use the existing error) and call
console.error with a descriptive message and the error object (e.g., "Failed to
set default LLM model" / "Failed to set default embedding model") so that
llmConfigApi.setDefaultLLMModel and llmConfigApi.setDefaultEmbeddingModel
failures are recorded for debugging.

Comment on lines +10 to +11
### Added
- Default model selection: Users can designate a default LLM and embedding model via the Settings UI. Both model types support `is_default` flag with automatic fallback to the first configured model.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Clarify “first configured model” ordering.
“First configured” is ambiguous (creation time? list order? name sort?). Please specify the ordering criteria to avoid user confusion.

🤖 Prompt for AI Agents
In `@CHANGELOG.md` around lines 10 - 11, The phrasing "first configured model" in
the "Default model selection" changelog entry is ambiguous; update the sentence
to explicitly define the ordering rule (e.g., "the earliest created model" or
"the first model in the Settings UI list as displayed") and state which
attribute is used (creation timestamp or list position). Reference the changelog
entry title "Default model selection" and the `is_default` flag so readers know
the fallback deterministically uses that defined ordering.

Comment on lines +132 to +152
const handleSetDefaultLlm = async (name: string) => {
try {
await llmConfigApi.setDefaultLLMModel(name);
toast.success("Default LLM model updated");
loadLlmModels();
} catch (error) {
const message = error instanceof Error ? error.message : "Unknown error";
toast.error(`Failed to set default LLM model: ${message}`);
}
};

const handleSetDefaultEmbedding = async (name: string) => {
try {
await llmConfigApi.setDefaultEmbeddingModel(name);
toast.success("Default embedding model updated");
loadEmbeddingModels();
} catch (error) {
const message = error instanceof Error ? error.message : "Unknown error";
toast.error(`Failed to set default embedding model: ${message}`);
}
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Update frontend state docs for the new Settings flow.

The Settings page now supports default-model selection and new API integration patterns. Please reflect this in llm/state-frontend.md.

Based on learnings: Applies to llm/state-frontend.md : Update llm/state-frontend.md when API integration patterns change. Based on learnings: Applies to llm/state-frontend.md : Update llm/state-frontend.md when UI flow changes.

🤖 Prompt for AI Agents
In `@frontend/src/pages/Settings.tsx` around lines 132 - 152, Update the
llm/state-frontend.md documentation to describe the new Settings flow: explain
the ability to select default models and the new API integration patterns used
by the frontend (reference the handler functions handleSetDefaultLlm and
handleSetDefaultEmbedding and the llmConfigApi methods setDefaultLLMModel and
setDefaultEmbeddingModel), and document the state update calls loadLlmModels and
loadEmbeddingModels that refresh UI state after API calls; include example UX
steps for selecting defaults and note expected success/error handling (toast
notifications) so the docs match the updated Settings.tsx behavior.

Comment on lines +132 to +151
const handleSetDefaultLlm = async (name: string) => {
try {
await llmConfigApi.setDefaultLLMModel(name);
toast.success("Default LLM model updated");
loadLlmModels();
} catch (error) {
const message = error instanceof Error ? error.message : "Unknown error";
toast.error(`Failed to set default LLM model: ${message}`);
}
};

const handleSetDefaultEmbedding = async (name: string) => {
try {
await llmConfigApi.setDefaultEmbeddingModel(name);
toast.success("Default embedding model updated");
loadEmbeddingModels();
} catch (error) {
const message = error instanceof Error ? error.message : "Unknown error";
toast.error(`Failed to set default embedding model: ${message}`);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Log default-setting failures to console.

The toast is user-facing, but the error isn’t logged, which makes debugging harder. Add a console.error alongside the toast.

Proposed fix
   const handleSetDefaultLlm = async (name: string) => {
     try {
       await llmConfigApi.setDefaultLLMModel(name);
       toast.success("Default LLM model updated");
       loadLlmModels();
     } catch (error) {
       const message = error instanceof Error ? error.message : "Unknown error";
+      console.error("Failed to set default LLM model:", error);
       toast.error(`Failed to set default LLM model: ${message}`);
     }
   };

   const handleSetDefaultEmbedding = async (name: string) => {
     try {
       await llmConfigApi.setDefaultEmbeddingModel(name);
       toast.success("Default embedding model updated");
       loadEmbeddingModels();
     } catch (error) {
       const message = error instanceof Error ? error.message : "Unknown error";
+      console.error("Failed to set default embedding model:", error);
       toast.error(`Failed to set default embedding model: ${message}`);
     }
   };

As per coding guidelines: Log errors to console in frontend - ensure errors are logged for debugging.

🤖 Prompt for AI Agents
In `@frontend/src/pages/Settings.tsx` around lines 132 - 151, In the catch blocks
of handleSetDefaultLlm and handleSetDefaultEmbedding, log the caught error to
the console in addition to showing the toast; capture the error variable (use
the existing error) and call console.error with a descriptive message and the
error object (e.g., "Failed to set default LLM model" / "Failed to set default
embedding model") so that llmConfigApi.setDefaultLLMModel and
llmConfigApi.setDefaultEmbeddingModel failures are recorded for debugging.

Comment on lines +236 to +252
<Box
key={model.name}
onClick={() => !isDefault && handleSetDefaultLlm(model.name)}
sx={{
p: 3,
border: "1px solid",
borderColor: isDefault ? "success.emphasis" : "border.default",
borderRadius: 2,
bg: isDefault ? "success.subtle" : "canvas.subtle",
cursor: isDefault ? "default" : "pointer",
transition: "all 0.2s",
"&:hover": {
borderColor: isDefault ? "success.emphasis" : "accent.emphasis",
transform: isDefault ? "none" : "translateY(-2px)",
boxShadow: isDefault ? "none" : "shadow.medium",
},
}}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Make default-selection cards keyboard accessible.

The cards are clickable but not focusable or operable via keyboard, which blocks non-mouse users from setting defaults.

Proposed fix
                 <Box
                   key={model.name}
                   onClick={() => !isDefault && handleSetDefaultLlm(model.name)}
+                  role="button"
+                  tabIndex={isDefault ? -1 : 0}
+                  aria-pressed={isDefault}
+                  onKeyDown={(e: React.KeyboardEvent) => {
+                    if (!isDefault && (e.key === "Enter" || e.key === " ")) {
+                      e.preventDefault();
+                      handleSetDefaultLlm(model.name);
+                    }
+                  }}
                   sx={{
                     p: 3,
                     border: "1px solid",
                     borderColor: isDefault ? "success.emphasis" : "border.default",
                 <Box
                   key={model.name}
                   onClick={() => !isDefault && handleSetDefaultEmbedding(model.name)}
+                  role="button"
+                  tabIndex={isDefault ? -1 : 0}
+                  aria-pressed={isDefault}
+                  onKeyDown={(e: React.KeyboardEvent) => {
+                    if (!isDefault && (e.key === "Enter" || e.key === " ")) {
+                      e.preventDefault();
+                      handleSetDefaultEmbedding(model.name);
+                    }
+                  }}
                   sx={{
                     p: 3,
                     border: "1px solid",
                     borderColor: isDefault ? "success.emphasis" : "border.default",

Also applies to: 399-415

🤖 Prompt for AI Agents
In `@frontend/src/pages/Settings.tsx` around lines 236 - 252, The model card Box
(key={model.name}) is clickable but not keyboard-accessible; make it focusable
and operable by adding tabIndex={0} (so it can receive focus), role="button" and
an onKeyDown handler that calls handleSetDefaultLlm(model.name) when Enter or
Space is pressed (respecting the isDefault guard so default cards remain inert).
Also add appropriate ARIA state attributes such as aria-pressed or aria-current
(or aria-disabled when isDefault) to convey selection to assistive tech; apply
the same changes to the other card block around lines 399-415.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

🚀 Feat: Model selection in settings

2 participants