🚀 Feat: Model selection + e2e tests#61
🚀 Feat: Model selection + e2e tests#61LedjoLleshaj wants to merge 18 commits intonicofretti:mainfrom
Conversation
…abbit (nicofretti#57) - Adds a data augmentation pipeline template via three new blocks (StructureSampler, SemanticInfiller, DuplicateRemover) - Enables JSON-or-template field configuration across blocks - Implements E2E testing infrastructure with Playwright - Updates frontend UI for model selection and template editing - Integrates CodeRabbit for automated code review.
Fixed frontend display bug where models named "default" incorrectly overrode explicit default settings. Updated lib/storage.py to strictly enforce a single default model constraint. Implemented auto-default logic: the first model created is automatically set as default. Implemented default reassignment: deleting the current default model automatically promotes another model to default. Added integration tests covering auto-default and reassignment scenarios.
WalkthroughAdds default-model selection end-to-end: DB + storage changes to track a single default, LLMConfigManager methods and API PUT endpoints to set defaults, frontend Settings UI to mark defaults, updated lookup fallbacks to prefer Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Frontend
participant API
participant Manager
participant Storage
User->>Frontend: click "Set default" on model card
Frontend->>API: PUT /llm-models/{name}/default
API->>Manager: set_default_llm_model(name)
Manager->>Storage: set_default_llm_model(name)
Storage->>Storage: BEGIN TRANSACTION
Storage->>Storage: UPDATE llm_models SET is_default=0
Storage->>Storage: UPDATE llm_models SET is_default=1 WHERE name={name}
Storage->>Storage: COMMIT
Storage-->>Manager: OK
Manager-->>API: success
API-->>Frontend: 200 OK (message)
Frontend->>API: GET /llm-models
API->>Manager: list_llm_models()
Manager->>Storage: list_llm_models()
Storage-->>Manager: rows (with is_default)
Manager-->>API: model list
API-->>Frontend: updated list (default badge)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Fix all issues with AI agents
In `@app.py`:
- Around line 700-710: Add integration tests using TestClient that cover the new
endpoints PUT /llm-models/{name}/default and PUT
/embedding-models/{name}/default for both success and failure paths: call the
app endpoints (following patterns in test_api.py) and assert 200 success
responses on normal behavior; mock llm_config_manager.set_default_llm_model and
embedding_config_manager.set_default_embedding_model to raise
LLMConfigNotFoundError and assert the HTTP response is 404 with the error
detail; also mock them to raise a generic Exception and assert the response is
400 and contains the exception message; ensure tests reference the handler
functions by endpoint paths and use the same import/mocking approach used
elsewhere in tests.
- Around line 700-710: Replace the broad except Exception in the
set_default_llm_model endpoint with a specific except LLMConfigError handler
(LLMConfigError is the base for LLMConfigNotFoundError) so you catch
storage/manager errors without swallowing unrelated exceptions; keep logging via
logger.exception(f"failed to set default llm model {name}") and raise
HTTPException(status_code=400, detail=e.message) from e to preserve the message
format; make the analogous change in the embedding default endpoint (the
function that sets embedding models default) using the same LLMConfigError
pattern and message handling; finally add unit tests that exercise both
/llm-models/{name}/default and /embedding-models/{name}/default for success,
not-found (LLMConfigNotFoundError), and generic manager/storage error
(LLMConfigError) to ensure response codes and detail strings match expectations.
In `@frontend/src/pages/Settings.tsx`:
- Around line 396-397: The embedding list uses a legacy fallback causing
inconsistent default detection: inside the embeddingModels.map callback (where
you set const isDefault = model.is_default || model.name === "default";) remove
the legacy check so it matches the LLM logic (use only model.is_default). Update
that line to compute isDefault solely from model.is_default to ensure
backend-controlled defaults are honored.
In `@lib/llm_config.py`:
- Around line 135-141: Remove the duplicated "fallback chain:" line in the
docstring (lines 135-136) so the docstring reads the fallback sequence only
once; mirror the same fix applied in get_llm_model’s docstring to keep wording
consistent and ensure the list (1.–4.) follows a single "fallback chain:"
header.
- Around line 42-48: Remove the duplicated docstring line describing the
fallback chain in lib/llm_config.py so the description only appears once; locate
the module-level docstring (the block explaining the fallback chain: "uses
fallback chain to ensure blocks always have a model available:") and delete the
repeated occurrence so the list of fallback steps remains correct and not
duplicated.
In `@tests/integration/test_default_model_selection_integration.py`:
- Around line 143-150: Replace the broad Exception in the
test_set_nonexistent_default_raises_error with the specific
LLMConfigNotFoundError: update the pytest.raises(...) to
pytest.raises(LLMConfigNotFoundError) and add/import LLMConfigNotFoundError at
the top of the test module so the assertion targets the exact error raised by
llm_config_manager.set_default_llm_model("non_existent_model").
🧹 Nitpick comments (2)
tests/integration/test_default_model_selection_integration.py (1)
31-38: Consider adding test for legacy "default" name fallback.The docstring mentions "Fallback to 'default' named model (legacy support)" but the test doesn't explicitly verify this behavior. The current tests only check
is_default=Truefallback.lib/storage.py (1)
707-709: Consider adding logging before rollback for debugging.The transaction rollback blocks re-raise exceptions but don't log them at the storage layer. While the caller may log, adding
logger.exception()here would preserve context closer to the failure point.♻️ Example for save_llm_model (apply similar pattern to others)
except Exception: + logger.exception("transaction failed during save_llm_model") await db.execute("ROLLBACK") raiseAs per coding guidelines: "Use logger.exception() in backend — preserves stack traces in error logs."
Also applies to: 736-738, 759-761, 848-850, 876-878, 901-903
Fixed frontend display bug where models named "default" incorrectly overrode explicit default settings. Updated lib/storage.py to strictly enforce a single default model constraint. Implemented auto-default logic: the first model created is automatically set as default. Implemented default reassignment: deleting the current default model automatically promotes another model to default. Added integration tests covering auto-default and reassignment scenarios.
…shaj/DataGenFlow into feature-model-selection
There was a problem hiding this comment.
Actionable comments posted: 18
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
lib/blocks/builtin/ragas_metrics.py (1)
110-149:⚠️ Potential issue | 🟠 Majoravoid mutating
self.metricsinside execute.Line 110 mutates instance state per-run; if block instances are reused concurrently, metrics can leak across executions. Keep parsed metrics local and pass them to helpers.
🔧 suggested fix
- # store parsed metrics for use in other methods - self.metrics = metrics + metrics_list = metrics @@ - return {"ragas_scores": self._empty_scores()} + return {"ragas_scores": self._empty_scores(metrics_list)} @@ - metric_instances = self._build_metrics(llm, embeddings) + metric_instances = self._build_metrics(llm, embeddings, metrics_list)- def _empty_scores(self) -> dict[str, Any]: + def _empty_scores(self, metrics: list[str]) -> dict[str, Any]: """return empty scores with passed=False""" - return {metric: 0.0 for metric in self.metrics} | {"passed": False} + return {metric: 0.0 for metric in metrics} | {"passed": False} - def _build_metrics(self, llm: Any, embeddings: Any) -> dict[str, Any]: + def _build_metrics( + self, llm: Any, embeddings: Any, metrics: list[str] + ) -> dict[str, Any]: """build metric instances""" @@ - return {k: v for k, v in available.items() if k in self.metrics} + return {k: v for k, v in available.items() if k in metrics}frontend/src/pages/Generator.tsx (1)
614-616:⚠️ Potential issue | 🟡 MinorEmpty catch block for metadata parsing.
While this handles malformed JSON gracefully by returning null, adding a
console.errorwould help debug production issues. As per coding guidelines, empty catch blocks should include logging.🛡️ Add error logging
} catch { + console.error("Failed to parse job metadata"); return null; }
🤖 Fix all issues with AI agents
In @.claude/skills/address-pr-review/scripts/fetch_comments.py:
- Around line 23-31: The get_repo() function currently exits silently when the
gh repo view subprocess fails; update it to mirror fetch_comments() error
handling by checking result.returncode and printing or logging the subprocess
error output (result.stderr) and/or stdout before exiting with non-zero status.
Locate get_repo() and change the failure branch to emit a clear error message
including result.stderr (and result.stdout if helpful) and then call sys.exit(1)
so failures are visible for debugging; use the same pattern used in
fetch_comments() around the subprocess.run call for consistency.
In @.claude/skills/implementing-datagenflow-blocks/SKILL.md:
- Line 42: Update the category enumeration assigned to the variable `category`
in SKILL.md so it matches the canonical list used by the state backend: replace
the current value "generators|transformers|validators|utilities" with
"generators|validators|metrics|seeders|multipliers|observability|utilities"
(i.e., remove "transformers" and add "metrics", "seeders", "multipliers", and
"observability") so the SKILL `category` options align with the state-backend
definitions.
- Line 386: The private method _prepare_embedding_call is used across modules
(duplicate_remover.py, semantic_infiller.py, ragas_metrics.py); make it public
by adding a prepare_embedding_call() wrapper on the LLM config manager that
delegates to the existing _prepare_embedding_call (or rename the method and keep
a short-lived alias for backward compatibility), update all callers to invoke
prepare_embedding_call instead of _prepare_embedding_call, and ensure any
docstring/visibility tests mirror the existing prepare_llm_call pattern so
behavior and signatures remain identical.
In @.claude/skills/webapp-testing/examples/static_html_automation.py:
- Around line 13-35: Wrap the browser/page operations inside a try/finally so
browser.close() always runs: after calling sync_playwright() and creating
browser and page (symbols: sync_playwright, p.chromium.launch, browser, page),
put navigation/page interactions (page.goto, page.screenshot, page.click,
page.fill, page.wait_for_timeout) inside a try block and call browser.close() in
the finally block; ensure screenshots (output_dir) and any pending actions are
flushed before closing by keeping the final page.screenshot call in the try and
leaving only browser.close() in finally.
In `@app.py`:
- Around line 701-711: Add documentation for the new default-model endpoints to
llm/state-backend.md: describe the PUT /llm-models/{name}/default endpoint
implemented by set_default_llm_model (which calls
llm_config_manager.set_default_llm_model), the expected request/response shape,
HTTP status codes (200 success, 404 LLMConfigNotFoundError, 400 other errors),
and the semantics of "default model" behavior; also scan nearby endpoints (e.g.,
the other default-model related handlers around lines 769-779) and update any
related llm/*.md sections to satisfy the backend code review checklist from
llm/rules-backend.md.
In `@frontend/src/components/pipeline-editor/BlockConfigPanel.tsx`:
- Around line 85-119: The component BlockConfigPanel currently contains inline
fetchModels inside its useEffect and directly calls
setLlmModels/setEmbeddingModels; refactor by moving the fetch logic into a
service (e.g., add modelsService.getLlmModels and
modelsService.getEmbeddingModels or a single modelsService.fetchAllModels that
handles AbortController) and implement a custom hook (e.g., useModels or
useAvailableModels) that returns { llmModels, embeddingModels, loading, error,
refresh }. Replace the inline useEffect/fetchModels in BlockConfigPanel with
that hook, wire loading to the UI (spinner/disabled states) and surface errors
via your app's toast/notification mechanism when hook.error is set, and ensure
the hook supports aborting on unmount and exposes a refresh function for
retries.
- Around line 97-113: Replace the unsafe "any" usages in BlockConfigPanel: add a
type guard function (e.g., function isModel(obj: unknown): obj is { name: string
}) and use it to filter llmData/embeddingData before mapping to names passed to
setLlmModels and setEmbeddingModels (e.g., Array.isArray(llmData) ?
llmData.filter(isModel).map(m => m.name) : []). Also change the catch error
handling to use unknown and narrow it safely (e.g., catch (error: unknown) { if
(!(error instanceof DOMException && error.name === 'AbortError'))
console.error('Failed to fetch models:', error instanceof Error ? error :
String(error)); }) so you remove "m: any" and "error as any" while preserving
the AbortError check and logging.
In `@frontend/src/components/pipeline-editor/BlockNode.tsx`:
- Around line 108-117: The catch block in BlockNode.tsx that parses
config["fields_to_generate"] is currently empty and hides JSON parse errors;
update the catch to provide a fallback and record the failure by setting
displayValue to a clear fallback (e.g., "[invalid JSON]" or treat as template
string) and/or logging the error via the component logger or console, so
failures aren’t silent; specifically modify the try/catch around
JSON.parse(config[key]) in the fields_to_generate handling to catch the error,
call console.error or the component logging utility with the error and
key/context, and set displayValue to the chosen fallback (instead of leaving it
unchanged).
In `@lib/blocks/builtin/duplicate_remover.py`:
- Around line 217-253: The code filters out empty entries from sample_texts but
not from samples, breaking alignment and causing index errors in
_compute_similarities; update the logic after creating sample_texts so that you
filter samples in tandem (e.g., build pairs of (sample, text) via a
comprehension or zip, then split back into filtered_samples and sample_texts)
and pass filtered_samples and their corresponding sample_embeddings to
_compute_similarities instead of the original samples so indices remain aligned
(refer to _extract_text, sample_texts, samples, and _compute_similarities).
In `@lib/blocks/builtin/ragas_metrics.py`:
- Around line 86-109: After parsing metrics_template into metrics_list in the
method that uses render_template and assigns metrics, validate every entry
against the allowed set (e.g., a module-level or class constant like
ALLOWED_METRICS or self.allowed_metrics); if any items are not in that set,
raise BlockExecutionError with a clear message and detail that includes the
invalid metrics and the rendered value (similar to the existing JSON error
handling). Insert this check after the isinstance checks on metrics_list and
before assigning metrics = metrics_list so unknown metric names cause an
immediate BlockExecutionError instead of being silently ignored.
In `@lib/blocks/builtin/structure_sampler.py`:
- Around line 348-369: When parsing dependencies (the dependencies variable
populated via render_and_parse_json from self.dependencies_template), validate
that every dependency key references a declared categorical field in
self.categorical_fields before proceeding to _topological_sort; if any key is
missing, raise BlockExecutionError with a clear message and include the full
dependencies and categorical_fields in the detail so callers can see the
mismatch. Also consider validating that each listed parent in the dependency
values is a declared categorical field and raise similarly if not, to avoid
misleading circular-dependency errors or sampling with missing parents.
In `@lib/blocks/commons/template_utils.py`:
- Around line 66-96: The parse_llm_json_response function currently swallows
JSONDecodeError in three empty except blocks; capture each exception (e.g.,
"except json.JSONDecodeError as e") and either log it at debug level via the
module logger or store the last exception and include its message in the
BlockExecutionError detail; update the blocks around markdown_match and
json_match and the initial direct-parse try to reference the captured exception
(and optionally logger) so failures aren't silent, and ensure the final raise
includes the last exception info (or call logger.debug with field_name and
exception) to aid debugging.
In `@lib/workflow.py`:
- Around line 275-284: Document that multiplier pipelines run
_filter_output_data and remove any keys present in initial_data_keys (used to
strip template/config/seed fields) by updating llm/state-backend.md; explicitly
note this behavior is currently applied only for multiplier pipelines and either
justify the asymmetry or state whether the filter should also apply to normal
pipelines. Also add guidance recommending moving from the current deny-list
approach to an explicit allow-list of preserved output fields (or provide a
configurable whitelist) to avoid accidental removal of legitimate block outputs
(reference _filter_output_data and places where multiplier pipelines invoke it)
and include migration notes for blocks that emit keys like "samples".
In `@Makefile`:
- Line 1: The .PHONY declaration currently lists many targets but omits "all",
which can cause a file named "all" to shadow the make target; update the .PHONY
line (the .PHONY declaration containing targets like check-deps, install, dev,
etc.) to include the "all" target so Make treats "all" as phony and avoids file
conflicts.
In `@pyproject.toml`:
- Line 85: The CI is skipping E2E tests because pyproject.toml's addopts ignores
tests/e2e and the pre-merge CI job only runs the Makefile's test target; update
CI to run E2E by adding the existing Makefile target test-e2e (which calls
./tests/e2e/run_all_tests.sh) to the pre-merge workflow or add an explicit step
in .github/workflows/pre-merge.yml to run `make test-e2e` so E2E coverage is
executed on PRs.
In `@tests/e2e/README.md`:
- Around line 204-205: In the "Writing New Tests" section replace the term
"webapp-testing" with the preferred phrase "web app testing" (or "web-app
testing") to correct terminology; locate occurrences of the exact token
"webapp-testing" in the README (e.g., the line under the "Writing New Tests"
heading) and update them to "web app testing" consistently throughout the
document.
In `@tests/integration/test_data_augmentation.py`:
- Around line 295-297: The test currently double-serializes the pipeline by
calling storage.save_pipeline with json.dumps(pipeline_def); change the call to
pass the raw dict (pipeline_def) to storage.save_pipeline so it isn't serialized
twice, keeping the subsequent Pipeline("test_no_embedding",
pipeline_def["blocks"]) as-is; update the use of storage.save_pipeline(...)
accordingly.
- Around line 192-194: The test is double-serializing pipeline_def because it
calls storage.save_pipeline with json.dumps(pipeline_def); change the call to
pass the dict directly (pipeline_def) so save_pipeline can perform the single
json.dumps internally; update the line that constructs the Pipeline instance
remains as Pipeline("test_sampler", pipeline_def["blocks"]) and ensure you only
remove the extra json.dumps wrapper around pipeline_def when invoking
save_pipeline.
🧹 Nitpick comments (23)
scripts/inspect_db_configs.py (3)
50-50: Return value frommain()is discarded.
asyncio.run(main())at line 56 doesn't capture the returned tuple. Either remove the return statement or capture/use the result if needed for testing.♻️ Proposed fix
- return llm_models, embedding_models finally: await storage.close()Or if the return is intentional for testing:
if __name__ == "__main__": - asyncio.run(main()) + llm_models, embedding_models = asyncio.run(main()) + print(f"\nTotal: {len(llm_models)} LLM models, {len(embedding_models)} embedding models")Also applies to: 55-56
17-17: Hardcoded database path.Consider accepting the path as a CLI argument for flexibility, or document that
data/qa_records.dbis the expected location.♻️ Proposed enhancement
+import argparse + async def main(): - storage = Storage("data/qa_records.db") + parser = argparse.ArgumentParser(description=__doc__) + parser.add_argument("--db", default="data/qa_records.db", help="path to database") + args = parser.parse_args() + storage = Storage(args.db)
30-34: Use public Storage methods instead of private API.Storage exposes
list_llm_models()andlist_embedding_models()public methods that handle the database queries. Replace the private_execute_with_connection()calls (lines 30 and 44) with these public methods, which return properly typed objects instead of raw dicts.Also remove the unused return statement at line 50—
asyncio.run(main())at line 56 discards the result..claude/skills/debugging-pipelines/SKILL.md (1)
90-92: Hardcoded line numbers may drift.References like
lib/workflow.py:85-224andlib/workflow.py:305-449will become stale as code evolves. Consider using function/method names or comments to anchor references instead.tests/integration/conftest.py (1)
14-15: Consider fixture scope for e2e efficiency.The fixture defaults to
scope="function", creating a new DB per test. For e2e tests that don't mutate shared state,scope="module"orscope="session"could reduce overhead.Example if appropriate
`@pytest_asyncio.fixture` +async def e2e_storage(scope="module"): -async def e2e_storage():.claude/skills/webapp-testing/scripts/with_server.py (2)
76-77: Addstrict=Trueto zip for defense in depth.While line 71-73 validates count match, adding
strict=Trueprevents silent truncation if that check is ever bypassed or refactored.Proposed fix
- for cmd, port in zip(args.servers, args.ports): + for cmd, port in zip(args.servers, args.ports, strict=True):
28-32: UseOSErrorinstead ofsocket.error.
socket.erroris an alias forOSErrorsince Python 3.3. UsingOSErrordirectly is more idiomatic.Proposed fix
- except (socket.error, ConnectionRefusedError): + except OSError:Note:
ConnectionRefusedErroris a subclass ofOSError, so catchingOSErroralone suffices.scripts/with_server.py (1)
46-63: Fail fast if a server process exits during readiness polling.
Right now a crashed process just burns the full timeout; checkingproc.poll()gives faster feedback and clearer failures.💡 Suggested change
while time.time() - start_time < self.max_wait: + if proc.poll() is not None: + print(f" server on port {port} exited (code {proc.returncode})") + self.cleanup() + sys.exit(1) try: with urllib.request.urlopen(url, timeout=2) as response: if response.status == 200: print(f" server on port {port} is ready")lib/workflow.py (1)
334-435: Consider aligning output filtering across pipeline types.
Filtering is now applied in the multiplier seed path, but normal pipelines still return fullaccumulated_state. If a consistent response shape is desired, consider applying the same filter in the non-multiplier path.tests/test_template_renderer.py (3)
8-33: split conditionals test and rename for convention.
test_render_template_with_conditionalsexercises two scenarios in one test. Prefer one behavior per test and use thetest_<method>_<scenario>_<expected>naming pattern.As per coding guidelines, One behavior per test; Test names: test_.
35-97: split multi-scenario tojson tests and align naming.
test_tojson_filter_nested_in_complex_templatemixes defined/undefined cases; split into separate tests and rename to the standard pattern.As per coding guidelines, One behavior per test; Test names: test_.
99-133: separate truncate scenarios and rename error tests.
test_truncate_filtercovers short + long inputs in one test; split and rename to the required convention. Consider renaming the error tests (test_undefined_variable_without_filter,test_template_syntax_error) to match the same pattern.As per coding guidelines, One behavior per test; Test names: test_.
lib/blocks/builtin/semantic_infiller.py (5)
174-182: Use exception chaining for better traceability.When re-raising exceptions, chain with
from eto preserve the original traceback.Proposed fix
try: response = await litellm.acompletion(**llm_params) except Exception as e: raise BlockExecutionError( - f"LLM call failed: {str(e)}", + f"LLM call failed: {e!s}", detail={ "skeleton": skeleton, "prompt_preview": prompt[:200], "error": str(e), }, - ) + ) from e
247-252: Addstrict=Trueto zip for defensive iteration.If
embeddingsandsamplessomehow get out of sync, this would silently truncate. Addingstrict=Trueensures length mismatches raise immediately.Proposed fix
- for emb, sample in zip(embeddings, samples): + for emb, sample in zip(embeddings, samples, strict=True): sim = self._cosine_similarity(target_embedding, emb) similarities.append((sim, sample))
268-270: Rename unused loop variable.
simis unpacked but not used in the loop body.Proposed fix
- for sim, sample in similar_samples: + for _sim, sample in similar_samples: fields_str = json.dumps({f: sample.get(f, "") for f in fields_to_generate}) negative_lines.append(f" - {fields_str}")
332-338: Use exception chaining here as well.Same issue as the other LLM call - chain with
from e.Proposed fix
try: response = await litellm.acompletion(**llm_params) except Exception as e: raise BlockExecutionError( - f"LLM call failed: {str(e)}", + f"LLM call failed: {e!s}", detail={"skeleton": skeleton, "attempt": attempt, "error": str(e)}, - ) + ) from e
389-390: Unreachable code after exhausted for-loop.These lines execute only if the loop completes all iterations without returning. However, the loop always returns on the last iteration (attempt == max_diversity_retries) after exhausting retries. The variables
resultandembeddingwill be defined from the last iteration, so this is technically safe but misleading.Consider adding a comment or restructuring to clarify intent.
tests/integration/test_data_augmentation_pipeline.py (1)
28-32: Monkey-patching global state works but consider fixture-based approach.The manual save/restore of
app.llm_config_manageris functional but fragile. Consider usingunittest.mock.patchas a context manager or decorator for cleaner lifecycle management:with patch.object(app, 'llm_config_manager', llm_config_manager): # test bodyThis eliminates the need for try/finally cleanup.
tests/blocks/test_semantic_infiller.py (1)
386-417: Unusedmock_embeddingparameter - add assertion or remove.The
mock_embeddingmock is patched but never used or asserted. If the intent is to verify embeddings are NOT called when unavailable, add an assertion:♻️ Option 1: Add assertion
# should still work, just without diversity check assert "samples" in result assert len(result["samples"]) == 1 + # embedding should not be called when model unavailable + mock_embedding.assert_not_called()Alternatively, remove the mock from the decorator if not needed.
tests/e2e/test_review_e2e.py (2)
17-37: Missing setup/teardown fixture - inconsistent with other e2e tests.
test_pipelines_e2e.pyandtest_generator_e2e.pyuse a module-scoped_e2e_setup_teardownfixture that waits for server and cleans database. This file lacks that pattern, which could lead to test pollution or failures if server isn't ready.♻️ Add fixture for consistency
+@pytest.fixture(scope="module", autouse=True) +def _e2e_setup_teardown(): + """setup and teardown for e2e tests""" + from test_helpers import cleanup_database, wait_for_server + if not wait_for_server(): + pytest.skip("server not ready for e2e tests") + cleanup_database() + yield + cleanup_database() + + def test_review_page_loads():
40-68: Tests may pass vacuously when UI elements are missing.Several tests (like
test_select_job) checkif len(selectors) > 0but don't assert or skip when elements aren't found. The test passes without verifying anything meaningful.Consider using
pytest.skip()when preconditions aren't met:♻️ Make preconditions explicit
selectors = page.locator('select, [role="combobox"]').all() - if len(selectors) > 0: + if len(selectors) == 0: + browser.close() + pytest.skip("no job selector found on review page") + + # click first selector + selectors[0].click()tests/e2e/test_generator_e2e.py (2)
56-74: Inconsistent screenshot path handling vs test_pipelines_e2e.py.This file uses hardcoded
/tmp/paths whiletest_pipelines_e2e.pyuses pytest'stmp_pathfixture. While acceptable per project guidelines, consider aligning patterns across e2e tests for maintainability.
115-163: Seed file cleanup could fail silently if test errors before os.remove.If an assertion fails before line 160 (
os.remove(seed_path)), the temp file persists. Consider using a context manager or fixture for cleanup.♻️ Use try/finally for cleanup
seed_path = "/tmp/test_seed.json" - with open(seed_path, "w") as f: - json.dump(seed_data, f) - - # go to generator page - ... - - # cleanup - os.remove(seed_path) + try: + with open(seed_path, "w") as f: + json.dump(seed_data, f) + + # go to generator page + ... + finally: + if os.path.exists(seed_path): + os.remove(seed_path)
| def get_repo() -> str: | ||
| result = subprocess.run( | ||
| ["gh", "repo", "view", "--json", "owner,name", "-q", '.owner.login + "/" + .name'], | ||
| capture_output=True, | ||
| text=True, | ||
| ) | ||
| if result.returncode != 0: | ||
| sys.exit(1) | ||
| return result.stdout.strip() |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n .claude/skills/address-pr-review/scripts/fetch_comments.py | head -40Repository: nicofretti/DataGenFlow
Length of output: 1534
🏁 Script executed:
# Check for callers of get_repo() and other error handling patterns
rg "get_repo" .claude/skills/address-pr-review/scripts/ -B 2 -A 2Repository: nicofretti/DataGenFlow
Length of output: 1008
🏁 Script executed:
# Check the broader context of the file and other functions
wc -l .claude/skills/address-pr-review/scripts/fetch_comments.pyRepository: nicofretti/DataGenFlow
Length of output: 128
🏁 Script executed:
# Check other error handling in the file to see patterns and consistency
cat -n .claude/skills/address-pr-review/scripts/fetch_comments.py | sed -n '34,50p'Repository: nicofretti/DataGenFlow
Length of output: 817
🏁 Script executed:
# Check if there are other subprocess calls and how they handle errors
rg "subprocess.run|sys.exit" .claude/skills/address-pr-review/scripts/fetch_comments.py -B 1 -A 1Repository: nicofretti/DataGenFlow
Length of output: 1106
Add error output to get_repo() when gh repo view fails.
The function exits silently on error (line 29-30). The fetch_comments() function below already implements the correct pattern (line 42); apply the same error handling here for consistency.
🔧 Suggested fix
if result.returncode != 0:
- sys.exit(1)
+ print(f"failed to resolve repo: {result.stderr.strip()}", file=sys.stderr)
+ sys.exit(1)🧰 Tools
🪛 Ruff (0.14.14)
[error] 25-25: Starting a process with a partial executable path
(S607)
🤖 Prompt for AI Agents
In @.claude/skills/address-pr-review/scripts/fetch_comments.py around lines 23 -
31, The get_repo() function currently exits silently when the gh repo view
subprocess fails; update it to mirror fetch_comments() error handling by
checking result.returncode and printing or logging the subprocess error output
(result.stderr) and/or stdout before exiting with non-zero status. Locate
get_repo() and change the failure branch to emit a clear error message including
result.stderr (and result.stdout if helpful) and then call sys.exit(1) so
failures are visible for debugging; use the same pattern used in
fetch_comments() around the subprocess.run call for consistency.
| class MyBlock(BaseBlock): | ||
| name = "My Block" | ||
| description = "Short description of what this block does" | ||
| category = "generators" # generators|transformers|validators|utilities |
There was a problem hiding this comment.
Category list inconsistent with state-backend.md.
Line 42 shows generators|transformers|validators|utilities but llm/state-backend.md (line 384) lists: generators, validators, metrics, seeders, multipliers, observability, utilities.
Missing: metrics, seeders, multipliers, observability
Extra: transformers
📝 Proposed fix
- category = "generators" # generators|transformers|validators|utilities
+ category = "generators" # generators|validators|metrics|seeders|multipliers|observability|utilities📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| category = "generators" # generators|transformers|validators|utilities | |
| category = "generators" # generators|validators|metrics|seeders|multipliers|observability|utilities |
🤖 Prompt for AI Agents
In @.claude/skills/implementing-datagenflow-blocks/SKILL.md at line 42, Update
the category enumeration assigned to the variable `category` in SKILL.md so it
matches the canonical list used by the state backend: replace the current value
"generators|transformers|validators|utilities" with
"generators|validators|metrics|seeders|multipliers|observability|utilities"
(i.e., remove "transformers" and add "metrics", "seeders", "multipliers", and
"observability") so the SKILL `category` options align with the state-backend
definitions.
| ) | ||
|
|
||
| # prepare embedding call | ||
| embedding_params = llm_config_manager._prepare_embedding_call( |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# First, let's verify the method exists and check for public alternatives
ast-grep --pattern 'def _prepare_embedding_call($$$)'
ast-grep --pattern 'def prepare_embedding_call($$$)'Repository: nicofretti/DataGenFlow
Length of output: 2657
🏁 Script executed:
#!/bin/bash
# Check for prepare_llm_call (public method mentioned in docstring)
ast-grep --pattern 'def prepare_llm_call($$$)'
# Find all usages of _prepare_embedding_call
rg '_prepare_embedding_call' --type py -B 2 -A 2
# Check the LLMConfigManager class structure
rg 'class.*[Cc]onfig' lib/llm_config.py --type py -A 5Repository: nicofretti/DataGenFlow
Length of output: 7400
Make _prepare_embedding_call public or refactor internal usage.
The _prepare_embedding_call() method is private (underscore prefix) but is actively used from multiple modules (duplicate_remover.py, semantic_infiller.py, ragas_metrics.py). This violates encapsulation. Either expose it as a public method prepare_embedding_call() (matching the pattern of prepare_llm_call()), or refactor the modules to use a public API.
🤖 Prompt for AI Agents
In @.claude/skills/implementing-datagenflow-blocks/SKILL.md at line 386, The
private method _prepare_embedding_call is used across modules
(duplicate_remover.py, semantic_infiller.py, ragas_metrics.py); make it public
by adding a prepare_embedding_call() wrapper on the LLM config manager that
delegates to the existing _prepare_embedding_call (or rename the method and keep
a short-lived alias for backward compatibility), update all callers to invoke
prepare_embedding_call instead of _prepare_embedding_call, and ensure any
docstring/visibility tests mirror the existing prepare_llm_call pattern so
behavior and signatures remain identical.
| with sync_playwright() as p: | ||
| browser = p.chromium.launch(headless=True) | ||
| page = browser.new_page(viewport={"width": 1920, "height": 1080}) | ||
|
|
||
| # Navigate to local HTML file | ||
| page.goto(file_url) | ||
|
|
||
| # Take screenshot | ||
| page.screenshot(path=str(output_dir / "static_page.png"), full_page=True) | ||
|
|
||
| # Interact with elements | ||
| page.click("text=Click Me") | ||
| page.fill("#name", "John Doe") | ||
| page.fill("#email", "john@example.com") | ||
|
|
||
| # Submit form | ||
| page.click('button[type="submit"]') | ||
| page.wait_for_timeout(500) | ||
|
|
||
| # Take final screenshot | ||
| page.screenshot(path=str(output_dir / "after_submit.png"), full_page=True) | ||
|
|
||
| browser.close() |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find . -name "static_html_automation.py" -type fRepository: nicofretti/DataGenFlow
Length of output: 132
🏁 Script executed:
fd "static_html_automation" --type fRepository: nicofretti/DataGenFlow
Length of output: 48
🏁 Script executed:
cat -n .claude/skills/webapp-testing/examples/static_html_automation.pyRepository: nicofretti/DataGenFlow
Length of output: 1371
🏁 Script executed:
# Check if there's any documentation or tests showing how playwright context manager handles exceptions
rg "sync_playwright" --type py -A 10 -B 2 | head -60Repository: nicofretti/DataGenFlow
Length of output: 3925
🏁 Script executed:
# Search for other sync_playwright usages with try/finally or exception handling patterns
rg "with sync_playwright" -A 20 | grep -E "(try|except|finally)" | head -20Repository: nicofretti/DataGenFlow
Length of output: 48
🏁 Script executed:
# Check playwright documentation patterns in comments or existing code
rg "browser.close\(\)" --type py -B 3 -A 1Repository: nicofretti/DataGenFlow
Length of output: 6605
🌐 Web query:
Playwright Python sync_playwright context manager exception handling browser cleanup
💡 Result:
In Playwright Python (sync), with sync_playwright() as p: ensures the Playwright driver is stopped when the block exits, even if an exception is raised—but you should still explicitly close any BrowserContext and Browser you created using try/finally so browsers shut down cleanly and artifacts (videos/HAR, etc.) flush properly. (playwright.dev)
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch()
context = browser.new_context()
try:
page = context.new_page()
page.goto("https://example.com")
# ... your actions/assertions ...
finally:
# Close context first (graceful; flushes artifacts), then browser
context.close()
browser.close()Notes:
- Playwright docs recommend explicitly closing contexts you created via
browser.new_context()before callingbrowser.close(). (playwright.dev) - If you don’t use the context manager (
sync_playwright().start()), you must callplaywright.stop()yourself (typically in afinally). (playwright.dev)
Citations:
- 1: https://playwright.dev/python/docs/api/class-browser?utm_source=openai
- 2: https://playwright.dev/python/docs/api/class-browser?utm_source=openai
- 3: https://playwright.dev/python/docs/api/class-playwright?utm_source=openai
Wrap browser operations in try/finally to ensure cleanup on exceptions.
If page navigation or interactions fail, browser.close() is never reached, risking orphaned processes. While sync_playwright() provides cleanup, explicit resource handling via try/finally is a best practice for proper shutdown and artifact flushing.
Proposed fix
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
- page = browser.new_page(viewport={"width": 1920, "height": 1080})
-
- # Navigate to local HTML file
- page.goto(file_url)
-
- # Take screenshot
- page.screenshot(path=str(output_dir / "static_page.png"), full_page=True)
-
- # Interact with elements
- page.click("text=Click Me")
- page.fill("#name", "John Doe")
- page.fill("#email", "john@example.com")
-
- # Submit form
- page.click('button[type="submit"]')
- page.wait_for_timeout(500)
-
- # Take final screenshot
- page.screenshot(path=str(output_dir / "after_submit.png"), full_page=True)
-
- browser.close()
+ try:
+ page = browser.new_page(viewport={"width": 1920, "height": 1080})
+
+ # Navigate to local HTML file
+ page.goto(file_url)
+
+ # Take screenshot
+ page.screenshot(path=str(output_dir / "static_page.png"), full_page=True)
+
+ # Interact with elements
+ page.click("text=Click Me")
+ page.fill("#name", "John Doe")
+ page.fill("#email", "john@example.com")
+
+ # Submit form
+ page.click('button[type="submit"]')
+ page.wait_for_timeout(500)
+
+ # Take final screenshot
+ page.screenshot(path=str(output_dir / "after_submit.png"), full_page=True)
+ finally:
+ browser.close()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| with sync_playwright() as p: | |
| browser = p.chromium.launch(headless=True) | |
| page = browser.new_page(viewport={"width": 1920, "height": 1080}) | |
| # Navigate to local HTML file | |
| page.goto(file_url) | |
| # Take screenshot | |
| page.screenshot(path=str(output_dir / "static_page.png"), full_page=True) | |
| # Interact with elements | |
| page.click("text=Click Me") | |
| page.fill("#name", "John Doe") | |
| page.fill("#email", "john@example.com") | |
| # Submit form | |
| page.click('button[type="submit"]') | |
| page.wait_for_timeout(500) | |
| # Take final screenshot | |
| page.screenshot(path=str(output_dir / "after_submit.png"), full_page=True) | |
| browser.close() | |
| with sync_playwright() as p: | |
| browser = p.chromium.launch(headless=True) | |
| try: | |
| page = browser.new_page(viewport={"width": 1920, "height": 1080}) | |
| # Navigate to local HTML file | |
| page.goto(file_url) | |
| # Take screenshot | |
| page.screenshot(path=str(output_dir / "static_page.png"), full_page=True) | |
| # Interact with elements | |
| page.click("text=Click Me") | |
| page.fill("#name", "John Doe") | |
| page.fill("#email", "john@example.com") | |
| # Submit form | |
| page.click('button[type="submit"]') | |
| page.wait_for_timeout(500) | |
| # Take final screenshot | |
| page.screenshot(path=str(output_dir / "after_submit.png"), full_page=True) | |
| finally: | |
| browser.close() |
🤖 Prompt for AI Agents
In @.claude/skills/webapp-testing/examples/static_html_automation.py around
lines 13 - 35, Wrap the browser/page operations inside a try/finally so
browser.close() always runs: after calling sync_playwright() and creating
browser and page (symbols: sync_playwright, p.chromium.launch, browser, page),
put navigation/page interactions (page.goto, page.screenshot, page.click,
page.fill, page.wait_for_timeout) inside a try block and call browser.close() in
the finally block; ensure screenshots (output_dir) and any pending actions are
flushed before closing by keeping the final page.screenshot call in the try and
leaving only browser.close() in finally.
app.py
Outdated
| @api_router.put("/llm-models/{name}/default") | ||
| async def set_default_llm_model(name: str) -> dict[str, str]: | ||
| """set default llm model""" | ||
| try: | ||
| await llm_config_manager.set_default_llm_model(name) | ||
| return {"message": "llm model set as default successfully"} | ||
| except LLMConfigNotFoundError as e: | ||
| raise HTTPException(status_code=404, detail=e.message) | ||
| except Exception as e: | ||
| logger.exception(f"failed to set default llm model {name}") | ||
| raise HTTPException(status_code=400, detail=str(e)) |
There was a problem hiding this comment.
Document the new default-model endpoints in backend state docs.
Please update llm/state-backend.md to capture these endpoints and default-model behavior.
As per coding guidelines: **/.py: Apply backend code review checklist from llm/rules-backend.md: Identify which llm/.md files need updates: - New API endpoints → update llm/state-backend.md; Identify if the docs needs updates.
Also applies to: 769-779
🧰 Tools
🪛 Ruff (0.14.14)
[warning] 706-706: Consider moving this statement to an else block
(TRY300)
[warning] 708-708: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
[warning] 709-709: Do not catch blind exception: Exception
(BLE001)
[warning] 711-711: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
🤖 Prompt for AI Agents
In `@app.py` around lines 701 - 711, Add documentation for the new default-model
endpoints to llm/state-backend.md: describe the PUT /llm-models/{name}/default
endpoint implemented by set_default_llm_model (which calls
llm_config_manager.set_default_llm_model), the expected request/response shape,
HTTP status codes (200 success, 404 LLMConfigNotFoundError, 400 other errors),
and the semantics of "default model" behavior; also scan nearby endpoints (e.g.,
the other default-model related handlers around lines 769-779) and update any
related llm/*.md sections to satisfy the backend code review checklist from
llm/rules-backend.md.
| @@ -1,4 +1,4 @@ | |||
| .PHONY: check-deps install dev dev-ui dev-backend run-dev build-ui run mock-llm clean lint format lint-frontend format-frontend format-all lint-all typecheck typecheck-frontend typecheck-all test test-integration pre-merge setup | |||
| .PHONY: check-deps install dev dev-ui dev-backend run-dev build-ui run mock-llm clean lint format lint-frontend format-frontend format-all lint-all typecheck typecheck-frontend typecheck-all test test-integration test-e2e test-e2e-ui pre-merge setup | |||
There was a problem hiding this comment.
Add all to .PHONY to avoid file conflicts.
Ensures a file named all doesn’t shadow the target.
Suggested fix
-.PHONY: check-deps install dev dev-ui dev-backend run-dev build-ui run mock-llm clean lint format lint-frontend format-frontend format-all lint-all typecheck typecheck-frontend typecheck-all test test-integration test-e2e test-e2e-ui pre-merge setup
+.PHONY: check-deps install dev dev-ui dev-backend run-dev build-ui run mock-llm clean lint format lint-frontend format-frontend format-all lint-all typecheck typecheck-frontend typecheck-all test test-integration test-e2e test-e2e-ui pre-merge setup all📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| .PHONY: check-deps install dev dev-ui dev-backend run-dev build-ui run mock-llm clean lint format lint-frontend format-frontend format-all lint-all typecheck typecheck-frontend typecheck-all test test-integration test-e2e test-e2e-ui pre-merge setup | |
| .PHONY: check-deps install dev dev-ui dev-backend run-dev build-ui run mock-llm clean lint format lint-frontend format-frontend format-all lint-all typecheck typecheck-frontend typecheck-all test test-integration test-e2e test-e2e-ui pre-merge setup all |
🧰 Tools
🪛 checkmake (0.2.2)
[warning] 1-1: Missing required phony target "all"
(minphony)
🤖 Prompt for AI Agents
In `@Makefile` at line 1, The .PHONY declaration currently lists many targets but
omits "all", which can cause a file named "all" to shadow the make target;
update the .PHONY line (the .PHONY declaration containing targets like
check-deps, install, dev, etc.) to include the "all" target so Make treats "all"
as phony and avoids file conflicts.
|
|
||
| [tool.pytest.ini_options] | ||
| addopts = "-m 'not integration' --tb=short --ignore=scripts/" | ||
| addopts = "-m 'not integration' --tb=short --ignore=scripts/ --ignore=tests/e2e/" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for CI workflows that might run e2e tests
find .github/workflows -type f -name "*.yml" -o -name "*.yaml" 2>/dev/null | head -20Repository: nicofretti/DataGenFlow
Length of output: 97
🏁 Script executed:
#!/bin/bash
# Search for e2e test execution in CI files
rg -i "e2e|end.*to.*end|playwright|tests/e2e" --type yaml --type ymlRepository: nicofretti/DataGenFlow
Length of output: 93
🏁 Script executed:
#!/bin/bash
# Search for test-related scripts
rg -i "e2e|test.*all|integration" --type sh --type py scripts/ 2>/dev/null | head -30Repository: nicofretti/DataGenFlow
Length of output: 249
🏁 Script executed:
#!/bin/bash
# Check the current state of pyproject.toml at line 85
sed -n '80,90p' pyproject.tomlRepository: nicofretti/DataGenFlow
Length of output: 433
🏁 Script executed:
#!/bin/bash
# Read the pre-merge CI workflow
cat .github/workflows/pre-merge.ymlRepository: nicofretti/DataGenFlow
Length of output: 1214
🏁 Script executed:
#!/bin/bash
# Search for pytest invocations that might override or run e2e tests specifically
rg "pytest|test" --type yaml . | grep -i "e2e\|override\|pytest.*tests"Repository: nicofretti/DataGenFlow
Length of output: 48
🏁 Script executed:
#!/bin/bash
# Check if with_server.py is called by any CI or test script
rg "with_server|test_mock_providers" --type yaml --type pyRepository: nicofretti/DataGenFlow
Length of output: 401
🏁 Script executed:
#!/bin/bash
# Find and read Makefile
find . -maxdepth 2 -name "Makefile" -o -name "makefile" | head -5Repository: nicofretti/DataGenFlow
Length of output: 76
🏁 Script executed:
#!/bin/bash
# Check Makefile content - specifically the test target
rg "^test:" -A 10 MakefileRepository: nicofretti/DataGenFlow
Length of output: 228
🏁 Script executed:
#!/bin/bash
# Look for all test-related make targets
rg "^[a-z-]+.*test" MakefileRepository: nicofretti/DataGenFlow
Length of output: 116
E2E tests are skipped in CI — add make test-e2e to pre-merge checks.
The pytest addopts now ignores tests/e2e/, and the CI workflow only runs make test (which uses those options). Although make test-e2e exists and calls ./tests/e2e/run_all_tests.sh, it is not invoked by the pre-merge CI job. This means e2e test failures won't be caught on PRs.
Add test-e2e to the pre-merge target in the Makefile or call it separately in .github/workflows/pre-merge.yml.
🤖 Prompt for AI Agents
In `@pyproject.toml` at line 85, The CI is skipping E2E tests because
pyproject.toml's addopts ignores tests/e2e and the pre-merge CI job only runs
the Makefile's test target; update CI to run E2E by adding the existing Makefile
target test-e2e (which calls ./tests/e2e/run_all_tests.sh) to the pre-merge
workflow or add an explicit step in .github/workflows/pre-merge.yml to run `make
test-e2e` so E2E coverage is executed on PRs.
| follow the webapp-testing skill patterns: | ||
|
|
There was a problem hiding this comment.
Fix minor terminology in “Writing New Tests”.
Consider “web app testing” (or “web-app testing”) instead of “webapp-testing.”
Suggested edit
-follow the webapp-testing skill patterns:
+follow the web app testing skill patterns:📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| follow the webapp-testing skill patterns: | |
| follow the web app testing skill patterns: | |
🧰 Tools
🪛 LanguageTool
[grammar] ~204-~204: Ensure spelling is correct
Context: ...s ``` ## Writing New Tests follow the webapp-testing skill patterns: 1. **wait for n...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
🤖 Prompt for AI Agents
In `@tests/e2e/README.md` around lines 204 - 205, In the "Writing New Tests"
section replace the term "webapp-testing" with the preferred phrase "web app
testing" (or "web-app testing") to correct terminology; locate occurrences of
the exact token "webapp-testing" in the README (e.g., the line under the
"Writing New Tests" heading) and update them to "web app testing" consistently
throughout the document.
| pipeline_id = await storage.save_pipeline("test_sampler", json.dumps(pipeline_def)) | ||
| assert pipeline_id > 0 | ||
| pipeline = Pipeline("test_sampler", pipeline_def["blocks"]) |
There was a problem hiding this comment.
Double JSON serialization bug.
save_pipeline internally calls json.dumps(definition) (see lib/storage.py:459-469). Passing json.dumps(pipeline_def) here will double-serialize, storing escaped JSON strings instead of proper JSON objects.
🐛 Fix: Pass dict directly
- pipeline_id = await storage.save_pipeline("test_sampler", json.dumps(pipeline_def))
+ pipeline_id = await storage.save_pipeline("test_sampler", pipeline_def)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| pipeline_id = await storage.save_pipeline("test_sampler", json.dumps(pipeline_def)) | |
| assert pipeline_id > 0 | |
| pipeline = Pipeline("test_sampler", pipeline_def["blocks"]) | |
| pipeline_id = await storage.save_pipeline("test_sampler", pipeline_def) | |
| assert pipeline_id > 0 | |
| pipeline = Pipeline("test_sampler", pipeline_def["blocks"]) |
🤖 Prompt for AI Agents
In `@tests/integration/test_data_augmentation.py` around lines 192 - 194, The test
is double-serializing pipeline_def because it calls storage.save_pipeline with
json.dumps(pipeline_def); change the call to pass the dict directly
(pipeline_def) so save_pipeline can perform the single json.dumps internally;
update the line that constructs the Pipeline instance remains as
Pipeline("test_sampler", pipeline_def["blocks"]) and ensure you only remove the
extra json.dumps wrapper around pipeline_def when invoking save_pipeline.
| pipeline_id = await storage.save_pipeline("test_no_embedding", json.dumps(pipeline_def)) | ||
| assert pipeline_id > 0 | ||
| pipeline = Pipeline("test_no_embedding", pipeline_def["blocks"]) |
There was a problem hiding this comment.
Same double-serialization issue.
Apply the same fix here.
🐛 Fix
- pipeline_id = await storage.save_pipeline("test_no_embedding", json.dumps(pipeline_def))
+ pipeline_id = await storage.save_pipeline("test_no_embedding", pipeline_def)🤖 Prompt for AI Agents
In `@tests/integration/test_data_augmentation.py` around lines 295 - 297, The test
currently double-serializes the pipeline by calling storage.save_pipeline with
json.dumps(pipeline_def); change the call to pass the raw dict (pipeline_def) to
storage.save_pipeline so it isn't serialized twice, keeping the subsequent
Pipeline("test_no_embedding", pipeline_def["blocks"]) as-is; update the use of
storage.save_pipeline(...) accordingly.
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@CHANGELOG.md`:
- Around line 10-11: The phrasing "first configured model" in the "Default model
selection" changelog entry is ambiguous; update the sentence to explicitly
define the ordering rule (e.g., "the earliest created model" or "the first model
in the Settings UI list as displayed") and state which attribute is used
(creation timestamp or list position). Reference the changelog entry title
"Default model selection" and the `is_default` flag so readers know the fallback
deterministically uses that defined ordering.
In `@frontend/src/pages/Settings.tsx`:
- Around line 236-252: The model card Box (key={model.name}) is clickable but
not keyboard-accessible; make it focusable and operable by adding tabIndex={0}
(so it can receive focus), role="button" and an onKeyDown handler that calls
handleSetDefaultLlm(model.name) when Enter or Space is pressed (respecting the
isDefault guard so default cards remain inert). Also add appropriate ARIA state
attributes such as aria-pressed or aria-current (or aria-disabled when
isDefault) to convey selection to assistive tech; apply the same changes to the
other card block around lines 399-415.
- Around line 132-152: Update the llm/state-frontend.md documentation to
describe the new Settings flow: explain the ability to select default models and
the new API integration patterns used by the frontend (reference the handler
functions handleSetDefaultLlm and handleSetDefaultEmbedding and the llmConfigApi
methods setDefaultLLMModel and setDefaultEmbeddingModel), and document the state
update calls loadLlmModels and loadEmbeddingModels that refresh UI state after
API calls; include example UX steps for selecting defaults and note expected
success/error handling (toast notifications) so the docs match the updated
Settings.tsx behavior.
- Around line 132-151: In the catch blocks of handleSetDefaultLlm and
handleSetDefaultEmbedding, log the caught error to the console in addition to
showing the toast; capture the error variable (use the existing error) and call
console.error with a descriptive message and the error object (e.g., "Failed to
set default LLM model" / "Failed to set default embedding model") so that
llmConfigApi.setDefaultLLMModel and llmConfigApi.setDefaultEmbeddingModel
failures are recorded for debugging.
| ### Added | ||
| - Default model selection: Users can designate a default LLM and embedding model via the Settings UI. Both model types support `is_default` flag with automatic fallback to the first configured model. |
There was a problem hiding this comment.
Clarify “first configured model” ordering.
“First configured” is ambiguous (creation time? list order? name sort?). Please specify the ordering criteria to avoid user confusion.
🤖 Prompt for AI Agents
In `@CHANGELOG.md` around lines 10 - 11, The phrasing "first configured model" in
the "Default model selection" changelog entry is ambiguous; update the sentence
to explicitly define the ordering rule (e.g., "the earliest created model" or
"the first model in the Settings UI list as displayed") and state which
attribute is used (creation timestamp or list position). Reference the changelog
entry title "Default model selection" and the `is_default` flag so readers know
the fallback deterministically uses that defined ordering.
| const handleSetDefaultLlm = async (name: string) => { | ||
| try { | ||
| await llmConfigApi.setDefaultLLMModel(name); | ||
| toast.success("Default LLM model updated"); | ||
| loadLlmModels(); | ||
| } catch (error) { | ||
| const message = error instanceof Error ? error.message : "Unknown error"; | ||
| toast.error(`Failed to set default LLM model: ${message}`); | ||
| } | ||
| }; | ||
|
|
||
| const handleSetDefaultEmbedding = async (name: string) => { | ||
| try { | ||
| await llmConfigApi.setDefaultEmbeddingModel(name); | ||
| toast.success("Default embedding model updated"); | ||
| loadEmbeddingModels(); | ||
| } catch (error) { | ||
| const message = error instanceof Error ? error.message : "Unknown error"; | ||
| toast.error(`Failed to set default embedding model: ${message}`); | ||
| } | ||
| }; |
There was a problem hiding this comment.
Update frontend state docs for the new Settings flow.
The Settings page now supports default-model selection and new API integration patterns. Please reflect this in llm/state-frontend.md.
Based on learnings: Applies to llm/state-frontend.md : Update llm/state-frontend.md when API integration patterns change. Based on learnings: Applies to llm/state-frontend.md : Update llm/state-frontend.md when UI flow changes.
🤖 Prompt for AI Agents
In `@frontend/src/pages/Settings.tsx` around lines 132 - 152, Update the
llm/state-frontend.md documentation to describe the new Settings flow: explain
the ability to select default models and the new API integration patterns used
by the frontend (reference the handler functions handleSetDefaultLlm and
handleSetDefaultEmbedding and the llmConfigApi methods setDefaultLLMModel and
setDefaultEmbeddingModel), and document the state update calls loadLlmModels and
loadEmbeddingModels that refresh UI state after API calls; include example UX
steps for selecting defaults and note expected success/error handling (toast
notifications) so the docs match the updated Settings.tsx behavior.
| const handleSetDefaultLlm = async (name: string) => { | ||
| try { | ||
| await llmConfigApi.setDefaultLLMModel(name); | ||
| toast.success("Default LLM model updated"); | ||
| loadLlmModels(); | ||
| } catch (error) { | ||
| const message = error instanceof Error ? error.message : "Unknown error"; | ||
| toast.error(`Failed to set default LLM model: ${message}`); | ||
| } | ||
| }; | ||
|
|
||
| const handleSetDefaultEmbedding = async (name: string) => { | ||
| try { | ||
| await llmConfigApi.setDefaultEmbeddingModel(name); | ||
| toast.success("Default embedding model updated"); | ||
| loadEmbeddingModels(); | ||
| } catch (error) { | ||
| const message = error instanceof Error ? error.message : "Unknown error"; | ||
| toast.error(`Failed to set default embedding model: ${message}`); | ||
| } |
There was a problem hiding this comment.
Log default-setting failures to console.
The toast is user-facing, but the error isn’t logged, which makes debugging harder. Add a console.error alongside the toast.
Proposed fix
const handleSetDefaultLlm = async (name: string) => {
try {
await llmConfigApi.setDefaultLLMModel(name);
toast.success("Default LLM model updated");
loadLlmModels();
} catch (error) {
const message = error instanceof Error ? error.message : "Unknown error";
+ console.error("Failed to set default LLM model:", error);
toast.error(`Failed to set default LLM model: ${message}`);
}
};
const handleSetDefaultEmbedding = async (name: string) => {
try {
await llmConfigApi.setDefaultEmbeddingModel(name);
toast.success("Default embedding model updated");
loadEmbeddingModels();
} catch (error) {
const message = error instanceof Error ? error.message : "Unknown error";
+ console.error("Failed to set default embedding model:", error);
toast.error(`Failed to set default embedding model: ${message}`);
}
};As per coding guidelines: Log errors to console in frontend - ensure errors are logged for debugging.
🤖 Prompt for AI Agents
In `@frontend/src/pages/Settings.tsx` around lines 132 - 151, In the catch blocks
of handleSetDefaultLlm and handleSetDefaultEmbedding, log the caught error to
the console in addition to showing the toast; capture the error variable (use
the existing error) and call console.error with a descriptive message and the
error object (e.g., "Failed to set default LLM model" / "Failed to set default
embedding model") so that llmConfigApi.setDefaultLLMModel and
llmConfigApi.setDefaultEmbeddingModel failures are recorded for debugging.
| <Box | ||
| key={model.name} | ||
| onClick={() => !isDefault && handleSetDefaultLlm(model.name)} | ||
| sx={{ | ||
| p: 3, | ||
| border: "1px solid", | ||
| borderColor: isDefault ? "success.emphasis" : "border.default", | ||
| borderRadius: 2, | ||
| bg: isDefault ? "success.subtle" : "canvas.subtle", | ||
| cursor: isDefault ? "default" : "pointer", | ||
| transition: "all 0.2s", | ||
| "&:hover": { | ||
| borderColor: isDefault ? "success.emphasis" : "accent.emphasis", | ||
| transform: isDefault ? "none" : "translateY(-2px)", | ||
| boxShadow: isDefault ? "none" : "shadow.medium", | ||
| }, | ||
| }} |
There was a problem hiding this comment.
Make default-selection cards keyboard accessible.
The cards are clickable but not focusable or operable via keyboard, which blocks non-mouse users from setting defaults.
Proposed fix
<Box
key={model.name}
onClick={() => !isDefault && handleSetDefaultLlm(model.name)}
+ role="button"
+ tabIndex={isDefault ? -1 : 0}
+ aria-pressed={isDefault}
+ onKeyDown={(e: React.KeyboardEvent) => {
+ if (!isDefault && (e.key === "Enter" || e.key === " ")) {
+ e.preventDefault();
+ handleSetDefaultLlm(model.name);
+ }
+ }}
sx={{
p: 3,
border: "1px solid",
borderColor: isDefault ? "success.emphasis" : "border.default", <Box
key={model.name}
onClick={() => !isDefault && handleSetDefaultEmbedding(model.name)}
+ role="button"
+ tabIndex={isDefault ? -1 : 0}
+ aria-pressed={isDefault}
+ onKeyDown={(e: React.KeyboardEvent) => {
+ if (!isDefault && (e.key === "Enter" || e.key === " ")) {
+ e.preventDefault();
+ handleSetDefaultEmbedding(model.name);
+ }
+ }}
sx={{
p: 3,
border: "1px solid",
borderColor: isDefault ? "success.emphasis" : "border.default",Also applies to: 399-415
🤖 Prompt for AI Agents
In `@frontend/src/pages/Settings.tsx` around lines 236 - 252, The model card Box
(key={model.name}) is clickable but not keyboard-accessible; make it focusable
and operable by adding tabIndex={0} (so it can receive focus), role="button" and
an onKeyDown handler that calls handleSetDefaultLlm(model.name) when Enter or
Space is pressed (respecting the isDefault guard so default cards remain inert).
Also add appropriate ARIA state attributes such as aria-pressed or aria-current
(or aria-disabled when isDefault) to convey selection to assistive tech; apply
the same changes to the other card block around lines 399-415.
🚀 Feat: Model selection + e2e tests
Description
Related Issue
Checklist
make formatpassesmake pre-mergepassesSummary by CodeRabbit
New Features
Documentation
Tests
Chores
✏️ Tip: You can customize this high-level summary in your review settings.