🚀 Feat: template conversational data augmentation + e2e tests + CodeRabbit#57
🚀 Feat: template conversational data augmentation + e2e tests + CodeRabbit#57nicofretti merged 31 commits intodevelopfrom
Conversation
WalkthroughAdds a data-augmentation feature set (StructureSampler, SemanticInfiller, DuplicateRemover), JSON-or-template config support across blocks, frontend editor improvements (json-or-template toggle and model dropdowns), API/pipeline output changes, template utilities, E2E Playwright test infra + tests, various docs/skills, CI/build updates, and helper scripts for tests and PR comment inspection. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant BlockConfigPanel as BlockConfigPanel.tsx
participant API as /api/llm-models\n/api/embedding-models
participant Monaco as Monaco Editor
Client->>BlockConfigPanel: Select block node
BlockConfigPanel->>BlockConfigPanel: Reset jsonMode state
BlockConfigPanel->>API: GET models
API-->>BlockConfigPanel: model lists
BlockConfigPanel->>BlockConfigPanel: Store model names
Client->>BlockConfigPanel: Toggle json-or-template
alt JSON Mode
BlockConfigPanel->>Monaco: Init language=json
Client->>Monaco: Edit JSON
Monaco-->>BlockConfigPanel: JSON content
BlockConfigPanel->>BlockConfigPanel: Validate JSON on save
else Template Mode
BlockConfigPanel->>Monaco: Init language=jinja2
Client->>Monaco: Edit template
Monaco-->>BlockConfigPanel: Template content
end
Client->>BlockConfigPanel: Select model / Save
BlockConfigPanel->>BlockConfigPanel: Preserve custom model values
BlockConfigPanel->>API: Persist config
sequenceDiagram
participant Pipeline
participant StructureSampler as StructureSampler
participant SemanticInfiller as SemanticInfiller
participant DuplicateRemover as DuplicateRemover
participant LLM
participant Embedding as Embedding Model
Pipeline->>StructureSampler: execute(seed_data)
StructureSampler->>StructureSampler: Analyze distributions\n(topological dependencies)
StructureSampler-->>Pipeline: {skeletons, _seed_samples}
Pipeline->>SemanticInfiller: execute(skeletons)
SemanticInfiller->>LLM: Generate fields_to_generate
LLM-->>SemanticInfiller: JSON output
alt Diversity enabled
SemanticInfiller->>Embedding: Embed existing samples
SemanticInfiller->>Embedding: Embed candidate sample
Embedding-->>SemanticInfiller: embeddings
SemanticInfiller->>SemanticInfiller: Cosine similarity check
alt Too similar
SemanticInfiller->>LLM: Retry with negative exemplars/raised temp
end
end
SemanticInfiller-->>Pipeline: {samples, _usage}
Pipeline->>DuplicateRemover: execute(samples)
DuplicateRemover->>Embedding: Embed seed samples (cached per trace)
DuplicateRemover->>Embedding: Embed generated samples
Embedding-->>DuplicateRemover: embeddings
DuplicateRemover->>DuplicateRemover: Compute similarities\nflag is_duplicate
DuplicateRemover-->>Pipeline: {generated_samples, similarity metrics}
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Comment |
|
@coderabbitai help |
ChatThere are 3 ways to chat with CodeRabbit:
CodeRabbit commands
Other keywords and placeholders
Status, support, documentation and community
|
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 5
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (5)
frontend/src/pages/Generator.tsx (1)
355-363: Static description text doesn't reflect markdown mode.The description still says "Upload a JSON seed file" but the component now supports markdown files too. Consider making this dynamic based on
needsMarkdown.Proposed fix
<Text sx={{ color: "fg.default" }}> - Upload a JSON seed file with input data. Each seed will be executed through your pipeline - multiple times based on repetitions. + {needsMarkdown + ? "Upload a Markdown file with your content. The file will be processed through your pipeline." + : "Upload a JSON seed file with input data. Each seed will be executed through your pipeline multiple times based on repetitions."} </Text>Makefile (1)
1-1: Add new targets to.PHONYdeclaration.The new
test-e2eandtest-e2e-uitargets should be declared as phony to prevent conflicts with potential files of the same name.Proposed fix
-.PHONY: check-deps install dev dev-ui dev-backend run-dev build-ui run mock-llm clean lint format lint-frontend format-frontend format-all lint-all typecheck typecheck-frontend typecheck-all test test-integration pre-merge setup +.PHONY: check-deps install dev dev-ui dev-backend run-dev build-ui run mock-llm clean lint format lint-frontend format-frontend format-all lint-all typecheck typecheck-frontend typecheck-all test test-integration test-e2e test-e2e-ui pre-merge setupllm/state-project.md (1)
55-60: Directory structure omits tests/e2e.Line 55-60 lists test directories but the PR adds e2e tests; please include
tests/e2e/(or mention it in the testing section). As per coding guidelines, docs should reflect the actual codebase.lib/blocks/config.py (1)
41-67: Document_config_formatsmetadata pattern in block configuration docs.The
_config_formatsattribute is extracted in schema generation (line 49) and applied to JSON schema format hints (line 35-36), but is not documented in llm/state-backend.md or llm/state-project.md. Update BaseBlock interface sections to include this pattern alongside_config_enums,_field_references, and_config_descriptions.lib/blocks/builtin/ragas_metrics.py (1)
295-310: Update RAGAS dependency constraint to match implemented API version.Code uses RAGAS 0.4.x API (
ascore(**kwargs)returningMetricResultobject with.valueattribute, parameter names likereference), butpyproject.tomlallowsragas>=0.2.0. RAGAS 0.2.x has incompatible API (returns float directly, usesground_truthsinstead ofreference, different import paths). Update the constraint toragas>=0.4.0to prevent runtime failures.
🤖 Fix all issues with AI agents
In @.coderabbit.yaml:
- Around line 13-19: The auto_review block uses the invalid key 'branches' which
the schema ignores; replace the 'branches' field with the schema-expected
'base_branches' (keeping values/regex patterns) so auto_incremental_review
targets the intended branches—update the 'auto_review' mapping in
.coderabbit.yaml by renaming 'branches' to 'base_branches' while preserving
other keys like 'enabled' and 'auto_incremental_review'.
In `@frontend/src/components/pipeline-editor/BlockConfigPanel.tsx`:
- Around line 254-288: The model dropdowns in BlockConfigPanel.tsx (the "model"
and "embedding_model" branches) can drop saved custom model names that aren't
returned by llmModels/embeddingModels; update the rendering so if value is
non-empty and not present in the respective model array you add a Select.Option
for that exact value (e.g., a disabled or clearly labeled "Custom: {value}"
option or a normal option) before mapping the API models, so the current value
remains selectable and handleChange will preserve it on save; keep the existing
empty/default option and the onChange handler (handleChange) unchanged.
In `@lib/blocks/builtin/structured_generator.py`:
- Around line 113-120: The except block catching json.JSONDecodeError should
re-raise BlockExecutionError using exception chaining to preserve the original
traceback; update the raise in the except json.JSONDecodeError as e handler so
it raises BlockExecutionError(...) from e, keeping the same message and detail
payload (including self.json_schema_template and schema_rendered).
In `@pyproject.toml`:
- Around line 97-100: Consolidate dev dependencies by removing the duplicate
[project.optional-dependencies].dev or [dependency-groups].dev section and
migrating all dev packages into a single dev group using the preferred style
(use [dependency-groups].dev if adopting PEP 735); locate entries under
[project.optional-dependencies].dev and [dependency-groups].dev, combine their
package strings into the chosen section (ensuring versions like
"playwright>=1.57.0" are preserved) and delete the other section so only one dev
dependency block remains.
In `@scripts/with_server.py`:
- Around line 35-41: The subprocess.Popen invocation that starts the
long-running server (the proc created from cmd with preexec_fn=os.setsid)
currently pipes stdout/stderr which can deadlock when buffers fill; change the
Popen call to not use subprocess.PIPE for stdout and stderr (redirect them to
subprocess.DEVNULL or leave as inherited/None) so output is not buffered into a
fixed-size pipe while still keeping the process group cleanup (os.setsid /
os.killpg) logic intact.
🟡 Minor comments (21)
.gitignore-22-23 (1)
22-23: Unignore contents of.claude/skills/explicitly.
Without the/**rule, files under the directory may remain ignored.Suggested fix
.claude/* !.claude/skills/ +!.claude/skills/**.github/pull_request_template.md-26-26 (1)
26-26: Clarify checklist wording.
Current phrasing is awkward; consider tightening it for clarity.Suggested wording
-- [ ] Ask CodeRabbit review addressed (comment `@coderabbitai review`) +- [ ] CodeRabbit review requested and addressed (comment `@coderabbitai review`)tests/e2e/README.md-270-270 (1)
270-270: Capitalize "GitHub" properly."github" should be "GitHub" (official spelling).
Proposed fix
-example github actions workflow: +example GitHub Actions workflow:tests/e2e/README.md-27-35 (1)
27-35: Add language specifier to fenced code block.The code block is missing a language identifier. Add
bashfor syntax highlighting consistency.Proposed fix
make sure both backend and frontend can start: -``` +```bash # test backend (port 8000) uv run uvicorn app:app --reload --host 0.0.0.0 --port 8000frontend/src/components/pipeline-editor/BlockConfigPanel.tsx-393-465 (1)
393-465: Document the new json-or-template editor behavior.
This introduces a new editing mode + model selector UX. Please updatellm/state-frontend.mdaccordingly. As per coding guidelines, document the new UI patterns.scripts/inspect_db_configs.py-16-51 (1)
16-51: Ensure DB connections close on error paths.
If any query raises, the current cleanup is skipped. Wrap the body intry/finallyto alwaysawait storage.close().💡 Suggested fix
async def main(): storage = Storage("data/qa_records.db") - await storage.init_db() - - # get LLM models - print("=== LLM Models ===") - llm_models = [] - - async def get_llm_models(db): - db.row_factory = aiosqlite.Row - cursor = await db.execute("SELECT * FROM llm_models") - return await cursor.fetchall() - - llm_rows = await storage._execute_with_connection(get_llm_models) - for row in llm_rows: - model_dict = {key: row[key] for key in row.keys()} - print(model_dict) - llm_models.append(model_dict) - - print("\n=== Embedding Models ===") - embedding_models = [] - - async def get_embedding_models(db): - db.row_factory = aiosqlite.Row - cursor = await db.execute("SELECT * FROM embedding_models") - return await cursor.fetchall() - - emb_rows = await storage._execute_with_connection(get_embedding_models) - for row in emb_rows: - model_dict = {key: row[key] for key in row.keys()} - print(model_dict) - embedding_models.append(model_dict) - - await storage.close() - - return llm_models, embedding_models + try: + await storage.init_db() + + # get LLM models + print("=== LLM Models ===") + llm_models = [] + + async def get_llm_models(db): + db.row_factory = aiosqlite.Row + cursor = await db.execute("SELECT * FROM llm_models") + return await cursor.fetchall() + + llm_rows = await storage._execute_with_connection(get_llm_models) + for row in llm_rows: + model_dict = {key: row[key] for key in row.keys()} + print(model_dict) + llm_models.append(model_dict) + + print("\n=== Embedding Models ===") + embedding_models = [] + + async def get_embedding_models(db): + db.row_factory = aiosqlite.Row + cursor = await db.execute("SELECT * FROM embedding_models") + return await cursor.fetchall() + + emb_rows = await storage._execute_with_connection(get_embedding_models) + for row in emb_rows: + model_dict = {key: row[key] for key in row.keys()} + print(model_dict) + embedding_models.append(model_dict) + + return llm_models, embedding_models + finally: + await storage.close()tests/e2e/test_helpers.py-57-65 (1)
57-65: Log errors when pipeline count fails.
Silent failures make flaky E2E diagnostics harder. Thecleanup_database()function in the same file already follows this pattern—catch exceptions and print diagnostics instead of silently passing.def get_pipeline_count(): """get number of pipelines in database""" try: response = httpx.get("http://localhost:8000/api/pipelines", timeout=10.0) if response.status_code == 200: return len(response.json()) - except Exception: - pass + except Exception as e: + print(f"get_pipeline_count warning: {e}") return -1tests/e2e/test_helpers.py-17-37 (1)
17-37: Surface HTTP errors from cleanup operations instead of silently ignoring them.
Without checking response status codes on API calls, errors go undetected and tests can run against dirty database state. Currently, line 23 ignores the delete response entirely, and lines 27–32 skip pipeline cleanup if the list API doesn't return 200 but don't raise an error. Raise an exception on non-2xx responses so the test suite fails loudly if cleanup fails.💡 Suggested fix
def cleanup_database(): """delete all pipelines, jobs, and records from the database""" base_url = "http://localhost:8000" try: # delete all records - httpx.delete(f"{base_url}/api/records", timeout=10.0) + resp = httpx.delete(f"{base_url}/api/records", timeout=10.0) + if resp.status_code >= 400: + raise RuntimeError(f"failed to delete records: {resp.status_code}") # get all pipelines response = httpx.get(f"{base_url}/api/pipelines", timeout=10.0) - if response.status_code == 200: - pipelines = response.json() + if response.status_code >= 400: + raise RuntimeError(f"failed to list pipelines: {response.status_code}") + pipelines = response.json() - # delete each pipeline - for pipeline in pipelines: - httpx.delete(f"{base_url}/api/pipelines/{pipeline['id']}", timeout=10.0) + # delete each pipeline + for pipeline in pipelines: + resp = httpx.delete(f"{base_url}/api/pipelines/{pipeline['id']}", timeout=10.0) + if resp.status_code >= 400: + raise RuntimeError(f"failed to delete pipeline {pipeline['id']}: {resp.status_code}") time.sleep(0.5) # wait for cleanup to complete except Exception as e: - print(f"cleanup warning: {e}") + raise RuntimeError(f"cleanup failed: {e}") from etests/e2e/test_review_e2e.py-23-25 (1)
23-25: Locator result unused - no actual verification.The
.or_()chain returns a locator but the result is never used or asserted. This test doesn't actually verify the page loaded correctly.Suggested fix
# verify we're on review page -# look for job selector or records section -page.get_by_text("Select Job", exact=False).or_(page.get_by_text("Records", exact=False)) +job_or_records = page.get_by_text("Select Job", exact=False).or_(page.get_by_text("Records", exact=False)) +assert job_or_records.count() > 0, "Review page should show job selector or records section"tests/e2e/test_review_e2e.py-180-184 (1)
180-184: Another unused locator - verification incomplete.Same issue: locator chain returns result but no assertion verifies the trace content is visible.
Suggested fix
# verify trace content is visible -# look for block type, execution time, or trace data -page.get_by_text("block_type", exact=False).or_( - page.get_by_text("execution_time", exact=False) -) +trace_content = page.get_by_text("block_type", exact=False).or_( + page.get_by_text("execution_time", exact=False) +) +assert trace_content.count() > 0, "Trace should show block_type or execution_time"tests/blocks/test_field_mapper.py-16-28 (1)
16-28: Align test names with required pattern.Names like Line 17
test_init_with_mappingsdon’t followtest_<method>_<scenario>_<expected>. Please align across this file. As per coding guidelines, keep test naming consistent.tests/blocks/commons/test_template_utils.py-15-46 (1)
15-46: Test names don't follow required pattern.Line 16 (
test_render_and_parse_list) and similar names don’t follow the requiredtest_<method>_<scenario>_<expected>convention. Please align across this file (e.g.,test_render_and_parse_json_list_success). As per coding guidelines, please keep test naming consistent.tests/e2e/test_generator_e2e.py-176-178 (1)
176-178: Locators created but never asserted.Lines 176-178 and 201-205 create locators without any assertions. These are effectively no-ops.
- # verify job progress appears - # look for progress indicators - page.get_by_text("Progress", exact=False).or_( - page.get_by_text("Generated", exact=False) - ) + # verify job progress appears + progress_indicator = page.get_by_text("Progress", exact=False).or_( + page.get_by_text("Generated", exact=False) + ) + expect(progress_indicator).to_be_visible(timeout=5000)Also applies to: 201-205
tests/integration/test_data_augmentation_pipeline.py-148-169 (1)
148-169: Missing@pytest.mark.integrationmarker.First test has
@pytest.mark.integrationbut this test doesn't. For consistent test filtering in CI, add the marker.+@pytest.mark.integration `@pytest.mark.asyncio` async def test_data_augmentation_pipeline_missing_fields_error():tests/e2e/test_generator_e2e.py-49-70 (1)
49-70: Test passes silently when no selectors found.If
len(selectors) > 0is false, the test completes without any assertions. This hides failures when the UI structure changes.Add assertion or skip
selectors = page.locator('select, [role="combobox"]').all() - if len(selectors) > 0: + assert len(selectors) > 0, "No pipeline selector found on page" + if len(selectors) > 0:Or use
pytest.skip()if the element is optional.tests/integration/test_data_augmentation.py-81-83 (1)
81-83: Remove double JSON encoding insave_pipelinecall.
save_pipelineexpects adict[str, Any]and internally callsjson.dumps(definition). Passingjson.dumps(pipeline_def)causes the string to be encoded again, storing escaped JSON in the database.Pass dict directly
# save pipeline to database - pipeline_id = await storage.save_pipeline("test_augmentation", json.dumps(pipeline_def)) + pipeline_id = await storage.save_pipeline("test_augmentation", pipeline_def) assert pipeline_id > 0tests/e2e/test_pipelines_e2e.py-38-69 (1)
38-69: Test lacks assertions - only prints debug info.
test_view_templatesdoesn't assert anything meaningful. The test will always pass regardless of whether templates are displayed.🔧 Add assertion
# if templates exist, there should be use template buttons # otherwise page should at least load without error print(f"found {use_template_buttons.count()} template buttons") + + # At minimum, verify page loaded without error + expect(page.get_by_text("Pipelines", exact=True)).to_be_visible() browser.close()tests/e2e/test_pipelines_e2e.py-114-172 (1)
114-172: Overly broad locator for delete buttons may match unintended elements.Line 142
page.locator("button:has(svg)")matches any button with an SVG icon, not specifically delete buttons. This could lead to clicking the wrong button.🔧 Tighten the selector
delete_buttons = ( page.get_by_role("button") .filter(has_text="Delete") .or_(page.locator('button[aria-label*="Delete"]')) - .or_(page.locator("button:has(svg)")) + .or_(page.locator('button[aria-label*="delete" i]')) )tests/e2e/test_pipelines_e2e.py-71-112 (1)
71-112: Test silently passes when no templates exist.If
create_buttons.count() == 0, the test prints a message and passes without testing anything. Consider usingpytest.skip()or making template availability a precondition.🔧 Suggested fix
if create_buttons.count() > 0: # ... existing logic ... else: - print("no template buttons found, skipping test") + pytest.skip("no template buttons found - templates may not be loaded") browser.close()lib/blocks/builtin/semantic_infiller.py-228-235 (1)
228-235: Add length check orstrict=Trueto prevent silent truncation.If
vec1andvec2have different lengths,zipsilently truncates, producing incorrect similarity scores. Embedding dimensions should match, but defensive coding is warranted.Proposed fix
def _cosine_similarity(self, vec1: list[float], vec2: list[float]) -> float: """compute cosine similarity between two vectors""" + if len(vec1) != len(vec2): + logger.warning(f"Vector length mismatch: {len(vec1)} vs {len(vec2)}") + return 0.0 dot = sum(a * b for a, b in zip(vec1, vec2)) norm1 = sum(a * a for a in vec1) ** 0.5 norm2 = sum(b * b for b in vec2) ** 0.5lib/blocks/builtin/structure_sampler.py-370-373 (1)
370-373: The instance attribute assignment pattern is safe in the current sequential execution model, but represents poor design practice that creates technical debt.The pipeline executes blocks sequentially with
await block.execute(context)in a for loop, which guarantees no concurrent access to the same block instance. However, mutating instance state duringexecute()violates idempotency and creates fragility if the execution model ever changes to support concurrent block execution.Pass
categorical_fields,numeric_fields, anddependenciesas method parameters or store them in the execution context instead of instance attributes. This makes the method stateless and maintains single responsibility.
🧹 Nitpick comments (39)
.claude/skills/implementing-datagenflow-blocks/SKILL.md (1)
144-151: Minor:formatshadows Python builtin.The example uses
formatas a parameter name, which shadows the built-informat()function. While functional, consider noting this or usingoutput_formatin the example to avoid confusion.tests/e2e/README.md (1)
281-281: Update to actions/checkout@v6 (current latest version).
actions/checkout@v3is outdated. Version 6 is the current recommended release.tests/e2e/test_helpers.py (1)
40-54: Validate URL scheme in server wait helper.
Restrict to http/https to avoid accidentalfile:or other schemes.💡 Suggested fix
def wait_for_server(url: str = "http://localhost:8000/health", timeout: int = 30): """wait for server to be ready""" import urllib.error import urllib.request + from urllib.parse import urlparse + + parsed = urlparse(url) + if parsed.scheme not in ("http", "https"): + raise ValueError("wait_for_server only supports http/https URLs")frontend/src/pages/Generator.tsx (1)
43-44: Consider removing or documenting the relationship betweenisMultiplierPipelineandneedsMarkdown.Both states are set from the same API response but serve slightly different purposes. If
isMultiplierPipelineis no longer used elsewhere, consider removing it to reduce state complexity.#!/bin/bash # Check if isMultiplierPipeline is used elsewhere in the codebase rg -n "isMultiplierPipeline" frontend/src/ --type ts --type tsxlib/blocks/builtin/structured_generator.py (1)
37-39: AddClassVarannotation for mutable class attribute.Per Ruff RUF012, mutable class attributes should be annotated with
typing.ClassVarto make the intent explicit.Proposed fix
+from typing import Any, ClassVar -from typing import Any - _config_formats = { + _config_formats: ClassVar[dict[str, str]] = { "json_schema": "json-or-template", }tests/e2e/run_all_tests.sh (2)
39-44: Consider adding Playwright browser installation check.The script checks if
playwrightmodule is installed but doesn't verify the browser (chromium) is installed. Missing browser will cause cryptic failures.Proposed enhancement
# check if playwright is installed if ! uv run python -c "import playwright" 2>/dev/null; then echo "❌ Playwright not installed" echo "Install with: uv pip install playwright && uv run playwright install chromium" exit 1 fi + +# check if chromium browser is installed +if ! uv run python -c "from playwright.sync_api import sync_playwright; p = sync_playwright().start(); p.chromium.launch(headless=True).close(); p.stop()" 2>/dev/null; then + echo "❌ Chromium browser not installed" + echo "Install with: uv run playwright install chromium" + exit 1 +fi
53-76: Test suites restart servers for each suite.Each test suite spawns fresh backend/frontend instances. This is safe but slow. If tests are independent, consider running all suites under a single server session.
tests/integration/conftest.py (2)
24-34: Hardcoded endpoint IP reduces portability.The endpoint
http://172.20.160.1:11434is hardcoded and will fail in different environments. Consider using an environment variable with a fallback:Suggested fix
+import os + +OLLAMA_ENDPOINT = os.getenv("OLLAMA_ENDPOINT", "http://localhost:11434") + # add default LLM model (ollama gemma3:1b) llm_config = LLMModelConfig( name="default", provider=LLMProvider.OLLAMA, - endpoint="http://172.20.160.1:11434/v1/chat/completions", + endpoint=f"{OLLAMA_ENDPOINT}/v1/chat/completions", api_key="", model_name="gemma3:1b", ) ... embedding_config = EmbeddingModelConfig( name="ollama-nomic", provider=LLMProvider.OLLAMA, - endpoint="http://172.20.160.1:11434/v1/embeddings", + endpoint=f"{OLLAMA_ENDPOINT}/v1/embeddings", api_key="", model_name="nomic-embed-text", dimensions=768, )
43-51: SQLite WAL files may persist after cleanup.SQLite can create
.db-waland.db-shmfiles that won't be cleaned up. Also, moveosimport to the top of the file.Suggested fix
# cleanup try: await storage.close() finally: - # ensure cleanup happens even if close() fails - import os - - if os.path.exists("data/test_e2e_records.db"): - os.remove("data/test_e2e_records.db") + db_path = "data/test_e2e_records.db" + for suffix in ("", "-wal", "-shm"): + path = db_path + suffix + if os.path.exists(path): + os.remove(path).claude/skills/debugging-pipelines/SKILL.md (1)
90-97: Line number references will become stale.Hardcoded line ranges like
lib/workflow.py:85-224andlib/workflow.py:305-449will drift as code evolves. Consider referencing function/method names instead, or adding a note that line numbers are approximate.Example: "Normal pipeline execution flow (see
execute()method)" instead of specific line numbers.tests/e2e/test_review_e2e.py (2)
41-41: Prefer Playwright wait methods overtime.sleep().
time.sleep()is brittle and slows tests unnecessarily. Use Playwright's built-in waiting:# Instead of: time.sleep(2) page.wait_for_selector('select, [role="combobox"]', timeout=5000)This pattern repeats throughout the file (lines 49, 56, 72, 80, 107, 115, 128, 140, 156, 164, 178, 200, 208, 220, 231, 247, 255).
282-313: Main block bypasses pytest.Running tests via
if __name__ == "__main__"bypasses pytest's test collection, fixtures, and reporting. This sequential execution also masks failures (all tests print "✓ passed" regardless of exceptions in previous tests).Consider removing or using pytest's CLI invocation instead:
pytest tests/e2e/test_review_e2e.py -vdocs/template_data_augmentation.md (1)
106-108: Blank line between blockquotes creates parsing issues.Markdown linters flag this (MD028). Combine into a single blockquote or use different formatting.
Suggested fix
> **Note:** `price` appears in both `numeric_fields` and `fields_to_generate`. This provides range hints to guide the LLM while letting it generate contextually appropriate prices. - -> **Tip:** Use 4-10 diverse samples for best results. More samples = better distribution learning. +> +> **Tip:** Use 4-10 diverse samples for best results. More samples = better distribution learning.lib/blocks/builtin/validator.py (1)
1-8: Deduplicate forbidden_words parsing via template_utils.Lines 43-69 re-implement JSON render/parse + string-list validation that already exists in
lib.blocks.commons.template_utils. Reusing the helper centralizes error handling and keeps block logic slimmer.♻️ Suggested refactor
-from lib.template_renderer import render_template +from lib.blocks.commons.template_utils import render_and_parse_json, validate_string_list @@ - if self.forbidden_words_template: - words_rendered = render_template( - self.forbidden_words_template, context.accumulated_state - ) - try: - words_list = json.loads(words_rendered) - if not isinstance(words_list, list): - raise BlockExecutionError( - "forbidden_words must be a JSON array", - detail={"rendered_value": words_rendered}, - ) - if not all(isinstance(w, str) for w in words_list): - raise BlockExecutionError( - "All items in forbidden_words must be strings", - detail={"forbidden_words": words_list}, - ) - forbidden_words = words_list - except json.JSONDecodeError as e: - raise BlockExecutionError( - f"forbidden_words must be valid JSON: {str(e)}", - detail={ - "template": self.forbidden_words_template, - "rendered": words_rendered, - }, - ) + if self.forbidden_words_template: + forbidden_words = render_and_parse_json( + self.forbidden_words_template, + context.accumulated_state, + "forbidden_words", + list, + ) + validate_string_list(forbidden_words, "forbidden_words")Also applies to: 43-69
tests/blocks/test_field_mapper.py (1)
44-66: Make xfail strict to avoid silent XPASS.For the known failures, consider
strict=Trueso a future fix fails the test and forces cleanup.🔧 Suggested adjustment
- `@pytest.mark.xfail`( + `@pytest.mark.xfail`( + strict=True, reason="tojson produces pretty-printed JSON with newlines causing parse error" ) @@ - `@pytest.mark.xfail`( + `@pytest.mark.xfail`( + strict=True, reason="tojson produces pretty-printed JSON with newlines causing parse error" ) @@ - `@pytest.mark.xfail`(reason="StrictUndefined raises error instead of returning empty string") + `@pytest.mark.xfail`( + strict=True, + reason="StrictUndefined raises error instead of returning empty string" + )lib/workflow.py (1)
380-393: Potential issue: filtering applied twice for saved results.
filtered_resultis computed at line 381, then_save_seed_resultis called at line 384 which internally calls_filter_output_dataagain oncontext.accumulated_state. This is redundant but not harmful since both produce the same result.Consider passing
filtered_resultdirectly or removing the redundant filter call inside_save_seed_resultto avoid confusion:Option: Pass filtered data directly to save
if storage and pipeline_id > 0 and job_id > 0: await self._save_seed_result( initial_data, - context.accumulated_state, - initial_data_keys, + filtered_result, context.trace, pipeline_id, job_id,This would require updating
_save_seed_resultto accept already-filtered data.lib/blocks/builtin/field_mapper.py (2)
29-31: ClassVar annotation would silence linter, but optional.The static analysis flags mutable class attributes. Adding
typing.ClassVarwould be more explicit, but this is a common pattern in the codebase for block metadata.
65-72: Add exception chaining for better tracebacks.When re-raising a different exception from a caught exception, use
from eto preserve the original traceback. This aids debugging JSON parse failures.Fix exception chaining
except json.JSONDecodeError as e: raise BlockExecutionError( - f"mappings must be valid JSON: {str(e)}", + f"mappings must be valid JSON: {e!s}", detail={ "template": self.mappings_template, "rendered": mappings_rendered, }, - ) + ) from etests/e2e/test_generator_e2e.py (3)
44-52: Prefer explicit waits overtime.sleep.Hardcoded sleeps make tests flaky and slow. Playwright provides
wait_for_selectorandexpectwith auto-waiting.Replace sleep with explicit wait
# find pipeline selector (dropdown or select) selectors = page.locator('select, [role="combobox"]').all() if len(selectors) > 0: # click first selector selectors[0].click() - time.sleep(0.5) + page.wait_for_timeout(500) # or better: wait for dropdown to appear
239-273: Main block bypasses pytest test discovery.Running tests via
if __name__ == "__main__"means pytest won't collect them properly for CI. Theprint("✓ passed")after each call doesn't verify the test actually passed - exceptions would halt execution.Consider using pytest markers and fixtures for proper test orchestration instead of manual sequencing.
As per coding guidelines, tests should follow naming convention
test_<method>_<scenario>_<expected>and use proper fixtures.
90-117: Seed file cleanup in finally block for reliability.If the test fails before
os.remove(seed_path), the file remains. Use a fixture or try/finally.Ensure cleanup on failure
+ try: # go to generator page page.goto("http://localhost:5173") # ... rest of test ... - - # cleanup - os.remove(seed_path) - - browser.close() + finally: + # cleanup + if os.path.exists(seed_path): + os.remove(seed_path) + browser.close()tests/integration/test_data_augmentation_pipeline.py (2)
28-32: Monkey patching works but consider pytest fixture.Direct monkey patching with try/finally restoration is functional. A pytest fixture with
monkeypatchwould be cleaner and auto-restore.Alternative using monkeypatch fixture
`@pytest.fixture` def patched_llm_config(e2e_storage, monkeypatch): manager = LLMConfigManager(e2e_storage) import app monkeypatch.setattr(app, "llm_config_manager", manager) return manager
165-169: Error assertion could be more specific.Checking for "undefined" or "target_count" is loose. Consider a more precise error match.
error_msg = str(exc_info.value) - assert "undefined" in error_msg.lower() or "target_count" in error_msg + assert "undefined" in error_msg.lower() or "target_count" in error_msg.lower(), ( + f"Expected error about undefined/target_count, got: {error_msg}" + )tests/blocks/test_structure_sampler.py (1)
43-119: Tests rely on internal state manipulation - acceptable for unit testing private methods.Setting
block.categorical_fields,block.dependencies, andblock.numeric_fieldsdirectly couples tests to implementation details. This is fine for testing computation methods in isolation, but consider adding integration-style tests that go throughexecute()for these scenarios to ensure the full flow works.tests/blocks/test_duplicate_remover.py (1)
180-214: Verify the cache test actually tests caching behavior.The test creates two contexts with the same
trace_idand checksmock_embedding.call_count == 3. However, both contexts are created withmake_context()which setstrace_id="test-trace", and line 209 redundantly sets it again. The assertion verifies embedding is called 3 times (1 seed + 2 batches), which confirms seed embeddings aren't recomputed.Consider adding a comment clarifying the expected call pattern.
lib/blocks/builtin/duplicate_remover.py (1)
43-57: Instance-level cache without eviction could cause memory growth.
_embeddings_cachestores embeddings keyed bytrace_idbut is never cleared. For long-running processes with many pipeline executions, this could accumulate significant memory.Consider adding a cache size limit or using
weakreffor automatic cleanup when trace contexts are garbage collected.💡 Suggested improvement
from collections import OrderedDict # In __init__: self._embeddings_cache: OrderedDict[str, list[list[float]]] = OrderedDict() self._max_cache_size = 100 # configurable # In _get_seed_embeddings, after caching: if len(self._embeddings_cache) > self._max_cache_size: self._embeddings_cache.popitem(last=False) # evict oldesttests/e2e/test_pipelines_e2e.py (2)
13-36: Replacetime.sleep(1)with Playwright's built-in waiting.Hard-coded sleeps are flaky. Use
page.wait_for_selector()orexpect(...).to_be_visible()instead.♻️ Suggested fix
pipelines_link.click() page.wait_for_load_state("networkidle") - time.sleep(1) + # Wait for content to be visible instead of arbitrary sleep + page.wait_for_selector("text=Pipelines", state="visible")
174-216: Use context manager or try/finally for browser cleanup.If an assertion fails,
browser.close()won't be called, potentially leaving browser processes orphaned.♻️ Use fixture or context manager pattern
def test_pipeline_editor_opens(): """test that pipeline editor modal opens""" with sync_playwright() as p: browser = p.chromium.launch(headless=get_headless_mode()) try: page = browser.new_page() # ... test logic ... finally: browser.close()Or better, extract browser setup to a pytest fixture that handles cleanup automatically.
lib/blocks/builtin/json_validator.py (1)
53-81: Chain the exception for better debugging.When re-raising
BlockExecutionErrorfromJSONDecodeError, usefrom eto preserve the original traceback.♻️ Fix exception chaining
except json.JSONDecodeError as e: raise BlockExecutionError( - f"required_fields must be valid JSON: {str(e)}", + f"required_fields must be valid JSON: {e}", detail={ "template": self.required_fields_template, "rendered": fields_rendered, }, - ) + ) from elib/blocks/commons/template_utils.py (3)
26-35: Add exception chaining to preserve traceback.Missing
from eloses the original JSON parse error context, making debugging harder.Proposed fix
try: parsed = json.loads(rendered) except json.JSONDecodeError as e: raise BlockExecutionError( - f"{field_name} must be valid JSON: {str(e)}", + f"{field_name} must be valid JSON: {e!s}", detail={ "template": template, "rendered": rendered, }, - ) + ) from e
85-91: Greedy regex may over-match in edge cases.The regex
\{.*\}withre.DOTALLmatches from the first{to the last}in the content. If the LLM returns JSON followed by text containing braces, this could grab extra content.Consider non-greedy matching, though this may break legitimate nested JSON:
Alternative approach (trade-off)
- json_match = re.search(r"\{.*\}", content, re.DOTALL) + # Non-greedy - may fail on deeply nested JSON but avoids over-matching + json_match = re.search(r"\{[^{}]*(?:\{[^{}]*\}[^{}]*)*\}", content, re.DOTALL)The current approach is reasonable as a last-resort fallback since strategies 1 and 2 handle most cases.
107-122: Hardcoded metadata fields may drift out of sync.This set is defined locally but metadata fields are used across multiple blocks. Consider defining
METADATA_FIELDSas a module-level constant or in a shared location to ensure consistency.Proposed improvement
+# module-level constant for reuse +METADATA_FIELDS = frozenset({ + "samples", + "target_count", + "categorical_fields", + "numeric_fields", + "dependencies", + "comparison_fields", + "similarity_threshold", + "fields_to_generate", +}) + + def clean_metadata_fields(state: dict[str, Any]) -> dict[str, Any]: """ remove pipeline metadata fields from state returns new dict without mutation """ - metadata_fields = { - "samples", - "target_count", - "categorical_fields", - "numeric_fields", - "dependencies", - "comparison_fields", - "similarity_threshold", - "fields_to_generate", - } - return {key: value for key, value in state.items() if key not in metadata_fields} + return {key: value for key, value in state.items() if key not in METADATA_FIELDS}lib/blocks/builtin/ragas_metrics.py (2)
74-78: Consider usingnormalize_template_paramfor consistency.Other blocks use
normalize_template_param(metrics, list)fromtemplate_utils.py. This inline implementation duplicates that logic.Proposed refactor
+from lib.blocks.commons.template_utils import normalize_template_param + # in __init__: - if isinstance(metrics, list): - self.metrics_template = json.dumps(metrics) - else: - self.metrics_template = metrics + self.metrics_template = normalize_template_param(metrics, list)
101-108: Add exception chaining and consider using shared utilities.
- Missing
from eloses traceback context.- This parsing logic duplicates
render_and_parse_json+validate_string_listfromtemplate_utils.py.Proposed fix
+from lib.blocks.commons.template_utils import render_and_parse_json, validate_string_list # In execute(), replace lines 86-108 with: - metrics_rendered = render_template(self.metrics_template, context.accumulated_state) - try: - metrics_list = json.loads(metrics_rendered) - if not isinstance(metrics_list, list): - raise BlockExecutionError( - "metrics must be a JSON array", - detail={"rendered_value": metrics_rendered}, - ) - if not all(isinstance(m, str) for m in metrics_list): - raise BlockExecutionError( - "All items in metrics must be strings", - detail={"metrics": metrics_list}, - ) - metrics = metrics_list - except json.JSONDecodeError as e: - raise BlockExecutionError( - f"metrics must be valid JSON: {str(e)}", - detail={ - "template": self.metrics_template, - "rendered": metrics_rendered, - }, - ) + metrics = render_and_parse_json( + self.metrics_template, + context.accumulated_state, + "metrics", + expected_type=list, + ) + validate_string_list(metrics, "metrics")lib/blocks/builtin/structure_sampler.py (1)
319-326: Add exception chaining for better debugging.Missing
from eloses the originalValueErrorcontext.Proposed fix
except ValueError as e: raise BlockExecutionError( - f"target_count must be a valid integer: {str(e)}", + f"target_count must be a valid integer: {e!s}", detail={ "template": self.target_count_template, "rendered": target_count_rendered, }, - ) + ) from elib/blocks/builtin/semantic_infiller.py (4)
172-182: Add exception chaining for LLM errors.Missing
from eloses the original exception context.Proposed fix
try: response = await litellm.acompletion(**llm_params) except Exception as e: raise BlockExecutionError( - f"LLM call failed: {str(e)}", + f"LLM call failed: {e!s}", detail={ "skeleton": skeleton, "prompt_preview": prompt[:200], "error": str(e), }, - ) + ) from e
265-268: Unused loop variablesim.The similarity score is unpacked but never used. Prefix with
_to indicate intentionally unused.Proposed fix
- for sim, sample in similar_samples: + for _sim, sample in similar_samples: fields_str = json.dumps({f: sample.get(f, "") for f in fields_to_generate}) negative_lines.append(f" - {fields_str}")
330-336: Add exception chaining for LLM retry errors.Same issue as in
_process_skeleton- missingfrom e.Proposed fix
try: response = await litellm.acompletion(**llm_params) except Exception as e: raise BlockExecutionError( - f"LLM call failed: {str(e)}", + f"LLM call failed: {e!s}", detail={"skeleton": skeleton, "attempt": attempt, "error": str(e)}, - ) + ) from e
387-388: Misleading comment - this code path is reachable.This return executes when all retries complete but similarity remains above threshold. The comment suggests it's unreachable, but it's actually the fallback for exhausted retries.
Proposed fix
- # should not reach here, but return last result + # fallback: return last result after exhausting retries return result, embedding
There was a problem hiding this comment.
Actionable comments posted: 11
🤖 Fix all issues with AI agents
In `@frontend/src/components/pipeline-editor/BlockConfigPanel.tsx`:
- Around line 436-440: The Editor is using defaultLanguage which only applies on
mount, so toggling isJsonMode doesn't update highlighting; replace the
defaultLanguage prop with the dynamic language prop (e.g., language={isJsonMode
? "json" : "python"}) on the <Editor> (and remove any stale usage of
defaultLanguage or unnecessary key remount hacks), ensuring the Editor's
language prop is driven by isJsonMode so syntax highlighting updates when
toggled.
In `@llm/state-backend.md`:
- Around line 441-443: The docs for DuplicateRemover list an output named
similarity_score which is inconsistent with tests; update the DuplicateRemover
output specification to match the actual outputs used in tests by replacing
similarity_score with similarity_to_seeds and similarity_to_generated (or
explicitly list both fields) and ensure the outputs line reads: "outputs: *,
is_duplicate, similarity_to_seeds, similarity_to_generated" so the doc matches
the DuplicateRemover behavior and tests.
In `@llm/state-project.md`:
- Around line 117-121: The documentation for the validators lists
DuplicateRemover output as "similarity_score" but the implementation/tests
expect "similarity_to_seeds" and "similarity_to_generated"; update the docs in
the validators section (DuplicateRemover) to list the actual output fields
"is_duplicate", "similarity_to_seeds", and "similarity_to_generated" (or
alternatively change the DuplicateRemover implementation/tests to emit a single
"similarity_score" if you prefer that public contract) so the public contract
matches the tests and implementation; reference the DuplicateRemover symbol and
ensure the validators list reflects the exact output field names used by tests.
In `@tests/e2e/README.md`:
- Around line 264-266: Replace the direct instruction to delete the default DB
file ("remove test database: `rm data/qa_records.db`") with a safer alternative:
either point to a dedicated test DB path or add a clear warning that this
command only applies to throwaway/test-only databases (and not
production/default data); update the README example text around the "remove test
database" bullet so it references a test-specific file name (e.g.,
`data/qa_records_test.db`) or prepends a cautionary note explaining to back up
or confirm the file is test-only before running removal.
- Around line 111-123: The README.md's directory-tree fenced code block is
missing a language tag which breaks lint rules; update the opening fence to
include a language (e.g., change the triple backticks that start the tree block
to ```text) so the block becomes a valid "text" fenced code block, ensuring the
closing ``` remains unchanged; locate the fenced block in README.md that begins
with the directory listing (the tree block shown) and add the language tag to
the opening fence.
In `@tests/e2e/test_generator_e2e.py`:
- Around line 100-104: The test is calling selectors[0].select_option(index=1)
without ensuring a real pipeline option exists, which can throw if only a
placeholder option is present; after you build selectors =
page.locator("select").all(), guard by asserting selectors is non-empty and then
check options_count = selectors[0].locator("option").count() (or equivalent) and
only call selectors[0].select_option(index=1) if options_count > 1, otherwise
fail fast with a clear assertion/error message; apply the same guard to the
other occurrence where select_option(index=1) is used.
In `@tests/e2e/test_pipelines_e2e.py`:
- Around line 223-255: The main-block that calls wait_for_server() and
cleanup_database() is not run under pytest, causing flaky CI; remove the entire
if __name__ == "__main__" block and add an autouse, module-scoped pytest fixture
named _e2e_setup_teardown that calls wait_for_server() (skip tests if it returns
False), runs cleanup_database() before yielding, and runs cleanup_database()
again after yield to teardown; reference symbols to update: wait_for_server,
cleanup_database, and add pytest.fixture(scope="module", autouse=True) on
_e2e_setup_teardown.
- Around line 54-70: The test currently collects use_template_buttons (variable
use_template_buttons) but only prints the count and asserts the page loaded;
update test_view_templates so it validates template rendering: call
use_template_buttons.count() and if it is zero call pytest.skip with a clear
message (or alternatively assert that count() > 0), otherwise assert that at
least one button is visible (e.g.,
expect(use_template_buttons.first()).to_be_visible()) to fail on regressions;
ensure you import pytest if using pytest.skip and keep the existing page load
expectation (expect(page.get_by_text("Pipelines", exact=True)).to_be_visible()).
- Around line 150-176: The tests test_delete_pipeline() and
test_pipeline_editor_opens() should explicitly skip when required UI buttons are
missing instead of silently passing: import pytest if not already, check
delete_buttons.count() and the edit/open button query (the same
page.get_by_role("button") filter used in test_pipeline_editor_opens) and call
pytest.skip("reason") when count == 0 before performing any clicks; likewise,
after opening the confirmation dialog check confirm_buttons.count() and call
pytest.skip("confirmation dialog not present") instead of proceeding silently,
and ensure these skips are placed immediately after the corresponding .count()
checks (refer to delete_buttons, confirm_buttons, and the
page.get_by_role("button").filter(...) calls).
- Around line 33-34: Update each test in tests/e2e/test_pipelines_e2e.py that
calls page.screenshot so it uses pytest's tmp_path fixture instead of hardcoded
"/tmp" paths: add tmp_path as a parameter to the test functions and replace
occurrences like page.screenshot(path="/tmp/pipelines_page.png", ...) with
page.screenshot(path=(tmp_path / "pipelines_page.png"), ...). Do this for all
page.screenshot calls in the file (there are five instances) so Playwright
receives a pathlib.Path and each test gets an isolated temp directory.
In `@tests/e2e/test_review_e2e.py`:
- Line 8: Import TimeoutError from playwright.sync_api and update the export
test so the entire page.expect_download() context manager is wrapped in a
try/except that catches TimeoutError (not a blind Exception). Specifically, in
tests/e2e/test_review_e2e.py add "from playwright.sync_api import TimeoutError"
and wrap the with page.expect_download(...) as download: block (and the code
that enters that context) in a try/except TimeoutError to handle the case where
expect_download times out when no records exist, replacing the generic Exception
handler with a TimeoutError-specific handler.
🧹 Nitpick comments (9)
scripts/with_server.py (2)
46-63: Unusedprocin health check loop.The
procvariable from the tuple unpacking isn't used - onlyportis needed for the health check URL construction.Suggested fix
- for proc, port in self.processes: + for _proc, port in self.processes:
148-148: Addstrict=Truetozip()for argument validation.Since the code already validates
len(args.servers) != len(args.ports)at line 136, addingstrict=Trueprovides runtime safety if that check is ever bypassed.Suggested fix
- servers = list(zip(args.servers, args.ports)) + servers = list(zip(args.servers, args.ports, strict=True))tests/e2e/test_helpers.py (1)
45-59: Move imports to module level.The
urllib.errorandurllib.requestimports are inside the function. While functional, module-level imports are idiomatic Python and avoid repeated import overhead on each call.Suggested fix
Add imports at module level (lines 6-9):
import os import time +import urllib.error +import urllib.request import httpxThen remove lines 47-48:
def wait_for_server(url: str = "http://localhost:8000/health", timeout: int = 30): """wait for server to be ready""" - import urllib.error - import urllib.request - start_time = time.time()lib/blocks/builtin/semantic_infiller.py (5)
28-29: AddClassVarannotation for mutable class attributes.Consistency with
StructuredGeneratorwhich usesClassVarfor_config_formats. Mutable class attributes withoutClassVarannotation can cause unexpected sharing between instances.Suggested fix
+from typing import Any, ClassVar, cast -from typing import Any, cast # ... then in the class: - inputs = ["skeletons"] - outputs = ["samples"] + inputs: ClassVar[list[str]] = ["skeletons"] + outputs: ClassVar[list[str]] = ["samples"]And for
_config_descriptionsand_config_formats:- _config_descriptions = { + _config_descriptions: ClassVar[dict[str, str]] = { ... } - _config_formats = { + _config_formats: ClassVar[dict[str, str]] = {
172-182: Use exception chaining withraise ... from e.Preserves the original traceback for debugging. Also use
{e!s}conversion flag for cleaner f-string.Suggested fix
try: response = await litellm.acompletion(**llm_params) except Exception as e: raise BlockExecutionError( - f"LLM call failed: {str(e)}", + f"LLM call failed: {e!s}", detail={ "skeleton": skeleton, "prompt_preview": prompt[:200], "error": str(e), }, - ) + ) from e
332-338: Use exception chaining withraise ... from e.Same pattern as above - preserves traceback for LLM failures during diversity retry loop.
Suggested fix
try: response = await litellm.acompletion(**llm_params) except Exception as e: raise BlockExecutionError( - f"LLM call failed: {str(e)}", + f"LLM call failed: {e!s}", detail={"skeleton": skeleton, "attempt": attempt, "error": str(e)}, - ) + ) from e
247-252: Addstrict=Truetozip().Ensures
embeddingsandsampleslists have matching lengths at runtime.Suggested fix
- for emb, sample in zip(embeddings, samples): + for emb, sample in zip(embeddings, samples, strict=True):
267-270: Rename unused loop variablesimto_sim.The similarity score
simisn't used in the loop body - only the sample is extracted for the negative example.Suggested fix
- for sim, sample in similar_samples: + for _sim, sample in similar_samples:tests/e2e/test_pipelines_e2e.py (1)
21-28: Prefer configurable base URL and event-driven waits over fixed sleeps.Hard-coded
http://localhost:5173plustime.sleep(...)makes these e2e tests flaky across environments and slower in CI. Use a configurable base URL and wait for a stable UI element instead.♻️ Proposed fix (example in this test)
+import os ... -def test_pipelines_page_loads(): +BASE_URL = os.getenv("E2E_BASE_URL", "http://localhost:5173") + +def test_pipelines_page_loads(): """verify pipelines page loads successfully""" with sync_playwright() as p: browser = p.chromium.launch(headless=get_headless_mode()) page = browser.new_page() ... - page.goto("http://localhost:5173") + page.goto(BASE_URL) page.wait_for_load_state("networkidle") ... - time.sleep(1) + expect(page.get_by_text("Pipelines", exact=True)).to_be_visible()
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Fix all issues with AI agents
In `@frontend/src/components/pipeline-editor/BlockConfigPanel.tsx`:
- Around line 405-429: Update the frontend state docs (llm/state-frontend.md) to
document the new JSON-mode toggle and model selection state: describe the
jsonMode object keyed by config keys (accessed as jsonMode[key] with default
true), how setJsonMode updates it (setJsonMode(prev => ({ ...prev, [key]: value
}))), the UI toggle label/behavior in BlockConfigPanel (JSON mode vs Jinja2
template) and how jsonValue is derived (string value or JSON.stringify(value,
null, 2)); also note where model selection state lives and how it should be
persisted/updated in the same frontend state conventions.
- Around line 155-162: The code currently skips all fields with
fieldSchema.format === "json-or-template", allowing invalid JSON to be saved;
update the logic in BlockConfigPanel so that when jsonMode[key] is true you
attempt to parse/validate processedData[key] (but do not mutate processedData)
and surface an error if parsing fails, while still returning early for
non-JSON-mode json-or-template fields; also ensure jsonMode is included in the
relevant dependency array (e.g., the useEffect or validator hook that iterates
over Object.entries(schema)) so validation re-runs when JSON mode toggles.
- Around line 35-36: The jsonMode state (initialized via useState in
BlockConfigPanel.tsx) is keyed only by field name and can leak between nodes;
add an effect that watches node.id (or the node object used in this component)
and calls setJsonMode({}) to reset jsonMode whenever the node changes so toggles
don’t bleed across nodes; locate the jsonMode/setJsonMode usage in
BlockConfigPanel and add a useEffect referencing node.id to clear the map on
change.
In `@llm/state-backend.md`:
- Around line 420-431: The doc entries for StructureSampler and SemanticInfiller
need explicit outputs and missing parameters: update the StructureSampler entry
(symbol: StructureSampler) to use outputs ["skeletons", "_seed_samples"] instead
of "*" and enumerate its parameters (target_count, categorical_fields,
numeric_fields, dependencies, seed) as already present; update the
SemanticInfiller entry (symbol: SemanticInfiller) to use outputs ["samples"] and
add the five optional parameters system_prompt, embedding_model,
diversity_threshold, negative_examples_count, and max_diversity_retries to its
parameter list along with the existing fields_to_generate, model, temperature,
max_tokens; ensure both entries include full parameter lists and correct
categories (seeders or generators) and remove the "*" shorthand.
In `@tests/e2e/README.md`:
- Line 316: Replace the broken placeholder link target used for "[webapp-testing
skill]" that currently points to
"/.claude/plugins/cache/anthropic-agent-skills/document-skills/.../webapp-testing/"
with a valid relative or absolute documentation URL (or remove the link if no
public location exists); locate the markdown entry for "[webapp-testing skill]"
in the README and update the link target to the correct path to the
webapp-testing skill documentation or to an appropriate repository/README
anchor.
🧹 Nitpick comments (8)
tests/e2e/test_helpers.py (2)
47-61: Consider using httpx consistently instead of urllib.
cleanup_database()andget_pipeline_count()use httpx, butwait_for_server()uses urllib. Using httpx throughout would be more consistent.♻️ Optional: Use httpx for consistency
def wait_for_server(url: str = "http://localhost:8000/health", timeout: int = 30): """wait for server to be ready""" - import urllib.error - import urllib.request - start_time = time.time() while time.time() - start_time < timeout: try: - with urllib.request.urlopen(url, timeout=2) as response: - if response.status == 200: - return True - except (urllib.error.URLError, TimeoutError): + response = httpx.get(url, timeout=2.0) + if response.status_code == 200: + return True + except (httpx.RequestError, httpx.TimeoutException): time.sleep(1) return False
64-72: Catching bareExceptionobscures errors.The broad
except Exceptioninget_pipeline_countsilently swallows all errors. Consider catching specific exceptions or at least logging the exception type.♻️ Suggested fix
def get_pipeline_count(): """get number of pipelines in database""" try: response = httpx.get("http://localhost:8000/api/pipelines", timeout=10.0) if response.status_code == 200: return len(response.json()) - except Exception as e: - print(f"get_pipeline_count warning: {e}") + return -1 + except httpx.RequestError as e: + print(f"get_pipeline_count warning: request failed - {e}") return -1tests/e2e/test_review_e2e.py (2)
17-37: Usetmp_pathfixture for screenshot paths (consistency).
test_pipelines_e2e.pyuses pytest'stmp_pathfixture for screenshots, but this file uses hardcoded/tmppaths. This causes S108 warnings and potential path collisions in parallel runs.♻️ Pattern to apply to all test functions
-def test_review_page_loads(): +def test_review_page_loads(tmp_path): """verify review page loads successfully""" with sync_playwright() as p: ... - page.screenshot(path="/tmp/review_page.png", full_page=True) + page.screenshot(path=str(tmp_path / "review_page.png"), full_page=True)Apply this pattern to all 7 test functions in this file.
1-15: Add autouse fixture for server readiness and cleanup.Unlike
test_pipelines_e2e.py, this module lacks a setup/teardown fixture. This causes inconsistent behavior between pytest runs and direct execution.♻️ Add fixture after imports
`@pytest.fixture`(scope="module", autouse=True) def _e2e_setup_teardown(): """setup and teardown for e2e tests""" from test_helpers import wait_for_server if not wait_for_server(): pytest.skip("server not ready for e2e tests") yieldAlso add
import pytestto imports.tests/e2e/test_generator_e2e.py (3)
51-53: Redundant conditional after assertion.Line 51 asserts
len(selectors) > 0, so theif len(selectors) > 0check on line 53 is always true.♻️ Remove redundant check
selectors = page.locator('select, [role="combobox"]').all() assert len(selectors) > 0, "No pipeline selector found on page" - if len(selectors) > 0: - # click first selector - selectors[0].click() + # click first selector + selectors[0].click()
18-36: Usetmp_pathfixture and add autouse setup fixture.Same consistency issues as
test_review_e2e.py:
- Hardcoded
/tmppaths for screenshots- Missing autouse fixture for server readiness
Follow the pattern from
test_pipelines_e2e.pyfor consistency across the e2e test suite.
94-96: Seed file creation in/tmpshould usetmp_pathfixture.Seed files created at
/tmp/test_seed.jsoncan collide in parallel runs. Usetmp_pathfixture for test isolation.♻️ Example fix
-def test_upload_seed_file(): +def test_upload_seed_file(tmp_path): """test uploading a seed JSON file""" ... - seed_path = "/tmp/test_seed.json" + seed_path = tmp_path / "test_seed.json" with open(seed_path, "w") as f: json.dump(seed_data, f)Apply to both
test_upload_seed_fileandtest_start_generation_job.pyproject.toml (1)
16-17: Move test dependencies to dev group.
pytest>=8.4.2andpytest-asyncio>=0.24.0are in the main project dependencies but should be in[dependency-groups].dev. End users don't need test frameworks installed.♻️ Suggested fix
[project] dependencies = [ "fastapi>=0.115.0", "uvicorn>=0.32.0", "pydantic>=2.9.0", "aiosqlite>=0.20.0", "httpx>=0.27.0", "python-dotenv>=1.0.0", "python-multipart>=0.0.20", "loguru>=0.7.0", "tqdm>=4.66.0", - "pytest>=8.4.2", - "pytest-asyncio>=0.24.0", "jinja2>=3.1.6", ... ] [dependency-groups] dev = [ "ruff>=0.7.0", "mypy>=1.13.0", "playwright>=1.57.0", + "pytest>=8.4.2", + "pytest-asyncio>=0.24.0", + "pytest-timeout>=2.4.0", ]Note:
pytest-timeout>=2.4.0(line 27) should also move to dev.
There was a problem hiding this comment.
Actionable comments posted: 11
🤖 Fix all issues with AI agents
In @.claude/skills/webapp-testing/examples/console_logging.py:
- Line 1: Ensure the output directory '/mnt/user-data/outputs' exists before
writing logs: add an import for os if missing and call
os.makedirs('/mnt/user-data/outputs', exist_ok=True) just before the code that
opens/writes the log file (the open(...) / write(...) calls around lines 30-32 /
any usage of output_path or output_file); this guarantees no FileNotFoundError
when the script writes logs.
In @.claude/skills/webapp-testing/examples/element_discovery.py:
- Around line 36-38: Replace the fixed "/tmp/page_discovery.png" usage in
page.screenshot and the subsequent print with a unique temporary file: import
and use the tempfile module (e.g., tempfile.NamedTemporaryFile(delete=False,
suffix=".png") or tempfile.mkstemp) to create a one-off path, close the handle
if necessary, pass that temp file's name to page.screenshot, and print that
generated filename; update any surrounding logic in element_discovery.py that
references the old hard-coded path.
- Around line 5-40: Wrap the browser/page interaction in a try/finally so
browser.close() always runs even on errors: after obtaining the Playwright
instance (sync_playwright()) and launching the browser (p.chromium.launch), put
everything that uses page (page.goto, page.wait_for_load_state, page.locator,
page.screenshot, etc.) inside a try block and call browser.close() in the
finally block; guard the finally close with a safety check (e.g., if browser is
not None or "browser" in locals()) to avoid referencing an uninitialized
variable. Ensure you keep the outer sync_playwright() context manager but move
cleanup of the browser into the finally to avoid leaking the browser process in
CI.
In @.claude/skills/webapp-testing/examples/static_html_automation.py:
- Around line 18-30: The screenshots use a hardcoded path
"/mnt/user-data/outputs" which may not exist; update the code around
page.screenshot (both calls) to use a repo-relative or configurable output
directory (e.g., "./outputs" or an env var) and ensure the directory exists
before saving (create it with os.makedirs or
pathlib.Path(...).mkdir(parents=True, exist_ok=True)); keep the same filenames
("static_page.png" and "after_submit.png") but build their full paths from the
validated/created output dir so writes cannot fail.
- Around line 7-9: The file URL is built in a way that breaks on Windows and
doesn't percent-encode paths: replace the os.path.abspath +
f"file://{html_file_path}" construction by using
pathlib.Path(html_file_path).resolve().as_uri() to produce a correct,
cross-platform file URL; update the variables html_file_path and file_url
accordingly and add the pathlib import if missing.
In @.claude/skills/webapp-testing/scripts/with_server.py:
- Around line 87-89: The subprocess.Popen call that creates the test server
currently sets stdout=subprocess.PIPE and stderr=subprocess.PIPE which will
deadlock if those buffers fill; update the subprocess.Popen invocation that
launches server["cmd"] (the process variable) to not pipe these streams — either
remove the stdout/stderr args so the child inherits the parent streams, or set
stdout=subprocess.DEVNULL and stderr=subprocess.DEVNULL if output should be
discarded — and ensure any subsequent code that expects to read output from
process is adjusted accordingly (look for the subprocess.Popen(...) call that
assigns to process in with_server.py).
In @.claude/skills/webapp-testing/SKILL.md:
- Line 18: The fenced code block in .claude/skills/webapp-testing/SKILL.md is
missing a language identifier; update the opening backticks to include a
language (e.g., change ``` to ```text) so markdownlint passes and readability
improves, leaving the block contents unchanged.
- Line 14: Fix the typo "abslutely" to "absolutely" in the guidance sentence
that starts with "Always run scripts with `--help` first" (replace the
misspelled word in the same sentence in SKILL.md); ensure the corrected sentence
reads "...find that a customized solution is absolutely necessary." and keep
surrounding formatting/backticks unchanged.
In `@llm/state-backend.md`:
- Around line 420-431: Update the documented output field names for the
StructureSampler and SemanticInfiller blocks to match their implementations:
replace the dynamic "*" outputs with explicit arrays — for StructureSampler list
["skeletons", "_seed_samples"] (note the leading underscore on "_seed_samples")
and for SemanticInfiller list ["samples"]; ensure the entries appear under the
corresponding StructureSampler and SemanticInfiller headings in the doc so the
documented outputs match the actual block declarations.
In `@tests/e2e/README.md`:
- Around line 127-133: Update the "pipelines tests" README paragraph to
explicitly warn that the tests will delete pipelines, jobs, and records and
therefore must be run against a dedicated test database or isolated environment
(e.g., separate DB instance, schema, or test-only config), and mention that you
should set test-specific credentials/connection (CI or local env) before running
the pipelines tests to avoid impacting production data.
- Around line 264-266: Replace the blanket "killall uvicorn" / "killall node"
instructions with targeted cleanup: instruct tests to capture server PIDs when
starting (e.g., write PID files or export variables) and then use "kill <PID>"
or "kill -TERM $(cat /path/to/pidfile)" to stop only those processes, or use
"pkill -f '<unique_start_command_or_pattern>'" scoped to the test-runner
pattern; keep the "rm data/test_qa_records.db" note but add a caution to only
remove the DB if using a dedicated test path and ensure the README warns that
indiscriminate killall may terminate unrelated developer processes.
🧹 Nitpick comments (3)
.claude/skills/webapp-testing/examples/static_html_automation.py (1)
21-27: Consider waiting for selectors before interactions.Direct clicks/fills can be flaky if elements render after load. A small wait (or
locator().wait_for()) makes the example more reliable.♻️ Suggested tweak
- page.click("text=Click Me") - page.fill("#name", "John Doe") - page.fill("#email", "john@example.com") + page.locator("text=Click Me").wait_for() + page.click("text=Click Me") + page.locator("#name").wait_for() + page.fill("#name", "John Doe") + page.fill("#email", "john@example.com").claude/skills/webapp-testing/examples/console_logging.py (1)
9-28: Ensure the browser closes even when a step fails.If
page.goto()orpage.click()throws, the browser stays open. Wrap the interaction intry/finallyso cleanup always runs.♻️ Suggested change
with sync_playwright() as p: browser = p.chromium.launch(headless=True) page = browser.new_page(viewport={"width": 1920, "height": 1080}) - # Set up console log capture - def handle_console_message(msg): - console_logs.append(f"[{msg.type}] {msg.text}") - print(f"Console: [{msg.type}] {msg.text}") - - page.on("console", handle_console_message) - - # Navigate to page - page.goto(url) - page.wait_for_load_state("networkidle") - - # Interact with the page (triggers console logs) - page.click("text=Dashboard") - page.wait_for_timeout(1000) - - browser.close() + try: + # Set up console log capture + def handle_console_message(msg): + console_logs.append(f"[{msg.type}] {msg.text}") + print(f"Console: [{msg.type}] {msg.text}") + + page.on("console", handle_console_message) + + # Navigate to page + page.goto(url) + page.wait_for_load_state("networkidle") + + # Interact with the page (triggers console logs) + page.click("text=Dashboard") + page.wait_for_timeout(1000) + finally: + browser.close().claude/skills/webapp-testing/scripts/with_server.py (1)
17-21: Terminate server process groups to avoid orphans.With
shell=True,terminate()may only kill the shell while the actual server keeps running. Consider starting a new session and killing the process group.♻️ Suggested change (POSIX)
-import argparse +import argparse +import os +import signal @@ - process = subprocess.Popen(server["cmd"], shell=True) + process = subprocess.Popen( + server["cmd"], + shell=True, + preexec_fn=os.setsid, + ) @@ - try: - process.terminate() - process.wait(timeout=5) - except subprocess.TimeoutExpired: - process.kill() - process.wait() + try: + os.killpg(process.pid, signal.SIGTERM) + process.wait(timeout=5) + except subprocess.TimeoutExpired: + os.killpg(process.pid, signal.SIGKILL) + process.wait()Also applies to: 87-89, 112-117
.claude/skills/webapp-testing/examples/static_html_automation.py
Outdated
Show resolved
Hide resolved
.claude/skills/webapp-testing/examples/static_html_automation.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In @.claude/skills/address-pr-review/scripts/fetch_comments.py:
- Around line 111-123: Validate CLI args at program start: ensure pr_number is a
numeric value before calling fetch_comments and, when mode == "--id", verify a
third argument exists and is numeric (set target_id only after this validation);
if validation fails print __doc__ and exit(1). Update the top-level block around
pr_number, mode, and target_id to perform these checks so missing or non-numeric
inputs cannot fall through to unintended API calls (references: variables
pr_number, mode, target_id and function fetch_comments).
- Around line 51-80: parse_comment currently only populates desc when
TITLE_PATTERN matches and ignores <details> at position 0, causing comments
without bold titles to lose their bodies; update parse_comment so that if
title_match is missing you set after_title = body (use the full body as the
source), and change the <details> handling to consider details_pos >= 0 (or ==
0) when slicing so content before a details block is kept; keep the existing
length cutoff (len(after_title) < 500) and the markdown-cleaning steps (removing
<!-- --> and collapsing extra newlines) after computing desc; refer to
parse_comment, title_match, TITLE_PATTERN, after_title, details_pos, and desc
when making the change.
In `@llm/state-backend.md`:
- Around line 420-461: The document's block category labels are inconsistent
with the BaseBlock category list; update either the BaseBlock category comment
or the block entries so they match. Specifically, edit the BaseBlock category
list to include "multipliers", "observability", and "utilities" (or rename
MarkdownMultiplierBlock, LangfuseBlock, FieldMapper categories to ones already
in BaseBlock) and ensure each block entry (StructureSampler, TextGenerator,
StructuredGenerator, SemanticInfiller, MarkdownMultiplierBlock, ValidatorBlock,
JSONValidatorBlock, DuplicateRemover, DiversityScore, CoherenceScore,
RougeScore, LangfuseBlock, FieldMapper, RagasMetrics) uses a category label that
exactly matches the canonical list; keep naming consistent (singular vs plural)
across BaseBlock and all block definitions.
In `@tests/e2e/README.md`:
- Around line 266-268: Update the pkill pattern used in the cleanup notes:
replace the hardcoded "uvicorn.*5000" reference with the actual documented port
"uvicorn.*8000" or make it a placeholder like "uvicorn.*<PORT>" and instruct
users to substitute their configured port so the pkill/kill command targets the
correct uvicorn process.
♻️ Duplicate comments (1)
.claude/skills/webapp-testing/examples/element_discovery.py (1)
37-38: Hardcoded/tmppath still present.This was already flagged in a previous review. The static analysis (Ruff S108) also flags this as a potential security issue due to predictable temp file paths. Consider using
tempfile.NamedTemporaryFilefor unique paths.
🧹 Nitpick comments (1)
.claude/skills/webapp-testing/scripts/with_server.py (1)
76-77: Addstrict=Truetozip()for Python 3.10+.The manual length validation at lines 71-73 makes this safe, but
strict=Trueis idiomatic Python 3.10+ and self-documenting.♻️ Suggested change
- for cmd, port in zip(args.servers, args.ports): + for cmd, port in zip(args.servers, args.ports, strict=True):
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@docs/how_to_create_blocks.md`:
- Line 42: The description for the MarkdownMultiplierBlock uses a lowercase
"markdown" — update the text for the symbol MarkdownMultiplierBlock so the
phrase reads "Split Markdown documents into chunks for batch processing"
(capitalize "Markdown" only in the human-readable description).
In `@docs/template_data_augmentation.md`:
- Around line 106-108: Remove the stray empty line between the two blockquote
lines so the "**Note:** `price` appears in both `numeric_fields` and
`fields_to_generate`..." and the "**Tip:** Use 4-10 diverse samples..." lines
form a single contiguous blockquote (i.e., no blank line between the lines
starting with >), or alternatively explicitly prefix the blank line with > to
make it a valid blockquote paragraph; update the blockquote content accordingly.
♻️ Duplicate comments (4)
.claude/skills/address-pr-review/scripts/fetch_comments.py (2)
67-75: Comment bodies dropped when no bold title - not addressed.The previous review flagged that
descremains empty whentitle_matchis None, anddetails_pos > 0ignores<details>at position 0. Human review comments (often without bold titles) will have empty descriptions.
113-131: CLI arg validation - not addressed.Previous review flagged:
pr_numbershould be validated as numeric, and--idmode needs proper validation. Additionally,int(sys.argv[3])on line 125 will raise an unhandledValueErrorif the argument isn't numeric.frontend/src/components/pipeline-editor/BlockConfigPanel.tsx (2)
266-312: LGTM!Both dropdowns correctly preserve custom model values not returned by the API, preventing silent data loss on save.
417-489: Document the new JSON-mode toggle in frontend state docs.Per coding guidelines, new UI flows should be reflected in
llm/state-frontend.md. ThejsonModestate and JSON/Jinja2 toggle behavior should be documented.
🧹 Nitpick comments (12)
.claude/skills/address-pr-review/scripts/fetch_comments.py (3)
23-31: Silent exit on failure obscures the root cause.When
gh repo viewfails, the script exits without any error message, making it hard to diagnose issues (e.g., not in a repo,ghnot authenticated).Suggested improvement
if result.returncode != 0: + print(f"Failed to get repo: {result.stderr.strip()}", file=sys.stderr) sys.exit(1)
34-43: Same silent failure pattern.Add stderr output on API failure for debuggability.
Suggested improvement
if result.returncode != 0: + print(f"Failed to fetch comments: {result.stderr.strip()}", file=sys.stderr) sys.exit(1)
147-151: Unrecognized mode silently defaults to--unresolved.Line 147's condition
mode not in ["--all", "--id", "--summary"]causes any typo (e.g.,--unresloved) to silently behave as--unresolved. Consider warning or erroring on unknown modes.Suggested improvement
- if mode == "--unresolved" or mode not in ["--all", "--id", "--summary"]: + if mode not in ["--unresolved", "--all", "--id", "--summary"]: + print(f"Unknown mode '{mode}', use --unresolved, --all, --id, or --summary", file=sys.stderr) + sys.exit(1) + if mode == "--unresolved": top_level = [c for c in top_level if not is_resolved(c)] print(f"showing {len(top_level)} unresolved comments")frontend/src/pages/Generator.tsx (3)
50-50: Remove unusedvalidatedstate.The
validatedvalue is discarded ([, setValidated]) and never read. If validation gating is needed, usevalidationResultinstead. Otherwise, remove this state entirely to reduce noise.- const [, setValidated] = useState(false);Then remove the
setValidated(...)calls at lines 78, 137, 248, and 275.
125-127: Magic string for block type detection.The hardcoded
"MarkdownMultiplierBlock"string is fragile. If the backend renames this block type, the check fails silently. Consider having the API return a format hint (e.g.,expected_input_format: "markdown" | "json") or extracting this to a constant.+const MARKDOWN_BLOCK_TYPE = "MarkdownMultiplierBlock"; + const fetchPipelineDetails = async () => { // ... const firstBlockType = data.first_block_type || ""; - const needsMd = firstBlockType === "MarkdownMultiplierBlock"; + const needsMd = firstBlockType === MARKDOWN_BLOCK_TYPE;
732-732: Minor redundancy in verify button condition.The
!needsMarkdown && file.name.endsWith(".json")check is defensive but redundant—ifneedsMarkdownis false and a file passed validation, it's already JSON. Safe to keep for robustness, or simplify to just!needsMarkdown.lib/blocks/builtin/duplicate_remover.py (1)
56-57: Potential memory leak in embeddings cache.The
_embeddings_cachedictionary grows unbounded as newtrace_idvalues are added but never evicted. In long-running processes with many pipeline executions, this could lead to memory exhaustion.Consider adding cache eviction (e.g., LRU cache, max size limit, or explicit cleanup after pipeline completion).
♻️ Suggested approach using functools.lru_cache or bounded dict
# Option 1: Add a max cache size MAX_CACHE_SIZE = 100 def _get_seed_embeddings(self, ...): # Evict oldest entries if cache too large if len(self._embeddings_cache) >= MAX_CACHE_SIZE: oldest_key = next(iter(self._embeddings_cache)) del self._embeddings_cache[oldest_key] # ... rest of methodlib/blocks/builtin/semantic_infiller.py (5)
174-182: Useraise ... from errto preserve exception chain.When re-raising exceptions, use
raise ... from errto maintain the exception chain for better debugging.📝 Suggested fix
try: response = await litellm.acompletion(**llm_params) except Exception as e: raise BlockExecutionError( - f"LLM call failed: {str(e)}", + f"LLM call failed: {e!s}", detail={ "skeleton": skeleton, "prompt_preview": prompt[:200], "error": str(e), }, - ) + ) from e
247-247: Addstrict=Trueto zip for safety.Adding
strict=Trueensures embeddings and samples lists have matching lengths, catching potential bugs early.📝 Suggested fix
- for emb, sample in zip(embeddings, samples): + for emb, sample in zip(embeddings, samples, strict=True):
268-270: Rename unused loop variablesimto_sim.The similarity value isn't used in the loop body - only the sample is needed. Prefix with underscore to indicate intentional non-use.
📝 Suggested fix
- for sim, sample in similar_samples: + for _sim, sample in similar_samples: fields_str = json.dumps({f: sample.get(f, "") for f in fields_to_generate}) negative_lines.append(f" - {fields_str}")
322-322: Temperature could exceed valid range on retries.With
max_diversity_retries=5and base temperature=0.9, the adjusted temperature becomes 1.4. LLMs typically accept 0.0-2.0, but values >1.0 may produce unexpected results with some providers.Consider capping the temperature:
📝 Suggested fix
- temperature=self.temperature + (attempt * 0.1), # increase temp on retry + temperature=min(self.temperature + (attempt * 0.1), 1.0), # increase temp on retry, cap at 1.0
334-338: Useraise ... from ehere as well.Same issue as line 175 - preserve exception chain.
📝 Suggested fix
except Exception as e: raise BlockExecutionError( - f"LLM call failed: {str(e)}", + f"LLM call failed: {e!s}", detail={"skeleton": skeleton, "attempt": attempt, "error": str(e)}, - ) + ) from e
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In @.claude/skills/address-pr-review/scripts/fetch_comments.py:
- Around line 34-43: The gh API call in fetch_comments currently uses ["gh",
"api", f"repos/{repo}/pulls/{pr_number}/comments", "--paginate"] which can emit
multiple JSON documents that json.loads cannot parse; update the subprocess.run
invocation in fetch_comments to include the "--slurp" flag (e.g., add "--slurp"
to the argument list) so gh will wrap all pages into a single JSON array and
json.loads(result.stdout) will succeed.
In `@tests/e2e/test_generator_e2e.py`:
- Around line 196-219: The test_job_progress_monitoring currently asserts any
presence of progress or upload UI via the progress_or_upload locator, which is
too permissive; either rename the test to reflect it's only checking UI
visibility (e.g., test_generator_page_shows_expected_ui) or modify it to start a
generation job and then assert real progress updates: interact with the page to
initiate a job (e.g., upload/select input and click the start button), then wait
for the progress bar locator ('[role="progressbar"]' / ".progress, .Progress")
or percentage text to appear and assert that its value or text changes over time
(e.g., percentage increases or status transitions), and keep using the
progress_or_upload locator only after the job start to verify actual progress
updates in test_job_progress_monitoring.
🧹 Nitpick comments (3)
tests/e2e/test_generator_e2e.py (2)
167-185: Consider using Playwright's auto-waiting instead of fixed sleep.The 3-second
time.sleep(3)at line 177 is a code smell. Playwright'sexpectwith timeout orwait_for_selectorwould be more reliable and faster.♻️ Suggested improvement
if generate_buttons.count() > 0: generate_buttons.first.click() - # wait for job to start - time.sleep(3) - page.wait_for_load_state("networkidle") - - # verify job progress appears - # look for progress indicators - progress_indicator = page.get_by_text("Progress", exact=False).or_( + # wait for progress indicator to appear + progress_indicator = page.get_by_text("Progress", exact=False).or_( page.get_by_text("Generated", exact=False) ) + expect(progress_indicator.first).to_be_visible(timeout=10000) - assert progress_indicator.count() > 0, "Progress indicator should be visible"
247-281: Main block runs tests sequentially without pytest.This works for manual execution but tests won't benefit from pytest's fixtures, parallel execution, or reporting when run this way. Consider adding pytest markers for proper test discovery.
♻️ Suggested improvement
Add a
conftest.pywith shared fixtures and use pytest markers:# In conftest.py or at module level import pytest `@pytest.fixture`(scope="module", autouse=True) def setup_and_cleanup(): """Setup pipeline and cleanup after all tests in module.""" wait_for_server() cleanup_database() setup_test_pipeline() yield cleanup_database()Then run with:
pytest tests/e2e/test_generator_e2e.py -v.claude/skills/address-pr-review/scripts/fetch_comments.py (1)
23-31: Silent failure ongh repo viewerror.When
ghfails, the script exits without telling the user why. A brief stderr message helps debugging.Proposed fix
if result.returncode != 0: + print(f"failed to get repo: {result.stderr.strip()}", file=sys.stderr) sys.exit(1)
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@tests/e2e/test_generator_e2e.py`:
- Around line 160-175: The test currently skips the flow when the file input or
Generate/Start button is missing; add explicit assertions so the test fails when
required UI elements are absent: assert that file_inputs
(page.locator('input[type="file"]').all()) has length > 0 before calling
set_input_files(seed_path) or, if file upload is optional, document that and
only assert presence of the generate/start control; similarly assert
generate_buttons.count() > 0 (or call pytest.fail with a clear message) before
invoking generate_buttons.first.click(); reference the variables file_inputs and
generate_buttons and the methods set_input_files, count, and first.click when
making the checks.
- Around line 110-116: The test currently attempts to upload without verifying
the file input exists; before calling file_inputs[0].set_input_files(seed_path)
assert that the locator 'input[type="file"]' has at least one element (e.g.,
assert len(file_inputs) > 0 or use Playwright's
expect(file_inputs[0]).to_be_visible()) so the test fails clearly when the file
input is missing; keep the existing variables file_inputs, seed_path and the
page.locator call and only proceed to set_input_files and sleep after the
existence assertion.
- Around line 239-272: Remove the guarded "__main__" block and replace its
setup/teardown logic with a module-scoped autouse pytest fixture that calls
wait_for_server(), cleanup_database(), and setup_test_pipeline() before yielding
and then calls cleanup_database() after yield; keep the existing test_*
functions (test_generator_page_loads, test_select_pipeline,
test_upload_seed_file, test_start_generation_job,
test_generator_shows_upload_ui) as normal tests so pytest will run them,
matching the pattern used in test_pipelines_e2e.py.
♻️ Duplicate comments (1)
tests/e2e/test_generator_e2e.py (1)
58-67: Fail fast when no real pipeline option exists.
Right now the test can pass even if only a placeholder option exists (or no custom dropdown items). Make it assert the availability of a selectable option.🔧 Suggested fix
if selectors[0].evaluate("el => el.tagName") == "SELECT": options = selectors[0].locator("option").all() - if len(options) > 1: # skip "select pipeline" placeholder - selectors[0].select_option(index=1) + assert len(options) > 1, "No pipelines available; create one before running e2e tests" + selectors[0].select_option(index=1) else: # for custom dropdowns, click first item items = page.locator('[role="option"]').all() - if len(items) > 0: - items[0].click() + assert len(items) > 0, "No pipeline options found in custom dropdown" + items[0].click()
🧹 Nitpick comments (2)
.claude/skills/address-pr-review/scripts/fetch_comments.py (2)
23-31: Silent failure makes debugging harder.
fetch_comments()prints stderr on failure, butget_repo()exits silently. Ifghisn't authenticated or there's no repo context, users get no feedback.♻️ Suggested fix
def get_repo() -> str: result = subprocess.run( ["gh", "repo", "view", "--json", "owner,name", "-q", '.owner.login + "/" + .name'], capture_output=True, text=True, ) if result.returncode != 0: + print(f"failed to get repo: {result.stderr.strip()}", file=sys.stderr) sys.exit(1) return result.stdout.strip()
162-166: Unknown modes silently treated as--unresolved.A typo like
--unreslvedwould be silently accepted. Consider warning when an unrecognized mode is used.♻️ Optional: warn on unknown modes
- if mode == "--unresolved" or mode not in ["--all", "--id", "--summary"]: + known_modes = ["--unresolved", "--all", "--id", "--summary"] + if mode not in known_modes: + print(f"unknown mode '{mode}', defaulting to --unresolved", file=sys.stderr) + if mode == "--unresolved" or mode not in known_modes: top_level = [c for c in top_level if not is_resolved(c)] print(f"showing {len(top_level)} unresolved comments")
Description
Related Issue
Checklist
make formatpassesmake pre-mergepassesSummary by CodeRabbit
New Features
Documentation
Tests
Bug Fixes
✏️ Tip: You can customize this high-level summary in your review settings.