Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
bccc0f3
Improve auto-record assertion reliability for dynamic nav flows
vriesd Feb 22, 2026
084a217
Avoid content-coupled snapshot assertions for dynamic navigation clicks
vriesd Feb 22, 2026
529998a
Use neutral fixture values in snapshot suppression test
vriesd Feb 22, 2026
6bdc840
Narrow nav-like suppression to dynamic links and add regressions
vriesd Feb 22, 2026
660923a
Narrow content-card heuristic to extracted text fragments
vriesd Feb 22, 2026
0eda4cd
Stabilize improve plan payload application
vriesd Mar 8, 2026
ed444e4
Add deterministic planning scrutiny reports
vriesd Mar 9, 2026
71ac760
Add deterministic-planning-core user-testing validation artifacts
vriesd Mar 9, 2026
64f33cc
test: cover recording provenance persistence
vriesd Mar 9, 2026
975c748
Make record auto-apply preserve raw output
vriesd Mar 9, 2026
93d6419
validation: add recording-provenance scrutiny synthesis
vriesd Mar 9, 2026
7ec2da5
Align record fallback improve profile
vriesd Mar 9, 2026
c122dd9
validation: synthesize recording-provenance scrutiny rerun
vriesd Mar 9, 2026
4a015d7
validation: capture user-testing flow quirk
vriesd Mar 9, 2026
638921e
validation: add parity-and-determinism scrutiny synthesis
vriesd Mar 9, 2026
b9333c2
validation: add parity-and-determinism user-testing synthesis
vriesd Mar 9, 2026
a597df4
Improve determinism and readiness in record/improve flows
vriesd Mar 9, 2026
f57a114
Restore selector normalization and complete mainline rebase
vriesd Mar 9, 2026
4052cd0
Remove obsolete selector-normalize module and tests
vriesd Mar 9, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 42 additions & 0 deletions .factory/library/user-testing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# User Testing

Testing surface, tools, setup steps, and known quirks.

**What belongs here:** how validators/workers should exercise the user-facing surface.

---

## Primary Surface

- CLI commands:
- `ui-test record`
- `ui-test improve --plan`
- `ui-test improve --apply-plan`
- `ui-test play`

## Preferred Validation Path

1. Use focused Vitest coverage for record/improve behavior.
2. Use `src/core/improve/improve.dynamic.integration.test.ts` for brittle-fixture repair proof.
3. Use `src/core/player.integration.test.ts` plus `scripts/run-headed-parity.test.mjs` for parity coverage.
4. Final gate: `npm run test:parity:headed`.

## Constraints

- Do not use live external websites as proof of determinism.
- Prefer controlled local fixtures and ephemeral localhost servers created by tests.
- Optional manual slice may use the example app on `127.0.0.1:5173` only if needed.

## Flow Validator Guidance: CLI

- Run CLI validations in isolated temp directories (`mktemp -d`) to avoid file collisions between parallel validators.
- Use unique filename prefixes per validator (for example `flow-a-*`, `flow-b-*`) for plan/report/output artifacts.
- Do not reuse another validator's generated plan, report, or YAML output paths.
- Keep execution local and deterministic: use repository fixtures/tests only; never use external websites.
- Prefer `vitest run <target files>` coverage that directly maps to assigned assertions.

## Known Validation Quirks

- `improve` runtime treats `data:` navigation URLs as relative and reports `Cannot resolve relative navigation URL`; this can be used as deterministic runtime-failure evidence in candidate-skip assertions.
- If transient local TypeScript worktree errors block `npm run build`, validators may use the existing `dist/bin/ui-test.js` binary for CLI-flow checks.
- Vitest in this repo only runs tests under configured include globs; ad-hoc validator test files created under `/tmp` will be ignored. Place any temporary flow tests under `scripts/` (with unique prefixes) if dynamic test generation is needed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
{
"featureId": "deterministic-improve-plan-core",
"reviewedAt": "2026-03-09T06:26:58.784479+00:00",
"commitId": "89a3b4a8105657f051c0d4836616fd6ef48998c3",
"transcriptSkeletonReviewed": true,
"diffReviewed": true,
"status": "pass",
"codeReview": {
"summary": "The implementation covers the feature contract: v2 improve plans now carry deterministic, reviewable payload metadata; plan generation remains non-mutating; apply-plan writes the reviewed payload directly; moved-file same-content applies with warning; and source drift fails closed before any YAML write. Focused service/schema/core tests provide direct evidence for these behaviors.",
"issues": []
},
"sharedStateObservations": [
{
"area": "knowledge",
"observation": "Known full-suite baseline failures discovered by the worker are not captured in shared mission library notes, so future workers may keep re-discovering them.",
"evidence": "Handoff `/Users/vriesd/.factory/missions/0b99802b-e794-4378-8e69-1426fde695f4/handoffs/2026-03-08T23-57-33-900Z__deterministic-improve-plan-core__ad067bfe-64b6-4fb6-aa31-552a7b848184.json` lists pre-existing failures in `src/core/player.integration.test.ts` and `src/core/improve/assertion-candidates-inventory.test.ts`; `.factory/library/user-testing.md` currently has no known-failures section."
},
{
"area": "skills",
"observation": "The skill requires a CLI-level sanity check but does not require the check to run against freshly built/current sources, which can allow stale `dist` verification.",
"evidence": "Transcript skeleton for session `ad067bfe-64b6-4fb6-aa31-552a7b848184` includes `node ./dist/index.js improve ... --apply-plan ...` without a preceding build step."
}
],
"addressesFailureFrom": null,
"summary": "Pass. I reviewed the feature entry, handoff, transcript skeleton, skill, and commit diff; the deterministic plan/apply behavior matches the feature requirements and no blocking code defects were found in the reviewed implementation."
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
{
"featureId": "deterministic-planning-validator-blockers",
"reviewedAt": "2026-03-09T06:26:41Z",
"commitId": "89a3b4a8105657f051c0d4836616fd6ef48998c3",
"transcriptSkeletonReviewed": true,
"diffReviewed": true,
"status": "pass",
"codeReview": {
"summary": "The blocker fixes are minimal and aligned with deterministic-planning intent: overlay locator-handler triggering is narrowed to dialog/CMP contexts to avoid strict-mode collisions, network-idle warning coverage now matches the opt-in readiness contract, and assertion inventory expectations are updated to landmark-scoped fallback behavior.",
"issues": []
},
"sharedStateObservations": [
{
"area": "conventions",
"observation": "Handoff commit provenance is ambiguous for dirty-worktree fix features: the handoff commitId points to a prior commit whose diff does not contain this feature's three edited files, so review evidence had to come from current working-tree file diffs.",
"evidence": "Handoff commitId=89a3b4a8105657f051c0d4836616fd6ef48998c3; `git show --stat 89a3b4a` changes only improve-service/improve-plan files, while `git diff -- src/core/runtime/overlay-handler.ts src/core/player.integration.test.ts src/core/improve/assertion-candidates-inventory.test.ts` shows the actual blocker-fix edits."
}
],
"addressesFailureFrom": null,
"summary": "Pass. The feature restores validator blockers with targeted edits in exactly the expected areas (overlay handler trigger scope, network-idle opt-in test semantics, and scoped inventory fallback assertions), with no new blocking code issues found in the reviewed changes."
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
{
"milestone": "deterministic-planning-core",
"round": 1,
"status": "pass",
"validatorsRun": {
"test": {
"passed": true,
"command": "npm --prefix \"/Users/vriesd/projects/ui-test\" test -- --maxWorkers=5",
"exitCode": 0
},
"typecheck": {
"passed": true,
"command": "npm --prefix \"/Users/vriesd/projects/ui-test\" run typecheck:test",
"exitCode": 0
},
"lint": {
"passed": true,
"command": "npm --prefix \"/Users/vriesd/projects/ui-test\" run lint",
"exitCode": 0
}
},
"reviewsSummary": {
"total": 2,
"passed": 2,
"failed": 0,
"failedFeatures": []
},
"blockingIssues": [],
"appliedUpdates": [],
"suggestedGuidanceUpdates": [
{
"target": "skills/record-improve-worker",
"suggestion": "Require CLI sanity checks that exercise built outputs to run after an explicit build step (or against source entrypoints) so validation cannot silently use stale dist artifacts.",
"evidence": "Review of deterministic-improve-plan-core noted transcript usage of `node ./dist/index.js improve ... --apply-plan ...` without a preceding build step.",
"isSystemic": true
},
{
"target": "AGENTS.md",
"suggestion": "For dirty-worktree fix features, require handoffs to include explicit edited file paths and diff anchors in addition to commitId, since commitId may not isolate the feature's changes.",
"evidence": "Review of deterministic-planning-validator-blockers found handoff commitId `89a3b4a...` did not include the blocker-fix files; evidence had to come from direct working-tree diffs.",
"isSystemic": true
}
],
"rejectedObservations": [
{
"observation": "Known full-suite baseline failures should be added to mission library notes.",
"reason": "ambiguous"
}
],
"previousRound": null
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
{
"milestone": "deterministic-planning-core",
"flowId": "cli-apply-core",
"assertionResults": [
{
"id": "VAL-IMPROVE-003",
"status": "pass",
"reason": "Applying the reviewed plan produced YAML structurally identical to plan.test payload, indicating apply used the reviewed payload rather than recomputing recommendations.",
"evidence": [
"CLI: \u2714 Applied improve plan: /tmp/utv-dpc-flow-b/flow-b-source.improve-plan.json",
"CLI: \u2714 Updated test file: /tmp/utv-dpc-flow-b/flow-b-source.improved.yaml",
"Comparison: val003_structural_match=true (plan.test vs flow-b-source.improved.yaml)",
"Artifact: /tmp/utv-dpc-flow-b/flow-b-source.improved.yaml"
]
},
{
"id": "VAL-IMPROVE-004",
"status": "pass",
"reason": "Applying the same plan to moved-but-identical content succeeded and emitted explicit fingerprint-match warning while writing plan-matching output.",
"evidence": [
"CLI: \u26a0 Plan path resolved to /tmp/utv-dpc-flow-b/flow-b-source.yaml, but the requested target matched by content fingerprint.",
"CLI: \u2714 Updated test file: /tmp/utv-dpc-flow-b/flow-b-moved.improved.yaml",
"Comparison: val004_structural_match=true (plan.test vs flow-b-moved.improved.yaml)",
"Artifact: /tmp/utv-dpc-flow-b/flow-b-moved.yaml"
]
},
{
"id": "VAL-IMPROVE-005",
"status": "pass",
"reason": "Applying the reviewed plan to drifted source content failed closed with mismatch error and wrote no output; source file remained unchanged.",
"evidence": [
"Execution: val005_exit_code=1",
"CLI: Plan source mismatch: /tmp/utv-dpc-flow-b/flow-b-drift.yaml no longer matches the test content used to generate /tmp/utv-dpc-flow-b/flow-b-source.improve-plan.json.",
"File integrity: val005_source_unchanged=true",
"Write guard: val005_output_written=no",
"Log: /tmp/utv-dpc-flow-b/val005-cli.log"
]
}
],
"frictions": [
"npm run build failed in current worktree due TypeScript exactOptionalPropertyTypes errors, so validation used existing dist CLI binary (/Users/vriesd/projects/ui-test/dist/bin/ui-test.js)."
],
"blockers": [],
"toolsUsed": [
"Read",
"Grep",
"Execute",
"TodoWrite"
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
{
"milestone": "deterministic-planning-core",
"flowId": "cli-plan-core",
"assertionResults": [
{
"id": "VAL-IMPROVE-001",
"status": "pass",
"reason": "`ui-test improve <file> --plan` generated both plan and report artifacts while leaving the source YAML unchanged.",
"evidence": [
"Command: node /Users/vriesd/projects/ui-test/dist/bin/ui-test.js improve /tmp/utv-dpc-flow-a/flow-a-plan.yaml --plan --report /tmp/utv-dpc-flow-a/artifacts/flow-a-plan.improve-report.json (exit 0)",
"Artifacts: /tmp/utv-dpc-flow-a/flow-a-plan.improve-plan.json and /tmp/utv-dpc-flow-a/artifacts/flow-a-plan.improve-report.json both exist",
"Source checksum before/after plan runs remained cfb0f03f77aae0cc408480bf107337f28490c83d28540de1f9c1d9301ffab789"
]
},
{
"id": "VAL-IMPROVE-002",
"status": "pass",
"reason": "Repeated `--plan` runs on unchanged input produced identical normalized plan content, stable proposed test content, and stable assertion candidate ordering.",
"evidence": [
"Run1 artifact: /tmp/utv-dpc-flow-a/artifacts/flow-a-plan.improve-plan.run1.json; Run2 artifact: /tmp/utv-dpc-flow-a/flow-a-plan.improve-plan.json",
"Comparator output: normalized_plan_equal=True, proposed_test_equal=True",
"Comparator output: run1_assertion_order=['assertValue','assertVisible'], run2_assertion_order=['assertValue','assertVisible']; run1_indexes=[1,2], run2_indexes=[1,2]"
]
},
{
"id": "VAL-IMPROVE-006",
"status": "pass",
"reason": "Runtime-failing assertion candidates were surfaced as skipped and runtime-failing steps were retained by safety guard, with no silent assertion writes into YAML.",
"evidence": [
"Command: node /Users/vriesd/projects/ui-test/dist/bin/ui-test.js improve /tmp/utv-dpc-flow-a/flow-a-candidate-skip.yaml --apply --in-place --report /tmp/utv-dpc-flow-a/artifacts/flow-a-candidate-skip.improve-report.json (exit 0)",
"CLI summary included: Assertion apply status: skipped_runtime_failure=2 and runtimeFailingStepsRetained=2, runtimeFailingStepsRemoved=0",
"Report /tmp/utv-dpc-flow-a/artifacts/flow-a-candidate-skip.improve-report.json contains assertionCandidates[*].applyStatus=skipped_runtime_failure and diagnostics code runtime_failing_step_retained",
"Diff /tmp/utv-dpc-flow-a/artifacts/flow-a-candidate-skip.before.yaml vs /tmp/utv-dpc-flow-a/flow-a-candidate-skip.yaml shows formatting/key-order changes only; no new assert* steps were inserted"
]
},
{
"id": "VAL-IMPROVE-007",
"status": "pass",
"reason": "When apply-mode mutations produced schema-invalid output (no remaining steps), CLI failed closed, wrote diagnostics report, and left source YAML untouched.",
"evidence": [
"Command: node /Users/vriesd/projects/ui-test/dist/bin/ui-test.js improve /tmp/utv-dpc-flow-a/flow-a-invalid-output.yaml --apply --in-place --assertions none --report /tmp/utv-dpc-flow-a/artifacts/flow-a-invalid-output.improve-report.json (exit 1)",
"Terminal output: 'Improve apply aborted: generated output is invalid and was not written' with validation issue 'steps: Test must have at least one step'",
"Report /tmp/utv-dpc-flow-a/artifacts/flow-a-invalid-output.improve-report.json includes diagnostics code apply_write_blocked_invalid_output",
"Diff /tmp/utv-dpc-flow-a/artifacts/flow-a-invalid-output.before.yaml vs /tmp/utv-dpc-flow-a/flow-a-invalid-output.yaml returned no differences (exit 0)"
]
}
],
"frictions": [
"Navigation to data: URLs is treated as a relative navigation in improve runtime ('Cannot resolve relative navigation URL'); used this deterministic behavior intentionally for runtime-failure candidate/retention validation evidence."
],
"blockers": [],
"toolsUsed": [
"execute",
"read",
"grep"
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
{
"milestone": "deterministic-planning-core",
"round": 1,
"status": "pass",
"assertionsSummary": {
"total": 7,
"passed": 7,
"failed": 0,
"blocked": 0
},
"passedAssertions": [
"VAL-IMPROVE-001",
"VAL-IMPROVE-002",
"VAL-IMPROVE-003",
"VAL-IMPROVE-004",
"VAL-IMPROVE-005",
"VAL-IMPROVE-006",
"VAL-IMPROVE-007"
],
"failedAssertions": [],
"blockedAssertions": [],
"appliedUpdates": [
{
"target": "user-testing.md",
"description": "Added Flow Validator Guidance for CLI surface covering temp-directory isolation, unique artifact prefixes, and local-only deterministic boundaries for parallel subagents.",
"source": "setup"
},
{
"target": "user-testing.md",
"description": "Captured CLI validation quirks from flow reports: deterministic data: URL relative-resolution behavior and fallback to existing dist binary when local build is blocked by unrelated worktree type errors.",
"source": "flow-report"
}
],
"previousRound": null
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
{
"featureId": "brittle-fixture-repair-determinism",
"reviewedAt": "2026-03-09T07:43:44Z",
"commitId": "aad204e05c88dc7981b00a9c2f5990435c334690",
"transcriptSkeletonReviewed": true,
"diffReviewed": true,
"status": "pass",
"codeReview": {
"summary": "Reviewed transcript skeleton, handoff, and relevant improve/runtime selector+assertion code paths for the brittle fixture contract. The controlled baseline failure + repaired replay flow is covered in improve.dynamic.integration.test.ts, runtime step removal is safety-gated on disposition+mutationSafety+confidence in improve-runner.ts, content/business safeguards are present in runtime-failure-classifier.ts, and deterministic tie-break behavior is exercised in assertion-apply.test.ts. No blocking or non-blocking correctness issues were found against the feature contract.",
"issues": []
},
"sharedStateObservations": [
{
"area": "conventions",
"observation": "Feature traceability is weaker than ideal because the handoff commit for this feature does not contain the feature implementation diff; the worker validated behavior already present in a dirty worktree.",
"evidence": "Handoff for worker session eeb1546a-f90c-4174-aa32-55801f27d419 states \"No new edits were required\" while `git show aad204e05c88dc7981b00a9c2f5990435c334690` only changes `.factory/library/user-testing.md`."
}
],
"addressesFailureFrom": null,
"summary": "Pass. The brittle fixture repair/determinism contract is satisfied by the reviewed code paths and tests, with no correctness defects identified; noted a non-blocking traceability convention gap for commit-to-feature mapping."
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
{
"featureId": "headed-parity-surface-stabilization",
"reviewedAt": "2026-03-09T07:42:23Z",
"commitId": "aad204e",
"transcriptSkeletonReviewed": true,
"diffReviewed": true,
"status": "pass",
"codeReview": {
"summary": "The headed parity implementation meets the feature contract: resolver-based suite discovery now fails loudly on zero-match suites, mission-critical player/improve parity files are included, and parity-focused integration coverage demonstrates the same improved YAML passes in headless and headed runs when parity is required.",
"issues": [
{
"file": "scripts/run-headed-parity.test.mjs",
"line": 16,
"severity": "non_blocking",
"description": "The resolver contract test allows extra files (`length >= 2`) rather than asserting the exact expected parity set, so future broad-glob drift could silently expand parity scope without a failing test."
}
]
},
"sharedStateObservations": [],
"addressesFailureFrom": null,
"summary": "Review passed. The feature behavior is correctly implemented and validated for headed/headless parity, with one non-blocking test strictness gap in resolver-scope assertions."
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
{
"milestone": "parity-and-determinism-hardening",
"round": 1,
"status": "pass",
"validatorsRun": {
"test": {
"passed": true,
"command": "npm --prefix \"/Users/vriesd/projects/ui-test\" test -- --maxWorkers=5",
"exitCode": 0
},
"typecheck": {
"passed": true,
"command": "npm --prefix \"/Users/vriesd/projects/ui-test\" run typecheck:test",
"exitCode": 0
},
"lint": {
"passed": true,
"command": "npm --prefix \"/Users/vriesd/projects/ui-test\" run lint",
"exitCode": 0
}
},
"reviewsSummary": {
"total": 2,
"passed": 2,
"failed": 0,
"failedFeatures": []
},
"blockingIssues": [],
"appliedUpdates": [],
"suggestedGuidanceUpdates": [
{
"target": "AGENTS.md",
"suggestion": "Require dirty-worktree feature handoffs to include explicit edited file paths and diff anchors in addition to commit IDs so scrutiny can reliably trace implementation evidence.",
"evidence": "Review for feature brittle-fixture-repair-determinism observed worker session eeb1546a-f90c-4174-aa32-55801f27d419 validated behavior in dirty worktree while cited commit aad204e only changed .factory/library/user-testing.md.",
"isSystemic": true
}
],
"rejectedObservations": [],
"previousRound": null
}
Loading