VeriFlow
A QA automation copilot that turns Azure DevOps requirements (or manual user stories) into runnable Playwright browser tests, executes them, and presents structured, reviewable results.
The problem
Requirements live in user stories; automation lives in separate test code. VeriFlow closes that gap by generating executable browser tests directly from requirement text and making the outcomes easy to review.
What it does
- Imports Azure DevOps work items via WIQL to pull user stories and acceptance criteria directly.
- Generates a test plan, test cases, and a runnable Playwright spec (ES module syntax) from requirements plus acceptance criteria.
- Executes the generated suite against a target URL.
- Reports structured results: totals, durations, and per-test pass/fail/skip outcomes.
- Produces plain-English run summaries (AI-generated, with a deterministic fallback) and persists refresh-safe run history with filtering. Scheduled reruns (5m / 30m / 1h) run while the server is active.
Approach & architecture
Claude turns a requirement into a test plan, test cases, and a Playwright spec written to generated.spec.js. The suite runs via Playwright with its JSON reporter, which produces an authoritative structured report. Run success is derived from real pass/fail/skip totals rather than scraped stdout, and history is persisted so reviews survive a refresh.
how it fits together
Requirements → sandboxed, self-healing browser tests
Turns user stories into runnable Playwright tests, executes them safely, and reports results you can trust.
- Requirement intakeAzure DevOps work item (WIQL) or a manual user story.
- Claude test generationProduces a test plan, test cases, and a runnable Playwright spec from acceptance criteria.
- Sandboxed executionDocker (dropped caps, no secrets) or hardened process fallback. LLM-generated code is treated as untrusted and isolated from the host.
- Playwright JSON reporterAuthoritative pass / fail / skip totals — no brittle stdout scraping.
- AI self-heal (on selector failure)Capture live DOM → Claude proposes one replacement selector → rewrite spec → re-run once (bounded).
- Structured report + traceability matrixEach acceptance criterion mapped to its covering test(s) + pass/fail.
- Run summary + persisted historyPlain-English summary (with fallback); refresh-safe history; scheduled reruns.
product screens
key engineering decisions
Trustworthy result reporting
Replaced brittle reporter-text scraping with Playwright's JSON reporter and derived success from authoritative pass/fail/skip totals, fixing a false-positive where runs with zero tests reported as "passed." Verified across all-pass, mixed, and zero-test scenarios.
Resilient run pipeline
Parses the structured report in both the success and non-zero-exit paths, and persists history for refresh-safe review.
results & outcomes
- Requirements become executable tests; QA becomes repeatable and verifiable.
- Eliminated a false-positive that reported zero-test runs as passing.
- Plain-English summaries, persisted history, and scheduled reruns make results easy to review over time.



