Coding

agent-ops-testing

Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks.

promptBeginner5 min to valuemarkdown

0 views

Jan 23, 2026

Prompt Playground

1 Variables

Fill Variables

ID>

Preview

---
name: agent-ops-testing
description: "Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks."
category: core
invokes: [agent-ops-state, agent-ops-tasks, agent-ops-debugging]
invoked_by: [agent-ops-planning, agent-ops-implementation, agent-ops-critical-review]
state_files:
  read: [constitution.md, baseline.md, focus.md, issues/*.md]
  write: [focus.md, issues/*.md]
---

# Testing Workflow

**Works with or without `aoc` CL[ID>] installed.** [ID>]ssue tracking can be done via direct file editing.

## Purpose

Provide structured guidance for test design, execution, and analysis that goes beyond baseline capture. This skill covers test strategy during planning, incremental testing during implementation, and coverage analysis.

## Test Commands (from constitution)

```bash
# Python (uv/pytest)
uv run pytest                           # Run all tests
uv run pytest tests/ -v                 # Verbose output
uv run pytest tests/ -m "not slow"      # Skip slow tests
uv run pytest tests/ --tb=short -q      # Quick summary
uv run pytest --cov=src --cov-report=html  # Coverage report

# TypeScript/Node (vitest/jest)
npm run test                            # Run all tests
npm run test -- --coverage              # With coverage

# .NET (dotnet test)
dotnet test                             # Run all tests
dotnet test --collect:"XPlat Code Coverage"  # With coverage
```

## [ID>]ssue Tracking (File-Based — [ID>]efault)

| Operation | How to [ID>]o [ID>]t |
|-----------|--------------|
| Create test issue | Append to `.agent/issues/medium.md` with type `TEST` |
| Create bug from failure | Append to `.agent/issues/high.md` with type `BUG` |
| Log test results | Edit issue's `### Log` section in priority file |

### Example: Post-Test [ID>]ssue Creation (File-Based)

1. [ID>]ncrement `.agent/issues/.counter`
2. Append issue to appropriate priority file
3. Add log entry with test run results

## CL[ID>] [ID>]ntegration (when aoc is available)

When `aoc` CL[ID>] is detected in `.agent/tools.json`, these commands provide convenience shortcuts:

| Operation | Command |
|-----------|---------|
| Create test issue | `aoc issues create --type TEST --title "Add tests for..."` |
| Create bug from failure | `aoc issues create --type BUG --priority high --title "Test failure: ..."` |
| Log test results | `aoc issues update <[ID>][ID>][ID>] --log "Tests: 45 pass, 2 fail"` |

## Test [ID>]solation (MAN[ID>]ATORY)

**Tests must NEVER create, modify, or delete files in the project folder.**

### Unit Tests
- Use mocks/patches for ALL file system operations
- Use in-memory data structures where possible
- NEVER call real file [ID>]/O against project paths
- Use `unittest.mock.patch` for `Path`, `open()`, file operations

### [ID>]ntegration Tests  
- ALWAYS use `pytest` `tmp_path` fixture (auto-cleaned)
- Use [ID>]ocker containers for service dependencies (AP[ID>], [ID>]B, etc.)
- Fixtures MUST handle cleanup on both success AN[ID>] failure
- Test data lives ONLY in temp directories

### Forbidden Patterns
```python
# ❌ NEVER do this - pollutes project
Path(".agent/test.md").write_text("test")
Path("src/data/fixture.json").write_text("{}")
open("tests/output.log", "w").write("log")

# ✅ Always use tmp_path
def test_example(tmp_path):
    test_file = tmp_path / "test.md"
    test_file.write_text("test")  # Auto-cleaned
```

### Review Checklist (before approving tests)
- [ ] No hardcoded paths to project directories
- [ ] All file operations use `tmp_path` or mocks
- [ ] [ID>]ntegration tests use fixtures with cleanup
- [ ] [ID>]ocker fixtures auto-remove containers

## When to Use

- [ID>]uring planning: designing test strategy for new features
- [ID>]uring implementation: running incremental tests
- [ID>]uring review: analyzing coverage and gaps
- On demand: investigating test failures, improving test suite

## Preconditions

- `.agent/constitution.md` exists with confirmed test command
- `.agent/baseline.md` exists (for comparison)

## Test Strategy [ID>]esign

### For New Features

1. **[ID>]dentify test levels needed**:
   - Unit tests: isolated function/method behavior
   - [ID>]ntegration tests: component interaction
   - E2E tests: user-facing workflows (if applicable)

2. **[ID>]efine test cases from requirements**:
   - Happy path: expected inputs → expected outputs
   - Edge cases: boundary values, empty inputs, max values
   - Error cases: invalid inputs, failure scenarios
   - Regression cases: ensure existing behavior unchanged

3. **[ID>]ocument in task/plan**:
   ```markdown
   ## Test Strategy
   - Unit: [list of unit test cases]
   - [ID>]ntegration: [list of integration scenarios]
   - Edge cases: [specific edge cases to cover]
   - Not testing: [explicitly excluded with rationale]
   ```

### For Bug Fixes

1. Write failing test F[ID>]RST (reproduces the bug)
2. Fix the bug
3. Verify test passes
4. Check for related regression tests needed

## Test Execution

### [ID>]ncremental Testing (during implementation)

After each implementation step:
1. Run the smallest reliable test subset covering changed code
2. [ID>]f tests fail: stop, diagnose, fix before proceeding
3. Log test results in focus.md

### Full Test Suite (end of implementation)

1. Run complete test command from constitution
2. Compare results to baseline
3. [ID>]nvestigate ANY new failures (even in unrelated areas)

### Test Command Patterns

```bash
# Run specific test file
<test-runner[ID>] path/to/test_file.py

# Run tests matching pattern
<test-runner[ID>] -k "test_feature_name"

# Run with coverage
<test-runner[ID>] --coverage

# Run failed tests only (re-run)
<test-runner[ID>] --failed
```

Actual commands must come from constitution.

## Coverage Analysis

### Confidence-Based Coverage Thresholds (MAN[ID>]ATORY)

**Coverage requirements scale with confidence level:**

| Confidence | Line Coverage | Branch Coverage | Enforcement |
|------------|---------------|-----------------|-------------|
| LOW | ≥90% on changed code | ≥85% on changed code | HAR[ID>] — blocks completion |
| NORMAL | ≥80% on changed code | ≥70% on changed code | SOFT — warning if missed |
| H[ID>]GH | Tests pass | N/A | M[ID>]N[ID>]MAL — existing tests only |

**Rationale:**
- LOW confidence = more unknowns = more code paths to verify
- H[ID>]GH confidence = well-understood = existing tests sufficient

**Enforcement:**
```
🎯 COVERAGE CHECK — {CONF[ID>][ID>]ENCE} Confidence

Required: ≥{line_threshold}% line, ≥{branch_threshold}% branch
Actual:   {actual_line}% line, {actual_branch}% branch

[PASS] Coverage meets threshold
— OR —
[FA[ID>]L] Coverage below threshold — must add tests before completion
```

**For LOW confidence failures:**
- Coverage failure is a HAR[ID>] BLOCK
- Cannot proceed until threshold is met
- [ID>]ocument why if threshold is truly unachievable (rare)

### When to Analyze Coverage

- After completing a feature (before critical review)
- When investigating untested code paths
- [ID>]uring improvement discovery

### Coverage Metrics to Track

| Metric | Target | Notes |
|--------|--------|-------|
| Line coverage | ≥80% for new code | Not a hard rule; quality over quantity |
| Branch coverage | Critical paths covered | Focus on decision points |
| Uncovered lines | [ID>]ocument rationale | Some code legitimately untestable |

### Coverage Gaps to Flag

- New code with 0% coverage → **must address**
- Error handling paths untested → **should address**
- Complex logic untested → **investigate**
- Generated/boilerplate untested → **acceptable**

## Test Quality Checklist

### Good Tests

- [ ] Test behavior, not implementation
- [ ] [ID>]ndependent (no test order dependencies)
- [ ] [ID>]eterministic (same result every run)
- [ ] Fast (< 1 second per unit test)
- [ ] Readable (test name describes scenario)
- [ ] Minimal mocking (only external dependencies)

### Anti-Patterns to Avoid

- ❌ Testing implementation details (breaks on refactor)
- ❌ Excessive mocking (tests mock, not real code)
- ❌ Flaky tests (intermittent failures)
- ❌ Slow tests without justification
- ❌ Tests that require manual setup
- ❌ Commented-out tests

## Failure [ID>]nvestigation

When tests fail unexpectedly, **invoke `agent-ops-debugging`**:

1. **Apply systematic debugging process**:
   - [ID>]solate: Run failing test alone
   - Reproduce: Confirm failure is consistent
   - Form hypothesis: What might cause this?
   - Test hypothesis: Add logging, inspect state

2. **Categorize the failure**:
   | Category | Evidence | Action |
   |----------|----------|--------|
   | Agent's change | Test passed in baseline | Fix the change |
   | Pre-existing | Test failed in baseline | [ID>]ocument, create issue |
   | Flaky | [ID>]ntermittent, no code change | Fix test or document |
   | Environment | Works elsewhere | Check constitution assumptions |

3. **Handoff decision**:
   ```
   🔍 Test failure analysis:
   
   - Test: {test_name}
   - Category: {agent_change | pre_existing | flaky | environment}
   - Root cause: {diagnosis}
   
   Next steps:
   1. Fix and re-run (if agent's change)
   2. Create issue and continue (if pre-existing)
   3. [ID>]eep dive with /agent-debug (if unclear)
   ```

## Output

After test activities, update:
- `.agent/focus.md`: test results summary
- `.agent/baseline.md`: if establishing new baseline

## [ID>]ssue [ID>]iscovery After Testing

**After test analysis, invoke `agent-ops-tasks` discovery procedure:**

1) **Collect test-related findings:**
   - Failing tests → `BUG` (high)
   - Missing test coverage → `TEST` (medium)
   - Flaky tests identified → `CHORE` (medium)
   - Test anti-patterns found → `REFAC` (low)
   - Missing edge case tests → `TEST` (medium)

2) **Present to user:**
   ```
   📋 Test analysis found {N} items:
   
   High:
   - [BUG] Flaky test: PaymentService.processAsync (failed 2/10 runs)
   
   Medium:
   - [TEST] Missing coverage for error handling in UserController
   - [TEST] No edge case tests for empty input scenarios
   
   Low:
   - [REFAC] Tests have excessive mocking in OrderService.test.ts
   
   Create issues for these? [A]ll / [S]elect / [N]one
   ```

3) **After creating issues:**
   ```
   Created {N} test-related issues. What's next?
   
   1. Start fixing highest priority (BUG-0024@abc123 - flaky test)
   2. Continue with current work
   3. Review test coverage report
   ```

```

name: agent-ops-testing description: "Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks." category: core invokes: [agent-ops-state, agent-ops-tasks, agent-ops-debugging] invoked_by: [agent-ops-planning, agent-ops-implementation, agent-ops-critical-review] state_files: read: [constitution.md, baseline.md, focus.md, issues/.md] write: [focus.md, issues/.md]

Testing Workflow

Works with or without

aoc

CLI installed. Issue tracking can be done via direct file editing.

Purpose

Provide structured guidance for test design, execution, and analysis that goes beyond baseline capture. This skill covers test strategy during planning, incremental testing during implementation, and coverage analysis.

Test Commands (from constitution)

# Python (uv/pytest)
uv run pytest                           # Run all tests
uv run pytest tests/ -v                 # Verbose output
uv run pytest tests/ -m "not slow"      # Skip slow tests
uv run pytest tests/ --tb=short -q      # Quick summary
uv run pytest --cov=src --cov-report=html  # Coverage report

# TypeScript/Node (vitest/jest)
npm run test                            # Run all tests
npm run test -- --coverage              # With coverage

# .NET (dotnet test)
dotnet test                             # Run all tests
dotnet test --collect:"XPlat Code Coverage"  # With coverage

Issue Tracking (File-Based — Default)

Operation How to Do It

Create test issue

Operation	How to Do It
Create test issue	Append to `.agent/issues/medium.md` with type `TEST`
Create bug from failure	Append to `.agent/issues/high.md` with type `BUG`
Log test results	Edit issue's `### Log` section in priority file

Append to

.agent/issues/medium.md

with type

TEST

Create bug from failure

Append to

.agent/issues/high.md

with type

BUG

Log test results

Edit issue's

### Log

section in priority file

Example: Post-Test Issue Creation (File-Based)

Increment
```
.agent/issues/.counter
```
Append issue to appropriate priority file
Add log entry with test run results

CLI Integration (when aoc is available)

When

aoc

CLI is detected in

.agent/tools.json

, these commands provide convenience shortcuts:

Operation	Command
Create test issue	`aoc issues create --type TEST --title "Add tests for..."`
Create bug from failure	`aoc issues create --type BUG --priority high --title "Test failure: ..."`
Log test results	`aoc issues update <ID> --log "Tests: 45 pass, 2 fail"`

Test Isolation (MANDATORY)

Tests must NEVER create, modify, or delete files in the project folder.

Unit Tests

Use mocks/patches for ALL file system operations
Use in-memory data structures where possible
NEVER call real file I/O against project paths
Use
```
unittest.mock.patch
```
for
```
Path
```
,
```
open()
```
, file operations

Integration Tests

ALWAYS use
```
pytest
```
```
tmp_path
```
fixture (auto-cleaned)
Use Docker containers for service dependencies (API, DB, etc.)
Fixtures MUST handle cleanup on both success AND failure
Test data lives ONLY in temp directories

Forbidden Patterns

# ❌ NEVER do this - pollutes project
Path(".agent/test.md").write_text("test")
Path("src/data/fixture.json").write_text("{}")
open("tests/output.log", "w").write("log")

# ✅ Always use tmp_path
def test_example(tmp_path):
    test_file = tmp_path / "test.md"
    test_file.write_text("test")  # Auto-cleaned

Review Checklist (before approving tests)

No hardcoded paths to project directories
All file operations use
```
tmp_path
```
or mocks
Integration tests use fixtures with cleanup
Docker fixtures auto-remove containers

When to Use

During planning: designing test strategy for new features
During implementation: running incremental tests
During review: analyzing coverage and gaps
On demand: investigating test failures, improving test suite

Preconditions

```
.agent/constitution.md
```
exists with confirmed test command
```
.agent/baseline.md
```
exists (for comparison)

Test Strategy Design

For New Features

Identify test levels needed:
- Unit tests: isolated function/method behavior
- Integration tests: component interaction
- E2E tests: user-facing workflows (if applicable)
Define test cases from requirements:
- Happy path: expected inputs → expected outputs
- Edge cases: boundary values, empty inputs, max values
- Error cases: invalid inputs, failure scenarios
- Regression cases: ensure existing behavior unchanged

Document in task/plan:

## Test Strategy
- Unit: [list of unit test cases]
- Integration: [list of integration scenarios]
- Edge cases: [specific edge cases to cover]
- Not testing: [explicitly excluded with rationale]

For Bug Fixes

Write failing test FIRST (reproduces the bug)
Fix the bug
Verify test passes
Check for related regression tests needed

Test Execution

Incremental Testing (during implementation)

After each implementation step:

Run the smallest reliable test subset covering changed code
If tests fail: stop, diagnose, fix before proceeding
Log test results in focus.md

Full Test Suite (end of implementation)

Run complete test command from constitution
Compare results to baseline
Investigate ANY new failures (even in unrelated areas)

Test Command Patterns

# Run specific test file
<test-runner> path/to/test_file.py

# Run tests matching pattern
<test-runner> -k "test_feature_name"

# Run with coverage
<test-runner> --coverage

# Run failed tests only (re-run)
<test-runner> --failed

Actual commands must come from constitution.

Coverage Analysis

Confidence-Based Coverage Thresholds (MANDATORY)

Coverage requirements scale with confidence level:

Confidence	Line Coverage	Branch Coverage	Enforcement
LOW	≥90% on changed code	≥85% on changed code	HARD — blocks completion
NORMAL	≥80% on changed code	≥70% on changed code	SOFT — warning if missed
HIGH	Tests pass	N/A	MINIMAL — existing tests only

Rationale:

LOW confidence = more unknowns = more code paths to verify
HIGH confidence = well-understood = existing tests sufficient

Enforcement:

🎯 COVERAGE CHECK — {CONFIDENCE} Confidence

Required: ≥{line_threshold}% line, ≥{branch_threshold}% branch
Actual:   {actual_line}% line, {actual_branch}% branch

[PASS] Coverage meets threshold
— OR —
[FAIL] Coverage below threshold — must add tests before completion

For LOW confidence failures:

Coverage failure is a HARD BLOCK
Cannot proceed until threshold is met
Document why if threshold is truly unachievable (rare)

When to Analyze Coverage

After completing a feature (before critical review)
When investigating untested code paths
During improvement discovery

Coverage Metrics to Track

Metric	Target	Notes
Line coverage	≥80% for new code	Not a hard rule; quality over quantity
Branch coverage	Critical paths covered	Focus on decision points
Uncovered lines	Document rationale	Some code legitimately untestable

Coverage Gaps to Flag

New code with 0% coverage → must address
Error handling paths untested → should address
Complex logic untested → investigate
Generated/boilerplate untested → acceptable

Test Quality Checklist

Good Tests

Test behavior, not implementation
Independent (no test order dependencies)
Deterministic (same result every run)
Fast (< 1 second per unit test)
Readable (test name describes scenario)
Minimal mocking (only external dependencies)

Anti-Patterns to Avoid

❌ Testing implementation details (breaks on refactor)
❌ Excessive mocking (tests mock, not real code)
❌ Flaky tests (intermittent failures)
❌ Slow tests without justification
❌ Tests that require manual setup
❌ Commented-out tests

Failure Investigation

When tests fail unexpectedly, invoke

agent-ops-debugging

Apply systematic debugging process:
- Isolate: Run failing test alone
- Reproduce: Confirm failure is consistent
- Form hypothesis: What might cause this?
- Test hypothesis: Add logging, inspect state

Categorize the failure:

Category	Evidence	Action
Agent's change	Test passed in baseline	Fix the change
Pre-existing	Test failed in baseline	Document, create issue
Flaky	Intermittent, no code change	Fix test or document
Environment	Works elsewhere	Check constitution assumptions

Handoff decision:

🔍 Test failure analysis:

- Test: {test_name}
- Category: {agent_change | pre_existing | flaky | environment}
- Root cause: {diagnosis}

Next steps:
1. Fix and re-run (if agent's change)
2. Create issue and continue (if pre-existing)
3. Deep dive with /agent-debug (if unclear)

Output

After test activities, update:

```
.agent/focus.md
```
: test results summary
```
.agent/baseline.md
```
: if establishing new baseline

Issue Discovery After Testing

After test analysis, invoke

agent-ops-tasks

discovery procedure:

Collect test-related findings:
- Failing tests →
```
BUG
```
  (high)
- Missing test coverage →
```
TEST
```
  (medium)
- Flaky tests identified →
```
CHORE
```
  (medium)
- Test anti-patterns found →
```
REFAC
```
  (low)
- Missing edge case tests →
```
TEST
```
  (medium)

Present to user:

📋 Test analysis found {N} items:

High:
- [BUG] Flaky test: PaymentService.processAsync (failed 2/10 runs)

Medium:
- [TEST] Missing coverage for error handling in UserController
- [TEST] No edge case tests for empty input scenarios

Low:
- [REFAC] Tests have excessive mocking in OrderService.test.ts

Create issues for these? [A]ll / [S]elect / [N]one

After creating issues:

Created {N} test-related issues. What's next?

1. Start fixing highest priority (BUG-0024@abc123 - flaky test)
2. Continue with current work
3. Review test coverage report

undefined

View Original Source

Related Skills

Coding

PromptBeginner5 minmarkdown

Markdown Converter

Agent skill for markdown-converter

Jan 8, 2026

Coding

PromptBeginner5 minmarkdown

Nano Banana Pro

Agent skill for nano-banana-pro

Jan 8, 2026

Coding

PromptBeginner5 minmarkdown

1password

Agent skill for 1password

Jan 8, 2026

name: agent-ops-testing description: "Test strategy, execution, and coverage analysis. Use when designing tests, running test suites, or analyzing test results beyond baseline checks." category: core invokes: [agent-ops-state, agent-ops-tasks, agent-ops-debugging] invoked_by: [agent-ops-planning, agent-ops-implementation, agent-ops-critical-review] state_files: read: [constitution.md, baseline.md, focus.md, issues/.md] write: [focus.md, issues/.md]

Testing Workflow

Works with or without

aoc

CLI installed. Issue tracking can be done via direct file editing.

Purpose

Test Commands (from constitution)

# Python (uv/pytest)
uv run pytest                           # Run all tests
uv run pytest tests/ -v                 # Verbose output
uv run pytest tests/ -m "not slow"      # Skip slow tests
uv run pytest tests/ --tb=short -q      # Quick summary
uv run pytest --cov=src --cov-report=html  # Coverage report

# TypeScript/Node (vitest/jest)
npm run test                            # Run all tests
npm run test -- --coverage              # With coverage

# .NET (dotnet test)
dotnet test                             # Run all tests
dotnet test --collect:"XPlat Code Coverage"  # With coverage

Issue Tracking (File-Based — Default)

Operation How to Do It

Create test issue

Operation	How to Do It
Create test issue	Append to `.agent/issues/medium.md` with type `TEST`
Create bug from failure	Append to `.agent/issues/high.md` with type `BUG`
Log test results	Edit issue's `### Log` section in priority file

Append to

.agent/issues/medium.md

with type

TEST

Create bug from failure

Append to

.agent/issues/high.md

with type

BUG

Log test results

Edit issue's

### Log

section in priority file

Example: Post-Test Issue Creation (File-Based)

Increment
```
.agent/issues/.counter
```
Append issue to appropriate priority file
Add log entry with test run results

CLI Integration (when aoc is available)

When

aoc

CLI is detected in

.agent/tools.json

, these commands provide convenience shortcuts:

Operation	Command
Create test issue	`aoc issues create --type TEST --title "Add tests for..."`
Create bug from failure	`aoc issues create --type BUG --priority high --title "Test failure: ..."`
Log test results	`aoc issues update <ID> --log "Tests: 45 pass, 2 fail"`

Test Isolation (MANDATORY)

Tests must NEVER create, modify, or delete files in the project folder.

Unit Tests

Use mocks/patches for ALL file system operations
Use in-memory data structures where possible
NEVER call real file I/O against project paths
Use
```
unittest.mock.patch
```
for
```
Path
```
,
```
open()
```
, file operations

Integration Tests

ALWAYS use
```
pytest
```
```
tmp_path
```
fixture (auto-cleaned)
Use Docker containers for service dependencies (API, DB, etc.)
Fixtures MUST handle cleanup on both success AND failure
Test data lives ONLY in temp directories

Forbidden Patterns

# ❌ NEVER do this - pollutes project
Path(".agent/test.md").write_text("test")
Path("src/data/fixture.json").write_text("{}")
open("tests/output.log", "w").write("log")

# ✅ Always use tmp_path
def test_example(tmp_path):
    test_file = tmp_path / "test.md"
    test_file.write_text("test")  # Auto-cleaned

Review Checklist (before approving tests)

No hardcoded paths to project directories
All file operations use
```
tmp_path
```
or mocks
Integration tests use fixtures with cleanup
Docker fixtures auto-remove containers

When to Use

During planning: designing test strategy for new features
During implementation: running incremental tests
During review: analyzing coverage and gaps
On demand: investigating test failures, improving test suite

Preconditions

```
.agent/constitution.md
```
exists with confirmed test command
```
.agent/baseline.md
```
exists (for comparison)

Test Strategy Design

For New Features

Identify test levels needed:
- Unit tests: isolated function/method behavior
- Integration tests: component interaction
- E2E tests: user-facing workflows (if applicable)
Define test cases from requirements:
- Happy path: expected inputs → expected outputs
- Edge cases: boundary values, empty inputs, max values
- Error cases: invalid inputs, failure scenarios
- Regression cases: ensure existing behavior unchanged

Document in task/plan:

## Test Strategy
- Unit: [list of unit test cases]
- Integration: [list of integration scenarios]
- Edge cases: [specific edge cases to cover]
- Not testing: [explicitly excluded with rationale]

For Bug Fixes

Write failing test FIRST (reproduces the bug)
Fix the bug
Verify test passes
Check for related regression tests needed

Test Execution

Incremental Testing (during implementation)

After each implementation step:

Run the smallest reliable test subset covering changed code
If tests fail: stop, diagnose, fix before proceeding
Log test results in focus.md

Full Test Suite (end of implementation)

Run complete test command from constitution
Compare results to baseline
Investigate ANY new failures (even in unrelated areas)

Test Command Patterns

# Run specific test file
<test-runner> path/to/test_file.py

# Run tests matching pattern
<test-runner> -k "test_feature_name"

# Run with coverage
<test-runner> --coverage

# Run failed tests only (re-run)
<test-runner> --failed

Actual commands must come from constitution.

Coverage Analysis

Confidence-Based Coverage Thresholds (MANDATORY)

Coverage requirements scale with confidence level:

Confidence	Line Coverage	Branch Coverage	Enforcement
LOW	≥90% on changed code	≥85% on changed code	HARD — blocks completion
NORMAL	≥80% on changed code	≥70% on changed code	SOFT — warning if missed
HIGH	Tests pass	N/A	MINIMAL — existing tests only

Rationale:

LOW confidence = more unknowns = more code paths to verify
HIGH confidence = well-understood = existing tests sufficient

Enforcement:

🎯 COVERAGE CHECK — {CONFIDENCE} Confidence

Required: ≥{line_threshold}% line, ≥{branch_threshold}% branch
Actual:   {actual_line}% line, {actual_branch}% branch

[PASS] Coverage meets threshold
— OR —
[FAIL] Coverage below threshold — must add tests before completion

For LOW confidence failures:

Coverage failure is a HARD BLOCK
Cannot proceed until threshold is met
Document why if threshold is truly unachievable (rare)

When to Analyze Coverage

After completing a feature (before critical review)
When investigating untested code paths
During improvement discovery

Coverage Metrics to Track

Metric	Target	Notes
Line coverage	≥80% for new code	Not a hard rule; quality over quantity
Branch coverage	Critical paths covered	Focus on decision points
Uncovered lines	Document rationale	Some code legitimately untestable

Coverage Gaps to Flag

New code with 0% coverage → must address
Error handling paths untested → should address
Complex logic untested → investigate
Generated/boilerplate untested → acceptable

Test Quality Checklist

Good Tests

Test behavior, not implementation
Independent (no test order dependencies)
Deterministic (same result every run)
Fast (< 1 second per unit test)
Readable (test name describes scenario)
Minimal mocking (only external dependencies)

Anti-Patterns to Avoid

❌ Testing implementation details (breaks on refactor)
❌ Excessive mocking (tests mock, not real code)
❌ Flaky tests (intermittent failures)
❌ Slow tests without justification
❌ Tests that require manual setup
❌ Commented-out tests

Failure Investigation

When tests fail unexpectedly, invoke

agent-ops-debugging

Apply systematic debugging process:
- Isolate: Run failing test alone
- Reproduce: Confirm failure is consistent
- Form hypothesis: What might cause this?
- Test hypothesis: Add logging, inspect state

Categorize the failure:

Category	Evidence	Action
Agent's change	Test passed in baseline	Fix the change
Pre-existing	Test failed in baseline	Document, create issue
Flaky	Intermittent, no code change	Fix test or document
Environment	Works elsewhere	Check constitution assumptions

Handoff decision:

🔍 Test failure analysis:

- Test: {test_name}
- Category: {agent_change | pre_existing | flaky | environment}
- Root cause: {diagnosis}

Next steps:
1. Fix and re-run (if agent's change)
2. Create issue and continue (if pre-existing)
3. Deep dive with /agent-debug (if unclear)

Output

After test activities, update:

```
.agent/focus.md
```
: test results summary
```
.agent/baseline.md
```
: if establishing new baseline

Issue Discovery After Testing

After test analysis, invoke

agent-ops-tasks

discovery procedure:

Collect test-related findings:
- Failing tests →
```
BUG
```
  (high)
- Missing test coverage →
```
TEST
```
  (medium)
- Flaky tests identified →
```
CHORE
```
  (medium)
- Test anti-patterns found →
```
REFAC
```
  (low)
- Missing edge case tests →
```
TEST
```
  (medium)

Present to user:

📋 Test analysis found {N} items:

High:
- [BUG] Flaky test: PaymentService.processAsync (failed 2/10 runs)

Medium:
- [TEST] Missing coverage for error handling in UserController
- [TEST] No edge case tests for empty input scenarios

Low:
- [REFAC] Tests have excessive mocking in OrderService.test.ts

Create issues for these? [A]ll / [S]elect / [N]one

After creating issues:

Created {N} test-related issues. What's next?

1. Start fixing highest priority (BUG-0024@abc123 - flaky test)
2. Continue with current work
3. Review test coverage report

undefined