Test Matrix for Capture Reliability

This matrix is the companion to architecture.md (where the I-1..I-15 invariants live) and anti-patterns.md. It exists to make one question answerable at a glance: for every failure avenue we know about, is there a test that would have caught it?

Every failure mode in the 2026-04-22 sev1 investigation (issues #49–#53, #58, plus the two hotfixes #54/#55) and every invariant from architecture.md must appear as a row. When a row is marked blocked-on-*, the test file and a #[ignore] skeleton must still exist, so that enabling the infrastructure is a one-line change rather than “remember to write the test later”.

Conventions

“Invariant” refers to I-1..I-15 defined in architecture.md.

Failure modes

#Failure modeTrigger / evidenceTest typeLocationStatusInvariant
F-1tmux hook new-window without -t lands in wrong session#48 (1330113)shell integration(was tests/shell/test-claude-session-hook.sh)retired in T-8 (tmux path deleted; failure mode no longer reachable)I-2
F-2Firefox extension dist/ absent at runtime#54: build pipeline produced no dist/*.js; hippo doctor didn’t flag itrust integration (doctor check)crates/hippo-daemon/src/commands.rs #[cfg(test)] mod testsexisting (#54)I-4
F-3hippo ingest claude-session fires but claude_sessions rows are 0#58rust integrationcrates/hippo-daemon/tests/claude_session.rsadded-by-#58-fix (in-flight, worktree agent-ac306c6f)I-2
F-4Redaction regex false-positives drop or corrupt legitimate events#52rust unit (negative cases)crates/hippo-core/src/redaction.rs (mod tests)new (this PR)I-5
F-5claade / other wrappers break PID-chain assumption#50shell integration(was tests/shell/test-hook-pid-ppid.sh)retired in T-8 (the slim hook no longer walks PIDs)I-2
F-6Native Messaging manifest path drifts after binary moveuser moves binary; doctor never cross-checks manifest path fieldrust integration (doctor check)crates/hippo-daemon/tests/nm_manifest_doctor.rsnew (this PR) — skeleton #[ignore] until doctor grows the checksource-change-required
F-7Daemon restart during NM send silently drops browser visits#51rust integrationcrates/hippo-daemon/tests/nm_restart_integration.rsnew (this PR) — fallback-file-survives-restart exercised; end-to-end NM send across restart is #[ignore] pending a test harness for the NM stdio streamI-4
F-8Fallback JSONL accumulates > 24 h while daemon is up (drain broken)design invariant I-9rust integration (doctor check)crates/hippo-daemon/tests/fallback_age_doctor.rsnew (this PR) — skeleton #[ignore] with note that doctor currently only counts fallback files, does not inspect mtimeI-9 / source-change-required
F-9Apr 10–17 capture blackout (root cause unknown)#49investigation pendingblocked-on-#49
F-10Claude JSONL grows but no claude_sessions row within 5 mininvariant I-2rust integration (probe)crates/hippo-daemon/tests/capture_invariants.rs::i2_claude_session_end_to_endblocked-on-P2.1 (FS-watcher) + P0.1 (source_health)I-2
F-11Shell hook fires in 2 min but no events row appearsinvariant I-1rust integration (probe)crates/hippo-daemon/tests/capture_invariants.rs::i1_shell_livenessblocked-on-P0.1 (source_health)I-1
F-12Synthetic probe round-trip > 15 mininvariant I-8rust integrationcrates/hippo-daemon/tests/capture_invariants.rs::i8_probe_round_tripblocked-on-P2.2 (synthetic probes)I-8
F-13Watchdog heartbeat stale > 180 sinvariant I-7rust integrationcrates/hippo-daemon/tests/capture_invariants.rs::i7_watchdog_heartbeatblocked-on-P1.1 (watchdog process)I-7
F-14source_health stops updating when brain is downinvariant I-10 (decoupling)rust integration (kill-brain canary)crates/hippo-daemon/tests/capture_invariants.rs::i10_decoupled_from_brainblocked-on-P0.2 (source_health writes on every capture path)I-10
F-15Hippo’s own CI / sev1 failures never graduate into lessons#53brain unit (xfail)brain/tests/test_lessons_graduation_hippo.pynew (this PR) — @pytest.mark.xfail(reason="tracked in #53"); fails-closed on fix
F-16Schema version drift between daemon and brainv0.13.0 handshake incidentrust integrationcrates/hippo-daemon/tests/schema_handshake.rs (existing) + negative case addedexisting + new (this PR)
F-17Silent error swallowing via .filter_map(Result::ok) in capture pathsAP-11 in anti-patterns.md; observed at crates/hippo-core/src/storage.rs:805static analysis (semgrep) + regression test for the rule itself.semgrep.yml + tests/semgrep/silent_swallow_fixture.rsnew (this PR) — rule file + fixture; wiring into CI (adding .semgrep.yml to the security workflow) is follow-up because security.yml is currently path-scoped to shell/ onlyAP-11
F-18tmux base-index != 0 causes “index N in use”#48 (1330113, pre-fix path)shell integration(was tests/shell/test-claude-session-hook.sh)retired in T-8I-2
F-19Session name with shell metacharacters (spaces, colons) breaks hookdefensiveshell integration(was tests/shell/test-claude-session-hook-extended.sh)retired in T-8 (slim hook no longer interpolates session names)I-2
F-20No tmux server running at hook time — batch fallback pathhook line 106-110shell integration(was tests/shell/test-claude-session-hook-extended.sh)retired in T-8 (no tmux fallback path exists; manual hippo ingest claude-session <path> is the recovery)I-2
F-21$TMUX_PANE unset but tmux server is up — fallback hippo-session reusehook line 96-105shell integration(was tests/shell/test-claude-session-hook-extended.sh)retired in T-8I-2
F-22check_claude_session_hook_at false-OK when settings.json is malformed / not-objectregression for #45, #46, #48rust unitcrates/hippo-daemon/src/commands.rs mod tests (test_hook_check_structural_type_mismatch, test_hook_check_not_configured, test_hook_check_match_missing_script)existing
F-23Claude settings.json hooks.SessionStart array has multiple hippo entries, one stale one currentobserved during #48 rolloutrust unitsame as F-22 (test_hook_check_multiple_entries_one_exact_match)existing
F-24hippo doctor output for hook check is not behaviourally asserted — only smoke-tested (“does not panic”)code review of commands.rs mod testsrust unit — assert on captured stdoutsame as F-22source-change-required (would need println! → returning String, or a writeln!(w, …) injection)
F-25INSERT OR IGNORE on (session_id, segment_index) silently freezes segment content at first partial extraction (Bug A)2026-04-26 investigation; AP-12; ../archive/capture-reliability-overhaul/11-watcher-data-loss-fix.mdrust unit (hash, upsert, enqueue gate, sweep, backfill) + migration + Pythoncrates/hippo-daemon/src/claude_session.rs, crates/hippo-daemon/src/watch_claude_sessions.rs, crates/hippo-daemon/src/backfill.rs, crates/hippo-core/src/storage.rs, brain/tests/test_claude_sessions.pynew (T-A.1–T-A.7)I-2
F-26Opencode poller mixes time_created / time_updated cursor semantics — silently skips updated sessionsreview of PR #149: cursor field name and read-query filter disagreerust integration + unitcrates/hippo-daemon/tests/opencode_session.rs (poll-tick end-to-end) + crates/hippo-daemon/src/opencode_session.rs::tests::* (cursor + summary_text + read-only DB)new (this PR)I-11
F-27Opencode brain claim path bypasses agentic_enrichment_queue — enrichment failures lose segmentsreview of PR #149: direct enriched=1 flip with no retrybrain unitbrain/tests/test_opencode_sessions.py (TestClaimPath, TestMarkQueueFailed)new (this PR)I-11
F-28Opencode junction-table insert uses malformed VALUES (?, ?, …) clausereview of PR #149: len(segment_ids) placeholders for a 2-col tablebrain unitbrain/tests/test_opencode_sessions.py::TestWriteKnowledgeNodenew (this PR)
F-29Brain preflight fails silently against a misconfigured inference backend (e.g. [lmstudio][inference] rename drift; oMLX port mismatch) and the enrichment queue stalls indefinitely with no alarm2026-05-10 incident: post-omlx config rename pointed brain at port 1234 forever; lmstudio_reachable=false was visible on /health but no watchdog invariant gated on itrust unit (invariant) + rust integration (config-section guard) + python unit (config loader)crates/hippo-daemon/src/watchdog.rs::tests::watchdog_i12_*, crates/hippo-core/src/config.rs::tests::test_legacy_lmstudio_section_is_rejected, brain/tests/test_init.py::test_load_runtime_settings_rejects_legacy_lmstudio_section, brain/tests/test_mcp_server.py::test_legacy_lmstudio_section_is_rejectednew (this PR)I-12
F-30Cursor transcript lands zero rows in claude_sessions (poll_tick silent skip)cursor_session poller ships; I-15 guards proxy predicaterust integrationcrates/hippo-daemon/tests/source_audit/cursor_agent.rs::cursor_agent_transcript_lands_row_and_bumps_healthnew (cursor-ingestion PR)I-15
F-31Cursor source_health row not bumped on successful ingestsame as F-30rust integrationcrates/hippo-daemon/tests/cursor_session.rs::poll_tick_ingests_idle_files_and_advances_cursornew (cursor-ingestion PR)I-15
F-32Zero-segment Cursor transcript bumps source_health erroneouslyboundary: file with no parseable segmentsrust unitcrates/hippo-daemon/tests/cursor_session.rs::poll_tick_zero_segment_file_advances_cursor_without_health_bumpnew (cursor-ingestion PR)I-15
F-33Cursor probe fails when ~/.cursor/projects is absent or contains no settled transcriptsprobe trivial-pass pathrust unitcrates/hippo-daemon/src/probe.rs::tests::cursor_probe_trivial_pass_when_no_transcriptsnew (cursor-ingestion PR)I-15
F-34Cursor probe rows appear in user-facing queries (AP-6 regression)shared probe_tag IS NULL filterstatic analysis (semgrep)shared .semgrep.yml rule — same probe_tag IS NULL Semgrep rule covers all claude_sessions queriesexisting (shared filter)AP-6

Phase 1 (Bug A) test coverage — F-25

The tests below cover the watcher data-loss fix shipped in T-A.1–T-A.7 (2026-04-27). Row F-25 in the table above represents the failure mode; the entries here give individual test names and file paths for traceability.

GroupTestsFile
Schema migrationtest_migrate_v11_to_v12_adds_content_hash_columns, test_migrate_v11_to_v12_recovers_from_partial_successcrates/hippo-core/src/storage.rs
Content hashtest_hash_empty_segment_is_stable, test_hash_is_deterministic, test_hash_changes_when_tools_change, test_hash_changes_when_prompts_change, test_hash_changes_when_assistant_text_changescrates/hippo-daemon/src/claude_session.rs
Upsert (replaces INSERT OR IGNORE)test_upsert_inserts_new_segment, test_upsert_updates_existing_segment_on_growth, test_upsert_idempotent_on_same_contentcrates/hippo-daemon/src/claude_session.rs
Enqueue gatetest_decide_enqueue_inserts_always, test_decide_enqueue_skip_when_hash_unchanged, test_decide_enqueue_skip_when_within_debounce, test_decide_enqueue_skip_when_processing, test_decide_enqueue_enqueue_when_hash_changed_and_debounced, test_decide_enqueue_enqueue_when_no_prior_queue_rowcrates/hippo-daemon/src/claude_session.rs
Empty-segment short-circuittest_insert_segments_skips_enqueue_for_empty_segmentcrates/hippo-daemon/src/claude_session.rs
Settling sweeptest_sweep_enqueues_segment_with_old_mtime_and_hash_mismatch, test_sweep_skips_recent_mtime, test_sweep_skips_when_hash_matches, test_sweep_replaces_done_queue_row, test_sweep_skips_when_processing, test_sweep_skips_empty_segment, test_sweep_skips_missing_file, test_sweep_caps_at_max_per_tick, test_sweep_returns_zero_on_pre_migration_dbcrates/hippo-daemon/src/watch_claude_sessions.rs
Backfill CLItest_backfill_dry_run_writes_nothing, test_backfill_resets_offset_for_matched_files, test_backfill_reparses_and_updates_segment, test_backfill_idempotent_on_second_run, test_backfill_skips_files_older_than_since, test_backfill_glob_matches_multiple_filescrates/hippo-daemon/src/backfill.rs
Backfill CLI helperstest_parse_since_date_valid, test_parse_since_date_invalid, test_reset_offset_no_row_is_okcrates/hippo-daemon/src/backfill.rs
Brain hash propagationTestContentHashPropagation::test_claim_pending_segments_returns_content_hash, TestContentHashPropagation::test_enrichment_writes_last_enriched_content_hash, TestContentHashPropagation::test_enrichment_failure_does_not_write_hash, TestContentHashPropagation::test_null_content_hash_skips_writebrain/tests/test_claude_sessions.py
Review fix-ups (T-A.10)test_enqueue_does_not_clobber_processing_lock, test_process_file_short_circuit_preserves_queue_state, test_check_backfill_needed_warns_when_null_hash_post_cutoff, test_check_backfill_needed_silent_when_hash_setcrates/hippo-daemon/src/watch_claude_sessions.rs
Review fix-ups (round 3)test_upsert_counts_legacy_null_hash_as_inserted (CP-1/R2-6), test_source_health_not_bumped_on_idempotent_reparse (C-3)crates/hippo-daemon/src/claude_session.rs
Review fix-ups (round 3)test_check_backfill_needed_silent_on_pre_migration_db (R2-7), test_sweep_returns_zero_on_pre_migration_db (CP-4/R2-3, now uses true v11 fixture)crates/hippo-daemon/src/watch_claude_sessions.rs
Review fix-ups (round 3)test_expand_tilde_passthrough, test_expand_tilde_prefix, test_collect_paths_expands_tilde (C-2 — ~ expansion in backfill glob)crates/hippo-daemon/src/backfill.rs
Review fix-ups (round 3)test_brain_server_rejects_v11_db (C-4/R2-5 — v11 dropped from ACCEPTED_READ_VERSIONS)brain/tests/test_server.py

Running Phase 1 tests:

# Rust — migration, hash, upsert, enqueue gate, sweep, backfill
cargo test -p hippo-core storage::tests::test_migrate_v11
cargo test -p hippo-daemon claude_session::tests::test_hash_
cargo test -p hippo-daemon claude_session::tests::test_upsert_
cargo test -p hippo-daemon claude_session::tests::test_decide_enqueue_
cargo test -p hippo-daemon claude_session::tests::test_insert_segments_skips_enqueue_for_empty_segment
cargo test -p hippo-daemon watch_claude_sessions::tests::test_sweep_
cargo test -p hippo-daemon backfill::tests::test_backfill_

# Python — brain hash propagation
uv run --project brain pytest brain/tests/test_claude_sessions.py::TestContentHashPropagation -v

Invariant coverage cross-check

InvariantRow(s)Status
I-1 Shell livenessF-11blocked-on-P0.1
I-2 Claude-session end-to-endF-3, F-10, F-25 (active); F-1, F-5, F-18..F-21 (retired in T-8 — failure modes structurally eliminated by removing the tmux path)F-3 covers batch-import; F-10 covers the FSEvents watcher end-to-end; F-25 covers segment-content truncation (Bug A upsert fix)
I-3 Claude-tool livenessnot yet implemented; skeleton row TBD when invariant test design lands
I-4 Browser livenessF-2, F-7fix PRs + new (this PR)
I-5 Redaction correctness (no over-redaction)F-4new (this PR)
I-6 Daemon livenessimplicit in existing daemon start-up testsexisting
I-7 Watchdog heartbeatF-13blocked-on-P1.1
I-8 Probe round-tripF-12blocked-on-P2.2
I-9 Fallback recovery freshnessF-8skeleton; blocked on doctor growing an age check
I-10 Capture decoupled from enrichmentF-14blocked-on-P0.2
I-11 Opencode-session coverage proxyF-26, F-27new (this PR) — production probe still deferred
I-12 Brain preflight stuckF-29new (this PR) — gates on brain-preflight.consecutive_failures > 12
I-13 Codex-session coverage proxyimplemented in watchdog; test coverage via watchdog unit tests
I-14 Embedding orphan backlogimplemented in watchdog; test coverage via watchdog unit tests
I-15 Cursor-session coverage proxyF-30, F-31, F-32, F-33, F-34new (cursor-ingestion PR)

Any invariant without at least one new (this PR) or existing row is by definition gated on a P0/P1/P2 task. If you see an invariant listed in architecture.md that is not in the table above, that is a gap — open an issue and add a row.

Test coverage gaps

These are the failure modes that cannot be tested against main today:

Running the tests

# Rust — the redaction negative cases, doctor unit tests, NM/fallback skeletons
cargo test -p hippo-core redaction::
cargo test -p hippo-daemon commands::tests::test_check_claude_session_hook
cargo test -p hippo-daemon --test nm_manifest_doctor
cargo test -p hippo-daemon --test nm_restart_integration
cargo test -p hippo-daemon --test fallback_age_doctor
cargo test -p hippo-daemon --test capture_invariants
cargo test -p hippo-daemon --test schema_handshake_negative

# Every `#[ignore]` skeleton — re-enable when its P0/P1/P2 dependency lands
cargo test -p hippo-daemon -- --ignored --nocapture

# Shell
bash tests/shell/test-claude-session-hook-extended.sh
bash tests/shell/test-hook-pid-ppid.sh

# Brain (xfail stays green; becomes pass on #53 fix)
uv run --project brain pytest brain/tests/test_lessons_graduation_hippo.py -v

# Static analysis (when wired into CI)
semgrep --config .semgrep.yml crates/ brain/

How to extend

When you add a new capture path (say, iMessage ingestion) or discover a new failure mode, you MUST:

  1. Add a row to the Failure modes table above. Include:
    • A one-sentence description of the failure.
    • The trigger (issue number, commit, design-doc reference).
    • The chosen test type and file path.
    • An invariant reference, if any.
  2. Write the test. If the test depends on infrastructure that does not exist yet, commit the test file with a #[ignore = "blocked on <task-id>"] attribute (Rust) or @pytest.mark.skip(reason=...) / xfail (Python) so the file compiles and the skeleton is visible.
  3. If the test cannot be written at all without changing source, add a row to Test coverage gaps explaining why, and open a follow-up issue for the source change. Do not silently drop the failure mode.
  4. Update the Invariant coverage cross-check if the new row fills a gap.

The matrix is the source of truth for “what do we test?” If a failure recurs and its row’s test did not fire, that is a bug in the test, not additional justification to skip writing a test next time.