Lifecycle of an Event
End-to-end traces of how events become knowledge in hippo, citing the real symbols that handle each step. Companion to capture/architecture.md (which describes the system in cross-section) and capture/operator-runbook.md (first-aid recipes).
For a power user diagnosing a missing event, this doc + the SQL recipes at the bottom should be enough. Three lifecycles are covered: shell command, Claude session segment, and browser visit. Workflow runs and Codex/Xcode-side ingest follow analogous patterns and are noted where they diverge.
Shell command
zsh preexec
|
v
shell/hippo.zsh::_hippo_preexec captures cmd_start_ms, cwd, git_*
| -- registered via add-zsh-hook preexec _hippo_preexec
|
v (foreground, microseconds)
zsh runs the command
|
v
shell/hippo.zsh::_hippo_precmd captures exit_code, duration_ms, captured output
| -- registered via add-zsh-hook precmd _hippo_precmd
| -- single combined stream sent via --output (head + tail truncated)
|
v (DISOWNED, fire-and-forget)
hippo send-event shell -- a child process
|
v (length-prefixed JSON over Unix socket)
crates/hippo-daemon/src/commands.rs::send_event_fire_and_forget
| -- writes into in-memory buffer, returns immediately
| -- contract: success = "the frame hit the socket", NOT "SQLite was touched"
v
crates/hippo-daemon/src/daemon.rs::flush_events
| -- background tokio timer, every flush_interval_ms (default 100 ms)
| -- batches buffered frames, opens single SQLite transaction
| -- calls redact_shell_event before insert (hippo-core/src/redaction.rs)
| -- on insert failure: write_fallback_jsonl + bump drop counter
| -- updates source_health after the batch (last_event_ts, counters)
v
crates/hippo-core/src/storage.rs::insert_event_at
| -- inserts the events row inside flush_events' transaction
| -- does not redact, does not touch source_health, does not write fallback
v
events table (source_kind='shell'); source_health updated by flush_events
|
v
brain/src/hippo_brain/enrichment.py::claim_pending_events_by_session
| -- session-grouped, 60s gap-split, max_claim_batch cap
| -- calls _skip_ineligible_shell_events() to stamp
| enrichment_queue.status='skipped' for ineligible shell events
v
brain/src/hippo_brain/enrichment.py::is_enrichment_eligible
| -- pure predicate; returns (ok, reason)
| -- filters trivial commands (clear, exec zsh, true, :) under 100 ms
| with no stdout/stderr; reason is stored as queue.error_message
v
brain/src/hippo_brain/server.py::_enrich_shell_batches
| -- builds prompt via build_enrichment_prompt
| -- 3 retries via _call_llm_with_retries (inference backend /v1/chat/completions)
v
brain/src/hippo_brain/enrichment.py::write_knowledge_node
| -- single transaction: knowledge_nodes + tags + entities + link tables
| -- bumps queue.status='done' atomically
v
knowledge_nodes row + knowledge_node_entities + knowledge_node_events
|
v (background asyncio.create_task)
brain/src/hippo_brain/embeddings.py::embed_knowledge_node
v
knowledge_vectors row (sqlite-vec INSERT OR REPLACE)
|
v
MCP-visible: search_events / search_knowledge / ask
The key invariant for shell capture: the hook never touches SQLite. Latency in the user’s interactive prompt is bounded by the socket write — typically 20–50 ms. SQLite writes happen in flush_events on the daemon’s tokio runtime. (See capture/anti-patterns.md AP-1.)
Truncation. The shell hook captures a single combined stream (stdout-only via the script’s --output flag; stderr is not separately captured today). It is truncated to [daemon] output_head_lines lines from the head and [daemon] output_tail_lines from the tail (defaults: 50 head / 100 tail; see config/config.default.toml). Long output in between is replaced with an ellipsis marker. Configure in ~/.config/hippo/config.toml.
Redaction. crates/hippo-core/src/redaction.rs runs on the event’s command, stdout, and stderr before storage. Patterns come from ~/.config/hippo/redact.toml. (Limits are documented in config/README.md ↗; a deeper redaction reference is tracked in #114 ↗.)
Claude session segment
Claude Code writes to ~/.claude/projects/<project>/<session>.jsonl
|
v (file growth)
macOS FSEvents notifies com.hippo.claude-session-watcher (LaunchAgent)
|
v
crates/hippo-daemon/src/watch_claude_sessions.rs::process_file
| -- reads from claude_session_offsets per file (resume state)
| -- re-runs extract_segments on every growth event (idempotent)
v
crates/hippo-daemon/src/claude_session.rs::extract_segments
| -- splits the JSONL into time-bounded SessionSegments
| -- segment_index is monotonic, derived from message ranges
| -- Python port for batch / non-watcher ingest paths:
| brain/src/hippo_brain/claude_sessions.py::extract_segments
v
crates/hippo-daemon/src/claude_session.rs::insert_segments
| -- INSERT ... ON CONFLICT(session_id, segment_index) DO UPDATE SET
| (mutable cols) -- AP-12: NOT "OR IGNORE"; the segment grows
| -- content_hash compared with last_enriched_content_hash
| to gate re-enrichment of unchanged segments
v
claude_sessions table -- one row per (session_id, segment_index)
+
events table (source_kind='claude-tool') -- per tool_use line
+
claude_enrichment_queue -- for segments where content_hash changed
|
v
brain/src/hippo_brain/claude_sessions.py::claim_pending_claude_segments
| -- one segment at a time (no session grouping like shell)
v
brain/src/hippo_brain/server.py::_enrich_claude_batches
| -- prompt = "\n---\n".join(segment.summary_text for segment in batch)
| -- the live brain joins pre-summarized segment text rather than
| calling build_claude_enrichment_prompt -- which is reserved for
| contexts (re-enrichment, eval) that need the full segment shape
v
brain/src/hippo_brain/claude_sessions.py::write_claude_knowledge_node
| -- writes knowledge_nodes + knowledge_node_claude_sessions +
| entities + last_enriched_content_hash on the segment
v
knowledge_nodes (one node per claim batch) + entities + embedding
Key idempotency contract: the watcher re-runs extract_segments on every FSEvents notification. The same (session_id, segment_index) will appear with growing message_count over time. The historical bug class — INSERT OR IGNORE on a bucket key whose content mutates — is documented in capture/anti-patterns.md AP-12. The current code uses ON CONFLICT DO UPDATE plus a content hash to detect “did anything actually change?” before re-enqueueing.
Manual recovery. If the watcher is wedged, hippo ingest claude-session <path> does a one-shot batch import via claude_session.rs::ingest_session_file.
Subagent sessions (<project>/<parent>/subagents/<id>.jsonl) follow the same path; is_subagent=1 and parent_session_id are set during segment extraction.
Codex sessions are ingested by a Rust poller — the com.hippo.codex-session launchd job runs hippo codex-poll, whose codex_session::poll_tick (in crates/hippo-daemon/src/codex_session.rs) walks ~/.codex/sessions (plus ~/.codex/archived_sessions and Xcode’s CodingAssistant/codex/sessions as secondary locations), parses the distinct envelope shape, and writes segmented rows through the same claude_sessions table, sharing claude_enrichment_queue for enrichment. Capture-health is keyed agentic-session-codex.
Cursor Agent sessions are ingested by a parallel Rust poller — the com.hippo.cursor-session launchd job runs hippo cursor-poll, whose cursor_session::poll_tick (in crates/hippo-daemon/src/cursor_session.rs) walks ~/.cursor/projects/**/agent-transcripts/**/*.jsonl (main sessions and subagents), parses the Anthropic-style transcript into char-bounded segments stamped from file mtime, and writes segmented rows into claude_sessions, sharing claude_enrichment_queue with Claude Code and Codex for enrichment. Subagents land with is_subagent=1 and parent_session_id set. Capture-health is keyed agentic-session-cursor; watchdog invariant I-15 gates on proxy consecutive-failures.
Browser visit
Firefox content script in extension/firefox/src/content.ts
| -- captures URL/title/dwell on page departure (visibilitychange)
| -- runs Mozilla Readability to extract main article text
| -- only fires on allowlisted domains
v
extension/firefox/src/background.ts
| -- batches recent visits, applies engagement filter
| (scroll >= 15% OR has search query OR dwell > long_dwell_bypass_ms)
v (Native Messaging stdio)
crates/hippo-daemon/src/native_messaging.rs::run
| -- length-prefixed JSON frames over stdin/stdout
| -- strip_sensitive_params runs against the URL using
| [browser.url_redaction] strip_params
| -- make_envelope_id deduplicates same-URL repeats within
| [browser] dedup_window_minutes (default 10)
v
crates/hippo-daemon/src/commands.rs::send_event_fire_and_forget
v
crates/hippo-daemon/src/daemon.rs::flush_events
v
crates/hippo-core/src/storage.rs::insert_browser_event
|
v
browser_events table + browser_enrichment_queue
|
v
brain/src/hippo_brain/browser_enrichment.py::claim_pending_browser_events
| -- 5-minute gap chunking; engagement filter applied at claim time
v
brain/src/hippo_brain/server.py::_enrich_browser_batches
| -- build_browser_enrichment_prompt(events)
v
write_knowledge_node + entities + embedding (same write path as shell)
Allowlist. Configured in [browser.allowlist] in config.toml. Visits to non-allowlisted domains are dropped in the content script — they never reach the daemon.
URL redaction. [browser.url_redaction] strip_params lists query-parameter names to strip (default includes session_id, auth_token, access_token, etc.). Path components are preserved; only matching query params are removed.
Dedup. Same URL within dedup_window_minutes collapses to a single envelope via make_envelope_id (UUID derived from URL + window-bucket).
Probe events
Synthetic round-trips that bypass none of the above. com.hippo.probe LaunchAgent invokes hippo probe --source <name>, which emits an event tagged with probe_tag (a per-run UUID) through the same capture path the source uses. The probe code waits for the event to land and records source_health.probe_lag_ms.
Probe events are filtered out of every user-facing query at the daemon-side query path (and the brain side enforces the same filter as belt-and-braces). A Semgrep rule blocks new query call-sites that omit AND probe_tag IS NULL. See capture/anti-patterns.md AP-6.
Where capture can fail silently
The historical reasons hippo built capture/architecture.md’s I-1..I-15 invariants:
- Hook not sourced. The user’s
~/.zshrcwas edited but never re-loaded. No errors anywhere — events just never appear. - NM manifest missing. The Firefox extension was reloaded but
hippo daemon install --forcewasn’t re-run. The extension can’t reach the daemon. Captured by I-4. INSERT OR IGNOREon growing JSONL (AP-12). Segments captured at first FSEvents notification — usually 2–4 messages. Subsequent reparses with full content silently rejected. Symptom:pct_with_toolsdrops from ~50% to ~6%. Captured by I-2 once the migration toON CONFLICT DO UPDATEshipped.- Daemon crash mid-flush. Buffer empties to fallback JSONL via
write_fallback_jsonl. Drained on next start. Captured by I-9 (fallback file age) if the daemon comes back but the drain is broken. - Brain unreachable but daemon up. Capture continues to land events; only enrichment is delayed. The watchdog must NOT couple capture health to enrichment health. Captured by I-10 (architectural invariant).
Diagnosing a missing event
If a shell command ran at 14:30 and isn’t in hippo events, walk the lifecycle backward:
Recipe 1 — Did the event reach SQLite at all?
-- shell (replace timestamp window as appropriate)
SELECT id, command, timestamp, exit_code, source_kind
FROM events
WHERE source_kind = 'shell'
AND timestamp BETWEEN strftime('%s','now') * 1000 - 1800000 -- 30 min ago
AND strftime('%s','now') * 1000
ORDER BY id DESC LIMIT 20;
If the row is there but you don’t see it via hippo events, check whether your filters exclude it (e.g., session, branch, source) and whether probe_tag is non-null (probes are filtered out of user-facing queries — that’s correct behavior).
Recipe 2 — Did the source health update recently?
SELECT source, last_event_ts,
(strftime('%s','now') * 1000 - last_event_ts) / 1000 AS seconds_ago,
consecutive_failures, probe_ok, probe_lag_ms
FROM source_health
ORDER BY source;
If seconds_ago is climbing for the source you expected to capture, the capture path stopped writing (not just enrichment). Fall through to the next recipe.
Recipe 3 — Are enrichment claims piling up?
SELECT status, COUNT(*) FROM enrichment_queue GROUP BY status;
SELECT status, COUNT(*) FROM claude_enrichment_queue GROUP BY status;
SELECT status, COUNT(*) FROM browser_enrichment_queue GROUP BY status;
pending climbing means the brain isn’t claiming fast enough (inference backend slow / unloaded / wrong model name in [models].enrichment).
processing rows older than lock_timeout_secs are reaped by docs/brain-watchdog.md. A persistent failed count means rows hit max_retries — inspect with:
SELECT id, retry_count, error_message
FROM enrichment_queue
WHERE status = 'failed'
ORDER BY id DESC LIMIT 10;
Recipe 4 — Are events landing but stuck in the fallback path?
ls -la ~/.local/share/hippo/fallback/*.jsonl 2>/dev/null
A fallback file present means the event could not be durably stored to SQLite at that moment — either the CLI couldn’t reach the daemon socket (commands.rs::send_event_fire_and_forget falls back to disk), or the daemon hit a SQLite/transaction failure inside flush_events. The next daemon start replays them via recover_fallback_files. Run hippo doctor --explain and skim ~/.local/share/hippo/daemon.stderr.log to determine which failure mode caused the fallback. If the file persists for > 24 h, I-9 fires.
Recipe 5 — Has the watchdog noticed anything?
hippo alarms list # exits 1 if any unacknowledged
hippo doctor --explain # CAUSE / FIX / DOC per failure
Doctor is the highest-leverage check; it summarizes everything above in 2 seconds.
See also
capture/architecture.md— the system in cross-section: source_health, invariants, watchdog, probes, alarms.capture/sources.md— per-source coverage matrix.capture/anti-patterns.md— review-blocker rules (AP-1..AP-12).capture/operator-runbook.md— first-aid recipes.brain-watchdog.md— enrichment-queue reaper + claim-batch caps.crates/hippo-core/src/schema.sql↗ — the SQL schema referenced throughout.