Redaction Reference
How hippo’s secret-redaction layer works, what it catches by default, and what it explicitly doesn’t. Companion to the README’s Privacy and Security section.
For the bigger threat model (data flow, encryption, MCP trust boundary), read the README section first. This doc is the regex-pattern deep-dive.
What redaction is
A regex-based filter that runs over event text before storage. Implemented in crates/hippo-core/src/redaction.rs::RedactionEngine. Applied to:
- Shell command strings, plus values of env-allowlisted env vars, in
daemon.rs::flush_events(viaredact_shell_eventincrates/hippo-daemon/src/lib.rs). The shell hook’s captured output (stdout/stderrpayload sent as the event’s--output) is not currently passed throughRedactionEngine; only the command and allowlisted env values are filtered beforeinsert_event_at. - Claude session segment text —
user_prompts,assistant_texts, and per-tool-callsummaryfields are redacted in the FS-watcher path (crates/hippo-daemon/src/claude_session.rs::extract_segments, not inflush_events). The shared pattern set is loaded once viaRedactionEngine::builtin(). - Browser visits — URL query parameters listed in
[browser.url_redaction] strip_paramsare stripped innative_messaging.rs::strip_sensitive_params(see Browser URL redaction below). Browser titles and Readability-extracted page content are NOT currently passed throughRedactionEnginebefore storage.
Redaction is best-effort. It catches known secret formats; it cannot catch secrets in arbitrary positions. Treat it as a noise filter, not a security guarantee. If you need stronger guarantees, don’t paste secrets into your terminal in front of a tool that captures stdout.
What redaction isn’t
- Not a network filter. Redaction runs at storage time, not at LLM-call time. The LM Studio prompt is built from already-redacted events, so secrets that did slip past redaction will reach the LLM. That LLM is local (LM Studio at
localhost:1234); there’s no third-party LLM in the path. But if you point LM Studio at a remote backend, see the README’s Privacy section. - Not a database scrubber. Once a non-redacted secret has been stored in
events, hippo doesn’t re-process old rows when you add a pattern. Add a pattern → only future captures benefit. - Not a substitute for FileVault. The DB is unencrypted at rest.
Pattern format
~/.config/hippo/redact.toml is a TOML file with an array of pattern tables:
[[patterns]]
name = "aws_access_key"
regex = 'AKIA[0-9A-Z]{16}'
replacement = "[REDACTED]"
[[patterns]]
name = "github_pat"
regex = 'ghp_[a-zA-Z0-9]{36}|github_pat_[a-zA-Z0-9_]{82}'
replacement = "[REDACTED]"
| Field | Required | Behavior |
|---|---|---|
name | yes | Rule identifier. Surfaces in metrics (hippo.daemon.redactions{rule=<name>}) and in hippo redact test output. Must be unique. |
regex | yes | Rust regex crate ↗ syntax. No PCRE backreferences or lookaround. Compiled once at daemon startup (and RegexSet-bundled for fast match dispatch). Use (?i) for case-insensitivity. |
replacement | no | The replacement string for matched substrings. Defaults to "[REDACTED]" when omitted (see RedactConfig in crates/hippo-core/src/config.rs); can include capture-group references ($1, $2) per the regex crate’s Replacer. |
The whole redact.toml is the rule set; patterns are loaded in file order. See crates/hippo-core/src/config.rs::RedactConfig for the deserialization shape.
Evaluation model
- All patterns apply, not first-match.
RedactionEngine::redactiterates overRegexSet::matches, then callsreplace_allfor each matching rule. A single command can fire multiple rules. - Order is deterministic. Patterns evaluate in the order they appear in
redact.toml. After each pattern’sreplace_all, subsequent patterns operate on the already-redacted text. This matters when patterns can overlap: an earlier pattern that replaces a substring with[REDACTED]prevents a later pattern from matching what was there. - Per-rule hit attribution. Counting happens before replacement (counting after
replace_allwould return zero, since[REDACTED]doesn’t match the original pattern). Hit counts feed the OTel counterhippo.daemon.redactionswith the rule name as theruleattribute (seecrates/hippo-daemon/src/metrics.rs). Note that hits are only emitted for the shellcommandfield today; redactions of allowlisted env values fire but aren’t surfaced as separate counter increments. - No event dropping. When the entire command matches a pattern, the substring is replaced with
[REDACTED]in-place; the event row is still stored. Hippo doesn’t delete events even when redaction renders them empty. (See issue #52 ↗ for the open discussion of “over-redaction silently producing empty events” — the current behavior is “store the redacted row,” which is auditable but means a power user might see[REDACTED]lines inhippo events.)
Default patterns
Shipped in config/redact.default.toml ↗. All replacements are "[REDACTED]". The regex column below shows the patterns verbatim — Markdown’s | cell separator is escaped as | where it appears inside a regex’s alternation; the raw TOML uses a literal |.
| Rule | Regex | Catches |
|---|---|---|
aws_access_key | AKIA[0-9A-Z]{16} | Long-lived AWS access key IDs (the AKIA* prefix). |
github_pat | ghp_[a-zA-Z0-9]{36}|github_pat_[a-zA-Z0-9_]{82} | GitHub classic personal access tokens (ghp_) and fine-grained tokens (github_pat_). |
generic_secret_assignment | (?i)(api[_-]?key|api[_-]?token|access[_-]?token|auth[_-]?token|secret[_-]?key|private[_-]?key|password)\s*[=:]\s*\S{8,} | key=value and key: value assignments where the key matches a known secret-y name and the value is ≥ 8 non-whitespace characters. |
jwt | eyJ[a-zA-Z0-9_-]{10,}\.eyJ[a-zA-Z0-9_-]{10,}\.[a-zA-Z0-9_-]+ | Three-segment JWTs starting with the standard base64 { prefix. |
bearer_header | (?i)authorization:\s*bearer\s+\S+ | HTTP Authorization: Bearer <token> headers. |
private_key_pem | -----BEGIN [A-Z ]*PRIVATE KEY----- | The leading line of any PEM-encoded private key. (The body and trailing -----END line aren’t matched, so a key body in stdout would have its header redacted but the body would persist. This is a known gap.) |
Known false-negatives
The default rules do not catch:
- AWS temporary keys (
ASIA*prefix, used by STS). - Secrets in positional arguments:
./deploy prod my-secret-tokenwon’t match anything — there’s nokey=prefix. - Secrets in env-var names not on the keyword list:
STRIPE_KEY=sk_live_...,SENDGRID_APIKEY=...won’t firegeneric_secret_assignmentbecause the keyword regex doesn’t includeSTRIPEorSENDGRID. - Secrets renamed locally:
x=ghp_...only firesgithub_pat(because the value matches), notgeneric_secret_assignment(becausexisn’t a recognized keyword). If the user pastes a non-ghp_token under a non-keyword name, neither rule fires. - Multi-line secrets: a key body across multiple stdout lines isn’t matched by
private_key_pem(which only matches the header). - Secrets in JSON payloads not matching
keyword=value: e.g.{"apiKey": "..."}. The keyword regex requires=or:adjacent to the keyword name; embedded JSON shapes pass through.
If your workflow involves any of the above, write custom patterns. (See Writing custom patterns.)
Writing custom patterns
The hippo redact test "<input>" CLI compiles your live redact.toml and reports which rules fire on the given input. Use it iteratively while writing a new pattern.
Example session
You want to catch your team’s internal API tokens, which look like xtok_<32-hex>:
# Step 1: confirm the default rules don't catch it
hippo redact test "deploy --token xtok_a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6"
# No patterns matched.
# Redacted (0 replacements):
# deploy --token xtok_a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6
# Step 2: add the pattern
$EDITOR ~/.config/hippo/redact.toml
[[patterns]]
name = "internal_xtok"
regex = 'xtok_[a-f0-9]{32}'
replacement = "[REDACTED]"
# Step 3: verify
hippo redact test "deploy --token xtok_a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6"
# Matched patterns: internal_xtok
# Redacted (1 replacements):
# deploy --token [REDACTED]
# Step 4: confirm it doesn't false-positive on benign text
hippo redact test "git checkout xtok-feature-branch"
# No patterns matched.
# Redacted (0 replacements):
# git checkout xtok-feature-branch
More example sessions
# Stripe live keys
hippo redact test "STRIPE_KEY=sk_live_AbCdEfGhIjKlMnOp"
# No patterns matched.
# Redacted (0 replacements): ... (STRIPE_KEY isn't on the keyword list)
# After adding a pattern named `stripe_secret_key` with regex `sk_live_[a-zA-Z0-9]{24,}`:
hippo redact test "STRIPE_KEY=sk_live_AbCdEfGhIjKlMnOp"
# Matched patterns: stripe_secret_key
# Redacted (1 replacements):
# STRIPE_KEY=[REDACTED]
# Slack incoming webhooks (placeholder shown — replace with your team's URL when testing)
hippo redact test "curl -X POST https://hooks.slack.com/services/<TEAM>/<CHANNEL>/<TOKEN>"
# No patterns matched.
# Redacted (0 replacements): ...
# After adding a pattern with regex = 'https://hooks\.slack\.com/services/[A-Z0-9/]+'
# the same call would report:
# Matched patterns: slack_webhook_url
# Redacted (1 replacements): curl -X POST [REDACTED]
# Database connection strings
hippo redact test "DATABASE_URL=postgres://admin:[email protected]:5432/prod"
# No patterns matched.
# Redacted (0 replacements): ... (DATABASE_URL isn't on the keyword list)
# Add: regex = '(?i)(database_url|postgres|mysql)://[^@\s]+:[^@\s]+@'
hippo redact test "DATABASE_URL=postgres://admin:[email protected]:5432/prod"
# Matched patterns: db_connection_string
# Redacted (1 replacements): DATABASE_URL=[REDACTED]db.example.com:5432/prod
After adding any new pattern, restart the daemon: mise run restart. The engine compiles patterns at startup and doesn’t reload on redact.toml change.
Browser URL redaction
Separate from redact.toml. Configured in [browser.url_redaction] in ~/.config/hippo/config.toml:
[browser.url_redaction]
strip_params = ["session_id", "auth_token", "access_token", "api_key", "token"]
Implemented in crates/hippo-daemon/src/native_messaging.rs::strip_sensitive_params. For each URL passed by the Firefox extension, query parameters whose names match strip_params are removed before storage. Path components are preserved.
What it catches:
https://example.com/api?api_key=abc&user=joe→https://example.com/api?user=joehttps://example.com/login?token=xyz→https://example.com/login
What it doesn’t catch:
- Secrets in path segments:
https://example.com/api/v1/abc-token-xyz/userskeeps the path as-is. - Secrets in fragment:
https://example.com/page#auth=abckeeps the fragment. - Encoded params:
https://example.com/api?secret%3Dabcwon’t be parsed as asecret=abcquery.
Threat model
| Hippo defends against | Hippo does NOT defend against |
|---|---|
| Accidental capture of common token formats (AWS, GitHub PAT, JWT, PEM private keys) into the LLM context | Secrets in positional arguments, non-standard env-var names, or arbitrary file content the user pastes |
| Common URL-borne tokens in browser visit URLs | Tokens in URL path segments or page content |
Secrets in Authorization: Bearer … HTTP headers logged to stdout | Secrets in custom auth schemes |
| Storing structured PEM private-key headers | The body of a multi-line private key (only the header line matches) |
Replay of [REDACTED] strings across the LLM/MCP path (since the secret has been replaced before storage) | Secrets that were already stored before a pattern was added |
For threats outside this list, the answer is “don’t paste secrets into your terminal.” Hippo’s job is to catch the common case.
See also
- README Privacy and Security — the wider data-flow story.
config/redact.default.toml↗ — the default ruleset.crates/hippo-core/src/redaction.rs↗ —RedactionEngineimplementation.crates/hippo-daemon/src/native_messaging.rs↗ —strip_sensitive_paramsfor browser URL redaction.- Issue #52 ↗ — open discussion on over-redaction behavior.