MCP Tool Reference

Full reference for hippo’s MCP server. Every tool that hippo-mcp exposes, with arguments, return shapes, examples, and selection guidance. The MCP server source is brain/src/hippo_brain/mcp.py; this doc is what to read instead.

For setup (adding hippo to your MCP config), see the README’s MCP Server section. For the trust-boundary discussion (what granting MCP access actually exposes), see the README’s Privacy and Security section.

Tool selection guide

You want…Reach forWhy
A synthesized prose answer with cited sourcesaskPerforms retrieval + LLM synthesis end-to-end. Slow (~1-3 s) but most useful for “what was I working on?” / “how did I fix that?” / “why did we choose X?”
A list of relevant knowledge nodes (no synthesis)search_knowledge or search_hybridRetrieval only. Fastest path. Use search_hybrid when you want score-fused vec0 + FTS5 results; search_knowledge for the simpler “semantic with lexical fallback” path.
A Markdown context block ready to paste into another agent’s promptget_contextSame retrieval as search_hybrid, rendered as a prompt-shaped block (numbered list + per-hit summary/outcome/cwd/uuid).
Raw shell commands / Claude tool calls / browser visits — not enriched summariessearch_eventsOperates on the events tables, not knowledge nodes. Use for “what command did I run?” / “what URL was I on?”
The list of projects in the corpuslist_projectsUse for discovery before filtering other tools by project.
Extracted entities (project, file, tool, env_var, etc.)get_entitiesKnowledge graph view. Filter by type to scope.
CI status for a recent pushget_ci_statusStructured data; preferred over ask for known-shape queries.
Lessons (graduated recurring CI failures)get_lessonsPre-flight before editing in a known failure-prone area. Lessons require ≥ 2 occurrences to graduate.

The ones to reach for first: ask for natural-language questions with answer synthesis; search_hybrid for raw retrieval; search_events for raw event lookup.

Common arguments

Several tools share filter arguments. They all behave the same way:

ArgumentTypeBehavior
sincestrTime window. Strict format only: ^<digits><unit>$ where unit is m/h/d. Examples: "30m", "24h", "7d". parse_since returns 0 (no filter) for inputs with spaces, words, bare numbers, mixed case, or anything else that doesn’t match. Empty string disables.
projectstrSubstring match on cwd or git_repo of the events/sessions linked to a knowledge node. Use list_projects first to find candidates.
sourcestrOne of "shell", "claude", "browser", "workflow". Empty string means all sources.
branchstrExact-match git_branch filter. Ignored for browser events.
limitintMax results. Clamped to a sane upper bound by _clamp_limit; values above that are silently capped.

Tools

ask

Natural-language question → synthesized answer with cited sources.

Arguments

NameTypeDefaultNotes
questionstrrequiredThe natural-language question.
limitint10Number of knowledge nodes to retrieve as context.
project / since / source / branchstr""See Common arguments.

Returns — a single string (Markdown). Begins with the synthesized answer; ends with a Sources: block listing each cited node’s score, summary, cwd, and timestamp. Rendered nicely by glow.

Example

ask({"question": "What dep bumps shipped in v0.13.0?", "since": "30d"})
v0.13.0 included two CVE-related upgrades:
- python-multipart 0.0.22 → 0.0.26
- pygments 2.19.2 → 2.20.0

Sources:
  1. [98%] Patched two transitive Python vulnerabilities (python-multipart and pygments)…
     /Users/carpenter/projects/hippo (feat/claude-tool-enrichment-policy) — 2026-04-22
  2. [94%] Pushed v0.13.0 release tag…
     /Users/carpenter/projects/hippo (main) — 2026-04-22

Pitfalls


search_knowledge

Search enriched knowledge nodes; no synthesis. Defaults to semantic; falls back to lexical on embedding failure or when filters are applied.

Arguments

NameTypeDefaultNotes
querystrrequiredSearch query text.
modestr"semantic""semantic" (vector similarity via the inference server’s embedding model) or "lexical" (SQL LIKE over knowledge_nodes.content / embed_text — does NOT use the FTS5 index).
limitint10
project / since / source / branchstr""See Common arguments.

When any filter is applied, the implementation forces lexical mode (filter pushdown isn’t supported in the semantic path).

Returns — list of SearchResult-shaped dicts (from shape_semantic_results / search_knowledge_lexical in brain/src/hippo_brain/mcp_queries.py):

{
  "uuid": "node-uuid-...",
  "score": 0.87,
  "summary": "...",
  "intent": "",
  "outcome": "success" | "partial" | "failure" | "unknown",
  "tags": ["tag1", "tag2"],
  "embed_text": "identifier-dense tag soup",
  "cwd": "/Users/.../projects/hippo",
  "git_branch": "main",
  "captured_at": 1730000000000,
  "linked_event_ids": [12345, 12346],
  "linked_claude_session_ids": [501, 502],
  "linked_browser_event_ids": [9001]
}

The linked_*_ids arrays are empty when a node has no links to that source (e.g., a browser-only node returns [] for linked_event_ids).


search_hybrid

Hybrid retrieval (sqlite-vec + FTS5 score fusion) over knowledge nodes. No synthesis; same return shape as search_knowledge.

Arguments

Same as search_knowledge, plus:

NameTypeDefaultNotes
modestr"hybrid""hybrid" (default — RRF score fusion), "semantic", "lexical", or "recent".
entitystr""Require a specific canonical entity name to appear among the node’s linked entities.

Returns — same SearchResult shape as search_knowledge.

When to prefer over search_knowledge

search_hybrid is the structured retrieval path used by ask/get_context internally; it supports filter pushdown and the entity argument. Reach for search_knowledge only when you want the legacy “semantic with lexical fallback” behavior.


search_events

Search raw events — shell commands, Claude tool calls, browser visits — not enriched summaries.

Arguments

NameTypeDefaultNotes
querystr""Text to search for. Substring match on commands / Claude session summary_text / browser titles.
sourcestr"all""shell", "claude", "browser", "all".
since / project / branchstr""See Common arguments.
limitint20

Returns — list of normalized event dicts. The shape is the same across all three sources (the per-source helpers in mcp_queries.py project each row into this canonical envelope):

{
  "id": 12345,
  "source": "shell" | "claude" | "browser",
  "timestamp": 1730000000000,
  "summary": "...",
  "cwd": "/Users/.../projects/hippo",
  "detail": "...",
  "git_branch": "main"
}

What lands in summary and detail is source-specific:

Sourcesummarydetailcwd / git_branch
shell (rows from events)the command text"exit=<code> duration=<ms>ms"cwd and git_branch from the event
claude (rows from agentic_sessions where harness IN ('claude-code', 'opencode', 'codex', 'cursor'))summary_text"messages=<count> tools=<count>" (tool count derived from tool_calls_json)cwd / git_branch from the session
browser (rows from browser_events)"<domain> — <title or url>""dwell=<ms>ms scroll=<pct>%"empty strings (browser events have no cwd/branch)

When source="all", results are interleaved by timestamp desc and capped at limit.

The original per-table fields (command, exit_code, url, tool_calls_json, etc.) are not returned — they’re projected into summary/detail and the underlying row stays in SQLite. Use hippo events (CLI) or query SQLite directly when you need the raw columns.


get_entities

Browse the entities knowledge graph.

Arguments

NameTypeDefaultNotes
typestr""One of the schema’s entities.type CHECK values: "project", "file", "tool", "service", "repo", "host", "person", "concept", "domain", "env_var" (added in schema v13). Empty = all types. The brain doesn’t emit every category in every corpus — query get_entities with no filter to see what your DB actually contains.
querystr""Substring match on entity name.
limitint50
projectstr""Substring match on cwd/git_repo of co-occurring nodes.
sincestr""Window applied to entities.last_seen.

Returns — list of entity dicts (get_entities_impl in brain/src/hippo_brain/mcp_queries.py):

{
  "type": "file",
  "name": "/Users/.../brain/src/hippo_brain/enrichment.py",
  "canonical": "brain/src/hippo_brain/enrichment.py",
  "first_seen": 1730000000000,
  "last_seen": 1730900000000
}

canonical is the dedup key (worktree-stripped, project-root-relative); name is the display value the LLM emitted (worktree-stripped at write time per #105). The internal entities.id and any aggregate occurrence count are not exposed today.


list_projects

Distinct projects in the corpus, ordered by most-recent activity first.

Arguments

NameTypeDefaultNotes
limitint50

Returns — list of dicts (list_projects_impl in brain/src/hippo_brain/mcp_queries.py):

{
  "git_repo": "stevencarpenter/hippo",
  "cwd_root": "/Users/carpenter/projects/hippo",
  "last_seen": 1730900000000
}

The list is the union of distinct (git_repo, cwd_root) pairs from shell events and agentic_sessions (project_dir, restricted to harness IN ('claude-code', 'codex', 'cursor', 'opencode'); browser events have no cwd, so they’re skipped). last_seen is MAX(timestamp) / MAX(start_time) across both sources. There is no event_count field today.


get_context

Hybrid retrieval rendered as a Markdown context block, ready to paste into another agent’s prompt.

Argumentsquery, limit, project, since, source (same semantics as search_hybrid). Does NOT accept mode, branch, or entity. Always uses mode="hybrid" internally.

Returns — single Markdown string rendered by format_context_block in brain/src/hippo_brain/mcp_queries.py. Shape:

# Hippo context for: <query>

## [1] <summary> (score: 0.87)
- **Outcome:** <success/partial/failure>
- **CWD:** `<cwd>`
- **Branch:** `<git_branch>`
- **When:** <ISO timestamp>
- **uuid:** `<uuid>`

<truncated embed_text — up to 600 chars then `…`>

## [2] ...

When no results match, the block is # Hippo context for: <query>\n\n_No relevant knowledge found._. Embed text per hit is truncated to 600 characters (with a trailing ) to keep the block prompt-budget-friendly.


get_ci_status

Structured CI status from a recent git push.

Arguments

NameTypeDefaultNotes
repostrrequired"owner/repo" format.
shastr | NoneNoneGit commit SHA.
branchstr | NoneNoneBranch name (used when sha is not provided).

Returns — dict with the most recent run’s structured data (status, conclusion, jobs with annotations, started_at, completed_at, html_url). Empty dict {} if no matching run is found.

When to prefer over ask

get_ci_status is the right tool for “did my push pass CI?” — it returns structured data your script can act on without parsing prose.


get_lessons

Distilled past-mistake lessons. Pre-flight before editing in a known failure-prone area.

Arguments

NameTypeDefaultNotes
repostr | NoneNone"owner/repo" format.
pathstr | NoneNoneReturns lessons whose stored path_prefix matches as a prefix of this path.
toolstr | NoneNoneFilter by tool name ("ruff", "clippy", etc.).
limitint10

Returns — list of lesson dicts (from dataclasses.asdict(Lesson)): {id, repo, tool, rule_id, path_prefix, summary, fix_hint, occurrences, first_seen_at, last_seen_at}. id is the lessons.id primary key.

Lessons graduate only after 2+ occurrences (single failures stay in lesson_pending and don’t surface here).

Common pitfalls

See also