RAG / hippo ask expert summary

Mean scores: accuracy 4.36 / succinctness 3.73 / usefulness 3.26 / ask 3.44 / mcp 4.02

Corpus stats (post-truncation render):

Top strengths (top 3, concrete)

Top weaknesses (top 3, concrete)

Worst 5 nodes

uuidreason
67368c74-9405-4f53-8341-1da7bf98e634ask=1 use=3 mcp=5
0111018e-2904-4f3b-b7a9-bf72584af367ask=1 use=2 mcp=3
25b32204-7111-4d8d-8f9b-e0a2d5e4610aask=1 use=3 mcp=5
6dd039b8-5525-4539-87ed-782f59ed6ad5ask=1 use=3 mcp=5
b239a21e-24d2-434e-b4bf-ffe8ebe757c1ask=1 use=3 mcp=3

Render gaps (RAG lens — actionable)

Fields the model populates well but the RAG synthesis prompt builder NEVER renders. These are the biggest wins available without re-enriching: add render branches in brain/src/hippo_brain/rag.py::_hit_lines.

  1. key_decisions (most damaging) — populated on a large fraction of nodes with terse, identifier-rich bullet summaries (“Decided to use X instead of Y because Z”). This is exactly synthesis-prompt-shaped content and is dropped on the floor. Examples with the most orphaned content: 25b32204-7111-4d8d-8f9b-e0a2d5e4610a (1906 chars), 6dd039b8-5525-4539-87ed-782f59ed6ad5 (1906 chars), 6dbadd52-7c4e-4b92-a770-f0e330015656 (1646 chars).
  2. problems_encountered — when populated, often carries the error message and recovery action the user will absolutely query for (“what went wrong with X?”). Render currently skips it. Notable cases: 87976564-3ace-4903-b8a4-0e666720d4f1, 3a3ccf2f-2ef2-4a2d-8f5d-20e8ba2c1db3.
  3. outcome is rendered as a one-token line but the meaning of “partial” or “failure” is in problems_encountered. Currently the synthesizer sees Outcome: partial with no context for why, even though the model wrote it down. Same orphans as #2.

Cross-cutting observation

The single biggest lever for hippo ask quality is NOT re-enrichment — it’s plumbing. 66/100 nodes have populated content in key_decisions and/or problems_encountered that the RAG render in brain/src/hippo_brain/rag.py::_hit_lines silently drops. The synthesis LLM never sees any of it, regardless of how good the enrichment is. Add two render branches (“Decisions:” and “Problems:” lines under the same proportional truncation budgeting already applied to embed_text/commands/design_decisions) and the post-truncation context for the questions users actually type (“why did I do X?”, “what error did I hit when Y?”) would jump materially without touching the model. Secondary lever: a 10-node tail of filler-opening summaries (“The user requested…”) burns the 120-char Summary budget. That’s a prompt fix in the enricher (“Lead the summary with a strong verb + artefact, never with subject-first prose”), not a structural change.