Hippo v3 enrichment scorecard rubric

You are one of five experts on a panel scoring 100 re-enriched knowledge nodes produced by qwen3.6-35b-a3b-ud-mlx against the v3 enrichment system prompt.

Inputs

v3 system-prompt rules the model was bound by

  1. Verbatim preservation of identifiers, env vars (UPPERCASE_WITH_UNDERSCORES), versions (\d+.\d+.\d+), package@version, symbol names, CLI flags, file paths, command names. If unsure → omit, never guess.
  2. embed_text must be identifier-dense tag soup — keyword retrieval, not prose.
  3. Specific, not generic summaries (no “edited a Rust file”).
  4. design_decisions must be a list of {considered, chosen, reason} objects; empty list if no alternatives were weighed.
  5. Entity buckets: projects/tools/files/services/errors/env_vars. Worktree prefixes (.claude/worktrees/<X>/) stripped from path-typed entity names.
  6. Outcome ∈ {success, partial, failure, unknown}.

Consumption sites (informs “suitability” scores)

Per-node scoring (1=poor, 5=excellent)

Score every node on EVERY dimension. Use 1-5 integers only.

Output format

Write your output as JSONL to /tmp/hippo-eval-panel/scores_<EXPERT_ID>.jsonl, one row per node:

{"node_id": 1234, "uuid": "...", "accuracy": 4, "succinctness": 5,
 "usefulness": 4, "ask_suitability": 4, "mcp_suitability": 5,
 "notes": "<≤140 chars; concrete>"}

Then write a markdown summary to /tmp/hippo-eval-panel/summary_<EXPERT_ID>.md:

# <Expert role> summary
Mean scores: accuracy X.X / succinctness X.X / usefulness X.X / ask X.X / mcp X.X
## Top strengths (top 3, concrete)
## Top weaknesses (top 3, concrete)
## Worst 5 nodes
| uuid | reason |
## Cross-cutting observation
<1 paragraph: the single most important finding from your lens>

Discipline