Schema / Python-data expert summary

Mean scores: accuracy 4.82 / succinctness 4.51 / usefulness 4.38 / ask 4.94 / mcp 4.43

100 nodes scored, written to scores_schema.jsonl.

Structural fault tally

Fault typeCountExample uuid
invalid_json0
missing required field0
null where list expected0
invalid outcome0
entities not dict / missing buckets / wrong-type buckets0
design_decisions wrong shape0
worktree prefix not stripped (path-typed)6c1260596-e479-4bc7-a9ff-6904f722fefd
env_var bad case / not env-shaped3fe5d66df-c6f0-43bd-ad00-f6f18bbd7f0f
tool entity carries args (“cargo clippy”)19842bfe93-ec2e-4b4b-af2d-769485a8f887
sparse tags (≤2)1d3368650-f2ee-474a-a51b-ca034c396465
generic-only tag set / within-node duplicates0

design_decisions distribution: 0 nulls, 61 empty [], 39 populated — every populated entry is a well-formed {considered, chosen, reason} dict. Outcomes: success 81 / partial 14 / failure 3 / unknown 2 (all valid).

Worst 5 nodes

Cross-cutting observation

Hard schema is 100% clean: JSON validity, required keys, outcome enum, entities/dd shapes — zero violations across all 100 nodes. All remaining drift is in two semantic-taxonomy areas the validator does not enforce:

  1. Worktree prefixes leak into 6/100 path-typed entity names, and upsert_entities’s fix-on-conflict-only rule leaves the pollution baked into first-writer rows.
  2. The tools bucket gets filled with shell-invocation phrases (cargo clippy, git log, uv run --project) instead of bare command names in 19/100 nodes, defeating cross-node dedup and inflating get_entities cardinality.

Both are higher-leverage to fix in upsert_entities (add a tool-name normalize pass; tighten the worktree-strip to apply unconditionally on path types) than to chase via more prompt rules — those changes also retroactively clean already-written rows.

Authoritative refs consulted:

Checker script: check_schema.py. Per-node detail dump: _schema_fault_tally.json.