Repair \ \u2192 in YAML; live test finding from WYL-72 round 1

Judge-Gemini (on github-copilot/gemini-3.1-pro-preview) emitted reports
with \` (backslash-backtick) inside double-quoted YAML strings,
imitating markdown escaping (e.g. evidence: "see \`foo.py\`").  \`
is not a valid YAML escape sequence, so yaml.safe_load rejected the entire
report.  Judge-GPT did not make this mistake, so the consensus degenerated
from 3-to-2 parseable reports to 1-to-2 (and therefore produced a spurious
single-judge convergence).

The fix is a targeted cleanup in _extract_yaml: replace \` with literal
` before parsing.  No other interpretation of \` exists in YAML, so
this does not mask semantics.

Tests: 71 passed (added 2 for the new cleanup path).

Separately surfaced (not yet addressed in this commit):
- Judge-Claude on github-copilot/claude-sonnet-4.6 and
  github-copilot/claude-opus-4.5 returned model_not_supported.  Per GitHub
  Copilot Student plan docs (2026-03-13 update), Claude Opus/Sonnet are no
  longer student-selectable.  Tower-Copilot-Claude now runs
  claude-haiku-4.5 (which is student-accessible).
This commit is contained in:
2026-04-18 22:26:50 +02:00
parent d1039d01de
commit 1a77ddcb99
2 changed files with 36 additions and 3 deletions
+9 -3
View File
@@ -182,15 +182,21 @@ def _extract_yaml(content: str) -> str:
(``"`` for ``"``, ``>`` for ``>``, etc.). Agent replies are
plain UTF-8 to begin with, so we unescape first, then extract.
Also repairs ``\\``` (backslash-backtick) sequences to literal backticks.
Some models emit evidence strings like ``evidence: "see \\`foo.py\\`"`` that
imitate markdown escaping, but ``\\``` is not a valid YAML escape — fixing
it here is a cleanup of an objective mistake, not toleration of malformed
semantics.
Returns the YAML text (without fences), or the original content if no fence
is found. The caller is responsible for deciding whether the raw content
is parseable.
"""
unescaped = html.unescape(content)
m = _YAML_FENCE_RE.search(unescaped)
if m:
return m.group(1).strip()
return unescaped.strip()
text = m.group(1).strip() if m else unescaped.strip()
# Repair \` → ` (invalid YAML escape; models mean a literal backtick)
return text.replace("\\`", "`")
def _parse_rubric(content: str) -> dict[str, Any] | None:
+27
View File
@@ -215,6 +215,33 @@ def test_parse_rubric_accepts_html_encoded_input():
assert "checklist" in spec
def test_extract_yaml_repairs_backslash_backtick():
# Gemini (and similar) emit \` inside double-quoted YAML strings, imitating
# markdown escaping. \` is not a valid YAML escape, so we repair it.
content = "evaluation_report:\n rubric_scores:\n - name: X\n score: 4\n evidence: \"see \\`foo.py\\` and \\`bar.py\\`\"\n"
y = _extract_yaml(content)
assert "\\`" not in y
assert "`foo.py`" in y
def test_parse_judge_report_tolerates_backslash_backtick():
content = (
"```yaml\n"
"evaluation_report:\n"
" score_calculation:\n"
" final_score: 4.0\n"
" rubric_scores:\n"
" - name: Correctness\n"
" score: 4\n"
" weight: 1.0\n"
" evidence: \"see \\`foo.py\\`\"\n"
"```"
)
r = _parse_judge_report(content)
assert r is not None
assert r["score_calculation"]["final_score"] == 4.0
def test_parse_rubric_valid_flat():
spec = _parse_rubric(f"```yaml\n{_rubric_yaml_sample()}\n```")
assert spec is not None