Files
m-platform-admin 00ff80fbbb Port CEK's debate-round instructions verbatim; prevent sycophantic convergence
Live round on WYL-72 exposed that my debate-round comment produced
sycophancy, not debate.  Judge-Gemini moved from a 4.0 ACCEPT stance to
"I agree my initial score was too lenient... I have adjusted my scores
to 3s and 2s" after reading Judge-GPT's 2.92 report — without defending
its original scores or challenging GPT's evidence.  Classic social-pressure
convergence.

Root cause: my debate comment said "You may hold your position if you
have new evidence; you may move if you find the other reasoning more
grounded.  Do not split the difference to compromise."  That phrasing
is both weaker than CEK's intent AND it dropped every structural
anti-sycophancy instruction CEK spelled out in judge-with-debate/SKILL.md:

  Missing: "Identify disagreements (where your scores differ by >1 point)"
  Missing: "Defend your position with evidence from the specification"
  Missing: "Challenge the other judge's position with counter-evidence"
  Missing: "Only revise if you find their evidence compelling"
  Missing: "Defend your original scores if you still believe them"

Also: I asked judges to post a REVISED report (implicitly retracting
their prior position).  CEK asks them to APPEND a debate round section
to their prior report, keeping both visible so the revision is a change
ON TOP OF the original rather than a replacement.

Fixed by porting CEK's instruction block verbatim into _build_debate_round_comment.
Added a regression test that fails if any future edit removes these exact
clauses.

Tests: 72 passed (+1 regression test).
2026-04-18 22:39:30 +02:00
..