WYL-52: suppress debater mention-chain cascade #1

Merged
m-senior-developer merged 4 commits from agent/senior-developer/wyl-52 into main 2026-04-16 00:55:23 +02:00
Member

Add explicit no-mention rule to all DEBATER_PROMPTS and JUDGE_PROMPT_TEMPLATE.

Each prompt now ends with a bold instruction telling agents not to @-mention other agents, which was the root cause of ~7 cascade tasks firing after the WYL-47 live run.

Implements option 1 from WYL-52: prompt edit (cheap, immediate effect). Escalate to option 3 (marker comment) if cascade volume is still noisy after a follow-up live run.

Add explicit no-mention rule to all DEBATER_PROMPTS and JUDGE_PROMPT_TEMPLATE. Each prompt now ends with a bold instruction telling agents not to @-mention other agents, which was the root cause of ~7 cascade tasks firing after the WYL-47 live run. Implements option 1 from WYL-52: prompt edit (cheap, immediate effect). Escalate to option 3 (marker comment) if cascade volume is still noisy after a follow-up live run.
m-senior-developer added 4 commits 2026-04-16 00:48:17 +02:00
New module: src/coordinator/orchestrator.py
- DEBATER_NAMES, JUDGE_NAME, DEBATER_PROMPTS, JUDGE_PROMPT_TEMPLATE hardcoded for v1
- Per-debater prompts tell each debater exactly which tool output to ground evidence in
- orchestrate_pending() is the main entry point called from watch_loop
- _start_round(): pending→running, posts debater mention comment, phase→awaiting_debaters
- _advance_awaiting_debaters(): polls for replies, handles timeout with partial evidence,
  posts judge comment, phase→awaiting_judge
- _advance_awaiting_judge(): polls for verdict; RACE FIX — update_issue_status() called
  BEFORE queue.update_status("done") so poll_once can never double-enqueue
- Detection: primary=author_id match, fallback=[{name} response]: content marker (enables tests)
- Restart-safe: phase field persisted on every mutation; in-flight rounds resume correctly

Extended src/coordinator/queue.py:
- Round gains phase, phase_entered_at, coordinator_comment_id, judge_comment_id fields
- DebateQueue.update_phase() and running() added
- All new fields default-empty so existing queue.json files load cleanly

Extended src/coordinator/multica_client.py:
- update_issue_status() convenience wrapper
- create_issue() for integration / smoke tests

Updated src/coordinator/__main__.py:
- _orchestrate_pending stub replaced with real import from orchestrator

Tests:
- tests/test_orchestrator.py: 32 new unit tests covering phase transitions, timeouts,
  race fix ordering, restart resume, full lifecycle
- tests/test_integration.py: @pytest.mark.integration test against real API
- smoke_test.py: standalone end-to-end script; ran against real API, verdict OK

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
After _advance_awaiting_judge kicks an issue back to in_progress on a
REJECT, post a short followup comment that @-mentions the agent assignee
with the verdict, top 2-3 failure reasons, and a retry prompt.

Corner cases handled:
- assignee_type != 'agent' (member or unset) → skip silently
- ACCEPT branch → no notification
- notification failure → logged, round still completes (non-blocking)

New helpers: _extract_rejection_reasons, _build_rejection_followup,
_notify_assignee_on_reject.

+12 tests (5 for _extract_rejection_reasons, 7 for the notify path).
Total: 66 passed.
Replace the round-1 summariser approach with the full anchor pattern:

A. _post_rejection_retrigger(round_, client, issue, verdict_comment_content, logger)
   - Renamed from _notify_assignee_on_reject
   - Non-agent/no-assignee path: post a non-mentioning coordinator note and return
   - Agent path: build full anchor comment via _build_retrigger_comment

B. Anchor comment structure (verbatim, no summarising):
   1. [@AssigneeName](mention://agent/<id>)
   2. Verdict: REJECT (round <id>)
   3. ## ANCHOR — Original requirements + full issue description in blockquote
   4. ## Why this was rejected + full judge verdict in blockquote
   5. ## Instructions for rework + REWORK_INSTRUCTIONS constant (verbatim)
   6. Trailing audit line

C. Round.retrigger_comment_id + DebateQueue.set_retrigger_comment_id

D. 8 required tests (D1–D8): mention, verbatim description, no-drift constant,
   member skip, no-assignee skip, accept no-op, id persistence, race regression

E. test_retrigger_on_reject_end_to_end integration test

Removed: _extract_rejection_reasons, _build_rejection_followup (summarisers)
Added: REWORK_INSTRUCTIONS constant, MulticaClient.get_agent_name

Unit: 63 passed. Integration: 1 passed.
Add _NO_MENTION_RULE to all DEBATER_PROMPTS and JUDGE_PROMPT_TEMPLATE
explicitly instructing agents not to @-mention other agents in their
replies. Mentions trigger Multica's mention-trigger mechanism, causing
cascade tasks from debaters responding to each other's comments.

Plain-name references (e.g. 'Code Reviewer' not '@Code Reviewer') are
still allowed for cross-reference in evidence text.
m-senior-developer merged commit d3db6cfcd7 into main 2026-04-16 00:55:23 +02:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: multica/coordinator#1