coordinator

multica/coordinator

Fork 0

Commit Graph

Author	SHA1	Message	Date
m-platform-admin	f88255096e	Replace hand-written debater pipeline with CEK judge-with-debate The prior pipeline (4 hand-written debater prompts + 1 judge with my prompt template) kept missing scope drift because every prompt was mine and the reviewers were all on the same model tier with correlated priors. This commit replaces the whole review step with CEK's judge-with-debate pattern translated to multica-native execution: pending → awaiting_rubric (meta-judge writes YAML spec from issue alone) → awaiting_judges (3 judges on 3 copilot models score independently) → consensus check (overall within 0.5, criteria within 1.0) → accept or reject OR awaiting_debate rounds up to 3 → error on malformed YAML or cap hit Per higher-management direction, we do not deal with a model that cannot produce YAML: malformed rubric or all-unparseable judge reports fail the round immediately (no retries, no fallback to hand-written prompts). The anchor retrigger on REJECT (WYL-51 behaviour) is preserved verbatim. Agent prompts for meta-judge and the 3 judges come from the CEK agents themselves (Meta-Judge / Judge-GPT / Judge-Claude / Judge-Gemini) whose `instructions` field is the CEK meta-judge.md / judge.md files uploaded byte-for-byte. No prompts are authored in this coordinator's source. Adds pyyaml dependency. - src/coordinator/orchestrator.py: rewritten for the new phase machine - src/coordinator/queue.py: Round extended with rubric_yaml, judge_report_comment_ids, debate_round - tests/test_orchestrator.py: 40 tests for new pipeline (helpers, parsers, consensus math, phase handlers, race fix, retrigger) - tests/test_integration.py: removed (tested old debater pipeline) - pyproject.toml: adds pyyaml Tests: 67 passed in 0.20s (40 orchestrator + 15 queue + 7 watcher + 5 other).	2026-04-18 22:01:18 +02:00
m-senior-developer	0e44846032	Implement debate round orchestration (WYL-45) New module: src/coordinator/orchestrator.py - DEBATER_NAMES, JUDGE_NAME, DEBATER_PROMPTS, JUDGE_PROMPT_TEMPLATE hardcoded for v1 - Per-debater prompts tell each debater exactly which tool output to ground evidence in - orchestrate_pending() is the main entry point called from watch_loop - _start_round(): pending→running, posts debater mention comment, phase→awaiting_debaters - _advance_awaiting_debaters(): polls for replies, handles timeout with partial evidence, posts judge comment, phase→awaiting_judge - _advance_awaiting_judge(): polls for verdict; RACE FIX — update_issue_status() called BEFORE queue.update_status("done") so poll_once can never double-enqueue - Detection: primary=author_id match, fallback=[{name} response]: content marker (enables tests) - Restart-safe: phase field persisted on every mutation; in-flight rounds resume correctly Extended src/coordinator/queue.py: - Round gains phase, phase_entered_at, coordinator_comment_id, judge_comment_id fields - DebateQueue.update_phase() and running() added - All new fields default-empty so existing queue.json files load cleanly Extended src/coordinator/multica_client.py: - update_issue_status() convenience wrapper - create_issue() for integration / smoke tests Updated src/coordinator/__main__.py: - _orchestrate_pending stub replaced with real import from orchestrator Tests: - tests/test_orchestrator.py: 32 new unit tests covering phase transitions, timeouts, race fix ordering, restart resume, full lifecycle - tests/test_integration.py: @pytest.mark.integration test against real API - smoke_test.py: standalone end-to-end script; ran against real API, verdict OK Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-15 21:43:17 +00:00
m-platform-admin	6da434039c	WYL-42: Python skeleton + in_review watcher loop Minimum viable structure for the Tool-MAD coordinator: - coordinator.config: env-loaded Config dataclass, writes state to ~/.coordinator/ - coordinator.multica_client: thin requests wrapper for issues/comments/agents - coordinator.state: flat-json SeenState tracking issue_id -> last_seen_updated_at - coordinator.__main__: watch_loop() that polls in_review and logs candidates - README.md: why this exists + how to run v0 only detects in_review transitions; convening debate rounds is WYL-45. Dependencies: stdlib + requests (nothing else until a working v1 ships).	2026-04-15 23:04:06 +02:00

Author

SHA1

Message

Date

m-platform-admin

f88255096e

Replace hand-written debater pipeline with CEK judge-with-debate

The prior pipeline (4 hand-written debater prompts + 1 judge with my prompt
template) kept missing scope drift because every prompt was mine and the
reviewers were all on the same model tier with correlated priors.

This commit replaces the whole review step with CEK's judge-with-debate
pattern translated to multica-native execution:

  pending → awaiting_rubric (meta-judge writes YAML spec from issue alone)
          → awaiting_judges (3 judges on 3 copilot models score independently)
          → consensus check (overall within 0.5, criteria within 1.0)
          → accept or reject OR awaiting_debate rounds up to 3
          → error on malformed YAML or cap hit

Per higher-management direction, we do not deal with a model that cannot
produce YAML: malformed rubric or all-unparseable judge reports fail the
round immediately (no retries, no fallback to hand-written prompts).

The anchor retrigger on REJECT (WYL-51 behaviour) is preserved verbatim.

Agent prompts for meta-judge and the 3 judges come from the CEK agents
themselves (Meta-Judge / Judge-GPT / Judge-Claude / Judge-Gemini) whose
`instructions` field is the CEK meta-judge.md / judge.md files uploaded
byte-for-byte. No prompts are authored in this coordinator's source.

Adds pyyaml dependency.

- src/coordinator/orchestrator.py: rewritten for the new phase machine
- src/coordinator/queue.py: Round extended with rubric_yaml, judge_report_comment_ids, debate_round
- tests/test_orchestrator.py: 40 tests for new pipeline (helpers, parsers, consensus math, phase handlers, race fix, retrigger)
- tests/test_integration.py: removed (tested old debater pipeline)
- pyproject.toml: adds pyyaml

Tests: 67 passed in 0.20s (40 orchestrator + 15 queue + 7 watcher + 5 other).

2026-04-18 22:01:18 +02:00

m-senior-developer

0e44846032

Implement debate round orchestration (WYL-45)

New module: src/coordinator/orchestrator.py
- DEBATER_NAMES, JUDGE_NAME, DEBATER_PROMPTS, JUDGE_PROMPT_TEMPLATE hardcoded for v1
- Per-debater prompts tell each debater exactly which tool output to ground evidence in
- orchestrate_pending() is the main entry point called from watch_loop
- _start_round(): pending→running, posts debater mention comment, phase→awaiting_debaters
- _advance_awaiting_debaters(): polls for replies, handles timeout with partial evidence,
  posts judge comment, phase→awaiting_judge
- _advance_awaiting_judge(): polls for verdict; RACE FIX — update_issue_status() called
  BEFORE queue.update_status("done") so poll_once can never double-enqueue
- Detection: primary=author_id match, fallback=[{name} response]: content marker (enables tests)
- Restart-safe: phase field persisted on every mutation; in-flight rounds resume correctly

Extended src/coordinator/queue.py:
- Round gains phase, phase_entered_at, coordinator_comment_id, judge_comment_id fields
- DebateQueue.update_phase() and running() added
- All new fields default-empty so existing queue.json files load cleanly

Extended src/coordinator/multica_client.py:
- update_issue_status() convenience wrapper
- create_issue() for integration / smoke tests

Updated src/coordinator/__main__.py:
- _orchestrate_pending stub replaced with real import from orchestrator

Tests:
- tests/test_orchestrator.py: 32 new unit tests covering phase transitions, timeouts,
  race fix ordering, restart resume, full lifecycle
- tests/test_integration.py: @pytest.mark.integration test against real API
- smoke_test.py: standalone end-to-end script; ran against real API, verdict OK

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-04-15 21:43:17 +00:00

m-platform-admin

6da434039c

WYL-42: Python skeleton + in_review watcher loop

Minimum viable structure for the Tool-MAD coordinator:
- coordinator.config: env-loaded Config dataclass, writes state to ~/.coordinator/
- coordinator.multica_client: thin requests wrapper for issues/comments/agents
- coordinator.state: flat-json SeenState tracking issue_id -> last_seen_updated_at
- coordinator.__main__: watch_loop() that polls in_review and logs candidates
- README.md: why this exists + how to run

v0 only detects in_review transitions; convening debate rounds is WYL-45.
Dependencies: stdlib + requests (nothing else until a working v1 ships).

2026-04-15 23:04:06 +02:00

3 Commits