0e44846032
New module: src/coordinator/orchestrator.py
- DEBATER_NAMES, JUDGE_NAME, DEBATER_PROMPTS, JUDGE_PROMPT_TEMPLATE hardcoded for v1
- Per-debater prompts tell each debater exactly which tool output to ground evidence in
- orchestrate_pending() is the main entry point called from watch_loop
- _start_round(): pending→running, posts debater mention comment, phase→awaiting_debaters
- _advance_awaiting_debaters(): polls for replies, handles timeout with partial evidence,
posts judge comment, phase→awaiting_judge
- _advance_awaiting_judge(): polls for verdict; RACE FIX — update_issue_status() called
BEFORE queue.update_status("done") so poll_once can never double-enqueue
- Detection: primary=author_id match, fallback=[{name} response]: content marker (enables tests)
- Restart-safe: phase field persisted on every mutation; in-flight rounds resume correctly
Extended src/coordinator/queue.py:
- Round gains phase, phase_entered_at, coordinator_comment_id, judge_comment_id fields
- DebateQueue.update_phase() and running() added
- All new fields default-empty so existing queue.json files load cleanly
Extended src/coordinator/multica_client.py:
- update_issue_status() convenience wrapper
- create_issue() for integration / smoke tests
Updated src/coordinator/__main__.py:
- _orchestrate_pending stub replaced with real import from orchestrator
Tests:
- tests/test_orchestrator.py: 32 new unit tests covering phase transitions, timeouts,
race fix ordering, restart resume, full lifecycle
- tests/test_integration.py: @pytest.mark.integration test against real API
- smoke_test.py: standalone end-to-end script; ran against real API, verdict OK
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
27 lines
779 B
TOML
27 lines
779 B
TOML
[project]
|
|
name = "coordinator"
|
|
version = "0.1.0"
|
|
description = "Tool-MAD middle-management layer for multica. Watches for in_review transitions, convenes debate rounds, and actions judge verdicts."
|
|
readme = "README.md"
|
|
requires-python = ">=3.11"
|
|
dependencies = [
|
|
# v1 intentionally depends only on stdlib + requests. No async, no frameworks.
|
|
# If this list grows past 3 items before a working v1 is shipped, something is wrong.
|
|
"requests>=2.32",
|
|
]
|
|
|
|
[project.scripts]
|
|
coordinator = "coordinator.__main__:main"
|
|
|
|
[build-system]
|
|
requires = ["setuptools>=68"]
|
|
build-backend = "setuptools.build_meta"
|
|
|
|
[tool.setuptools.packages.find]
|
|
where = ["src"]
|
|
|
|
[tool.pytest.ini_options]
|
|
markers = [
|
|
"integration: mark test as integration test (requires real API credentials)",
|
|
]
|