f88255096e
The prior pipeline (4 hand-written debater prompts + 1 judge with my prompt
template) kept missing scope drift because every prompt was mine and the
reviewers were all on the same model tier with correlated priors.
This commit replaces the whole review step with CEK's judge-with-debate
pattern translated to multica-native execution:
pending → awaiting_rubric (meta-judge writes YAML spec from issue alone)
→ awaiting_judges (3 judges on 3 copilot models score independently)
→ consensus check (overall within 0.5, criteria within 1.0)
→ accept or reject OR awaiting_debate rounds up to 3
→ error on malformed YAML or cap hit
Per higher-management direction, we do not deal with a model that cannot
produce YAML: malformed rubric or all-unparseable judge reports fail the
round immediately (no retries, no fallback to hand-written prompts).
The anchor retrigger on REJECT (WYL-51 behaviour) is preserved verbatim.
Agent prompts for meta-judge and the 3 judges come from the CEK agents
themselves (Meta-Judge / Judge-GPT / Judge-Claude / Judge-Gemini) whose
`instructions` field is the CEK meta-judge.md / judge.md files uploaded
byte-for-byte. No prompts are authored in this coordinator's source.
Adds pyyaml dependency.
- src/coordinator/orchestrator.py: rewritten for the new phase machine
- src/coordinator/queue.py: Round extended with rubric_yaml, judge_report_comment_ids, debate_round
- tests/test_orchestrator.py: 40 tests for new pipeline (helpers, parsers, consensus math, phase handlers, race fix, retrigger)
- tests/test_integration.py: removed (tested old debater pipeline)
- pyproject.toml: adds pyyaml
Tests: 67 passed in 0.20s (40 orchestrator + 15 queue + 7 watcher + 5 other).
28 lines
807 B
TOML
28 lines
807 B
TOML
[project]
|
|
name = "coordinator"
|
|
version = "0.1.0"
|
|
description = "Tool-MAD middle-management layer for multica. Watches for in_review transitions, convenes debate rounds, and actions judge verdicts."
|
|
readme = "README.md"
|
|
requires-python = ">=3.11"
|
|
dependencies = [
|
|
# v1 intentionally depends only on stdlib + requests + pyyaml. No async, no frameworks.
|
|
# If this list grows past 4 items before a working v1 is shipped, something is wrong.
|
|
"requests>=2.32",
|
|
"pyyaml>=6.0",
|
|
]
|
|
|
|
[project.scripts]
|
|
coordinator = "coordinator.__main__:main"
|
|
|
|
[build-system]
|
|
requires = ["setuptools>=68"]
|
|
build-backend = "setuptools.build_meta"
|
|
|
|
[tool.setuptools.packages.find]
|
|
where = ["src"]
|
|
|
|
[tool.pytest.ini_options]
|
|
markers = [
|
|
"integration: mark test as integration test (requires real API credentials)",
|
|
]
|