WYL-42: Python skeleton + in_review watcher loop
Minimum viable structure for the Tool-MAD coordinator: - coordinator.config: env-loaded Config dataclass, writes state to ~/.coordinator/ - coordinator.multica_client: thin requests wrapper for issues/comments/agents - coordinator.state: flat-json SeenState tracking issue_id -> last_seen_updated_at - coordinator.__main__: watch_loop() that polls in_review and logs candidates - README.md: why this exists + how to run v0 only detects in_review transitions; convening debate rounds is WYL-45. Dependencies: stdlib + requests (nothing else until a working v1 ships).
This commit is contained in:
@@ -0,0 +1,8 @@
|
|||||||
|
__pycache__/
|
||||||
|
*.py[cod]
|
||||||
|
*.egg-info/
|
||||||
|
.venv/
|
||||||
|
venv/
|
||||||
|
build/
|
||||||
|
dist/
|
||||||
|
.coordinator/
|
||||||
@@ -1,3 +1,48 @@
|
|||||||
# coordinator
|
# coordinator
|
||||||
|
|
||||||
Tool-MAD middle-management layer for multica: watches in_review transitions, convenes debate rounds, actions verdicts.
|
Tool-MAD middle-management layer for multica.
|
||||||
|
|
||||||
|
## Why this exists
|
||||||
|
|
||||||
|
Multica's agents produce work and self-set issue status to `in_review`. Nothing in multica ever looks at that state — it's a dead end. Reviewers, when mentioned manually, tend to catch surface issues (length, structure) and miss semantic ones (scope drift, fabricated experiments, answering the wrong question).
|
||||||
|
|
||||||
|
This daemon is the missing middle-management layer. When an issue transitions to `in_review`, it:
|
||||||
|
|
||||||
|
1. Convenes a **debate round**: posts one comment on the issue mentioning a fixed set of debaters, each given a role-specific evidence-gathering prompt (e.g. Senior Developer greps the committed code for LLM API calls; Code Reviewer diffs the described method against the source; Project Manager Senior checks scope satisfaction against the original description).
|
||||||
|
2. **Waits** for every debater to reply (up to a timeout).
|
||||||
|
3. Posts a second comment mentioning the **judge** (Reality Checker), with the assembled debater transcript + the original issue body, and a hardcoded decision rule.
|
||||||
|
4. **Parses** the judge's structured verdict (`VERDICT: ACCEPT` or `VERDICT: REJECT\n- R<n>: <failure>`) and acts:
|
||||||
|
- `ACCEPT` → PUT status `done`, post acceptance summary
|
||||||
|
- `REJECT` → PUT status `in_progress`, post rejection listing every failure, re-trigger the original assignee
|
||||||
|
|
||||||
|
The pattern is **Tool-MAD (Multi-Agent Debate with heterogeneous tool augmentation)**. Reference: [arxiv 2601.04742](https://arxiv.org/html/2601.04742v1).
|
||||||
|
|
||||||
|
## Why debaters catch what a single judge misses
|
||||||
|
|
||||||
|
Each debater runs on a different agent (different runtime, different tool access, different role prompt) and is forced to ground its argument in a specific tool's output — not in its own recollection of the text. Example: if the question is "does this paper describe real LLM agent experiments", Senior Developer's grep for `anthropic|openai|ollama|model=` in the committed code either returns hits or it doesn't — that's a hard fact, not an opinion. The judge reads 4 grounded arguments and applies the decision rule "any debater reporting evidence of scope drift ⇒ REJECT."
|
||||||
|
|
||||||
|
A naive LLM-as-a-judge reviewer reads the paper and scores it on surface dimensions. An agent-as-judge driven by debaters catches the underlying substitution. See WYL-41 for the live failure case that motivated this build.
|
||||||
|
|
||||||
|
## Run
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pip install -e .
|
||||||
|
cat > ~/.coordinator/env <<'EOF'
|
||||||
|
COORDINATOR_SERVER_URL=http://localhost:8089
|
||||||
|
COORDINATOR_WORKSPACE_ID=<wid>
|
||||||
|
COORDINATOR_TOKEN=<coordinator-member-pat>
|
||||||
|
EOF
|
||||||
|
chmod 600 ~/.coordinator/env
|
||||||
|
coordinator
|
||||||
|
```
|
||||||
|
|
||||||
|
Logs go to stderr and to `~/.coordinator/coordinator.log`. State file is `~/.coordinator/seen.json`.
|
||||||
|
|
||||||
|
## Status
|
||||||
|
|
||||||
|
- [x] WYL-42 skeleton + watcher (this commit)
|
||||||
|
- [ ] WYL-43 dedicated admin PAT
|
||||||
|
- [ ] WYL-44 hook watcher to real round trigger
|
||||||
|
- [ ] WYL-45 debate round orchestration
|
||||||
|
- [ ] WYL-46 verdict parser + action executor
|
||||||
|
- [ ] WYL-47 dry-run against WYL-41
|
||||||
|
|||||||
@@ -0,0 +1,21 @@
|
|||||||
|
[project]
|
||||||
|
name = "coordinator"
|
||||||
|
version = "0.1.0"
|
||||||
|
description = "Tool-MAD middle-management layer for multica. Watches for in_review transitions, convenes debate rounds, and actions judge verdicts."
|
||||||
|
readme = "README.md"
|
||||||
|
requires-python = ">=3.11"
|
||||||
|
dependencies = [
|
||||||
|
# v1 intentionally depends only on stdlib + requests. No async, no frameworks.
|
||||||
|
# If this list grows past 3 items before a working v1 is shipped, something is wrong.
|
||||||
|
"requests>=2.32",
|
||||||
|
]
|
||||||
|
|
||||||
|
[project.scripts]
|
||||||
|
coordinator = "coordinator.__main__:main"
|
||||||
|
|
||||||
|
[build-system]
|
||||||
|
requires = ["setuptools>=68"]
|
||||||
|
build-backend = "setuptools.build_meta"
|
||||||
|
|
||||||
|
[tool.setuptools.packages.find]
|
||||||
|
where = ["src"]
|
||||||
@@ -0,0 +1,3 @@
|
|||||||
|
"""Tool-MAD middle-management layer for multica."""
|
||||||
|
|
||||||
|
__version__ = "0.1.0"
|
||||||
@@ -0,0 +1,74 @@
|
|||||||
|
"""Coordinator daemon entrypoint.
|
||||||
|
|
||||||
|
For v0 this only does the watcher loop — debate orchestration and verdict handling
|
||||||
|
land in WYL-45 and WYL-46. Running it now will just tail the `in_review` list and
|
||||||
|
log what would have triggered a round.
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import logging
|
||||||
|
import sys
|
||||||
|
import time
|
||||||
|
|
||||||
|
from coordinator.config import Config
|
||||||
|
from coordinator.multica_client import MulticaClient
|
||||||
|
from coordinator.state import SeenState
|
||||||
|
|
||||||
|
|
||||||
|
def _setup_logging(log_file) -> logging.Logger:
|
||||||
|
logger = logging.getLogger("coordinator")
|
||||||
|
logger.setLevel(logging.INFO)
|
||||||
|
fmt = logging.Formatter("%(asctime)s %(levelname)s %(message)s", "%Y-%m-%dT%H:%M:%S")
|
||||||
|
stderr = logging.StreamHandler(sys.stderr)
|
||||||
|
stderr.setFormatter(fmt)
|
||||||
|
logger.addHandler(stderr)
|
||||||
|
try:
|
||||||
|
fh = logging.FileHandler(log_file)
|
||||||
|
fh.setFormatter(fmt)
|
||||||
|
logger.addHandler(fh)
|
||||||
|
except OSError as e:
|
||||||
|
logger.warning("could not open log file %s: %s", log_file, e)
|
||||||
|
return logger
|
||||||
|
|
||||||
|
|
||||||
|
def watch_loop(cfg: Config, client: MulticaClient, state: SeenState, logger: logging.Logger) -> None:
|
||||||
|
logger.info(
|
||||||
|
"coordinator starting server=%s workspace=%s interval=%ds",
|
||||||
|
cfg.server_url, cfg.workspace_id, cfg.poll_interval_s,
|
||||||
|
)
|
||||||
|
while True:
|
||||||
|
try:
|
||||||
|
issues = client.list_issues_by_status("in_review")
|
||||||
|
except Exception as e:
|
||||||
|
logger.error("poll failed: %s", e)
|
||||||
|
time.sleep(cfg.poll_interval_s)
|
||||||
|
continue
|
||||||
|
|
||||||
|
for issue in issues:
|
||||||
|
iid = issue["id"]
|
||||||
|
updated = issue.get("updated_at", "")
|
||||||
|
if state.last_seen(iid) == updated:
|
||||||
|
continue
|
||||||
|
logger.info(
|
||||||
|
"candidate issue %s status=in_review updated_at=%s title=%r",
|
||||||
|
issue.get("identifier", iid), updated, issue.get("title", ""),
|
||||||
|
)
|
||||||
|
# v0: only mark-and-log. Round convening is WYL-45.
|
||||||
|
state.mark_seen(iid, updated)
|
||||||
|
|
||||||
|
time.sleep(cfg.poll_interval_s)
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> None:
|
||||||
|
cfg = Config.from_env()
|
||||||
|
logger = _setup_logging(cfg.log_file)
|
||||||
|
client = MulticaClient(cfg.server_url, cfg.workspace_id, cfg.token)
|
||||||
|
state = SeenState.load(cfg.seen_file)
|
||||||
|
try:
|
||||||
|
watch_loop(cfg, client, state, logger)
|
||||||
|
except KeyboardInterrupt:
|
||||||
|
logger.info("shutting down")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
@@ -0,0 +1,65 @@
|
|||||||
|
"""Configuration loading for the coordinator daemon.
|
||||||
|
|
||||||
|
Everything comes from environment variables (loaded from ~/.coordinator/env at
|
||||||
|
startup if present). No flags, no config files. Keep it boring.
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import os
|
||||||
|
from dataclasses import dataclass
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
|
||||||
|
ENV_FILE = Path.home() / ".coordinator" / "env"
|
||||||
|
STATE_DIR = Path.home() / ".coordinator"
|
||||||
|
|
||||||
|
|
||||||
|
def load_env_file(path: Path = ENV_FILE) -> None:
|
||||||
|
"""Populate os.environ from a simple KEY=VALUE file. Silent if missing."""
|
||||||
|
if not path.is_file():
|
||||||
|
return
|
||||||
|
for line in path.read_text().splitlines():
|
||||||
|
line = line.strip()
|
||||||
|
if not line or line.startswith("#"):
|
||||||
|
continue
|
||||||
|
if "=" not in line:
|
||||||
|
continue
|
||||||
|
key, _, value = line.partition("=")
|
||||||
|
key = key.strip()
|
||||||
|
value = value.strip().strip('"').strip("'")
|
||||||
|
if key and key not in os.environ:
|
||||||
|
os.environ[key] = value
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class Config:
|
||||||
|
server_url: str
|
||||||
|
workspace_id: str
|
||||||
|
token: str
|
||||||
|
poll_interval_s: int
|
||||||
|
round_timeout_s: int
|
||||||
|
max_concurrent_rounds: int
|
||||||
|
seen_file: Path
|
||||||
|
log_file: Path
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def from_env(cls) -> "Config":
|
||||||
|
load_env_file()
|
||||||
|
required = ("COORDINATOR_SERVER_URL", "COORDINATOR_WORKSPACE_ID", "COORDINATOR_TOKEN")
|
||||||
|
missing = [k for k in required if not os.environ.get(k)]
|
||||||
|
if missing:
|
||||||
|
raise SystemExit(
|
||||||
|
f"coordinator: missing required env vars: {', '.join(missing)}\n"
|
||||||
|
f"put them in {ENV_FILE} or export them before running"
|
||||||
|
)
|
||||||
|
STATE_DIR.mkdir(parents=True, exist_ok=True)
|
||||||
|
return cls(
|
||||||
|
server_url=os.environ["COORDINATOR_SERVER_URL"].rstrip("/"),
|
||||||
|
workspace_id=os.environ["COORDINATOR_WORKSPACE_ID"],
|
||||||
|
token=os.environ["COORDINATOR_TOKEN"],
|
||||||
|
poll_interval_s=int(os.environ.get("COORDINATOR_POLL_INTERVAL_S", "30")),
|
||||||
|
round_timeout_s=int(os.environ.get("COORDINATOR_ROUND_TIMEOUT_S", "600")),
|
||||||
|
max_concurrent_rounds=int(os.environ.get("COORDINATOR_MAX_CONCURRENT_ROUNDS", "3")),
|
||||||
|
seen_file=STATE_DIR / "seen.json",
|
||||||
|
log_file=STATE_DIR / "coordinator.log",
|
||||||
|
)
|
||||||
@@ -0,0 +1,100 @@
|
|||||||
|
"""Thin REST wrapper for multica's API. Only the endpoints the coordinator needs."""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from typing import Any, Iterable
|
||||||
|
|
||||||
|
import requests
|
||||||
|
|
||||||
|
|
||||||
|
class MulticaClient:
|
||||||
|
def __init__(self, server_url: str, workspace_id: str, token: str, timeout: float = 15.0):
|
||||||
|
self.server_url = server_url.rstrip("/")
|
||||||
|
self.workspace_id = workspace_id
|
||||||
|
self.timeout = timeout
|
||||||
|
self._session = requests.Session()
|
||||||
|
self._session.headers.update(
|
||||||
|
{
|
||||||
|
"Authorization": f"Bearer {token}",
|
||||||
|
"Content-Type": "application/json",
|
||||||
|
"Accept": "application/json",
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
# ---- issues ------------------------------------------------------------
|
||||||
|
|
||||||
|
def list_issues_by_status(self, status: str, limit: int = 200) -> list[dict[str, Any]]:
|
||||||
|
"""Return all issues in the workspace matching a given status."""
|
||||||
|
params = {"workspace_id": self.workspace_id, "status": status, "limit": str(limit)}
|
||||||
|
r = self._session.get(f"{self.server_url}/api/issues", params=params, timeout=self.timeout)
|
||||||
|
r.raise_for_status()
|
||||||
|
payload = r.json()
|
||||||
|
if isinstance(payload, dict):
|
||||||
|
return payload.get("issues", [])
|
||||||
|
return payload # some versions return a bare list
|
||||||
|
|
||||||
|
def get_issue(self, issue_id: str) -> dict[str, Any]:
|
||||||
|
params = {"workspace_id": self.workspace_id}
|
||||||
|
r = self._session.get(
|
||||||
|
f"{self.server_url}/api/issues/{issue_id}", params=params, timeout=self.timeout
|
||||||
|
)
|
||||||
|
r.raise_for_status()
|
||||||
|
return r.json()
|
||||||
|
|
||||||
|
def update_issue(self, issue_id: str, **fields: Any) -> dict[str, Any]:
|
||||||
|
params = {"workspace_id": self.workspace_id}
|
||||||
|
r = self._session.put(
|
||||||
|
f"{self.server_url}/api/issues/{issue_id}",
|
||||||
|
params=params,
|
||||||
|
json=fields,
|
||||||
|
timeout=self.timeout,
|
||||||
|
)
|
||||||
|
r.raise_for_status()
|
||||||
|
return r.json()
|
||||||
|
|
||||||
|
# ---- comments ----------------------------------------------------------
|
||||||
|
|
||||||
|
def list_comments(self, issue_id: str) -> list[dict[str, Any]]:
|
||||||
|
params = {"workspace_id": self.workspace_id}
|
||||||
|
r = self._session.get(
|
||||||
|
f"{self.server_url}/api/issues/{issue_id}/comments",
|
||||||
|
params=params,
|
||||||
|
timeout=self.timeout,
|
||||||
|
)
|
||||||
|
r.raise_for_status()
|
||||||
|
payload = r.json()
|
||||||
|
if isinstance(payload, list):
|
||||||
|
return payload
|
||||||
|
return payload.get("comments", [])
|
||||||
|
|
||||||
|
def post_comment(self, issue_id: str, content: str) -> dict[str, Any]:
|
||||||
|
params = {"workspace_id": self.workspace_id}
|
||||||
|
r = self._session.post(
|
||||||
|
f"{self.server_url}/api/issues/{issue_id}/comments",
|
||||||
|
params=params,
|
||||||
|
json={"content": content},
|
||||||
|
timeout=self.timeout,
|
||||||
|
)
|
||||||
|
r.raise_for_status()
|
||||||
|
return r.json()
|
||||||
|
|
||||||
|
# ---- agents ------------------------------------------------------------
|
||||||
|
|
||||||
|
def list_agents(self) -> list[dict[str, Any]]:
|
||||||
|
params = {"workspace_id": self.workspace_id}
|
||||||
|
r = self._session.get(
|
||||||
|
f"{self.server_url}/api/agents", params=params, timeout=self.timeout
|
||||||
|
)
|
||||||
|
r.raise_for_status()
|
||||||
|
payload = r.json()
|
||||||
|
if isinstance(payload, list):
|
||||||
|
return payload
|
||||||
|
return payload.get("agents", [])
|
||||||
|
|
||||||
|
def find_agents_by_name(self, names: Iterable[str]) -> dict[str, str]:
|
||||||
|
"""Return {name: agent_id} for every agent in `names` present in the workspace."""
|
||||||
|
wanted = set(names)
|
||||||
|
found: dict[str, str] = {}
|
||||||
|
for agent in self.list_agents():
|
||||||
|
if agent.get("name") in wanted:
|
||||||
|
found[agent["name"]] = agent["id"]
|
||||||
|
return found
|
||||||
@@ -0,0 +1,44 @@
|
|||||||
|
"""Flat-file state tracking. Dumb JSON, human-readable, debuggable."""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import json
|
||||||
|
from dataclasses import dataclass, field
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class SeenState:
|
||||||
|
"""Tracks which issues the coordinator has already acted on.
|
||||||
|
|
||||||
|
Schema:
|
||||||
|
issues: { issue_id: { last_seen_updated_at: iso8601, round_id: str|null } }
|
||||||
|
"""
|
||||||
|
|
||||||
|
path: Path
|
||||||
|
issues: dict[str, dict[str, Any]] = field(default_factory=dict)
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def load(cls, path: Path) -> "SeenState":
|
||||||
|
if not path.exists():
|
||||||
|
return cls(path=path)
|
||||||
|
try:
|
||||||
|
data = json.loads(path.read_text())
|
||||||
|
except (json.JSONDecodeError, OSError):
|
||||||
|
return cls(path=path)
|
||||||
|
return cls(path=path, issues=data.get("issues", {}))
|
||||||
|
|
||||||
|
def save(self) -> None:
|
||||||
|
self.path.parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
self.path.write_text(json.dumps({"issues": self.issues}, indent=2, sort_keys=True))
|
||||||
|
|
||||||
|
def last_seen(self, issue_id: str) -> str | None:
|
||||||
|
entry = self.issues.get(issue_id)
|
||||||
|
return entry.get("last_seen_updated_at") if entry else None
|
||||||
|
|
||||||
|
def mark_seen(self, issue_id: str, updated_at: str, round_id: str | None = None) -> None:
|
||||||
|
self.issues[issue_id] = {
|
||||||
|
"last_seen_updated_at": updated_at,
|
||||||
|
"round_id": round_id,
|
||||||
|
}
|
||||||
|
self.save()
|
||||||
Reference in New Issue
Block a user