WYL-42: Python skeleton + in_review watcher loop

Minimum viable structure for the Tool-MAD coordinator: - coordinator.config: env-loaded Config dataclass, writes state to ~/.coordinator/ - coordinator.multica_client: thin requests wrapper for issues/comments/agents - coordinator.state: flat-json SeenState tracking issue_id -> last_seen_updated_at - coordinator.__main__: watch_loop() that polls in_review and logs candidates - README.md: why this exists + how to run v0 only detects in_review transitions; convening debate rounds is WYL-45. Dependencies: stdlib + requests (nothing else until a working v1 ships).
2026-04-15 23:04:06 +02:00
parent 3935102d96
commit 6da434039c
8 changed files with 361 additions and 1 deletions
@@ -0,0 +1,8 @@
 __pycache__/
 *.py[cod]
 *.egg-info/
 .venv/
 venv/
 build/
 dist/
 .coordinator/
@@ -1,3 +1,48 @@
 # coordinator
-Tool-MAD middle-management layer for multica: watches in_review transitions, convenes debate rounds, actions verdicts.
+Tool-MAD middle-management layer for multica.
 ## Why this exists
 Multica's agents produce work and self-set issue status to `in_review`. Nothing in multica ever looks at that state — it's a dead end. Reviewers, when mentioned manually, tend to catch surface issues (length, structure) and miss semantic ones (scope drift, fabricated experiments, answering the wrong question).
 This daemon is the missing middle-management layer. When an issue transitions to `in_review`, it:
 1. Convenes a **debate round**: posts one comment on the issue mentioning a fixed set of debaters, each given a role-specific evidence-gathering prompt (e.g. Senior Developer greps the committed code for LLM API calls; Code Reviewer diffs the described method against the source; Project Manager Senior checks scope satisfaction against the original description).
 2. **Waits** for every debater to reply (up to a timeout).
 3. Posts a second comment mentioning the **judge** (Reality Checker), with the assembled debater transcript + the original issue body, and a hardcoded decision rule.
 4. **Parses** the judge's structured verdict (`VERDICT: ACCEPT` or `VERDICT: REJECT\n- R<n>: <failure>`) and acts:
   - `ACCEPT` → PUT status `done`, post acceptance summary
   - `REJECT` → PUT status `in_progress`, post rejection listing every failure, re-trigger the original assignee
 The pattern is **Tool-MAD (Multi-Agent Debate with heterogeneous tool augmentation)**. Reference: [arxiv 2601.04742](https://arxiv.org/html/2601.04742v1).
 ## Why debaters catch what a single judge misses
 Each debater runs on a different agent (different runtime, different tool access, different role prompt) and is forced to ground its argument in a specific tool's output — not in its own recollection of the text. Example: if the question is "does this paper describe real LLM agent experiments", Senior Developer's grep for `anthropic|openai|ollama|model=` in the committed code either returns hits or it doesn't — that's a hard fact, not an opinion. The judge reads 4 grounded arguments and applies the decision rule "any debater reporting evidence of scope drift ⇒ REJECT."
 A naive LLM-as-a-judge reviewer reads the paper and scores it on surface dimensions. An agent-as-judge driven by debaters catches the underlying substitution. See WYL-41 for the live failure case that motivated this build.
 ## Run
 ```bash
 pip install -e .
 cat > ~/.coordinator/env <<'EOF'
 COORDINATOR_SERVER_URL=http://localhost:8089
 COORDINATOR_WORKSPACE_ID=<wid>
 COORDINATOR_TOKEN=<coordinator-member-pat>
 EOF
 chmod 600 ~/.coordinator/env
 coordinator
 ```
 Logs go to stderr and to `~/.coordinator/coordinator.log`. State file is `~/.coordinator/seen.json`.
 ## Status
 - [x] WYL-42 skeleton + watcher (this commit)
 - [ ] WYL-43 dedicated admin PAT
 - [ ] WYL-44 hook watcher to real round trigger
 - [ ] WYL-45 debate round orchestration
 - [ ] WYL-46 verdict parser + action executor
 - [ ] WYL-47 dry-run against WYL-41
@@ -0,0 +1,21 @@
 [project]
 name = "coordinator"
 version = "0.1.0"
 description = "Tool-MAD middle-management layer for multica. Watches for in_review transitions, convenes debate rounds, and actions judge verdicts."
 readme = "README.md"
 requires-python = ">=3.11"
 dependencies = [
    # v1 intentionally depends only on stdlib + requests. No async, no frameworks.
    # If this list grows past 3 items before a working v1 is shipped, something is wrong.
    "requests>=2.32",
 ]
 [project.scripts]
 coordinator = "coordinator.__main__:main"
 [build-system]
 requires = ["setuptools>=68"]
 build-backend = "setuptools.build_meta"
 [tool.setuptools.packages.find]
 where = ["src"]
@@ -0,0 +1,3 @@
 """Tool-MAD middle-management layer for multica."""
 __version__ = "0.1.0"
@@ -0,0 +1,74 @@
 """Coordinator daemon entrypoint.
 For v0 this only does the watcher loop — debate orchestration and verdict handling
 land in WYL-45 and WYL-46. Running it now will just tail the `in_review` list and
 log what would have triggered a round.
 """
 from __future__ import annotations
 import logging
 import sys
 import time
 from coordinator.config import Config
 from coordinator.multica_client import MulticaClient
 from coordinator.state import SeenState
 def _setup_logging(log_file) -> logging.Logger:
    logger = logging.getLogger("coordinator")
    logger.setLevel(logging.INFO)
    fmt = logging.Formatter("%(asctime)s %(levelname)s %(message)s", "%Y-%m-%dT%H:%M:%S")
    stderr = logging.StreamHandler(sys.stderr)
    stderr.setFormatter(fmt)
    logger.addHandler(stderr)
    try:
        fh = logging.FileHandler(log_file)
        fh.setFormatter(fmt)
        logger.addHandler(fh)
    except OSError as e:
        logger.warning("could not open log file %s: %s", log_file, e)
    return logger
 def watch_loop(cfg: Config, client: MulticaClient, state: SeenState, logger: logging.Logger) -> None:
    logger.info(
        "coordinator starting server=%s workspace=%s interval=%ds",
        cfg.server_url, cfg.workspace_id, cfg.poll_interval_s,
    )
    while True:
        try:
            issues = client.list_issues_by_status("in_review")
        except Exception as e:
            logger.error("poll failed: %s", e)
            time.sleep(cfg.poll_interval_s)
            continue
        for issue in issues:
            iid = issue["id"]
            updated = issue.get("updated_at", "")
            if state.last_seen(iid) == updated:
                continue
            logger.info(
                "candidate issue %s status=in_review updated_at=%s title=%r",
                issue.get("identifier", iid), updated, issue.get("title", ""),
            )
            # v0: only mark-and-log. Round convening is WYL-45.
            state.mark_seen(iid, updated)
        time.sleep(cfg.poll_interval_s)
 def main() -> None:
    cfg = Config.from_env()
    logger = _setup_logging(cfg.log_file)
    client = MulticaClient(cfg.server_url, cfg.workspace_id, cfg.token)
    state = SeenState.load(cfg.seen_file)
    try:
        watch_loop(cfg, client, state, logger)
    except KeyboardInterrupt:
        logger.info("shutting down")
 if __name__ == "__main__":
    main()
@@ -0,0 +1,65 @@
 """Configuration loading for the coordinator daemon.
 Everything comes from environment variables (loaded from ~/.coordinator/env at
 startup if present). No flags, no config files. Keep it boring.
 """
 from __future__ import annotations
 import os
 from dataclasses import dataclass
 from pathlib import Path
 ENV_FILE = Path.home() / ".coordinator" / "env"
 STATE_DIR = Path.home() / ".coordinator"
 def load_env_file(path: Path = ENV_FILE) -> None:
    """Populate os.environ from a simple KEY=VALUE file. Silent if missing."""
    if not path.is_file():
        return
    for line in path.read_text().splitlines():
        line = line.strip()
        if not line or line.startswith("#"):
            continue
        if "=" not in line:
            continue
        key, _, value = line.partition("=")
        key = key.strip()
        value = value.strip().strip('"').strip("'")
        if key and key not in os.environ:
            os.environ[key] = value
@dataclass(frozen=True)
 class Config:
    server_url: str
    workspace_id: str
    token: str
    poll_interval_s: int
    round_timeout_s: int
    max_concurrent_rounds: int
    seen_file: Path
    log_file: Path
    @classmethod
    def from_env(cls) -> "Config":
        load_env_file()
        required = ("COORDINATOR_SERVER_URL", "COORDINATOR_WORKSPACE_ID", "COORDINATOR_TOKEN")
        missing = [k for k in required if not os.environ.get(k)]
        if missing:
            raise SystemExit(
                f"coordinator: missing required env vars: {', '.join(missing)}\n"
                f"put them in {ENV_FILE} or export them before running"
            )
        STATE_DIR.mkdir(parents=True, exist_ok=True)
        return cls(
            server_url=os.environ["COORDINATOR_SERVER_URL"].rstrip("/"),
            workspace_id=os.environ["COORDINATOR_WORKSPACE_ID"],
            token=os.environ["COORDINATOR_TOKEN"],
            poll_interval_s=int(os.environ.get("COORDINATOR_POLL_INTERVAL_S", "30")),
            round_timeout_s=int(os.environ.get("COORDINATOR_ROUND_TIMEOUT_S", "600")),
            max_concurrent_rounds=int(os.environ.get("COORDINATOR_MAX_CONCURRENT_ROUNDS", "3")),
            seen_file=STATE_DIR / "seen.json",
            log_file=STATE_DIR / "coordinator.log",
        )
@@ -0,0 +1,100 @@
 """Thin REST wrapper for multica's API. Only the endpoints the coordinator needs."""
 from __future__ import annotations
 from typing import Any, Iterable
 import requests
 class MulticaClient:
    def __init__(self, server_url: str, workspace_id: str, token: str, timeout: float = 15.0):
        self.server_url = server_url.rstrip("/")
        self.workspace_id = workspace_id
        self.timeout = timeout
        self._session = requests.Session()
        self._session.headers.update(
            {
                "Authorization": f"Bearer {token}",
                "Content-Type": "application/json",
                "Accept": "application/json",
            }
        )
    # ---- issues ------------------------------------------------------------
    def list_issues_by_status(self, status: str, limit: int = 200) -> list[dict[str, Any]]:
        """Return all issues in the workspace matching a given status."""
        params = {"workspace_id": self.workspace_id, "status": status, "limit": str(limit)}
        r = self._session.get(f"{self.server_url}/api/issues", params=params, timeout=self.timeout)
        r.raise_for_status()
        payload = r.json()
        if isinstance(payload, dict):
            return payload.get("issues", [])
        return payload  # some versions return a bare list
    def get_issue(self, issue_id: str) -> dict[str, Any]:
        params = {"workspace_id": self.workspace_id}
        r = self._session.get(
            f"{self.server_url}/api/issues/{issue_id}", params=params, timeout=self.timeout
        )
        r.raise_for_status()
        return r.json()
    def update_issue(self, issue_id: str, **fields: Any) -> dict[str, Any]:
        params = {"workspace_id": self.workspace_id}
        r = self._session.put(
            f"{self.server_url}/api/issues/{issue_id}",
            params=params,
            json=fields,
            timeout=self.timeout,
        )
        r.raise_for_status()
        return r.json()
    # ---- comments ----------------------------------------------------------
    def list_comments(self, issue_id: str) -> list[dict[str, Any]]:
        params = {"workspace_id": self.workspace_id}
        r = self._session.get(
            f"{self.server_url}/api/issues/{issue_id}/comments",
            params=params,
            timeout=self.timeout,
        )
        r.raise_for_status()
        payload = r.json()
        if isinstance(payload, list):
            return payload
        return payload.get("comments", [])
    def post_comment(self, issue_id: str, content: str) -> dict[str, Any]:
        params = {"workspace_id": self.workspace_id}
        r = self._session.post(
            f"{self.server_url}/api/issues/{issue_id}/comments",
            params=params,
            json={"content": content},
            timeout=self.timeout,
        )
        r.raise_for_status()
        return r.json()
    # ---- agents ------------------------------------------------------------
    def list_agents(self) -> list[dict[str, Any]]:
        params = {"workspace_id": self.workspace_id}
        r = self._session.get(
            f"{self.server_url}/api/agents", params=params, timeout=self.timeout
        )
        r.raise_for_status()
        payload = r.json()
        if isinstance(payload, list):
            return payload
        return payload.get("agents", [])
    def find_agents_by_name(self, names: Iterable[str]) -> dict[str, str]:
        """Return {name: agent_id} for every agent in `names` present in the workspace."""
        wanted = set(names)
        found: dict[str, str] = {}
        for agent in self.list_agents():
            if agent.get("name") in wanted:
                found[agent["name"]] = agent["id"]
        return found
@@ -0,0 +1,44 @@
 """Flat-file state tracking. Dumb JSON, human-readable, debuggable."""
 from __future__ import annotations
 import json
 from dataclasses import dataclass, field
 from pathlib import Path
 from typing import Any
@dataclass
 class SeenState:
    """Tracks which issues the coordinator has already acted on.
    Schema:
        issues: { issue_id: { last_seen_updated_at: iso8601, round_id: str|null } }
    """
    path: Path
    issues: dict[str, dict[str, Any]] = field(default_factory=dict)
    @classmethod
    def load(cls, path: Path) -> "SeenState":
        if not path.exists():
            return cls(path=path)
        try:
            data = json.loads(path.read_text())
        except (json.JSONDecodeError, OSError):
            return cls(path=path)
        return cls(path=path, issues=data.get("issues", {}))
    def save(self) -> None:
        self.path.parent.mkdir(parents=True, exist_ok=True)
        self.path.write_text(json.dumps({"issues": self.issues}, indent=2, sort_keys=True))
    def last_seen(self, issue_id: str) -> str | None:
        entry = self.issues.get(issue_id)
        return entry.get("last_seen_updated_at") if entry else None
    def mark_seen(self, issue_id: str, updated_at: str, round_id: str | None = None) -> None:
        self.issues[issue_id] = {
            "last_seen_updated_at": updated_at,
            "round_id": round_id,
        }
        self.save()
		`@@ -0,0 +1,3 @@`
							`"""Tool-MAD middle-management layer for multica."""`

							`__version__ = "0.1.0"`