feat: capture Anthropic rate limit headers for heartbeat throttling

Adds rate limit state capture to _make_request(): - Parses 7 Anthropic rate limit headers from every API response - Writes to memory/rate_limits.json with utilization, reset times, binding limit - Enables HEARTBEAT.md to throttle based on weekly_all_models budget Part of rate limit monitoring implementation plan.
Fix memory consolidation truncation: set max_tokens=16384
2026-02-14 21:52:10 +00:00 · 2026-02-14 18:45:35 +01:00 · 2026-02-14 17:30:48 +01:00 · 2026-02-14 13:25:02 +01:00 · 2026-02-14 11:40:50 +01:00
6 changed files with 55 additions and 7 deletions
--- a/Dockerfile.oauth
+++ b/Dockerfile.oauth
@@ -56,7 +56,7 @@ ENV PATH="/root/.local/bin:${PATH}"

 COPY pyproject.toml README.md LICENSE /app/
 COPY nanobot/ /app/nanobot/
-RUN uv pip install --system --no-cache --reinstall /app
+RUN uv pip install --system --no-cache --reinstall /app psycopg2-binary

 ENTRYPOINT ["nanobot"]
 CMD ["gateway"]
--- a/nanobot/agent/loop.py
+++ b/nanobot/agent/loop.py
@@ -424,7 +424,9 @@ Respond with ONLY valid JSON, no markdown fences."""
                    {"role": "system", "content": "You are a memory consolidation agent. Respond only with valid JSON."},
                    {"role": "user", "content": prompt},
                ],
-                model=self.model,
+                model="claude-haiku-4-5",
+                thinking_budget=0,
+                max_tokens=16384,
            )
            text = (response.content or "").strip()
            if text.startswith("```"):
--- a/nanobot/agent/subagent.py
+++ b/nanobot/agent/subagent.py
@@ -121,7 +121,7 @@ class SubagentManager:
            ]
            
            # Run agent loop (limited iterations)
-            max_iterations = 15
+            max_iterations = 50
            iteration = 0
            final_result: str | None = None
            
--- a/nanobot/providers/anthropic_oauth.py
+++ b/nanobot/providers/anthropic_oauth.py
@@ -225,6 +225,7 @@ class AnthropicOAuthProvider(LLMProvider):
        max_tokens: int = 4096,
        temperature: float = 0.7,
        tools: list[dict[str, Any]] | None = None,
+        thinking_budget_override: int | None = None,
    ) -> dict[str, Any]:
        """Make request to Anthropic API."""
        client = await self._get_client()
@@ -236,14 +237,15 @@ class AnthropicOAuthProvider(LLMProvider):
        }

        # Extended thinking: temperature must be 1 when enabled
-        if self.thinking_budget > 0:
+        effective_thinking = thinking_budget_override if thinking_budget_override is not None else self.thinking_budget
+        if effective_thinking > 0:
            payload["temperature"] = 1
            # max_tokens must exceed budget_tokens
-            if max_tokens <= self.thinking_budget:
-                payload["max_tokens"] = self.thinking_budget + 4096
+            if max_tokens <= effective_thinking:
+                payload["max_tokens"] = effective_thinking + 4096
            payload["thinking"] = {
                "type": "enabled",
-                "budget_tokens": self.thinking_budget,
+                "budget_tokens": effective_thinking,
            }
        else:
            payload["temperature"] = temperature
@@ -266,6 +268,43 @@ class AnthropicOAuthProvider(LLMProvider):
            json=payload,
        )

+        # Dump rate limit headers for analysis
+        try:
+            import datetime, os
+            header_dump = {
+                "timestamp": datetime.datetime.utcnow().isoformat(),
+                "status_code": response.status_code,
+                "model": payload.get("model"),
+                "headers": dict(response.headers),
+            }
+            dump_path = "/root/.nanobot/workspace/api_headers.jsonl"
+            with open(dump_path, "a") as f:
+                f.write(json.dumps(header_dump) + "\n")
+        except Exception:
+            pass
+
+        # Capture rate limit state for heartbeat throttling
+        try:
+            import datetime, os
+            headers = response.headers
+            rate_limit_state = {
+                "updated_at": datetime.datetime.utcnow().isoformat(),
+                "model": payload.get("model"),
+                "weekly_all_models": float(headers.get("anthropic-ratelimit-unified-7d-utilization", 0)) if headers.get("anthropic-ratelimit-unified-7d-utilization") else None,
+                "weekly_sonnet": float(headers.get("anthropic-ratelimit-unified-7d_sonnet-utilization", 0)) if headers.get("anthropic-ratelimit-unified-7d_sonnet-utilization") else None,
+                "session_5h": float(headers.get("anthropic-ratelimit-unified-5h-utilization", 0)) if headers.get("anthropic-ratelimit-unified-5h-utilization") else None,
+                "weekly_reset": int(headers.get("anthropic-ratelimit-unified-7d-reset", 0)) if headers.get("anthropic-ratelimit-unified-7d-reset") else None,
+                "session_reset": int(headers.get("anthropic-ratelimit-unified-5h-reset", 0)) if headers.get("anthropic-ratelimit-unified-5h-reset") else None,
+                "binding_limit": headers.get("anthropic-ratelimit-unified-representative-claim"),
+                "sonnet_fallback": headers.get("anthropic-ratelimit-unified-fallback"),
+            }
+            state_path = "/root/.nanobot/workspace/memory/rate_limits.json"
+            os.makedirs(os.path.dirname(state_path), exist_ok=True)
+            with open(state_path, "w") as f:
+                json.dump(rate_limit_state, f, indent=2)
+        except Exception:
+            pass
+
        if response.status_code != 200:
            error_text = response.text
            raise Exception(f"Anthropic API error {response.status_code}: {error_text}")
@@ -279,6 +318,7 @@ class AnthropicOAuthProvider(LLMProvider):
        model: str | None = None,
        max_tokens: int = 4096,
        temperature: float = 0.7,
+        thinking_budget: int | None = None,
    ) -> LLMResponse:
        """Send chat completion request to Anthropic API."""
        model = model or self.default_model
@@ -293,6 +333,9 @@ class AnthropicOAuthProvider(LLMProvider):
        system, prepared_messages = self._prepare_messages(messages)
        anthropic_tools = self._convert_tools_to_anthropic(tools)

+        # Per-call thinking override (None = use instance default)
+        effective_thinking = self.thinking_budget if thinking_budget is None else thinking_budget
+
        try:
            response = await self._make_request(
                messages=prepared_messages,
@@ -301,6 +344,7 @@ class AnthropicOAuthProvider(LLMProvider):
                max_tokens=max_tokens,
                temperature=temperature,
                tools=anthropic_tools,
+                thinking_budget_override=effective_thinking,
            )
            return self._parse_response(response)
        except Exception as e:
--- a/nanobot/providers/base.py
+++ b/nanobot/providers/base.py
@@ -48,6 +48,7 @@ class LLMProvider(ABC):
        model: str | None = None,
        max_tokens: int = 4096,
        temperature: float = 0.7,
+        thinking_budget: int | None = None,
    ) -> LLMResponse:
        """
        Send a chat completion request.
--- a/nanobot/providers/litellm_provider.py
+++ b/nanobot/providers/litellm_provider.py
@@ -106,6 +106,7 @@ class LiteLLMProvider(LLMProvider):
        model: str | None = None,
        max_tokens: int = 4096,
        temperature: float = 0.7,
+        thinking_budget: int | None = None,
    ) -> LLMResponse:
        """
        Send a chat completion request via LiteLLM.
Author	SHA1	Message	Date
nanobot	1d940cb4f2	feat: capture Anthropic rate limit headers for heartbeat throttling All checks were successful Build Nanobot OAuth / build (pull_request) Successful in 5m40s Details Build Nanobot OAuth / cleanup (pull_request) Has been skipped Details Adds rate limit state capture to _make_request(): - Parses 7 Anthropic rate limit headers from every API response - Writes to memory/rate_limits.json with utilization, reset times, binding limit - Enables HEARTBEAT.md to throttle based on weekly_all_models budget Part of rate limit monitoring implementation plan.	2026-02-14 21:52:10 +00:00
wylab	84268edf01	Fix memory consolidation truncation: set max_tokens=16384 All checks were successful Build Nanobot OAuth / build (push) Successful in 5m40s Details Build Nanobot OAuth / cleanup (push) Successful in 2s Details Consolidation was failing because max_tokens defaulted to 4096, causing Haiku's response to be truncated mid-JSON (finish_reason=max_tokens). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-14 18:45:35 +01:00
wylab	9136cca1ff	Fix memory consolidation timeout: use Haiku without thinking All checks were successful Build Nanobot OAuth / build (push) Successful in 5m51s Details Build Nanobot OAuth / cleanup (push) Successful in 3s Details Root cause: consolidation was calling Opus 4.6 with 10k thinking budget on 50-80 message prompts. The 300s httpx timeout killed every request (all failures were exactly 5 minutes after start). Consolidation is just summarization — Haiku with no thinking handles it in seconds. Also adds per-call thinking_budget override to the provider interface so callers can disable thinking for lightweight tasks. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-14 17:30:48 +01:00
wylab	6035b70ae5	Add psycopg2-binary to Docker image for PostgreSQL access All checks were successful Build Nanobot OAuth / build (push) Successful in 5m29s Details Build Nanobot OAuth / cleanup (push) Successful in 1s Details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-14 13:25:02 +01:00
nanobot	e4c300bcfd	Increase subagent max_iterations from 15 to 50 (#3 ) All checks were successful Build Nanobot OAuth / build (push) Successful in 47s Details Build Nanobot OAuth / cleanup (push) Successful in 1s Details Co-authored-by: nanobot <nanobot@wylab.me> Co-committed-by: nanobot <nanobot@wylab.me>	2026-02-14 11:40:50 +01:00