Add design: Native Anthropic tools integration

Design for integrating bash_20250124, text_editor_20250728, and
computer_20251124 native tools into nanobot. These tools leverage
model-trained behaviors instead of instruction-following.

Key approach: Duck-typed registry supporting both function tools
and native tools, with beta flag management and ToolResult handling.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
2026-02-27 07:51:10 +00:00
parent ae1dc44705
commit 9bb9cd0d55
@@ -0,0 +1,265 @@
# Design: Native Anthropic Tools Integration
**Goal**: Integrate Anthropic's native trained tools (bash_20250124, text_editor_20250728, computer_20251124) into nanobot to leverage model's trained behaviors instead of custom function tools.
## Overview
Anthropic's native tools are version-coupled to model training. Unlike custom function tools (which the model learns via instruction-following at inference time), native tools have their behaviors baked into model weights during training. This provides more reliable tool execution.
**Key Insight**: The Anthropic API accepts BOTH tool formats in the same request:
- Function tools: `{type: "function", function: {name, description, input_schema}}`
- Native tools: `{type: "bash_20250124", name: "bash"}` (schema-less)
## Architecture
### 1. Tool Addition Strategy
Add three native tool implementations from anthropic-quickstarts reference:
- **BashTool20250124** - persistent bash session (replaces ExecTool)
- **EditTool20250728** - file operations with view/create/str_replace/insert (replaces EditTool, possibly ReadFileTool/WriteFileTool)
- **ComputerTool20251124** - VNC desktop control (new capability)
Location: `nanobot/agent/tools/anthropic/` (new subpackage)
Port from reference:
- Base classes: `BaseAnthropicTool`, `ToolResult`, `CLIResult`, `ToolError`
- Tool implementations with trained behaviors intact
- Session management (_BashSession for bash tool)
### 2. Registry Changes
Make `ToolRegistry` format-agnostic via duck typing:
**Current**: Only calls `tool.to_schema()`, expects function format
**New**: Support both interfaces
```python
def get_definitions(self) -> list[dict[str, Any]]:
definitions = []
for tool in self._tools.values():
if hasattr(tool, 'to_params'): # Native Anthropic tool
definitions.append(tool.to_params())
elif hasattr(tool, 'to_schema'): # Function tool
definitions.append(tool.to_schema())
else:
raise ValueError(f"Tool {tool.name} has no schema method")
return definitions
```
**Execution**: No changes needed - `execute()` already looks up by name and calls the tool. Native tools implement `__call__(**kwargs)` which works with existing dispatch.
**Result**: Registry becomes thin coordination layer, doesn't enforce specific base class.
### 3. Tool Implementations
#### BashTool20250124
- Maintains persistent bash session via `_BashSession` class
- Sentinel-based output reading for reliable command capture
- Timeout handling (120s default)
- Restart capability
- Returns: `ToolResult(output=..., error=...)`
#### EditTool20250728
- Commands: `view`, `create`, `str_replace`, `insert`
- Path validation (absolute paths required)
- `str_replace`: uniqueness checking before replacement
- `insert`: line number validation
- File history tracking for potential undo
- Returns: `CLIResult(output=...)` with formatted snippets
#### ComputerTool20251124
- VNC desktop interaction (keyboard, mouse, screenshots)
- Actions: `key`, `type`, `mouse_move`, `left_click`, `right_click`, `double_click`, `screenshot`, etc.
- Screenshot returns `ToolResult(base64_image=...)`
- Coordinate scaling support
- Connects to VNC at 172.17.0.1:5900 (Windows VM from code-server)
### 4. API Integration
Update `anthropic_oauth.py._convert_tools_to_anthropic()` to pass through both formats:
**Current**: Only converts `type: "function"` tools
```python
if tool.get("type") == "function":
# convert to Anthropic format
```
**New**: Pass through ALL formats
```python
def _convert_tools_to_anthropic(self, tools: list[dict[str, Any]] | None) -> list[dict[str, Any]] | None:
if not tools:
return None
anthropic_tools = []
for tool in tools:
if tool.get("type") == "function":
# Convert function tool format
func = tool["function"]
anthropic_tools.append({
"name": func["name"],
"description": func.get("description", ""),
"input_schema": func.get("parameters", {"type": "object", "properties": {}})
})
else:
# Pass through native tool format as-is
# (bash_20250124, text_editor_20250728, computer_20251124)
anthropic_tools.append(tool)
return anthropic_tools if anthropic_tools else None
```
**Distinction**: Based on `type` field
- `type == "function"` → function tool, needs conversion
- `type == "bash_20250124"` (or other native type) → pass through as-is
### 5. Tool Result Handling
**Current**: Tools return plain strings
**New**: Native tools return `ToolResult` objects
```python
@dataclass(kw_only=True, frozen=True)
class ToolResult:
output: str | None = None
error: str | None = None
base64_image: str | None = None
system: str | None = None
```
**Agent loop changes** (`loop.py`): Handle both return types
```python
result = await self.tools.execute(tool_name, tool_input)
if isinstance(result, ToolResult):
# Native tool result - build structured content
tool_result_content = []
if result.output:
tool_result_content.append({"type": "text", "text": result.output})
if result.error:
tool_result_content.append({"type": "text", "text": f"Error: {result.error}"})
if result.base64_image:
# Image handling (see Section 6)
pass
if result.system:
# System messages for next turn
pass
else:
# Legacy string result from function tools
tool_result_content = [{"type": "text", "text": str(result)}]
```
### 6. Image Handling Flow
**Goal**: Both model and user see screenshots from computer tool
**Implementation**: Track media across tool iteration loop
```python
# At start of agent turn
media_paths_for_turn: list[str] = []
# During tool execution
if isinstance(result, ToolResult) and result.base64_image:
# 1. Save to disk for user
media_dir = Path.home() / ".nanobot" / "media"
media_dir.mkdir(parents=True, exist_ok=True)
screenshot_path = media_dir / f"screenshot_{int(time.time())}.png"
screenshot_path.write_bytes(base64.b64decode(result.base64_image))
media_paths_for_turn.append(str(screenshot_path))
# 2. Include in tool_result for model to see
tool_result_content.append({
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": result.base64_image
}
})
# After final LLM response
await self.bus.publish(OutboundMessage(
channel=inbound.channel,
chat_id=inbound.chat_id,
content=final_response,
media=media_paths_for_turn # Include all screenshots
))
```
**Result**:
- Model sees base64 in tool_result → analyzes and reasons about it
- User receives file via Telegram's media sending (`_send_with_media()`)
### 7. Version Management & Beta Flags
**Problem**: Each native tool version requires specific API beta flag
**Solution**: Add beta flag tracking to native tools
Each native tool class specifies its required beta flag:
```python
class BashTool20250124(BaseAnthropicTool):
api_type = "bash_20250124"
name = "bash"
beta_flag = "computer-use-2025-11-24" # Required for API
```
In `anthropic_oauth.py._make_request()`, collect beta flags:
```python
# Collect unique beta flags from native tools
beta_flags = set()
for tool in tools or []:
if hasattr(tool, 'beta_flag') and tool.beta_flag:
beta_flags.add(tool.beta_flag)
# Add to API request headers
if beta_flags:
headers["anthropic-beta"] = ",".join(sorted(beta_flags))
```
**Note**: All three tools (bash, text_editor, computer) currently use the same beta flag: `"computer-use-2025-11-24"` as of the 2025-11-24 tool version.
### 8. Removing Overlapping Tools
Once native tools are implemented and tested, remove overlapping custom tools:
**To Remove**:
- `ExecTool` → replaced by `BashTool20250124` (persistent session, better output)
- `EditFileTool` → replaced by `EditTool20250728` (str_replace command)
- Possibly `ReadFileTool`, `WriteFileTool``EditTool20250728` has `view` and `create` commands
**To Keep**:
- `ListDirTool` → no native equivalent
- `WebSearchTool`, `WebFetchTool` → no native equivalent
- `MessageTool`, `SpawnTool`, `WaitForSubagentsTool` → nanobot-specific
- `CronTool` → nanobot-specific
**Migration Notes**:
- `EditTool20250728` only supports absolute paths (enforced in validation)
- `BashTool20250124` maintains session state across calls (different from ExecTool's one-shot)
- Test native tools thoroughly before removing custom ones
## Benefits
1. **Trained Behaviors**: Model knows how to use these tools from training, not instruction-following
2. **Better Reliability**: Persistent bash sessions, validated file operations
3. **New Capabilities**: Desktop interaction via computer tool
4. **Future-Proof**: Easy to add more native tools as Anthropic releases them (just port implementation)
5. **Unified System**: Both function tools and native tools work together in same request
## Trade-offs
1. **Code Duplication**: Porting reference implementations means maintaining separate codebase
- Mitigation: Keep close to reference implementation for easier updates
2. **Version Management**: Need to track tool versions and beta flags
- Mitigation: Simple beta_flag attribute on tool classes
3. **Testing Complexity**: Need to test both tool systems
- Mitigation: Gradual rollout, keep custom tools until native tools proven
## Success Criteria
1. All three native tools execute successfully
2. Model can use bash, edit, and computer tools in same conversation
3. Screenshots from computer tool visible to both model and user
4. No regression in existing functionality (other tools still work)
5. Performance comparable to custom tools