Add design: Native Anthropic tools integration
Design for integrating bash_20250124, text_editor_20250728, and computer_20251124 native tools into nanobot. These tools leverage model-trained behaviors instead of instruction-following. Key approach: Duck-typed registry supporting both function tools and native tools, with beta flag management and ToolResult handling. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,265 @@
|
||||
# Design: Native Anthropic Tools Integration
|
||||
|
||||
**Goal**: Integrate Anthropic's native trained tools (bash_20250124, text_editor_20250728, computer_20251124) into nanobot to leverage model's trained behaviors instead of custom function tools.
|
||||
|
||||
## Overview
|
||||
|
||||
Anthropic's native tools are version-coupled to model training. Unlike custom function tools (which the model learns via instruction-following at inference time), native tools have their behaviors baked into model weights during training. This provides more reliable tool execution.
|
||||
|
||||
**Key Insight**: The Anthropic API accepts BOTH tool formats in the same request:
|
||||
- Function tools: `{type: "function", function: {name, description, input_schema}}`
|
||||
- Native tools: `{type: "bash_20250124", name: "bash"}` (schema-less)
|
||||
|
||||
## Architecture
|
||||
|
||||
### 1. Tool Addition Strategy
|
||||
|
||||
Add three native tool implementations from anthropic-quickstarts reference:
|
||||
- **BashTool20250124** - persistent bash session (replaces ExecTool)
|
||||
- **EditTool20250728** - file operations with view/create/str_replace/insert (replaces EditTool, possibly ReadFileTool/WriteFileTool)
|
||||
- **ComputerTool20251124** - VNC desktop control (new capability)
|
||||
|
||||
Location: `nanobot/agent/tools/anthropic/` (new subpackage)
|
||||
|
||||
Port from reference:
|
||||
- Base classes: `BaseAnthropicTool`, `ToolResult`, `CLIResult`, `ToolError`
|
||||
- Tool implementations with trained behaviors intact
|
||||
- Session management (_BashSession for bash tool)
|
||||
|
||||
### 2. Registry Changes
|
||||
|
||||
Make `ToolRegistry` format-agnostic via duck typing:
|
||||
|
||||
**Current**: Only calls `tool.to_schema()`, expects function format
|
||||
|
||||
**New**: Support both interfaces
|
||||
```python
|
||||
def get_definitions(self) -> list[dict[str, Any]]:
|
||||
definitions = []
|
||||
for tool in self._tools.values():
|
||||
if hasattr(tool, 'to_params'): # Native Anthropic tool
|
||||
definitions.append(tool.to_params())
|
||||
elif hasattr(tool, 'to_schema'): # Function tool
|
||||
definitions.append(tool.to_schema())
|
||||
else:
|
||||
raise ValueError(f"Tool {tool.name} has no schema method")
|
||||
return definitions
|
||||
```
|
||||
|
||||
**Execution**: No changes needed - `execute()` already looks up by name and calls the tool. Native tools implement `__call__(**kwargs)` which works with existing dispatch.
|
||||
|
||||
**Result**: Registry becomes thin coordination layer, doesn't enforce specific base class.
|
||||
|
||||
### 3. Tool Implementations
|
||||
|
||||
#### BashTool20250124
|
||||
- Maintains persistent bash session via `_BashSession` class
|
||||
- Sentinel-based output reading for reliable command capture
|
||||
- Timeout handling (120s default)
|
||||
- Restart capability
|
||||
- Returns: `ToolResult(output=..., error=...)`
|
||||
|
||||
#### EditTool20250728
|
||||
- Commands: `view`, `create`, `str_replace`, `insert`
|
||||
- Path validation (absolute paths required)
|
||||
- `str_replace`: uniqueness checking before replacement
|
||||
- `insert`: line number validation
|
||||
- File history tracking for potential undo
|
||||
- Returns: `CLIResult(output=...)` with formatted snippets
|
||||
|
||||
#### ComputerTool20251124
|
||||
- VNC desktop interaction (keyboard, mouse, screenshots)
|
||||
- Actions: `key`, `type`, `mouse_move`, `left_click`, `right_click`, `double_click`, `screenshot`, etc.
|
||||
- Screenshot returns `ToolResult(base64_image=...)`
|
||||
- Coordinate scaling support
|
||||
- Connects to VNC at 172.17.0.1:5900 (Windows VM from code-server)
|
||||
|
||||
### 4. API Integration
|
||||
|
||||
Update `anthropic_oauth.py._convert_tools_to_anthropic()` to pass through both formats:
|
||||
|
||||
**Current**: Only converts `type: "function"` tools
|
||||
```python
|
||||
if tool.get("type") == "function":
|
||||
# convert to Anthropic format
|
||||
```
|
||||
|
||||
**New**: Pass through ALL formats
|
||||
```python
|
||||
def _convert_tools_to_anthropic(self, tools: list[dict[str, Any]] | None) -> list[dict[str, Any]] | None:
|
||||
if not tools:
|
||||
return None
|
||||
|
||||
anthropic_tools = []
|
||||
for tool in tools:
|
||||
if tool.get("type") == "function":
|
||||
# Convert function tool format
|
||||
func = tool["function"]
|
||||
anthropic_tools.append({
|
||||
"name": func["name"],
|
||||
"description": func.get("description", ""),
|
||||
"input_schema": func.get("parameters", {"type": "object", "properties": {}})
|
||||
})
|
||||
else:
|
||||
# Pass through native tool format as-is
|
||||
# (bash_20250124, text_editor_20250728, computer_20251124)
|
||||
anthropic_tools.append(tool)
|
||||
|
||||
return anthropic_tools if anthropic_tools else None
|
||||
```
|
||||
|
||||
**Distinction**: Based on `type` field
|
||||
- `type == "function"` → function tool, needs conversion
|
||||
- `type == "bash_20250124"` (or other native type) → pass through as-is
|
||||
|
||||
### 5. Tool Result Handling
|
||||
|
||||
**Current**: Tools return plain strings
|
||||
|
||||
**New**: Native tools return `ToolResult` objects
|
||||
```python
|
||||
@dataclass(kw_only=True, frozen=True)
|
||||
class ToolResult:
|
||||
output: str | None = None
|
||||
error: str | None = None
|
||||
base64_image: str | None = None
|
||||
system: str | None = None
|
||||
```
|
||||
|
||||
**Agent loop changes** (`loop.py`): Handle both return types
|
||||
```python
|
||||
result = await self.tools.execute(tool_name, tool_input)
|
||||
|
||||
if isinstance(result, ToolResult):
|
||||
# Native tool result - build structured content
|
||||
tool_result_content = []
|
||||
if result.output:
|
||||
tool_result_content.append({"type": "text", "text": result.output})
|
||||
if result.error:
|
||||
tool_result_content.append({"type": "text", "text": f"Error: {result.error}"})
|
||||
if result.base64_image:
|
||||
# Image handling (see Section 6)
|
||||
pass
|
||||
if result.system:
|
||||
# System messages for next turn
|
||||
pass
|
||||
else:
|
||||
# Legacy string result from function tools
|
||||
tool_result_content = [{"type": "text", "text": str(result)}]
|
||||
```
|
||||
|
||||
### 6. Image Handling Flow
|
||||
|
||||
**Goal**: Both model and user see screenshots from computer tool
|
||||
|
||||
**Implementation**: Track media across tool iteration loop
|
||||
|
||||
```python
|
||||
# At start of agent turn
|
||||
media_paths_for_turn: list[str] = []
|
||||
|
||||
# During tool execution
|
||||
if isinstance(result, ToolResult) and result.base64_image:
|
||||
# 1. Save to disk for user
|
||||
media_dir = Path.home() / ".nanobot" / "media"
|
||||
media_dir.mkdir(parents=True, exist_ok=True)
|
||||
screenshot_path = media_dir / f"screenshot_{int(time.time())}.png"
|
||||
screenshot_path.write_bytes(base64.b64decode(result.base64_image))
|
||||
media_paths_for_turn.append(str(screenshot_path))
|
||||
|
||||
# 2. Include in tool_result for model to see
|
||||
tool_result_content.append({
|
||||
"type": "image",
|
||||
"source": {
|
||||
"type": "base64",
|
||||
"media_type": "image/png",
|
||||
"data": result.base64_image
|
||||
}
|
||||
})
|
||||
|
||||
# After final LLM response
|
||||
await self.bus.publish(OutboundMessage(
|
||||
channel=inbound.channel,
|
||||
chat_id=inbound.chat_id,
|
||||
content=final_response,
|
||||
media=media_paths_for_turn # Include all screenshots
|
||||
))
|
||||
```
|
||||
|
||||
**Result**:
|
||||
- Model sees base64 in tool_result → analyzes and reasons about it
|
||||
- User receives file via Telegram's media sending (`_send_with_media()`)
|
||||
|
||||
### 7. Version Management & Beta Flags
|
||||
|
||||
**Problem**: Each native tool version requires specific API beta flag
|
||||
|
||||
**Solution**: Add beta flag tracking to native tools
|
||||
|
||||
Each native tool class specifies its required beta flag:
|
||||
```python
|
||||
class BashTool20250124(BaseAnthropicTool):
|
||||
api_type = "bash_20250124"
|
||||
name = "bash"
|
||||
beta_flag = "computer-use-2025-11-24" # Required for API
|
||||
```
|
||||
|
||||
In `anthropic_oauth.py._make_request()`, collect beta flags:
|
||||
```python
|
||||
# Collect unique beta flags from native tools
|
||||
beta_flags = set()
|
||||
for tool in tools or []:
|
||||
if hasattr(tool, 'beta_flag') and tool.beta_flag:
|
||||
beta_flags.add(tool.beta_flag)
|
||||
|
||||
# Add to API request headers
|
||||
if beta_flags:
|
||||
headers["anthropic-beta"] = ",".join(sorted(beta_flags))
|
||||
```
|
||||
|
||||
**Note**: All three tools (bash, text_editor, computer) currently use the same beta flag: `"computer-use-2025-11-24"` as of the 2025-11-24 tool version.
|
||||
|
||||
### 8. Removing Overlapping Tools
|
||||
|
||||
Once native tools are implemented and tested, remove overlapping custom tools:
|
||||
|
||||
**To Remove**:
|
||||
- `ExecTool` → replaced by `BashTool20250124` (persistent session, better output)
|
||||
- `EditFileTool` → replaced by `EditTool20250728` (str_replace command)
|
||||
- Possibly `ReadFileTool`, `WriteFileTool` → `EditTool20250728` has `view` and `create` commands
|
||||
|
||||
**To Keep**:
|
||||
- `ListDirTool` → no native equivalent
|
||||
- `WebSearchTool`, `WebFetchTool` → no native equivalent
|
||||
- `MessageTool`, `SpawnTool`, `WaitForSubagentsTool` → nanobot-specific
|
||||
- `CronTool` → nanobot-specific
|
||||
|
||||
**Migration Notes**:
|
||||
- `EditTool20250728` only supports absolute paths (enforced in validation)
|
||||
- `BashTool20250124` maintains session state across calls (different from ExecTool's one-shot)
|
||||
- Test native tools thoroughly before removing custom ones
|
||||
|
||||
## Benefits
|
||||
|
||||
1. **Trained Behaviors**: Model knows how to use these tools from training, not instruction-following
|
||||
2. **Better Reliability**: Persistent bash sessions, validated file operations
|
||||
3. **New Capabilities**: Desktop interaction via computer tool
|
||||
4. **Future-Proof**: Easy to add more native tools as Anthropic releases them (just port implementation)
|
||||
5. **Unified System**: Both function tools and native tools work together in same request
|
||||
|
||||
## Trade-offs
|
||||
|
||||
1. **Code Duplication**: Porting reference implementations means maintaining separate codebase
|
||||
- Mitigation: Keep close to reference implementation for easier updates
|
||||
2. **Version Management**: Need to track tool versions and beta flags
|
||||
- Mitigation: Simple beta_flag attribute on tool classes
|
||||
3. **Testing Complexity**: Need to test both tool systems
|
||||
- Mitigation: Gradual rollout, keep custom tools until native tools proven
|
||||
|
||||
## Success Criteria
|
||||
|
||||
1. All three native tools execute successfully
|
||||
2. Model can use bash, edit, and computer tools in same conversation
|
||||
3. Screenshots from computer tool visible to both model and user
|
||||
4. No regression in existing functionality (other tools still work)
|
||||
5. Performance comparable to custom tools
|
||||
Reference in New Issue
Block a user