nanobot/ara

Fork 0

Files

T

nanobot b275af2b4d fix: dereference orchestra-skills submodule, add as plain files

2026-05-05 23:28:24 +02:00

14 KiB

Raw Permalink Blame History

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

AI Research Skills Library - A comprehensive open-source library of 90 AI research skills enabling AI agents to autonomously conduct AI research — from idea to paper. Each skill provides expert-level guidance (200-500 lines) with real code examples, troubleshooting guides, and production-ready workflows.

Mission: Enable AI agents to autonomously conduct AI research from hypothesis to experimental verification, covering the full lifecycle: literature survey, ideation, dataset preparation, training pipelines, model deployment, evaluation, and paper writing.

Repository Architecture

Directory Structure (90 Skills Across 23 Categories)

Skills are organized into numbered categories representing the AI research lifecycle:

0-autoresearch-skill/ - Autonomous research orchestration (1 skill: Autoresearch — central layer that manages the full lifecycle and routes to all other skills)
01-model-architecture/ - Model architectures (5 skills: Megatron-Core, LitGPT, Mamba, RWKV, NanoGPT)
02-tokenization/ - Tokenizers (2 skills: HuggingFace Tokenizers, SentencePiece)
03-fine-tuning/ - Fine-tuning frameworks (4 skills: Axolotl, LLaMA-Factory, Unsloth, PEFT)
04-mechanistic-interpretability/ - Interpretability tools (4 skills: TransformerLens, SAELens, NNsight, Pyvene)
05-data-processing/ - Data curation (2 skills: Ray Data, NeMo Curator)
06-post-training/ - RLHF/DPO/GRPO (8 skills: TRL, GRPO, OpenRLHF, SimPO, verl, slime, miles, torchforge)
07-safety-alignment/ - Safety and guardrails (4 skills: Constitutional AI, LlamaGuard, NeMo Guardrails, Prompt Guard)
08-distributed-training/ - Distributed systems (6 skills: Megatron-Core, DeepSpeed, FSDP, Accelerate, PyTorch Lightning, Ray Train)
09-infrastructure/ - Cloud compute (3 skills: Modal, SkyPilot, Lambda Labs)
10-optimization/ - Optimization techniques (6 skills: Flash Attention, bitsandbytes, GPTQ, AWQ, HQQ, GGUF)
11-evaluation/ - Benchmarking (3 skills: lm-evaluation-harness, BigCode, NeMo Evaluator)
12-inference-serving/ - Inference engines (4 skills: vLLM, TensorRT-LLM, llama.cpp, SGLang)
13-mlops/ - Experiment tracking (3 skills: Weights & Biases, MLflow, TensorBoard)
14-agents/ - Agent frameworks (4 skills: LangChain, LlamaIndex, CrewAI, AutoGPT)
15-rag/ - Retrieval-augmented generation (5 skills: Chroma, FAISS, Sentence Transformers, Pinecone, Qdrant)
16-prompt-engineering/ - Structured output (4 skills: DSPy, Instructor, Guidance, Outlines)
17-observability/ - LLM observability (2 skills: LangSmith, Phoenix)
18-multimodal/ - Vision and speech (7 skills: CLIP, Whisper, LLaVA, Stable Diffusion, SAM, BLIP-2, AudioCraft)
19-emerging-techniques/ - Advanced methods (6 skills: MoE Training, Model Merging, Long Context, Speculative Decoding, Knowledge Distillation, Model Pruning)
20-ml-paper-writing/ - Paper writing (1 skill: ML Paper Writing with LaTeX templates for NeurIPS, ICML, ICLR, ACL, AAAI, COLM)
21-research-ideation/ - Ideation (2 skills: Research Brainstorming, Creative Thinking)
22-agent-native-research-artifact/ - Agent-Native Research Artifact tooling (3 skills: ARA Compiler, ARA Research Manager, ARA Rigor Reviewer — ingestion, post-task provenance recording, and Seal Level 2 epistemic review)

Skill File Structure

Each skill follows a standardized format:

skill-name/
├── SKILL.md                    # Main guidance (200-600 lines with YAML frontmatter)
├── references/                 # Deep documentation (300KB+ target)
│   ├── README.md              # From official docs
│   ├── api.md                 # API reference
│   ├── tutorials.md           # Step-by-step guides
│   ├── issues.md              # Real GitHub issues & solutions
│   └── releases.md            # Version history
├── scripts/                    # Helper scripts (optional)
├── templates/                  # Code templates (optional)
└── examples/                   # Example implementations (optional)

Skill Quality Standards

YAML Frontmatter Requirements (CRITICAL)

All SKILL.md files MUST include YAML frontmatter with these exact fields:

---
name: skill-name-here              # kebab-case, no quotes, gerund form preferred
description: Third-person description of what AND when to use this skill  # No quotes, max 1024 chars
version: 1.0.0                     # Semantic versioning
author: Orchestra Research         # Standard author
license: MIT                       # Standard license
tags: [Tag One, Tag Two]          # Title Case (except UPPERCASE acronyms like GRPO, TRL, RLHF)
dependencies: [pkg>=1.0.0]         # Optional, with version constraints
---

Critical Rules:

name: Use gerund form (e.g., serving-llms, processing-data, grpo-rl-training)
description: Third person ("Provides guidance for..."), include WHAT it does AND WHEN to use it
tags: Title Case for regular words, UPPERCASE for acronyms (GRPO, TRL, RLHF, DPO, PPO)
No quotes around any field values (except in arrays)
Dependencies should include version constraints: transformers>=4.47.0

Content Quality Standards

Core Requirements (based on Anthropic official best practices):

✅ SKILL.md body: 200-500 lines (under 500 lines is critical for performance)
✅ Progressive disclosure: SKILL.md as overview, details in separate reference files
✅ Workflows with copy-paste checklists for complex tasks
✅ "When to use vs alternatives" guidance section
✅ Common issues section with solutions
✅ Concise content: assume Claude is smart, no over-explaining basics
✅ Code examples with language detection (python, bash, etc.)
✅ References ONE level deep from SKILL.md (no nested references)

Gold Standard (aim for this - see 06-post-training/grpo-rl-training/):

✅ 2-3 complete workflows with step-by-step checklists
✅ Reference files for advanced topics (one level deep)
✅ Feedback loops (validate → fix → repeat) for quality-critical operations
✅ Consistent terminology throughout
✅ Concrete input/output examples
✅ Real GitHub issues with solutions (when available)

NOT Acceptable:

❌ SKILL.md over 500 lines (split into reference files instead)
❌ Over-explaining basics that Claude already knows
❌ First-person descriptions ("I can help you...")
❌ Vague skill names ("helper", "utils", "tools")
❌ Nested references (SKILL.md → ref1.md → ref2.md)
❌ Missing workflows with checklists for complex tasks

Development Workflow

Adding a New Skill

Choose skill from roadmap (see CONTRIBUTING.md or README.md)
Create directory structure in appropriate category (01-19)
Write SKILL.md with YAML frontmatter following standards above
Add reference documentation (target 300KB+ from official sources)
Validate quality:
- Check SKILL.md has YAML frontmatter
- Verify SKILL.md is 200-500 lines
- Ensure code blocks have language tags
- Confirm references are one level deep from SKILL.md
- Check documentation size: du -sh skill-name/references/
Test the skill with real use cases before submitting

Improving Existing Skills

When updating skills:

Maintain YAML frontmatter format and fields
Keep SKILL.md under 500 lines - split into reference files if needed
Add workflows with checklists for complex operations
Update version number in YAML frontmatter
Test changes with representative tasks

Quality Validation Commands

# Check YAML frontmatter exists
head -20 skill-name/SKILL.md

# Verify SKILL.md line count (target 200-500 lines)
wc -l skill-name/SKILL.md

# Check documentation size (target 300KB+)
du -sh skill-name/references/

# Verify code blocks have language tags
grep -A 1 '```' skill-name/SKILL.md | head -20

# Validate YAML frontmatter syntax
python -c "import yaml; yaml.safe_load(open('skill-name/SKILL.md').read().split('---')[1])"

Key Files

README.md - Project overview, all 90 skills listed with descriptions and stats
CONTRIBUTING.md - Complete contribution guidelines and quality standards
SKILL_TEMPLATE.md - Copy-paste scaffold for new skills
ROADMAP.md - Development roadmap (90 skills achieved)
anthropic_official_docs/ - Anthropic's official best practices for skills

Git Workflow

Standard Git workflow:

# Create feature branch
git checkout -b add-skill-name

# Add and commit changes
git add category/skill-name/
git commit -m "Add [Skill Name] skill

- X lines of documentation
- Y GitHub issues with solutions
- API reference and examples included"

# Push to fork and create PR
git push origin add-skill-name

Automation: Orchestra Skill Marketplace Sync

How Auto-Sync Works

When skills are committed to the main branch, GitHub Actions automatically syncs them to the Orchestra skill marketplace:

GitHub Actions detects changed skill folders on push to main
For each changed skill:
- Extracts metadata from SKILL.md frontmatter (name, author, etc.)
- Creates ZIP file containing entire skill directory (SKILL.md, references/, scripts/, etc.)
- Uploads to Orchestra API endpoint
Orchestra stores ZIP in Supabase Storage and creates database record
Skill appears in marketplace at https://orchestra.com/research-skills

Workflow File Location

File: .github/workflows/sync-skills.yml
Triggers: Push to main branch, manual workflow dispatch
What syncs: Only skill directories that changed in the commit

Author Detection (Orchestra vs Community)

The workflow reads the author: field from SKILL.md frontmatter to determine badge:

Official Orchestra Skill:

---
author: Orchestra Research  # Contains "Orchestra"
---

Result: Source = orchestra (Official badge)
Storage: research-skills/orchestra/skill-name.zip

Community Skill:

---
author: Jane Doe  # Does NOT contain "Orchestra"
---

Result: Source = community (Community badge)
Storage: research-skills/community/skill-name.zip

What Gets Synced

The workflow zips ALL contents of skill directory:

✅ SKILL.md
✅ references/ (all subdirectories)
✅ scripts/ (if exists)
✅ assets/ (if exists)
✅ examples/ (if exists)
✅ templates/ (if exists)
❌ Hidden files (.gitkeep, .DS_Store)

Testing the Sync

Manual trigger:

Go to GitHub Actions tab
Select "Sync Skills to Orchestra" workflow
Click "Run workflow"

Test with commit:

# Make a small change to any skill
echo "\n<!-- Updated $(date) -->" >> 01-model-architecture/litgpt/SKILL.md

# Commit and push to main
git add .
git commit -m "test: trigger auto-sync"
git push origin main

Verify sync worked:

Check GitHub Actions tab for workflow run status
Check Orchestra marketplace for updated skill
Check Supabase Storage for ZIP file

Important Notes

GitHub Secrets required: ORCHESTRA_API_URL, ORCHESTRA_SYNC_API_KEY (already configured)
Only syncs changed skills: Workflow detects which skill directories changed in commit
SKILL.md required: Skills without SKILL.md are skipped with warning
See detailed setup: dev_data/GITHUB_SKILLS_SYNC_SETUP.md

npm Package Publishing

How It Works

The publish-npm.yml workflow auto-publishes to npm when the version in packages/ai-research-skills/package.json changes on main.

Auth: Uses OIDC trusted publishing (no npm tokens). Configured on npmjs.com under the package's Trusted Publishers settings.
Provenance: --provenance flag signs packages with Sigstore for supply chain security.
Workflow: .github/workflows/publish-npm.yml

Bumping Versions

Always use npm version (not manual edits) to keep package-lock.json in sync:

cd packages/ai-research-skills
npm version patch   # 1.3.6 → 1.3.7
npm version minor   # 1.3.7 → 1.4.0
npm version major   # 1.4.0 → 2.0.0

Use --no-git-tag-version if you want to commit manually.

Common Issues

npm ci fails in CI: package-lock.json is out of sync. Run npm install locally and commit the lockfile.
OIDC auth fails: The trusted publisher config on npmjs.com must match the repo exactly (case-sensitive: Orchestra-Research/AI-Research-SKILLs, workflow: publish-npm.yml).
NODE_AUTH_TOKEN blocks OIDC: actions/setup-node with registry-url auto-sets this token. The workflow unsets it before publish so OIDC takes over.
Version unchanged skip: The workflow compares HEAD vs HEAD~1. If only the lockfile changed (not package.json version), publish is skipped. Bump the version to trigger.

Important Conventions

Naming Conventions

Skill names: Use gerund form (verb + -ing) in kebab-case: processing-pdfs, serving-llms, grpo-rl-training
Tags: Title Case for words, UPPERCASE for acronyms (GRPO, TRL, RLHF, DPO, PPO, FSDP, MoE)
Descriptions: Third person, include what AND when to use

Code Examples

Always use language detection in code blocks:

# Good - has language tag
from transformers import AutoModel

NOT:

# Bad - no language tag
from transformers import AutoModel

Progressive Disclosure Pattern

SKILL.md should link directly to reference files (one level deep):

## Advanced Features

**API Reference**: See [references/api.md](references/api.md)
**Troubleshooting**: See [references/issues.md](references/issues.md)

Philosophy

Quality over Quantity: This library maintains high standards by:

Requiring 200-500 line SKILL.md files (focused, actionable guidance)
Including 300KB+ documentation from official sources
Providing real GitHub issues with solutions
Following Anthropic's official best practices for skills
Testing skills with real use cases before inclusion

Each skill represents expert-level knowledge distilled into a format optimized for AI agent consumption.

14 KiB Raw Permalink Blame History