Files

14 KiB

Contributing to Claude AI Research Skills

Thank you for your interest in contributing! This guide will help you add new skills to the library.


🎯 What We're Building

Vision: The most comprehensive open-source library of AI research skills for Claude Code.

Target: 86 comprehensive skills covering the entire AI research lifecycle — from ideation to paper writing. Achieved.

Current Progress: 86/86 skills across 22 categories (100%)

Philosophy: Quality > Quantity. We deleted 9 low-quality skills to maintain high standards.


🤝 How to Contribute

Ways to Contribute

  1. Add a new skill - Most valuable contribution
  2. Improve existing skills - Update docs, add examples, fix errors
  3. Report issues - Outdated information, broken links, missing content
  4. Share feedback - What skills do you need? What's missing?

📝 Adding a New Skill

Step 1: Choose a Skill

Step 2: Fork and Clone

# Fork the repository on GitHub first
git clone https://github.com/YOUR_USERNAME/AI-research-SKILLs.git
cd claude-ai-research-skills

# Create a feature branch
git checkout -b add-vllm-skill

Step 3: Use Skill Seeker MCP

Option A: Documentation Scraping

# Create config file
python3 cli/doc_scraper.py --interactive
# Or copy and modify an existing config
cp configs/react.json configs/vllm.json

# Scrape and build
python3 cli/doc_scraper.py --config configs/vllm.json

Option B: GitHub Scraping

# Scrape from GitHub repository
export GITHUB_TOKEN=$(gh auth token)
python3 cli/github_scraper.py --repo vllm-project/vllm --name vllm --description "High-performance LLM inference with PagedAttention"

Option C: Unified Scraping (recommended for comprehensive skills)

# Combine documentation + GitHub + PDF
python3 cli/unified_scraper.py --config configs/vllm_unified.json

Step 4: Move to Correct Directory

# Determine the category (see directory structure below)
mv output/vllm/ 12-inference-serving/vllm/

# Move metadata
mv output/vllm_data/ .metadata/vllm_data/

Step 5: Validate Quality

Based on Anthropic Official Best Practices

Core Requirements (or skill will be rejected):

  • YAML frontmatter with name (gerund form, e.g., "serving-llms") and description (third person, includes what AND when)
  • SKILL.md body: 200-300 lines (under 500 lines maximum)
  • Progressive disclosure: SKILL.md as overview, details in separate reference files
  • Workflows with copy-paste checklists for complex tasks
  • When to use vs alternatives guidance
  • Common issues section with solutions
  • Concise content: assume Claude is smart, no over-explaining basics
  • Code examples with language detection (python, bash, etc.)

Gold Standard (aim for this):

  • SKILL.md: 200-300 lines of focused, actionable guidance
  • 2-3 complete workflows with step-by-step checklists
  • Reference files for advanced topics (one level deep from SKILL.md)
  • Feedback loops (validate → fix → repeat) for quality-critical operations
  • Consistent terminology throughout
  • Concrete examples (input/output pairs where helpful)
  • Clear, concise troubleshooting guide

NOT Acceptable:

  • SKILL.md over 500 lines (split into reference files instead)
  • Over-explaining basics that Claude already knows
  • First-person descriptions ("I can help you...")
  • Vague skill names ("helper", "utils", "tools")
  • Nested references (SKILL.md → ref1.md → ref2.md)
  • Generic templates that just link to README/CHANGELOG
  • Missing workflows with checklists for complex tasks
  • Time-sensitive information (use "old patterns" section instead)

Quick Quality Check:

# Check SKILL.md has real code examples
cat 12-inference-serving/vllm/SKILL.md

# Check reference files exist
ls -lh 12-inference-serving/vllm/references/

# Verify total documentation size (should be 300KB+)
du -sh 12-inference-serving/vllm/references/

YAML Frontmatter Format Standards

All SKILL.md files must include properly formatted YAML frontmatter with the following fields:

---
name: skill-name-here
description: Clear description of when to use this skill
version: 1.0.0
author: Orchestra Research
license: MIT
tags: [Tag One, Tag Two, Tag Three]
dependencies: [package1>=1.0.0, package2>=2.0.0]
---

Field Requirements:

Field Required Format Notes
name Yes kebab-case No quotes, lowercase with hyphens
description Yes Plain text No quotes, concise explanation
version Yes Semantic version Format: MAJOR.MINOR.PATCH
author Yes Plain text Use "Orchestra Research"
license Yes License identifier Typically MIT
tags Yes Array Capitalized words, no quotes
dependencies ⚠️ Optional Array Include version constraints

Tag Guidelines:

  • Use Title Case for all tags (capitalize first letter of each word)
  • Keep acronyms UPPERCASE (e.g., GRPO, TRL, RLHF, DPO)
  • Use descriptive, searchable terms
  • Include 5-10 relevant tags
  • No quotes around tags

Example Tags:

tags: [Reinforcement Learning, GRPO, TRL, Post-Training, RLHF, Reward Modeling]

Dependencies Guidelines:

  • Only include direct dependencies needed to use the skill
  • Include minimum version constraints using >=
  • No quotes around package names
  • List core packages first, optional packages last

Example Dependencies:

dependencies: [transformers>=4.47.0, trl>=0.14.0, datasets>=3.2.0, peft>=0.14.0, torch]

Complete Example:

---
name: grpo-rl-training
description: Expert guidance for GRPO/RL fine-tuning with TRL for reasoning and task-specific model training
version: 1.0.0
author: Orchestra Research
license: MIT
tags: [Reinforcement Learning, GRPO, TRL, Post-Training, RLHF, Reward Modeling, Reasoning, DPO, PPO, Structured Output]
dependencies: [transformers>=4.47.0, trl>=0.14.0, datasets>=3.2.0, peft>=0.14.0, torch]
---

Validation Checklist:

  • YAML frontmatter is present at the very beginning of SKILL.md
  • All required fields are included
  • No quotes around field values (except in arrays)
  • Tags use Title Case (capitalized words)
  • Dependencies include version constraints where appropriate
  • YAML is valid (test with: python -c "import yaml; yaml.safe_load(open('SKILL.md').read().split('---')[1])")

Step 6: Update Marketplace

Add your skill to .claude-plugin/marketplace.json so it appears in the Claude Code plugin marketplace.

Add a new entry to the plugins array:

{
  "name": "your-skill-name",
  "source": "./XX-category/skill-folder",
  "description": "Description from your SKILL.md frontmatter (what it does AND when to use it)"
}

Example:

{
  "name": "serving-llms-vllm",
  "source": "./12-inference-serving/vllm",
  "description": "Serves LLMs with high throughput using vLLM's PagedAttention and continuous batching. Use when deploying production LLM APIs or optimizing inference latency/throughput."
}

Validation:

# Verify JSON is valid after editing
python3 -c "import json; json.load(open('.claude-plugin/marketplace.json'))"

Important: Place your entry in the correct position (skills are ordered by category number).

Step 7: Submit Pull Request

# Add your changes
git add 12-inference-serving/vllm/
git add .metadata/vllm_data/
git add .claude-plugin/marketplace.json

# Commit with descriptive message
git commit -m "Add vLLM inference serving skill

- 215 pages of documentation
- 12 GitHub issues with solutions
- API reference and examples
- Performance benchmarks included"

# Push to your fork
git push origin add-vllm-skill

Then create a Pull Request on GitHub with:

  • Title: "Add [Skill Name] skill"
  • Description:
    • What the skill covers
    • Source (docs, GitHub, or both)
    • Documentation size
    • Key features/examples included

📂 Directory Structure

Place skills in the correct category:

claude-ai-research-skills/
├── 01-model-architecture/      # Model architectures (GPT, LLaMA, etc.)
├── 02-tokenization/            # Tokenizers (HuggingFace, SentencePiece)
├── 03-fine-tuning/             # Fine-tuning frameworks (Axolotl, TRL)
├── 04-peft/                    # Parameter-efficient methods (LoRA, QLoRA)
├── 05-data-processing/         # Data curation and processing
├── 06-post-training/           # RLHF, DPO, PPO
├── 07-safety-alignment/        # Guardrails, safety, content moderation
├── 08-distributed-training/    # DeepSpeed, FSDP, distributed systems
├── 09-infrastructure/          # PyTorch Lightning, Ray, Composer
├── 10-optimization/            # Flash Attention, bitsandbytes, kernels
├── 11-evaluation/              # Benchmarks, evaluation frameworks
├── 12-inference-serving/       # vLLM, TensorRT-LLM, llama.cpp
├── 13-mlops/                   # Weights & Biases, MLflow, TensorBoard
├── 14-agents/                  # LangChain, LlamaIndex, CrewAI
├── 15-rag/                     # RAG pipelines, vector databases
├── 16-prompt-engineering/      # DSPy, Instructor, structured output
├── 17-observability/           # LangSmith, Phoenix, monitoring
├── 18-multimodal/              # LLaVA, Whisper, Stable Diffusion
└── 19-emerging-techniques/     # MoE, model merging, long context

📋 Skill Structure Template

Use SKILL_TEMPLATE.md as a starting point. Each skill should contain:

skill-name/
├── SKILL.md                    # Quick reference (50-150 lines)
│   ├── Metadata (name, description, version)
│   ├── When to use this skill
│   ├── Quick start examples
│   ├── Common patterns
│   └── Links to references
│
├── references/                 # Deep documentation (300KB+)
│   ├── README.md              # From GitHub/official docs
│   ├── api.md                 # API reference
│   ├── tutorials.md           # Step-by-step guides
│   ├── issues.md              # Real GitHub issues (if applicable)
│   ├── releases.md            # Version history (if applicable)
│   └── file_structure.md      # Codebase navigation (if applicable)
│
├── scripts/                    # Helper scripts (optional)
└── assets/                     # Templates & examples (optional)

🔍 Quality Standards

Code Examples

All code examples MUST have language detection:

Good:

```python
from transformers import AutoModel
model = AutoModel.from_pretrained("gpt2")
```

Bad:

```
from transformers import AutoModel
model = AutoModel.from_pretrained("gpt2")
```

Documentation Size

  • Minimum: 100KB total in references/
  • Target: 300KB+ total
  • Gold Standard: 500KB+ with issues, releases, examples

Real-World Content

Prefer skills with:

  • Real GitHub issues and solutions
  • Release notes and breaking changes
  • Community discussions
  • Performance benchmarks
  • Troubleshooting guides

Always include:

  • Official documentation link
  • GitHub repository link
  • License information
  • Version/release information

🧪 Testing

Before submitting, verify:

# 1. SKILL.md is well-formatted
cat your-skill/SKILL.md

# 2. All reference files exist
ls -R your-skill/references/

# 3. Documentation size is adequate (300KB+ target)
du -sh your-skill/references/

# 4. Code blocks have language tags
grep -A 1 '```' your-skill/SKILL.md | head -20

# 5. No broken links (manual check)
# Open SKILL.md and verify all [links](urls) work

# 6. Marketplace entry added and valid
python3 -c "import json; json.load(open('.claude-plugin/marketplace.json'))"

🎓 Examples of High-Quality Skills

Gold Standard (emulate this):

  1. 06-post-training/grpo-rl-training/ (569 lines)
    • Complete implementation workflow
    • 10+ code examples with explanations
    • Troubleshooting guide
    • Common pitfalls and solutions
    • Performance tips
    • This is the quality bar

Good Examples: 2. 03-fine-tuning/axolotl/ (151 lines)

  • Real configuration examples
  • When to use guidance
  • Comprehensive but could add more workflows
  1. 08-distributed-training/deepspeed/ (132 lines)
    • ZeRO optimization patterns
    • Configuration examples
    • Good foundation, needs more troubleshooting

🚫 What NOT to Contribute

  • Proprietary/closed-source tools
  • Deprecated libraries (unless historically important)
  • Duplicate skills (check existing skills first)
  • Incomplete skills (<50 lines SKILL.md, <100KB refs)
  • Skills without code examples

🎖️ Recognition

All contributors will be:

  • Listed in CONTRIBUTORS.md
  • Mentioned in release notes
  • Featured on project homepage (when launched)
  • Attributed in SKILL.md metadata

Top contributors (5+ skills) receive special recognition and maintainer status.


📞 Getting Help


📅 Review Process

  1. Automated Checks (when implemented):

    • File structure validation
    • Code block language detection
    • Documentation size check
    • Marketplace.json validation
  2. Manual Review (by maintainers):

    • Content quality and accuracy
    • Code example validity
    • Proper categorization
    • License compliance
  3. Feedback Loop:

    • Reviews within 48-72 hours
    • Constructive feedback provided
    • Iterate until approved
  4. Merge:

    • Merged to main branch
    • Added to release notes
    • Contributor recognized

🙏 Thank You!

Your contributions help the entire AI research community. Every skill added makes Claude Code more powerful for researchers, engineers, and students worldwide.

Let's build something amazing together! 🚀