14 KiB
Contributing to Claude AI Research Skills
Thank you for your interest in contributing! This guide will help you add new skills to the library.
🎯 What We're Building
Vision: The most comprehensive open-source library of AI research skills for Claude Code.
Target: 86 comprehensive skills covering the entire AI research lifecycle — from ideation to paper writing. ✅ Achieved.
Current Progress: 86/86 skills across 22 categories (100%)
Philosophy: Quality > Quantity. We deleted 9 low-quality skills to maintain high standards.
🤝 How to Contribute
Ways to Contribute
- Add a new skill - Most valuable contribution
- Improve existing skills - Update docs, add examples, fix errors
- Report issues - Outdated information, broken links, missing content
- Share feedback - What skills do you need? What's missing?
📝 Adding a New Skill
Step 1: Choose a Skill
Step 2: Fork and Clone
# Fork the repository on GitHub first
git clone https://github.com/YOUR_USERNAME/AI-research-SKILLs.git
cd claude-ai-research-skills
# Create a feature branch
git checkout -b add-vllm-skill
Step 3: Use Skill Seeker MCP
Option A: Documentation Scraping
# Create config file
python3 cli/doc_scraper.py --interactive
# Or copy and modify an existing config
cp configs/react.json configs/vllm.json
# Scrape and build
python3 cli/doc_scraper.py --config configs/vllm.json
Option B: GitHub Scraping
# Scrape from GitHub repository
export GITHUB_TOKEN=$(gh auth token)
python3 cli/github_scraper.py --repo vllm-project/vllm --name vllm --description "High-performance LLM inference with PagedAttention"
Option C: Unified Scraping (recommended for comprehensive skills)
# Combine documentation + GitHub + PDF
python3 cli/unified_scraper.py --config configs/vllm_unified.json
Step 4: Move to Correct Directory
# Determine the category (see directory structure below)
mv output/vllm/ 12-inference-serving/vllm/
# Move metadata
mv output/vllm_data/ .metadata/vllm_data/
Step 5: Validate Quality
Based on Anthropic Official Best Practices
Core Requirements (or skill will be rejected):
- ✅ YAML frontmatter with
name(gerund form, e.g., "serving-llms") anddescription(third person, includes what AND when) - ✅ SKILL.md body: 200-300 lines (under 500 lines maximum)
- ✅ Progressive disclosure: SKILL.md as overview, details in separate reference files
- ✅ Workflows with copy-paste checklists for complex tasks
- ✅ When to use vs alternatives guidance
- ✅ Common issues section with solutions
- ✅ Concise content: assume Claude is smart, no over-explaining basics
- ✅ Code examples with language detection (
python,bash, etc.)
Gold Standard (aim for this):
- ✅ SKILL.md: 200-300 lines of focused, actionable guidance
- ✅ 2-3 complete workflows with step-by-step checklists
- ✅ Reference files for advanced topics (one level deep from SKILL.md)
- ✅ Feedback loops (validate → fix → repeat) for quality-critical operations
- ✅ Consistent terminology throughout
- ✅ Concrete examples (input/output pairs where helpful)
- ✅ Clear, concise troubleshooting guide
NOT Acceptable:
- ❌ SKILL.md over 500 lines (split into reference files instead)
- ❌ Over-explaining basics that Claude already knows
- ❌ First-person descriptions ("I can help you...")
- ❌ Vague skill names ("helper", "utils", "tools")
- ❌ Nested references (SKILL.md → ref1.md → ref2.md)
- ❌ Generic templates that just link to README/CHANGELOG
- ❌ Missing workflows with checklists for complex tasks
- ❌ Time-sensitive information (use "old patterns" section instead)
Quick Quality Check:
# Check SKILL.md has real code examples
cat 12-inference-serving/vllm/SKILL.md
# Check reference files exist
ls -lh 12-inference-serving/vllm/references/
# Verify total documentation size (should be 300KB+)
du -sh 12-inference-serving/vllm/references/
YAML Frontmatter Format Standards
All SKILL.md files must include properly formatted YAML frontmatter with the following fields:
---
name: skill-name-here
description: Clear description of when to use this skill
version: 1.0.0
author: Orchestra Research
license: MIT
tags: [Tag One, Tag Two, Tag Three]
dependencies: [package1>=1.0.0, package2>=2.0.0]
---
Field Requirements:
| Field | Required | Format | Notes |
|---|---|---|---|
name |
✅ Yes | kebab-case | No quotes, lowercase with hyphens |
description |
✅ Yes | Plain text | No quotes, concise explanation |
version |
✅ Yes | Semantic version | Format: MAJOR.MINOR.PATCH |
author |
✅ Yes | Plain text | Use "Orchestra Research" |
license |
✅ Yes | License identifier | Typically MIT |
tags |
✅ Yes | Array | Capitalized words, no quotes |
dependencies |
⚠️ Optional | Array | Include version constraints |
Tag Guidelines:
- Use Title Case for all tags (capitalize first letter of each word)
- Keep acronyms UPPERCASE (e.g.,
GRPO,TRL,RLHF,DPO) - Use descriptive, searchable terms
- Include 5-10 relevant tags
- No quotes around tags
Example Tags:
tags: [Reinforcement Learning, GRPO, TRL, Post-Training, RLHF, Reward Modeling]
Dependencies Guidelines:
- Only include direct dependencies needed to use the skill
- Include minimum version constraints using
>= - No quotes around package names
- List core packages first, optional packages last
Example Dependencies:
dependencies: [transformers>=4.47.0, trl>=0.14.0, datasets>=3.2.0, peft>=0.14.0, torch]
Complete Example:
---
name: grpo-rl-training
description: Expert guidance for GRPO/RL fine-tuning with TRL for reasoning and task-specific model training
version: 1.0.0
author: Orchestra Research
license: MIT
tags: [Reinforcement Learning, GRPO, TRL, Post-Training, RLHF, Reward Modeling, Reasoning, DPO, PPO, Structured Output]
dependencies: [transformers>=4.47.0, trl>=0.14.0, datasets>=3.2.0, peft>=0.14.0, torch]
---
Validation Checklist:
- YAML frontmatter is present at the very beginning of SKILL.md
- All required fields are included
- No quotes around field values (except in arrays)
- Tags use Title Case (capitalized words)
- Dependencies include version constraints where appropriate
- YAML is valid (test with:
python -c "import yaml; yaml.safe_load(open('SKILL.md').read().split('---')[1])")
Step 6: Update Marketplace
Add your skill to .claude-plugin/marketplace.json so it appears in the Claude Code plugin marketplace.
Add a new entry to the plugins array:
{
"name": "your-skill-name",
"source": "./XX-category/skill-folder",
"description": "Description from your SKILL.md frontmatter (what it does AND when to use it)"
}
Example:
{
"name": "serving-llms-vllm",
"source": "./12-inference-serving/vllm",
"description": "Serves LLMs with high throughput using vLLM's PagedAttention and continuous batching. Use when deploying production LLM APIs or optimizing inference latency/throughput."
}
Validation:
# Verify JSON is valid after editing
python3 -c "import json; json.load(open('.claude-plugin/marketplace.json'))"
Important: Place your entry in the correct position (skills are ordered by category number).
Step 7: Submit Pull Request
# Add your changes
git add 12-inference-serving/vllm/
git add .metadata/vllm_data/
git add .claude-plugin/marketplace.json
# Commit with descriptive message
git commit -m "Add vLLM inference serving skill
- 215 pages of documentation
- 12 GitHub issues with solutions
- API reference and examples
- Performance benchmarks included"
# Push to your fork
git push origin add-vllm-skill
Then create a Pull Request on GitHub with:
- Title: "Add [Skill Name] skill"
- Description:
- What the skill covers
- Source (docs, GitHub, or both)
- Documentation size
- Key features/examples included
📂 Directory Structure
Place skills in the correct category:
claude-ai-research-skills/
├── 01-model-architecture/ # Model architectures (GPT, LLaMA, etc.)
├── 02-tokenization/ # Tokenizers (HuggingFace, SentencePiece)
├── 03-fine-tuning/ # Fine-tuning frameworks (Axolotl, TRL)
├── 04-peft/ # Parameter-efficient methods (LoRA, QLoRA)
├── 05-data-processing/ # Data curation and processing
├── 06-post-training/ # RLHF, DPO, PPO
├── 07-safety-alignment/ # Guardrails, safety, content moderation
├── 08-distributed-training/ # DeepSpeed, FSDP, distributed systems
├── 09-infrastructure/ # PyTorch Lightning, Ray, Composer
├── 10-optimization/ # Flash Attention, bitsandbytes, kernels
├── 11-evaluation/ # Benchmarks, evaluation frameworks
├── 12-inference-serving/ # vLLM, TensorRT-LLM, llama.cpp
├── 13-mlops/ # Weights & Biases, MLflow, TensorBoard
├── 14-agents/ # LangChain, LlamaIndex, CrewAI
├── 15-rag/ # RAG pipelines, vector databases
├── 16-prompt-engineering/ # DSPy, Instructor, structured output
├── 17-observability/ # LangSmith, Phoenix, monitoring
├── 18-multimodal/ # LLaVA, Whisper, Stable Diffusion
└── 19-emerging-techniques/ # MoE, model merging, long context
📋 Skill Structure Template
Use SKILL_TEMPLATE.md as a starting point. Each skill should contain:
skill-name/
├── SKILL.md # Quick reference (50-150 lines)
│ ├── Metadata (name, description, version)
│ ├── When to use this skill
│ ├── Quick start examples
│ ├── Common patterns
│ └── Links to references
│
├── references/ # Deep documentation (300KB+)
│ ├── README.md # From GitHub/official docs
│ ├── api.md # API reference
│ ├── tutorials.md # Step-by-step guides
│ ├── issues.md # Real GitHub issues (if applicable)
│ ├── releases.md # Version history (if applicable)
│ └── file_structure.md # Codebase navigation (if applicable)
│
├── scripts/ # Helper scripts (optional)
└── assets/ # Templates & examples (optional)
🔍 Quality Standards
Code Examples
All code examples MUST have language detection:
✅ Good:
```python
from transformers import AutoModel
model = AutoModel.from_pretrained("gpt2")
```
❌ Bad:
```
from transformers import AutoModel
model = AutoModel.from_pretrained("gpt2")
```
Documentation Size
- Minimum: 100KB total in references/
- Target: 300KB+ total
- Gold Standard: 500KB+ with issues, releases, examples
Real-World Content
Prefer skills with:
- ✅ Real GitHub issues and solutions
- ✅ Release notes and breaking changes
- ✅ Community discussions
- ✅ Performance benchmarks
- ✅ Troubleshooting guides
Links and Citations
Always include:
- ✅ Official documentation link
- ✅ GitHub repository link
- ✅ License information
- ✅ Version/release information
🧪 Testing
Before submitting, verify:
# 1. SKILL.md is well-formatted
cat your-skill/SKILL.md
# 2. All reference files exist
ls -R your-skill/references/
# 3. Documentation size is adequate (300KB+ target)
du -sh your-skill/references/
# 4. Code blocks have language tags
grep -A 1 '```' your-skill/SKILL.md | head -20
# 5. No broken links (manual check)
# Open SKILL.md and verify all [links](urls) work
# 6. Marketplace entry added and valid
python3 -c "import json; json.load(open('.claude-plugin/marketplace.json'))"
🎓 Examples of High-Quality Skills
Gold Standard (emulate this):
- 06-post-training/grpo-rl-training/ (569 lines) ⭐⭐⭐⭐⭐
- Complete implementation workflow
- 10+ code examples with explanations
- Troubleshooting guide
- Common pitfalls and solutions
- Performance tips
- This is the quality bar
Good Examples: 2. 03-fine-tuning/axolotl/ (151 lines)
- Real configuration examples
- When to use guidance
- Comprehensive but could add more workflows
- 08-distributed-training/deepspeed/ (132 lines)
- ZeRO optimization patterns
- Configuration examples
- Good foundation, needs more troubleshooting
🚫 What NOT to Contribute
- ❌ Proprietary/closed-source tools
- ❌ Deprecated libraries (unless historically important)
- ❌ Duplicate skills (check existing skills first)
- ❌ Incomplete skills (<50 lines SKILL.md, <100KB refs)
- ❌ Skills without code examples
🎖️ Recognition
All contributors will be:
- ✅ Listed in CONTRIBUTORS.md
- ✅ Mentioned in release notes
- ✅ Featured on project homepage (when launched)
- ✅ Attributed in SKILL.md metadata
Top contributors (5+ skills) receive special recognition and maintainer status.
📞 Getting Help
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Questions: Open a discussion with "Question:" prefix
📅 Review Process
-
Automated Checks (when implemented):
- File structure validation
- Code block language detection
- Documentation size check
- Marketplace.json validation
-
Manual Review (by maintainers):
- Content quality and accuracy
- Code example validity
- Proper categorization
- License compliance
-
Feedback Loop:
- Reviews within 48-72 hours
- Constructive feedback provided
- Iterate until approved
-
Merge:
- Merged to main branch
- Added to release notes
- Contributor recognized
🙏 Thank You!
Your contributions help the entire AI research community. Every skill added makes Claude Code more powerful for researchers, engineers, and students worldwide.
Let's build something amazing together! 🚀