Files
ara/orchestra-skills/04-mechanistic-interpretability/transformer-lens/references/README.md
T

1.6 KiB

TransformerLens Reference Documentation

This directory contains comprehensive reference materials for TransformerLens.

Contents

  • api.md - Complete API reference for HookedTransformer, ActivationCache, and HookPoints
  • tutorials.md - Step-by-step tutorials for common interpretability workflows
  • papers.md - Key research papers and foundational concepts

Installation

pip install transformer-lens

Basic Usage

from transformer_lens import HookedTransformer

# Load model
model = HookedTransformer.from_pretrained("gpt2-small")

# Run with activation caching
tokens = model.to_tokens("Hello world")
logits, cache = model.run_with_cache(tokens)

# Access activations
residual = cache["resid_post", 5]  # Layer 5 residual stream
attention = cache["pattern", 3]    # Layer 3 attention patterns

Key Concepts

HookPoints

Every activation in the transformer has a HookPoint wrapper, enabling:

  • Reading activations via run_with_cache()
  • Modifying activations via run_with_hooks()

Activation Cache

The ActivationCache stores all intermediate activations with helper methods for:

  • Residual stream decomposition
  • Logit attribution
  • Layer-wise analysis

Supported Models (50+)

GPT-2, LLaMA, Mistral, Pythia, GPT-Neo, OPT, Gemma, Phi, and more.