Files
claude-hub/docs/docker-optimization.md
Jonathan bda604bfdc fix: address PR review feedback
- Implement self-hosted runner fallback via USE_SELF_HOSTED repository variable
- Add runner information logging for debugging
- Add timeout protection (30 minutes) to prevent hanging
- Update documentation to match actual implementation
- Fix npm permission context switching in Dockerfile
- Consolidate directory creation to minimize user context switches
2025-05-29 14:30:52 -05:00

230 lines
6.3 KiB
Markdown

# Docker Build Optimization Guide
This document describes the optimizations implemented in our Docker CI/CD pipeline for faster builds and better caching.
## Overview
Our optimized Docker build pipeline includes:
- Self-hosted runner support with automatic fallback
- Multi-stage builds for efficient layering
- Advanced caching strategies
- Container-based testing
- Parallel builds for multiple images
- Security scanning integration
## Self-Hosted Runners
### Configuration
- **Labels**: `self-hosted, linux, x64, docker`
- **Usage**: All Docker builds use self-hosted runners by default for improved performance
- **Local Cache**: Self-hosted runners maintain Docker layer cache between builds
- **Fallback**: Configurable via `USE_SELF_HOSTED` repository variable
### Runner Setup
Self-hosted runners provide:
- Persistent Docker layer cache
- Faster builds (no image pull overhead)
- Better network throughput for pushing images
- Cost savings on GitHub Actions minutes
### Fallback Strategy
The workflow implements a flexible fallback mechanism:
1. **Default behavior**: Uses self-hosted runners (`self-hosted, linux, x64, docker`)
2. **Override option**: Set repository variable `USE_SELF_HOSTED=false` to force GitHub-hosted runners
3. **Timeout protection**: 30-minute timeout prevents hanging on unavailable runners
4. **Failure detection**: `build-fallback` job provides instructions if self-hosted runners fail
To manually switch to GitHub-hosted runners:
```bash
# Via GitHub UI: Settings → Secrets and variables → Actions → Variables
# Add: USE_SELF_HOSTED = false
# Or via GitHub CLI:
gh variable set USE_SELF_HOSTED --body "false"
```
The runner selection logic:
```yaml
runs-on: ${{ fromJSON(format('["{0}"]', (vars.USE_SELF_HOSTED == 'false' && 'ubuntu-latest' || 'self-hosted, linux, x64, docker'))) }}
```
## Multi-Stage Dockerfile
Our Dockerfile uses multiple stages for optimal caching and smaller images:
1. **Builder Stage**: Compiles TypeScript
2. **Prod-deps Stage**: Installs production dependencies only
3. **Test Stage**: Includes dev dependencies and test files
4. **Production Stage**: Minimal runtime image
### Benefits
- Parallel builds of independent stages
- Smaller final image (no build tools or dev dependencies)
- Test stage can run in CI without affecting production image
- Better layer caching between builds
## Caching Strategies
### 1. GitHub Actions Cache (GHA)
```yaml
cache-from: type=gha,scope=${{ matrix.image }}-prod
cache-to: type=gha,mode=max,scope=${{ matrix.image }}-prod
```
### 2. Registry Cache
```yaml
cache-from: type=registry,ref=${{ org }}/claude-hub:nightly
```
### 3. Inline Cache
```yaml
build-args: BUILDKIT_INLINE_CACHE=1
outputs: type=inline
```
### 4. Layer Ordering
- Package files copied first (changes less frequently)
- Source code copied after dependencies
- Build artifacts cached between stages
## Container-Based Testing
Tests run inside Docker containers for:
- Consistent environment
- Parallel test execution
- Isolation from host system
- Same environment as production
### Test Execution
```bash
# Unit tests in container
docker run --rm claude-hub:test npm test
# Integration tests with docker-compose
docker-compose -f docker-compose.test.yml run integration-test
# E2E tests against running services
docker-compose -f docker-compose.test.yml run e2e-test
```
## Build Performance Optimizations
### 1. BuildKit Features
- `DOCKER_BUILDKIT=1` for improved performance
- `--mount=type=cache` for package manager caches
- Parallel stage execution
### 2. Docker Buildx
- Multi-platform builds (amd64, arm64)
- Advanced caching backends
- Build-only stages that don't ship to production
### 3. Context Optimization
- `.dockerignore` excludes unnecessary files
- Minimal context sent to Docker daemon
- Faster uploads and builds
### 4. Dependency Caching
- Separate stage for production dependencies
- npm ci with --omit=dev for smaller images
- Cache mount for npm packages
## Workflow Features
### PR Builds
- Build and test without publishing
- Single platform (amd64) for speed
- Container-based test execution
- Security scanning with Trivy
### Main Branch Builds
- Multi-platform builds (amd64, arm64)
- Push to registry with :nightly tag
- Update cache images
- Full test suite execution
### Version Tag Builds
- Semantic versioning tags
- :latest tag update
- Multi-platform support
- Production-ready images
## Security Scanning
### Integrated Scanners
1. **Trivy**: Vulnerability scanning for Docker images
2. **Hadolint**: Dockerfile linting
3. **npm audit**: Dependency vulnerability checks
4. **SARIF uploads**: Results visible in GitHub Security tab
## Monitoring and Metrics
### Build Performance
- Build time per stage
- Cache hit rates
- Image size tracking
- Test execution time
### Health Checks
```yaml
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3002/health"]
interval: 30s
timeout: 10s
retries: 3
```
## Local Development
### Building locally
```bash
# Build with BuildKit
DOCKER_BUILDKIT=1 docker build -t claude-hub:local .
# Build specific stage
docker build --target test -t claude-hub:test .
# Run tests locally
docker-compose -f docker-compose.test.yml run test
```
### Cache Management
```bash
# Clear builder cache
docker builder prune
# Use local cache
docker build --cache-from claude-hub:local .
```
## Best Practices
1. **Order Dockerfile commands** from least to most frequently changing
2. **Use specific versions** for base images and dependencies
3. **Minimize layers** by combining RUN commands
4. **Clean up** package manager caches in the same layer
5. **Use multi-stage builds** to reduce final image size
6. **Leverage BuildKit** features for better performance
7. **Test in containers** for consistency across environments
8. **Monitor build times** and optimize bottlenecks
## Troubleshooting
### Slow Builds
- Check cache hit rates in build logs
- Verify .dockerignore is excluding large files
- Use `--progress=plain` to see detailed timings
- Consider parallelizing independent stages
### Cache Misses
- Ensure consistent base image versions
- Check for unnecessary file changes triggering rebuilds
- Use cache mounts for package managers
- Verify registry cache is accessible
### Test Failures in Container
- Check environment variable differences
- Verify volume mounts are correct
- Ensure test dependencies are in test stage
- Check for hardcoded paths or ports