diff --git a/ss14-cicd/evidence/README.md b/ss14-cicd/evidence/README.md new file mode 100644 index 0000000..a365251 --- /dev/null +++ b/ss14-cicd/evidence/README.md @@ -0,0 +1,18 @@ +# Evidence Index + +All evidence sourced from session logs in HISTORY.md (Makar Novozhilov's engineering notes, +December 2025). No formal benchmarks or tables were produced — evidence is operational log data. + +## Tables + +| File | Source | Claims | Description | +|------|--------|--------|-------------| +| [tables/table1_runner_configurations.md](tables/table1_runner_configurations.md) | HISTORY.md: 2025-12-14 through 2025-12-19 | C01, C02, C03, C04 | All three runner configurations attempted, with outcomes and failure modes | +| [tables/table2_cache_failure_modes.md](tables/table2_cache_failure_modes.md) | HISTORY.md: 2025-12-15, 2025-12-19 | C01, C02 | Cache strategies tested and their failure modes | +| [tables/table3_capacity_oom_progression.md](tables/table3_capacity_oom_progression.md) | HISTORY.md: 2025-12-19 | C04 | OOM-driven concurrent job capacity reduction sequence | +| [tables/table4_dns_approaches.md](tables/table4_dns_approaches.md) | HISTORY.md: 2025-12-14 | C03 | DNS resolution approaches tested and their outcomes | + +## Figures + +No quantitative figures available — this is an engineering log project, not a benchmarked experiment. +Performance observations (e.g., "5 minutes vs 5 seconds") are captured in the tables above. diff --git a/ss14-cicd/evidence/tables/table1_runner_configurations.md b/ss14-cicd/evidence/tables/table1_runner_configurations.md new file mode 100644 index 0000000..f64eb19 --- /dev/null +++ b/ss14-cicd/evidence/tables/table1_runner_configurations.md @@ -0,0 +1,11 @@ +# Table 1 — Runner Configurations Attempted + +**Source**: Session logs in HISTORY.md: 2025-12-14, 2025-12-15, 2025-12-18, 2025-12-19 +**Caption**: All three act-runner configurations attempted for the wylab-station-14 CI/CD pipeline, with their architecture, outcome, and primary failure mode. +**Extraction type**: raw_table + +| Configuration | Host | Architecture | Period | Primary Failure Mode | Outcome | +|--------------|------|-------------|--------|---------------------|---------| +| Unraid container runner | 192.168.1.50 (Unraid) | x86-64 (amd64) | 2025-12-14 | DNS resolution failure — job containers cannot resolve git.wylab.me in bridge network mode; 1/6 jobs succeeded with host networking | Reverted / partially abandoned | +| External VPS runner | 45.137.68.83 (Contabo, Düsseldorf) | x86-64 (amd64) | 2025-12-15 | Node.js module errors in .cache/act/; native Gitea cache ETIMEDOUT on port 39913; .NET cache step 5 min vs 5 sec | Abandoned — crashing under load | +| macOS ARM64 runner (OrbStack) | Developer MacBook, Apple Silicon | ARM64 (arm64) | 2025-12-18 – 2025-12-19 | OOM crashes with concurrent dotnet builds; mixed-arch cache corruption (arm64 entries consumed by x86-64 jobs, silent wrong-arch artifacts); runner kept crashing | Active but unstable as of 2025-12-19; fix: architecture-tagged cache keys OR runner label pinning | diff --git a/ss14-cicd/evidence/tables/table2_cache_failure_modes.md b/ss14-cicd/evidence/tables/table2_cache_failure_modes.md new file mode 100644 index 0000000..00bfae2 --- /dev/null +++ b/ss14-cicd/evidence/tables/table2_cache_failure_modes.md @@ -0,0 +1,12 @@ +# Table 2 — Cache Strategies and Failure Modes + +**Source**: Session logs in HISTORY.md: 2025-12-15, 2025-12-18, 2025-12-19 +**Caption**: Cache strategies tested for the wylab-station-14 .NET build pipeline, with their configuration, observed behavior, and disposition. +**Extraction type**: raw_table + +| Cache Strategy | Protocol | Configured On | Observed Behavior | Failure Mode | Disposition | +|---------------|----------|---------------|------------------|-------------|------------| +| Native Gitea act-cache-server (remote) | HTTP on port 39913 | External VPS runner (45.137.68.83) | .NET cache step: ~5 minutes (vs ~5 seconds for other steps); ETIMEDOUT connecting to 45.137.68.83:39913 from inside job containers | Full cache miss every build; job containers cannot reach port 39913 through Docker bridge | Abandoned | +| Local file cache (volume mount) | Filesystem (no HTTP) | macOS OrbStack runner | Stable cache hits; no timeout errors | No direct failure — but shared across arm64 and amd64 runners via NFS mount would cause C01 cache corruption | Adopted for single-runner use; requires architecture-tagged keys for multi-runner | +| Architecture-agnostic cache key | N/A (key design flaw) | Both runners sharing cache | arm64 cache entries with key `dotnet-{hash}` returned as hits for amd64 jobs (same project hash, same key) | Silent wrong-arch artifact production: builds pass CI but produce arm64 binaries on x86-64 deployment target | Root cause of C01; fix: include runner.arch in key | +| Architecture-tagged cache key | N/A (proposed fix) | Proposed for all runners | Not yet tested as of 2025-12-19 | No known failure mode — key includes arch, making cross-arch collision impossible | Proposed as C05 fix; see H01 | diff --git a/ss14-cicd/evidence/tables/table3_capacity_oom_progression.md b/ss14-cicd/evidence/tables/table3_capacity_oom_progression.md new file mode 100644 index 0000000..ce4623b --- /dev/null +++ b/ss14-cicd/evidence/tables/table3_capacity_oom_progression.md @@ -0,0 +1,17 @@ +# Table 3 — OOM-Driven Concurrent Job Capacity Reduction + +**Source**: Session log in HISTORY.md: 2025-12-19 +**Caption**: Sequential reduction of act-runner concurrent job capacity on the macOS ARM64 OrbStack runner due to out-of-memory crashes from concurrent dotnet builds. OrbStack does not expose swap memory (macOS manages memory pressure at hypervisor level). +**Extraction type**: raw_table + +| Step | Capacity Setting | Observed Outcome | Action Taken | +|------|-----------------|-----------------|-------------| +| Initial | 6 concurrent jobs | OOM crash under dotnet build load | Reduced capacity | +| Reduction 1 | 4 concurrent jobs | Still OOM crashing | Reduced capacity | +| Reduction 2 | 3 concurrent jobs | Still OOM crashing | Reduced capacity | +| Reduction 3 | 2 concurrent jobs | Stable — no OOM crashes observed | Kept at 2 | + +**Notes**: +- OrbStack constraint: no swap exposed to Linux VM; OOM kills are abrupt without graceful degradation +- Dotnet memory pressure: MSBuild build server + NuGet restore + compilation all consume significant memory; concurrent jobs multiply this linearly +- This constraint is specific to OrbStack on macOS; Linux runners with swap enabled may support higher concurrency diff --git a/ss14-cicd/evidence/tables/table4_dns_approaches.md b/ss14-cicd/evidence/tables/table4_dns_approaches.md new file mode 100644 index 0000000..5aa6ae4 --- /dev/null +++ b/ss14-cicd/evidence/tables/table4_dns_approaches.md @@ -0,0 +1,14 @@ +# Table 4 — DNS Resolution Approaches and Outcomes + +**Source**: Session log in HISTORY.md: 2025-12-14 +**Caption**: DNS resolution approaches tested for enabling runner job containers to resolve the internal hostname git.wylab.me, which is only served by Technitium DNS at 192.168.1.50. +**Extraction type**: raw_table + +| Approach | Configuration | Target | Result | Notes | +|----------|--------------|--------|--------|-------| +| Default bridge network | No custom DNS | Runner process container | Failed — git.wylab.me unresolvable in all job containers | Docker bridge DNS does not forward to Technitium | +| External DNS (1.1.1.1) | Added 1.1.1.1 to runner DNS config | Runner process container | Failed — 1.1.1.1 cannot resolve private internal hostname | Public DNS has no record for git.wylab.me | +| Host networking mode | container.network: host in runner config | Job containers | Partial — 1 out of 6 jobs succeeded; inconsistent | Mechanism of inconsistency not determined; changes reverted | +| Apply DNS to app container (alternative) | DNS applied to Gitea app container (not runner) | Wrong target | Failed — wrong target; Gitea app does not need DNS fix, runner job containers do | Misidentification of which component needs the fix | +| Docker bridge gateway (172.17.0.1) | container.dns: ["172.17.0.1"] in runner config (proposed) | Job containers | Not tested as of 2025-12-14 | Bridge gateway forwards to Technitium; expected to work based on Docker networking model | +| Technitium direct (192.168.1.50) | container.dns: ["192.168.1.50"] in runner config (proposed) | Job containers | Not tested | Only works if 192.168.1.50 is reachable from Docker bridge subnet | diff --git a/ss14-cicd/logic/concepts.md b/ss14-cicd/logic/concepts.md new file mode 100644 index 0000000..3834864 --- /dev/null +++ b/ss14-cicd/logic/concepts.md @@ -0,0 +1,43 @@ +# Concepts + +## act-runner +- **Notation**: `act-runner` (binary), `config.yml` (configuration file) +- **Definition**: The Gitea Actions runner daemon. It polls the Gitea server for pending workflow jobs, spawns per-job containers (using Docker or a process executor), and streams logs back. It is the Gitea equivalent of the GitHub Actions self-hosted runner. Each runner registers with a token and is assigned labels (e.g., `ubuntu-latest`, `self-hosted`). The runner binary manages the job container lifecycle including cache volume mounts. +- **Boundary conditions**: Applies when Gitea Actions workflows are used (not Gitea CI's legacy YAML format). Requires Docker to be installed on the host if container executor is used. Job containers inherit the runner's Docker daemon socket by default. Does not apply to GitHub Actions (uses `actions/runner`, different protocol). +- **Related concepts**: Runner label pinning, Runner job container, Cache key, Gitea Actions + +## Runner label pinning +- **Notation**: `runs-on: