Files
ara/traefik-infrastructure/logic/solution/algorithm.md
T

4.7 KiB
Raw Blame History

Solution Algorithm

Mathematical Formulation

Let N = \{n_1, n_2, \ldots, n_k\} be the ordered list of nameservers in a container's /etc/resolv.conf.

Let \text{reachable}(n_i) be a boolean function returning true if nameserver n_i responds to UDP DNS queries from inside a bridge-networked container.

Let T_\text{timeout} be the DNS resolver timeout per nameserver (typically 5 seconds; actual system default was approximately 8 seconds with a retry).

The total DNS query latency is:

L_\text{total} = \sum_{i=1}^{j-1} T_\text{timeout} \cdot \mathbb{1}[\neg \text{reachable}(n_i)] + L_\text{query}(n_j)

where n_j is the first reachable nameserver and L_\text{query}(n_j) is the actual query RTT.

Before fix: N = [192.168.1.50, 169.254.24.117, 1.1.1.1], where \text{reachable}(192.168.1.50) = \text{false} and \text{reachable}(169.254.24.117) = \text{false}. Thus L_\text{total} \approx 2 \times T_\text{timeout} + L_\text{query}(1.1.1.1) \approx 8\text{s}.

After fix: N = [172.17.0.1], where \text{reachable}(172.17.0.1) = \text{true}. Thus L_\text{total} = L_\text{query}(172.17.0.1) \approx 2\text{ms}.


DNS Resolution Algorithm (Post-Fix)

Algorithm: DNS_RESOLUTION_PATH
Input: hostname H, container C (bridge-networked)
Output: IP address for H

1. C issues DNS query for H to nameserver at /etc/resolv.conf[0] = 172.17.0.1
2. Query traverses docker0 bridge to host network namespace
3. Technitium DNS at 172.17.0.1:53 receives query
4. IF H ∈ internal_domain(wylab.me):
     RETURN internal_record(H)
   ELSE:
     FORWARD query to upstream resolver (e.g., 1.1.1.1)
     RETURN upstream_response(H)
5. Response travels back to C via docker0 bridge
6. C receives IP address for H
Total RTT: ~2ms

ACME Certificate Acquisition Algorithm (Circular Dependency Break)

Algorithm: ACME_CERT_ACQUISITION
Input: domain D (e.g., traefik.wylab.me), Traefik config with resolvers=["1.1.1.1:53"]
Output: Valid TLS certificate for D

1. Traefik ACME module initiates certificate request for D
2. Traefik resolves acme-v02.api.letsencrypt.org:
     DNS query → 1.1.1.1:53 (NOT via system resolver / NOT via Technitium)
     Returns: Let's Encrypt ACME server IP
   NOTE: Technitium state is IRRELEVANT in this step
3. Traefik connects to Let's Encrypt ACME server
4. ACME challenge issued (HTTP-01 or DNS-01):
   HTTP-01: Let's Encrypt verifies /.well-known/acme-challenge/ on D
   DNS-01: Let's Encrypt checks TXT record on _acme-challenge.D
5. Traefik responds to challenge (serves file or adds DNS record)
6. Let's Encrypt validates challenge, issues certificate
7. Traefik stores certificate in acme.json
8. Certificate served for all requests to D

Key invariant: step 2 uses 1.1.1.1, not Technitium.
The circular dependency chain [Traefik→Technitium→Traefik] is broken.

Boot Persistence Algorithm

Algorithm: UNRAID_BOOT_PERSISTENCE
Input: /boot/config/go startup script
Output: Correct runtime state after every Unraid reboot

ON EVERY BOOT:
1. /boot/config/go executes (runs as root, after array start)
2. Write daemon.json:
     echo '{"dns": ["172.17.0.1"]}' > /etc/docker/daemon.json
3. Apply iptables DNAT rules:
     iptables -t nat -A DOCKER ... [CONFIGURE per specific routing needs]
4. (Optional) Restart Docker daemon to pick up daemon.json:
     /etc/rc.d/rc.docker restart
     OR: if Docker is not yet started, it will pick up daemon.json on first start
5. All subsequently started containers receive:
     /etc/resolv.conf: "nameserver 172.17.0.1"
6. DNS latency: ~2ms from first container start

Idempotency note: Step 3 iptables rules should use --check before --append to avoid
duplicate rules on repeated executions.

Complexity Analysis

DNS latency reduction

  • Before: O(k \cdot T_\text{timeout}) where k = number of unreachable nameservers before the first reachable one
  • After: O(1) — single nameserver, always reachable, no timeouts
  • Practical reduction: ~8000ms → ~2ms (4000× improvement)

Scope of fix

  • Daemon.json change: Affects all future containers system-wide — O(1) configuration change with O(n) effect across n containers
  • Container-level resolv.conf edit: Affects only 1 container, non-persistent — O(1) effect, O(1) scope (dead end)

ACME circular dependency

  • Without bypass: ACME success probability P(\text{cert}) = P(\text{Technitium up}) \cdot P(\text{Traefik up}) — both must be healthy simultaneously
  • With bypass: ACME success probability P(\text{cert}) = P(\text{1.1.1.1 reachable}) ≈ 1 — independent of Technitium state

Boot persistence

  • Without /boot/config/go: Configuration lifetime = until next reboot
  • With /boot/config/go: Configuration lifetime = permanent (re-applied on every boot in O(1) time)