Files
ara/traefik-infrastructure/logic/experiments.md
T

7.0 KiB

Experiments

E01: DNS latency measurement before and after daemon.json fix

  • Verifies: C01, C02
  • Setup:
    • System: Unraid UM790 Pro, 32GB RAM, Docker Engine (version not specified in source material)
    • Network: Default docker0 bridge (172.17.0.0/16), Technitium DNS in host network mode
    • Containers under test: Any bridge-networked container (e.g., nanobot container)
    • Pre-condition: /etc/docker/daemon.json does NOT yet contain the 172.17.0.1 DNS override
  • Procedure:
    1. From inside a bridge-networked container, run repeated dig or nslookup queries for a known hostname (e.g., google.com) and record round-trip time for each query.
    2. Inspect the container's /etc/resolv.conf to confirm the nameserver order: 192.168.1.50, 169.254.24.117, 1.1.1.1.
    3. Test reachability of 192.168.1.50:53 via UDP from inside the container (e.g., nc -u -w 1 192.168.1.50 53). Confirm it times out.
    4. Test reachability of 172.17.0.1:53 via UDP from inside the container. Confirm it responds.
    5. On the Unraid host, add {"dns": ["172.17.0.1"]} to /etc/docker/daemon.json and restart the Docker daemon (systemctl restart docker or equivalent).
    6. Recreate the test container (daemon.json changes only apply to newly started containers).
    7. Repeat the DNS latency measurements from step 1 inside the new container.
    8. Inspect the new container's /etc/resolv.conf to confirm it now lists only 172.17.0.1.
  • Metrics: DNS query round-trip time in milliseconds (before and after), nameserver list in resolv.conf (before and after)
  • Expected outcome:
    • Before: ~8-second query time (timeout waiting for 192.168.1.50), resolv.conf lists 192.168.1.50 first
    • After: ~2ms query time, resolv.conf lists only 172.17.0.1
    • 172.17.0.1:53 responds; 192.168.1.50:53 does not respond from inside bridge container
  • Baselines: DNS query latency via 1.1.1.1 directly (should be fast, demonstrating the delay is nameserver ordering, not network latency)
  • Dependencies: none

E02: Verify resolv.conf edit failure mode (controlled reproduction)

  • Verifies: C06
  • Setup:
    • System: Any Docker bridge-networked container on the Unraid host
    • Container: Disposable test container (NOT the nanobot container or any production container)
    • Pre-condition: Container has working DNS (post-E01 fix, 172.17.0.1 as nameserver)
  • Procedure:
    1. Inspect the container's /etc/resolv.conf — note current working nameserver (172.17.0.1).
    2. Inside the container, overwrite /etc/resolv.conf with only a non-reachable nameserver (e.g., nameserver 192.0.2.1 — the TEST-NET range, guaranteed unreachable).
    3. Immediately attempt DNS resolution from inside the container (e.g., ping google.com). Observe failure.
    4. Without touching the container, check another bridge-networked container's DNS — confirm it is unaffected.
    5. Recreate (stop and start) the test container. Inspect /etc/resolv.conf — confirm Docker has regenerated it from daemon.json, overwriting the manual change.
  • Metrics: DNS resolution success/failure before and after /etc/resolv.conf edit, DNS resolution in sibling container (should be unaffected), resolv.conf content after container recreation
  • Expected outcome:
    • After edit: DNS fails in the edited container, succeeds in all other containers
    • After recreation: resolv.conf is restored to daemon.json-derived content (nameserver 172.17.0.1)
    • Confirms: the edit is container-scoped and non-persistent
  • Baselines: Sibling container DNS behavior (unchanged throughout)
  • Dependencies: E01

E03: Traefik ACME certificate acquisition with and without explicit resolver

  • Verifies: C03, C05
  • Setup:
    • System: Unraid host with Traefik running in bridge-networked Docker container
    • Traefik version: v2.x (exact version not specified in source material)
    • DNS: Technitium DNS at 172.17.0.1:53 (post-E01 fix)
    • Domain: wylab.me (with valid public DNS delegation for HTTP-01 or DNS-01 challenge)
    • Pre-condition: Traefik static config does NOT yet have resolvers field in certificatesResolvers
  • Procedure:
    1. Configure Traefik with a certificatesResolvers block pointing to Let's Encrypt (staging endpoint recommended for testing).
    2. Without the resolvers field, force Traefik to request a certificate for a test subdomain (e.g., test.wylab.me).
    3. Observe whether Traefik successfully resolves acme-v02.api.letsencrypt.org and completes the challenge. Check Traefik logs for DNS resolution errors.
    4. Simulate Technitium unavailability (stop the Technitium container). Repeat the certificate request. Observe failure.
    5. Re-add the resolvers = ["1.1.1.1:53"] field to the certificatesResolvers configuration.
    6. Restart Traefik. With Technitium stopped, attempt certificate acquisition again.
    7. Observe that certificate acquisition succeeds despite Technitium being unavailable.
    8. Test that http://172.17.0.1:PORT correctly reaches a host-networked service from Traefik's perspective.
  • Metrics: Certificate acquisition success/failure, Traefik log error messages related to DNS, time to certificate acquisition
  • Expected outcome:
    • Without resolvers: Certificate acquisition fails or is at risk when Technitium is unavailable
    • With resolvers = ["1.1.1.1:53"]: Certificate acquisition succeeds independently of Technitium state
    • Backend at http://172.17.0.1:PORT is reachable from Traefik; http://127.0.0.1:PORT is not
  • Baselines: Traefik certificate acquisition using system resolver (Technitium); direct curl to https://acme-v02.api.letsencrypt.org from the Traefik container
  • Dependencies: E01

E04: Verify persistence of daemon.json and iptables rules across Unraid reboot

  • Verifies: C04
  • Setup:
    • System: Unraid UM790 Pro with /boot/config/go configured with startup commands
    • Pre-condition: E01 fix applied AND persisted in /boot/config/go
  • Procedure:
    1. Confirm current state: daemon.json contains 172.17.0.1 DNS entry; iptables DNAT rules are active; DNS resolves fast from containers.
    2. Inspect /boot/config/go to confirm it contains the commands to write daemon.json and add iptables rules.
    3. Perform a full Unraid reboot (not just Docker restart).
    4. After reboot, inspect /etc/docker/daemon.json — confirm DNS entry is present.
    5. List active iptables rules — confirm DNAT rules are present.
    6. From a bridge-networked container, test DNS latency — confirm ~2ms resolution.
    7. From a host-networked container, test that iptables DNAT routes traffic correctly.
  • Metrics: daemon.json content after reboot, iptables rule presence after reboot, DNS latency after reboot
  • Expected outcome:
    • daemon.json retains 172.17.0.1 DNS entry across reboot
    • iptables DNAT rules are restored by /boot/config/go on each boot
    • DNS latency remains ~2ms after reboot (no regression to 8-second latency)
  • Baselines: State without /boot/config/go entries (expected: daemon.json reset to default, iptables rules lost, DNS latency returns to ~8s)
  • Dependencies: E01