Commit Graph

470 Commits

Author SHA1 Message Date
Cr4xy 636b53e70f Improve rocm-smi performance monitoring (#775)
Linux CI / run-tests (push) Successful in 2m53s
Windows CI / run-tests (push) Has been cancelled
Fix hardcoded indices for rocm-smi.
v217
2026-05-20 17:59:49 -07:00
gatkisson 59cd3b690d Added Windows performance monitoring using nvidia-smi (#773)
Linux CI / run-tests (push) Successful in 3m0s
Windows CI / run-tests (push) Has been cancelled
updates: #596, #771
2026-05-18 11:02:03 -07:00
Benson Wong 5d1e62d224 Disable auto review feature in coderabbit config 2026-05-18 10:40:21 -07:00
Benson Wong dbb869d019 Increase inactivity thresholds for stale issues
Updated stale issue and close messages to reflect new inactivity thresholds.
2026-05-17 22:52:58 -07:00
Benson Wong 26bb17e57e config.example.yaml: Improve matrix vs groups info
Validate JSON Schema / validate-schema (push) Successful in 11s
UI Tests / run-tests (push) Successful in 50s
For some use cases groups are simpler to use. Note this in the
documentation that it is still fully supported.
2026-05-17 15:59:25 -07:00
Benson Wong 2982dd3d40 ui-svelte: update link to performance discussion thread v216 2026-05-17 11:45:56 -07:00
knguyen298 79dc87f881 Add ROCm stats via rocm-smi (#767)
Linux CI / run-tests (push) Successful in 3m22s
Windows CI / run-tests (push) Has been cancelled
Add ROCm GPU stats support using `rocm-smi`.
v215
2026-05-17 07:58:26 -07:00
krzychdre b2fcc2daa1 ui-svelte: fix cached tokens total counting -1 sentinel (#760)
UI Tests / run-tests (push) Successful in 52s
Linux CI / run-tests (push) Successful in 3m3s
Windows CI / run-tests (push) Has been cancelled
The backend uses cache_tokens=-1 as a sentinel for endpoints that don't
report cache stats (embeddings, vLLM). The activity table correctly
renders these as "-", but the totals widget summed the sentinels
directly, so each such request subtracted 1 from the displayed total.

- clamp cache_tokens with Math.max(0, ...) when reducing
v214
2026-05-15 14:42:44 -07:00
cdwaage 6a9c4efc8f fix: use --loop instead of -loop for nvidia-smi (driver 540+ compat) (#759) 2026-05-15 13:20:29 -07:00
Benson Wong 0c813e44d1 ui-svelte: package updates
UI Tests / run-tests (push) Successful in 53s
Linux CI / run-tests (push) Successful in 3m3s
Windows CI / run-tests (push) Has been cancelled
v213
2026-05-14 21:56:04 -07:00
Benson Wong fe71e8a6ea proxy,ui-svelte: improve support for v1/messages and v1/responses (#758)
This improves the support for activity logging from the v1/responses and
v1/messages endpoints.

- add chat endpoint selection to Playground > Chat > Settings
- improve metrics extraction for streaming v1/messages and v1/responses
endpoints (tested with llama-server)

Fixes #742
2026-05-14 21:53:57 -07:00
Benson Wong aac7b8745a ci: set go-version-file in release workflow
Validate JSON Schema / validate-schema (push) Successful in 53s
Build Containers / build-and-push (cpu) (push) Failing after 2m44s
Build Containers / build-and-push (cuda) (push) Failing after 2m0s
Build Containers / build-and-push (cuda13) (push) Failing after 48s
Build Containers / build-and-push (intel) (push) Failing after 48s
Build Containers / build-and-push (musa) (push) Failing after 53s
Build Containers / build-and-push (rocm) (push) Failing after 49s
Build Containers / build-and-push (vulkan) (push) Failing after 45s
UI Tests / run-tests (push) Successful in 3m6s
Linux CI / run-tests (push) Successful in 5m21s
Build Containers / delete-untagged-containers (push) Has been skipped
Windows CI / run-tests (push) Has been cancelled
v212
2026-05-13 22:12:02 -07:00
Benson Wong 4e606feff0 ci: fix workflow bugs in release and go-ci
- release.yml: merge orphaned `uses:` into the Checkout step
- go-ci.yml: skip simple-responder build when restored from cache
2026-05-13 21:48:27 -07:00
Benson Wong a4b91e08cf Changes and fixes before the release (docs/small tweaks) (#750)
- update README.md with new docker instructions
- update docs/configuration.md
- update .github/workflows to have pinned action versions
- gofmt events package
- fix small bugs in CI scripts
- reduce config options for internal/perf/monitor and config. A ring buffer is used to keep 1hr of entries at max 5s granularity. For long term stats use prometheus monitoring on /metrics

Fixes #744
2026-05-13 21:18:19 -07:00
David Soušek 3e3646f9f9 perf: ignore LACT devices reporting zero VRAM (#753)
Linux CI / run-tests (push) Successful in 2m58s
Windows CI / run-tests (push) Has been cancelled
Ignore LACT devices that report zero total VRAM.

Some virtual GPUs on headless VMs report `MemTotalMB == 0` through LACT,
which makes them appear in performance monitoring despite not providing
useful memory data. Skip those entries so only usable GPU devices are
reported.

This makes performance monitoring cleaner on headless VMs with virtual
GPUs that report zero VRAM.

Co-authored-by: David Soušek <david.sousek@intelogy.co.uk>
2026-05-13 10:03:54 -07:00
rhtenhove a01afe261b ci: use manifest-aware cleanup action for multi-arch :cpu (#751)
Build Containers / build-and-push (cpu) (push) Failing after 55s
Build Containers / build-and-push (cuda) (push) Failing after 54s
Build Containers / build-and-push (cuda13) (push) Failing after 8s
Build Containers / build-and-push (intel) (push) Failing after 50s
Build Containers / build-and-push (musa) (push) Failing after 50s
Build Containers / build-and-push (vulkan) (push) Failing after 50s
Build Containers / build-and-push (rocm) (push) Failing after 57s
Build Containers / delete-untagged-containers (push) Has been skipped
actions/delete-package-versions can't see OCI manifest lists. When the
cpu build pushes a multi-arch image, the registry gets a tagged index
plus one untagged per-platform manifest per arch. The cleanup step with
`delete-only-untagged-versions: true` then deletes the per-platform
children, leaving the index dangling — `docker pull
ghcr.io/mostlygeek/llama-swap:cpu` 404s on the referenced sha.

Swap to dataaxiom/ghcr-cleanup-action, which inspects tagged manifest
lists first and excludes their children from deletion. Single-arch
backends behave the same as before.

Fix #746
2026-05-12 18:04:46 -07:00
rhtenhove 174e8562aa Multi arch cpu (#746)
Build Containers / build-and-push (cpu) (push) Failing after 2m12s
Build Containers / build-and-push (cuda) (push) Failing after 2m10s
Build Containers / build-and-push (cuda13) (push) Failing after 49s
Build Containers / build-and-push (intel) (push) Failing after 51s
Build Containers / build-and-push (musa) (push) Failing after 57s
Build Containers / build-and-push (rocm) (push) Failing after 52s
Build Containers / build-and-push (vulkan) (push) Failing after 46s
Build Containers / delete-untagged-containers (push) Has been skipped
Encountered a similar problem as in
https://github.com/mostlygeek/llama-swap/issues/709 but in my case I
only needed the :cpu version.

So decided to add the github action to build arm64 combined with the
amd64 version on the same :cpu tag. Already tested it from this fork:
ghcr.io/rhtenhove/llama-swap:cpu and it works perfectly fine.

Adding GPU support is a whole other beast, needing quite a bit more work
and isn't something I can test.
2026-05-11 21:03:48 -07:00
Abdulazez A. 085b54bc88 proxy: fix data race in /running endpoint and typo in error message (#748)
Linux CI / run-tests (push) Successful in 3m0s
Windows CI / run-tests (push) Has been cancelled
## Problem

The `/running` endpoint in `listRunningProcessesHandler` reads
`process.state` directly without holding `stateMutex`. Meanwhile,
`swapState()` writes to `process.state` while holding the write lock.
This is a data race flagged by the Go race detector.

Also fixes a minor typo: "processes was in state" → "process was in
state".

## Fix

- `proxymanager.go`: Replace `process.state` with
`process.CurrentState()` which acquires `stateMutex.RLock()` before
reading.
- `process.go`: Fix typo in error message.

## Verification

- `gofmt -l` — clean
- `go test -run "TestProcessGroup_|TestProxyManager_" ./proxy/` — all
pass
- `go test ./proxy/config/... ./proxy/cache/...
./proxy/configwatcher/...` — all pass
2026-05-11 12:49:18 -07:00
bankjaneo 2be3416baa ui: add auto theme switch mode based on system theme (#741)
UI Tests / run-tests (push) Successful in 51s
Add system theme detection with automatic switching when OS theme
changes.

- Add ThemeMode type with "light", "dark", and "system" options
- Add system theme listener using matchMedia API
- Update theme toggle to cycle through System → Light → Dark
- Add combined sun/moon icon for system theme mode
- Migrate existing theme preferences to new format
2026-05-09 20:22:18 -07:00
Benson Wong 7e3e94a08a proxy,ui: add performance monitoring with Prometheus metrics (#743)
Validate JSON Schema / validate-schema (push) Successful in 25s
UI Tests / run-tests (push) Successful in 1m16s
Linux CI / run-tests (push) Successful in 3m36s
Windows CI / run-tests (push) Has been cancelled
Add a comprehensive performance monitoring system that collects CPU, memory, swap, load average, network IO, and GPU stats. Provides both a REST API for the UI and a Prometheus /metrics endpoint.

Backend changes:
- New internal/perf package with configurable interval-based stats collection
- GPU monitoring via LACT (Unix socket) and nvidia-smi fallback on Linux
- Ring buffer (internal/ring) for time-series stat storage
- Prometheus /metrics endpoint with all system and GPU metrics
- Moved LogMonitor to internal/logmon package
- New PerformanceConfig for hot-reloadable monitoring settings
- REST /api/performance endpoint replacing SSE streaming

UI changes:
- New Performance page with real-time charts for CPU, memory, GPU, and network
- Reusable PerformanceChart component
- LLAMA_SWAP_URL environment variable support
- Improved capture dialog display

Other:
- Example Grafana dashboard for Prometheus metrics
- monitor-test standalone binary
- Config schema and example updates

fixes #596
2026-05-09 13:29:22 -07:00
Wim Vander Schelden e261745c66 proxy: add versionless API endpoint (#733)
Linux CI / run-tests (push) Successful in 3m10s
Close inactive issues / close-issues (push) Successful in 38s
Build Unified Docker Image / setup (push) Successful in 5s
Build Containers / build-and-push (cpu) (push) Failing after 26s
Build Containers / build-and-push (cuda) (push) Failing after 23s
Build Containers / build-and-push (cuda13) (push) Failing after 14s
Build Containers / build-and-push (intel) (push) Failing after 13s
Build Containers / build-and-push (musa) (push) Failing after 14s
Build Containers / build-and-push (rocm) (push) Failing after 46s
Build Containers / build-and-push (vulkan) (push) Failing after 39s
Build Containers / delete-untagged-containers (push) Has been skipped
Build Unified Docker Image / build (push) Failing after 14s
Windows CI / run-tests (push) Has been cancelled
Add versionless endpoints under v/ to support upstream peers that 
do not use the v1/ prefix.

Fixes #728.
2026-05-03 13:47:38 -07:00
Benson Wong 11b7913287 llama-swap.go: remove debounce, replace fmt.Printlns (#731)
Linux CI / run-tests (push) Successful in 3m22s
Build Unified Docker Image / setup (push) Successful in 4s
Build Containers / build-and-push (cpu) (push) Failing after 10s
Build Containers / build-and-push (cuda) (push) Failing after 10s
Build Containers / build-and-push (cuda13) (push) Failing after 10s
Build Containers / build-and-push (intel) (push) Failing after 10s
Build Containers / build-and-push (musa) (push) Failing after 10s
Build Containers / build-and-push (rocm) (push) Failing after 10s
Build Containers / build-and-push (vulkan) (push) Failing after 10s
Build Containers / delete-untagged-containers (push) Has been skipped
Build Unified Docker Image / build (push) Failing after 10s
Close inactive issues / close-issues (push) Successful in 5s
Windows CI / run-tests (push) Has been cancelled
small fixes to clean up the main(): 

- remove the debounced config reload 
- replace fmt.Println with a proxy.LogMonitor for consistency
2026-05-02 16:28:53 -07:00
Marcus c79114d40a proxy: fix logger not checking matrix for processes
Linux CI / run-tests (push) Successful in 4m50s
Build Unified Docker Image / setup (push) Successful in 2s
Build Containers / build-and-push (cpu) (push) Failing after 11s
Build Containers / build-and-push (cuda) (push) Failing after 11s
Build Containers / build-and-push (cuda13) (push) Failing after 10s
Build Containers / build-and-push (intel) (push) Failing after 11s
Build Containers / build-and-push (musa) (push) Failing after 12s
Build Containers / build-and-push (rocm) (push) Failing after 12s
Build Containers / build-and-push (vulkan) (push) Failing after 12s
Build Containers / delete-untagged-containers (push) Has been skipped
Build Unified Docker Image / build (push) Failing after 11s
Close inactive issues / close-issues (push) Successful in 11s
Windows CI / run-tests (push) Has been cancelled
Fix matrix not being used to search for a logger causing /logs/stream/model_name to return an error
v211
2026-05-01 16:43:20 -07:00
Benson Wong 430166d5eb proxy: fix zero duration for non streaming responses (#723)
Linux CI / run-tests (push) Successful in 4m9s
Close inactive issues / close-issues (push) Successful in 7s
Windows CI / run-tests (push) Has been cancelled
Updates #654
v210
2026-04-30 19:51:28 -07:00
Marcus 5b4beaceef fix: ?no-history flag and improve /logs monitoring docs (#721)
Linux CI / run-tests (push) Successful in 4m31s
Close inactive issues / close-issues (push) Successful in 7s
Build Unified Docker Image / setup (push) Successful in 3s
Build Containers / build-and-push (cpu) (push) Failing after 12s
Build Containers / build-and-push (cuda) (push) Failing after 11s
Build Containers / build-and-push (cuda13) (push) Failing after 14s
Build Containers / build-and-push (intel) (push) Failing after 12s
Build Containers / build-and-push (musa) (push) Failing after 26s
Build Containers / build-and-push (rocm) (push) Failing after 26s
Build Containers / build-and-push (vulkan) (push) Failing after 11s
Build Containers / delete-untagged-containers (push) Has been skipped
Build Unified Docker Image / build (push) Failing after 12s
Windows CI / run-tests (push) Has been cancelled
- improve logging documentation 
- small tweaks for edge case issues in upstream and log requests
2026-04-30 00:50:36 -07:00
Benson Wong fd3c28ffc5 Refactor Activity Page (#710)
UI Tests / run-tests (push) Successful in 1m16s
Linux CI / run-tests (push) Successful in 3m59s
Close inactive issues / close-issues (push) Successful in 8s
Build Unified Docker Image / setup (push) Successful in 3s
Build Containers / build-and-push (cpu) (push) Failing after 13s
Build Containers / build-and-push (cuda) (push) Failing after 11s
Build Containers / build-and-push (cuda13) (push) Failing after 13s
Build Containers / build-and-push (intel) (push) Failing after 11s
Build Containers / build-and-push (musa) (push) Failing after 12s
Build Containers / build-and-push (rocm) (push) Failing after 12s
Build Containers / build-and-push (vulkan) (push) Failing after 11s
Build Containers / delete-untagged-containers (push) Has been skipped
Build Unified Docker Image / build (push) Failing after 10s
Windows CI / run-tests (push) Has been cancelled
- inference handles to store an activity record for all inference endpoints
- add path, status code, and content type to Activities page
- toggle on/off columns no Activities page 
- add configurable capture level for inference endpoints so large binary blobs are not stored in memory
- store captures in compressed binary format
v209
2026-04-28 20:33:03 -07:00
Quentin Machu a846c4f18c config: remove hard cap on macro length (#718)
Linux CI / run-tests (push) Successful in 4m8s
Close inactive issues / close-issues (push) Successful in 7s
Build Unified Docker Image / setup (push) Successful in 3s
Build Containers / build-and-push (cpu) (push) Failing after 11s
Build Containers / build-and-push (cuda) (push) Failing after 11s
Build Containers / build-and-push (cuda13) (push) Failing after 11s
Build Containers / build-and-push (intel) (push) Failing after 11s
Build Containers / build-and-push (musa) (push) Failing after 11s
Build Containers / build-and-push (rocm) (push) Failing after 11s
Build Containers / build-and-push (vulkan) (push) Failing after 11s
Build Containers / delete-untagged-containers (push) Has been skipped
Build Unified Docker Image / build (push) Failing after 10s
Windows CI / run-tests (push) Has been cancelled
Remove macro value limit of 1024 characters
2026-04-28 13:32:54 -07:00
Marcus 5bae33a769 ui-svelte: default theme to user preferred color scheme (#712)
UI Tests / run-tests (push) Successful in 5m37s
Close inactive issues / close-issues (push) Successful in 7s
Build Unified Docker Image / setup (push) Successful in 3s
Build Containers / build-and-push (cpu) (push) Failing after 14s
Build Containers / build-and-push (cuda) (push) Failing after 12s
Build Containers / build-and-push (cuda13) (push) Failing after 12s
Build Containers / build-and-push (intel) (push) Failing after 12s
Build Containers / build-and-push (musa) (push) Failing after 13s
Build Containers / build-and-push (rocm) (push) Failing after 13s
Build Containers / build-and-push (vulkan) (push) Failing after 12s
Build Containers / delete-untagged-containers (push) Has been skipped
Build Unified Docker Image / build (push) Failing after 11s
Simple, if not set is localStorage use whatever the user's preferred
color scheme is to start.
2026-04-27 06:44:22 -07:00
Benson Wong 8f4ff01f93 ui-svelte: make it easier to toggle panels in logs view
UI Tests / run-tests (push) Successful in 3m20s
2026-04-26 22:12:43 -07:00
Benson Wong e8d4384cd2 ui-svelte: support reasoning and reasoning_content (#708)
UI Tests / run-tests (push) Successful in 8m1s
Close inactive issues / close-issues (push) Successful in 2m22s
Build Unified Docker Image / setup (push) Successful in 4s
Build Containers / build-and-push (cpu) (push) Failing after 15s
Build Containers / build-and-push (cuda) (push) Failing after 13s
Build Containers / build-and-push (cuda13) (push) Failing after 26s
Build Containers / build-and-push (musa) (push) Failing after 11s
Build Containers / build-and-push (rocm) (push) Failing after 11s
Build Containers / build-and-push (vulkan) (push) Failing after 10s
Build Unified Docker Image / build (push) Failing after 10s
Build Containers / build-and-push (intel) (push) Failing after 2m42s
Build Containers / delete-untagged-containers (push) Has been skipped
Support `reasoning` v1/chat/completion delta that vLLM uses.
v208
2026-04-26 13:11:48 -07:00
Benson Wong ce28485be2 ui-svelte: add prompt processing histogram (#705)
UI Tests / run-tests (push) Successful in 5m46s
Close inactive issues / close-issues (push) Successful in 2m23s
Build Unified Docker Image / setup (push) Successful in 2s
Build Containers / build-and-push (cuda) (push) Failing after 11s
Build Containers / build-and-push (cuda13) (push) Failing after 11s
Build Containers / build-and-push (intel) (push) Failing after 10s
Build Containers / build-and-push (musa) (push) Failing after 10s
Build Containers / build-and-push (rocm) (push) Failing after 13s
Build Containers / build-and-push (vulkan) (push) Failing after 25s
Build Unified Docker Image / build (push) Failing after 10s
Build Containers / build-and-push (cpu) (push) Failing after 2m44s
Build Containers / delete-untagged-containers (push) Has been skipped
Activities page shows histograms for prompt processing and token generation times. 

Fix: #691
Fix: #703
2026-04-25 16:13:07 -07:00
Damir 3cd7837b1f fix: support architecture-specific download URLs in install script (#698)
Close inactive issues / close-issues (push) Successful in 4m37s
Build Unified Docker Image / setup (push) Successful in 3s
Build Containers / build-and-push (cpu) (push) Failing after 14s
Build Containers / build-and-push (cuda) (push) Failing after 28s
Build Containers / build-and-push (intel) (push) Failing after 11s
Build Containers / build-and-push (musa) (push) Failing after 11s
Build Containers / build-and-push (rocm) (push) Failing after 11s
Build Containers / build-and-push (vulkan) (push) Failing after 11s
Build Unified Docker Image / build (push) Failing after 11s
Build Containers / build-and-push (cuda13) (push) Failing after 3m0s
Build Containers / delete-untagged-containers (push) Has been skipped
Just a small fix to include proper llama-swap binary when building the
arm64 architecture.
2026-04-23 18:05:33 -07:00
Benson Wong 0b31ccacc1 ui-svelte: fix histogram calculation (#695)
UI Tests / run-tests (push) Successful in 1m17s
Linux CI / run-tests (push) Successful in 4m8s
Close inactive issues / close-issues (push) Successful in 7s
Build Unified Docker Image / setup (push) Successful in 3s
Build Containers / build-and-push (cpu) (push) Failing after 11s
Build Containers / build-and-push (cuda) (push) Failing after 11s
Build Containers / build-and-push (cuda13) (push) Failing after 11s
Build Containers / build-and-push (intel) (push) Failing after 11s
Build Containers / build-and-push (musa) (push) Failing after 11s
Build Containers / build-and-push (rocm) (push) Failing after 11s
Build Containers / build-and-push (vulkan) (push) Failing after 11s
Build Containers / delete-untagged-containers (push) Has been skipped
Build Unified Docker Image / build (push) Failing after 10s
Windows CI / run-tests (push) Has been cancelled
- Fix the histogram calculation to use server provided generation
tokens/second.
- Move histogram to Activities page where it can exist with the rest of
the token metrics

Fixes #681
v206 v207
2026-04-22 23:42:39 -07:00
Bryan Gahagan 5938dbee8f Push unified docker images on scheduled runs (#694)
Fixes #693
2026-04-22 20:46:51 -07:00
Benson Wong 66639e83f7 proxy: replace fsnotify with stat-poll watcher and add SIGHUP reload (#685)
Linux CI / run-tests (push) Successful in 4m9s
Close inactive issues / close-issues (push) Successful in 6s
Build Unified Docker Image / setup (push) Successful in 3s
Build Containers / build-and-push (cpu) (push) Failing after 11s
Build Containers / build-and-push (cuda) (push) Failing after 11s
Build Containers / build-and-push (cuda13) (push) Failing after 12s
Build Containers / build-and-push (intel) (push) Failing after 12s
Build Containers / build-and-push (musa) (push) Failing after 11s
Build Containers / build-and-push (rocm) (push) Failing after 11s
Build Containers / build-and-push (vulkan) (push) Failing after 11s
Build Containers / delete-untagged-containers (push) Has been skipped
Build Unified Docker Image / build (push) Failing after 10s
Windows CI / run-tests (push) Has been cancelled
The fsnotify-based config watcher does not work reliably when the config
file is bind-mounted into a Docker container as an individual file, and
mishandles k8s ConfigMap projections (atomically swapped symlinks).
Replace it with a small os.Stat-polling watcher and add SIGHUP as an
explicit reload signal.

- new proxy/configwatcher package: 2s os.Stat poller, follows symlinks,
  fires on mtime/size change and on missing -> present transitions
- SIGHUP triggers reload unconditionally (works without --watch-config)
  via the same ConfigFileChangedEvent pipeline so the UI sees identical
  state transitions
- watcher goroutine now exits cleanly on shutdown via a context
- drop github.com/fsnotify/fsnotify dependency

fixes #682
v205
2026-04-21 23:21:48 -07:00
Benson Wong 625b296720 docker/unified: add uv via pip install (#681)
Close inactive issues / close-issues (push) Successful in 7s
Build Unified Docker Image / setup (push) Successful in 3s
Build Containers / build-and-push (cpu) (push) Failing after 14s
Build Containers / build-and-push (cuda) (push) Failing after 12s
Build Containers / build-and-push (cuda13) (push) Failing after 12s
Build Containers / build-and-push (intel) (push) Failing after 11s
Build Containers / build-and-push (musa) (push) Failing after 11s
Build Containers / build-and-push (rocm) (push) Failing after 20s
Build Containers / build-and-push (vulkan) (push) Failing after 11s
Build Containers / delete-untagged-containers (push) Has been skipped
Build Unified Docker Image / build (push) Failing after 13s
Install uv after the cpp tool binaries are copied and before the
llama-swap binary, enabling `uv run` usage for Python-based inference
backends like vLLM.

- add python3-pip to runtime apt installs
- add `pip install uv --break-system-packages` after cpp installs

fixes #628

Co-authored-by: Claude <noreply@anthropic.com>
2026-04-20 20:55:51 -07:00
Benson Wong 231e62291c proxy: fix matrix race and process stop bug (#677)
Linux CI / run-tests (push) Successful in 4m15s
Close inactive issues / close-issues (push) Successful in 7s
Windows CI / run-tests (push) Has been cancelled
- matrix.go change logic to consider any proxy.Process not in
StateStopped or StateShutdown
- process.StopImmediately, and Stop() which called it had a subtle bug
where it only handled state transitions from StateReady to
StateStopping. StateStarting -> StateStopping was ignored completely.

fix: #670
v204
2026-04-20 00:21:11 -07:00
Benson Wong 57ac666598 .github/workflows: tweak push ghcr conditional (#676)
Build Unified Docker Image / setup (push) Successful in 4s
Build Containers / build-and-push (cpu) (push) Failing after 14s
Build Containers / build-and-push (cuda) (push) Failing after 12s
Build Containers / build-and-push (cuda13) (push) Failing after 12s
Build Containers / build-and-push (intel) (push) Failing after 12s
Build Containers / build-and-push (musa) (push) Failing after 12s
Build Containers / build-and-push (rocm) (push) Failing after 13s
Build Containers / build-and-push (vulkan) (push) Failing after 11s
Build Containers / delete-untagged-containers (push) Has been skipped
Build Unified Docker Image / build (push) Failing after 11s
2026-04-19 13:56:26 -07:00
Benson Wong 69728301f5 .github/workflows: add toggle for pushing unified images to github (#672)
Close inactive issues / close-issues (push) Successful in 7s
Add ability to dispatch (manually run) unified container builds in github without push to ghcr.io.
2026-04-19 10:10:48 -07:00
Benson Wong c176fa70f1 docker/unified: add spirv-headers to fix vulkan build (#669)
Close inactive issues / close-issues (push) Successful in 7s
Build Unified Docker Image / setup (push) Successful in 3s
Build Containers / build-and-push (cpu) (push) Failing after 12s
Build Containers / build-and-push (cuda) (push) Failing after 11s
Build Containers / build-and-push (cuda13) (push) Failing after 13s
Build Containers / build-and-push (intel) (push) Failing after 13s
Build Containers / build-and-push (musa) (push) Failing after 13s
Build Containers / build-and-push (rocm) (push) Failing after 15s
Build Containers / build-and-push (vulkan) (push) Failing after 12s
Build Containers / delete-untagged-containers (push) Has been skipped
Build Unified Docker Image / build (push) Failing after 11s
2026-04-18 12:18:10 -07:00
Benson Wong 5e3c646829 proxy: compress captures with zstd (#668)
Linux CI / run-tests (push) Successful in 4m4s
Windows CI / run-tests (push) Has been cancelled
The previous captures were saved uncompressed in memory. In agentic
workflows there can be many turns with each request containing the
previous context in the body with a lot of redundant data. Use zstd to
compress the request and response data before keeping a copy of memory.

Results: 

- Average Percentage Saved: 73.19%
- Average Compression Factor: ~6.77:1
v203
2026-04-17 23:29:37 -07:00
Benson Wong c3f0d43e6e proxy: fix race conditions during swap (#667)
I pointed Opus 4.7 (high effort) at proxy.ProcessGroup to identify any
race conditions in the swapping code. It found a race condition where
there is a small window in the fast path for routing a request to a
loaded model. There is a very small window where:

- model M1 is loaded and ready for requests
- a request, R1, for M1 comes in 
- a request, R2, for M2 comes in almost immediately after
- R1 acquires the lock, sees M1 is loaded (fast path), releases the lock
`[race window]` and the request is ready to be forwarded
- the race window occurs between the release of the lock and the request
being forwarded
  - the lock is released so requests can be handled concurrently 
- R2 comes in within the `[race window]`, acquires the lock, triggers a
model swap to M2. stopping M1
- R1 is forwarded to a model that is unloaded or in the process of
shutting down creating an error response

In deployed systems the race window is very small and doesn't happen
often. However with #635 and PR #656 I though this deserved a bit more
attention. It is not concluded that this race is the cause of #635 but
the race is likely to happen more often under sustained or high load.

AI Note: Opus 4.7 x-high effort took about an hour to write the original
patch. With the pattern discovered the fix to matrix.go was very quick.
GLM 5.1 using the previous established patterns was able to easily write
the fix for ProcessGroup.StopProcesses().

Supersedes: #656
Updates: #277, #635
2026-04-17 21:23:17 -07:00
Benson Wong f6cf9f5844 proxy: Refactor tests (#660)
Linux CI / run-tests (push) Successful in 3m54s
Close inactive issues / close-issues (push) Successful in 12s
Build Unified Docker Image / setup (push) Successful in 2s
Build Containers / build-and-push (cpu) (push) Failing after 12s
Build Containers / build-and-push (cuda) (push) Failing after 11s
Build Containers / build-and-push (cuda13) (push) Failing after 14s
Build Containers / build-and-push (intel) (push) Failing after 12s
Build Containers / build-and-push (musa) (push) Failing after 12s
Build Containers / build-and-push (rocm) (push) Failing after 12s
Build Containers / build-and-push (vulkan) (push) Failing after 11s
Build Containers / delete-untagged-containers (push) Has been skipped
Build Unified Docker Image / build (push) Failing after 11s
Windows CI / run-tests (push) Has been cancelled
- use YAML for test configurations
- remove most uses of simple-responder, opting to use
process.testHandler

Fixes #655
2026-04-16 22:47:42 -07:00
Benson Wong 121fd93ad8 Makefile: restore linux arm64 targets
Validate JSON Schema / validate-schema (push) Successful in 27s
Linux CI / run-tests (push) Successful in 7m45s
Windows CI / run-tests (push) Has been cancelled
Close inactive issues / close-issues (push) Successful in 10s
Build Unified Docker Image / setup (push) Successful in 3s
Build Containers / build-and-push (cpu) (push) Failing after 13s
Build Containers / build-and-push (cuda) (push) Failing after 12s
Build Containers / build-and-push (cuda13) (push) Failing after 12s
Build Containers / build-and-push (intel) (push) Failing after 12s
Build Containers / build-and-push (musa) (push) Failing after 11s
Build Containers / build-and-push (rocm) (push) Failing after 11s
Build Containers / build-and-push (vulkan) (push) Failing after 11s
Build Containers / delete-untagged-containers (push) Has been skipped
Build Unified Docker Image / build (push) Failing after 11s
Fix #641
2026-04-14 22:05:39 -07:00
Benson Wong 17233e9278 docs: update configuration.md for matrix v202 2026-04-14 22:01:03 -07:00
Benson Wong 4866d16c3e README.md: update to use matrix instead of groups 2026-04-14 21:57:49 -07:00
Benson Wong 35193f82f1 proxy: add swap matrix with solver-based model swapping (#646)
Add a new swap matrix to supersede groups for running concurrent models.
The matrix uses a solver that picks the lowest cost evictions to make a
requested model available. This simple approach along with a very basic
DSL grammar can enable very complex swapping scenarios.

- add DSL parser for set expressions with & (AND), | (OR), (), +ref
- add MatrixConfig structs, validation, and topological sort for +ref
- add MatrixSolver with cost-minimizing swap decisions
- add Matrix runtime integrating solver with Process lifecycle
- integrate matrix into ProxyManager with if-branches at all endpoints
- update config.example.yaml and config-schema.json with matrix schema
- config enforces groups XOR matrix (cannot use both)

fixes #643
2026-04-14 21:55:30 -07:00
Benson Wong 40e39f7a86 ui-svelte: fix security issues (#649)
UI Tests / run-tests (push) Successful in 1m19s
Close inactive issues / close-issues (push) Successful in 12s
Build Unified Docker Image / setup (push) Successful in 3s
Build Containers / build-and-push (cpu) (push) Failing after 17s
Build Containers / build-and-push (cuda) (push) Failing after 15s
Build Containers / build-and-push (cuda13) (push) Failing after 11s
Build Containers / build-and-push (intel) (push) Failing after 11s
Build Containers / build-and-push (musa) (push) Failing after 11s
Build Containers / build-and-push (rocm) (push) Failing after 18s
Build Containers / build-and-push (vulkan) (push) Failing after 12s
Build Containers / delete-untagged-containers (push) Has been skipped
Build Unified Docker Image / build (push) Failing after 12s
2026-04-12 16:21:31 -07:00
Benson Wong a9d840ffd7 proxy,proxy/config: restore timeouts to pre PR 619 (#648)
Build Containers / build-and-push (cpu) (push) Failing after 53s
Build Containers / build-and-push (cuda) (push) Failing after 57s
Build Containers / build-and-push (cuda13) (push) Failing after 11s
Build Containers / build-and-push (intel) (push) Failing after 11s
Build Containers / build-and-push (musa) (push) Failing after 11s
Build Containers / build-and-push (vulkan) (push) Failing after 29s
Build Containers / build-and-push (rocm) (push) Failing after 49s
goreleaser / goreleaser (push) Failing after 13s
Build Containers / delete-untagged-containers (push) Has been skipped
goreleaser / trigger-tap-update (push) Has been skipped
Reset the default ResponseHeader timeout to 0 (no timeout) which was set
to 60 seconds in PR #619.

Fixes #647
v201
2026-04-11 20:42:13 -07:00
Benson Wong 7b2b82777f docker/unified: derive rootless image from root container (#644)
Build the root image once, then derive the rootless variant from it
using a small inline Dockerfile that adds the non-root user and chowns
the writable directories. This halves the number of CI jobs (4 → 2) and
eliminates the redundant full CUDA compilation for the rootless variant.

- remove RUN_UID build arg from build-image.sh
- derive rootless image inline after root build completes
- collapse variant matrix out of unified-docker.yml
- push both root and rootless tags in a single CI job

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-10 22:59:54 -07:00