llama-swap

mirror of https://github.com/mostlygeek/llama-swap.git synced 2026-06-09 14:56:34 +02:00

Author	SHA1	Message	Date
Cr4xy	636b53e70f	Improve rocm-smi performance monitoring (#775 ) Linux CI / run-tests (push) Successful in 2m53s Details Windows CI / run-tests (push) Has been cancelled Details Fix hardcoded indices for rocm-smi. v217	2026-05-20 17:59:49 -07:00
gatkisson	59cd3b690d	Added Windows performance monitoring using nvidia-smi (#773 ) Linux CI / run-tests (push) Successful in 3m0s Details Windows CI / run-tests (push) Has been cancelled Details updates: #596, #771	2026-05-18 11:02:03 -07:00
Benson Wong	5d1e62d224	Disable auto review feature in coderabbit config	2026-05-18 10:40:21 -07:00
Benson Wong	dbb869d019	Increase inactivity thresholds for stale issues Updated stale issue and close messages to reflect new inactivity thresholds.	2026-05-17 22:52:58 -07:00
Benson Wong	26bb17e57e	config.example.yaml: Improve matrix vs groups info Validate JSON Schema / validate-schema (push) Successful in 11s Details UI Tests / run-tests (push) Successful in 50s Details For some use cases groups are simpler to use. Note this in the documentation that it is still fully supported.	2026-05-17 15:59:25 -07:00
Benson Wong	2982dd3d40	ui-svelte: update link to performance discussion thread v216	2026-05-17 11:45:56 -07:00
knguyen298	79dc87f881	Add ROCm stats via rocm-smi (#767 ) Linux CI / run-tests (push) Successful in 3m22s Details Windows CI / run-tests (push) Has been cancelled Details Add ROCm GPU stats support using `rocm-smi`. v215	2026-05-17 07:58:26 -07:00
krzychdre	b2fcc2daa1	ui-svelte: fix cached tokens total counting -1 sentinel (#760 ) UI Tests / run-tests (push) Successful in 52s Details Linux CI / run-tests (push) Successful in 3m3s Details Windows CI / run-tests (push) Has been cancelled Details The backend uses cache_tokens=-1 as a sentinel for endpoints that don't report cache stats (embeddings, vLLM). The activity table correctly renders these as "-", but the totals widget summed the sentinels directly, so each such request subtracted 1 from the displayed total. - clamp cache_tokens with Math.max(0, ...) when reducing v214	2026-05-15 14:42:44 -07:00
cdwaage	6a9c4efc8f	fix: use --loop instead of -loop for nvidia-smi (driver 540+ compat) (#759 )	2026-05-15 13:20:29 -07:00
Benson Wong	0c813e44d1	ui-svelte: package updates UI Tests / run-tests (push) Successful in 53s Details Linux CI / run-tests (push) Successful in 3m3s Details Windows CI / run-tests (push) Has been cancelled Details v213	2026-05-14 21:56:04 -07:00
Benson Wong	fe71e8a6ea	proxy,ui-svelte: improve support for v1/messages and v1/responses (#758 ) This improves the support for activity logging from the v1/responses and v1/messages endpoints. - add chat endpoint selection to Playground > Chat > Settings - improve metrics extraction for streaming v1/messages and v1/responses endpoints (tested with llama-server) Fixes #742	2026-05-14 21:53:57 -07:00
Benson Wong	aac7b8745a	ci: set go-version-file in release workflow Validate JSON Schema / validate-schema (push) Successful in 53s Details Build Containers / build-and-push (cpu) (push) Failing after 2m44s Details Build Containers / build-and-push (cuda) (push) Failing after 2m0s Details Build Containers / build-and-push (cuda13) (push) Failing after 48s Details Build Containers / build-and-push (intel) (push) Failing after 48s Details Build Containers / build-and-push (musa) (push) Failing after 53s Details Build Containers / build-and-push (rocm) (push) Failing after 49s Details Build Containers / build-and-push (vulkan) (push) Failing after 45s Details UI Tests / run-tests (push) Successful in 3m6s Details Linux CI / run-tests (push) Successful in 5m21s Details Build Containers / delete-untagged-containers (push) Has been skipped Details Windows CI / run-tests (push) Has been cancelled Details v212	2026-05-13 22:12:02 -07:00
Benson Wong	4e606feff0	ci: fix workflow bugs in release and go-ci - release.yml: merge orphaned `uses:` into the Checkout step - go-ci.yml: skip simple-responder build when restored from cache	2026-05-13 21:48:27 -07:00
Benson Wong	a4b91e08cf	Changes and fixes before the release (docs/small tweaks) (#750 ) - update README.md with new docker instructions - update docs/configuration.md - update .github/workflows to have pinned action versions - gofmt events package - fix small bugs in CI scripts - reduce config options for internal/perf/monitor and config. A ring buffer is used to keep 1hr of entries at max 5s granularity. For long term stats use prometheus monitoring on /metrics Fixes #744	2026-05-13 21:18:19 -07:00
David Soušek	3e3646f9f9	perf: ignore LACT devices reporting zero VRAM (#753 ) Linux CI / run-tests (push) Successful in 2m58s Details Windows CI / run-tests (push) Has been cancelled Details Ignore LACT devices that report zero total VRAM. Some virtual GPUs on headless VMs report `MemTotalMB == 0` through LACT, which makes them appear in performance monitoring despite not providing useful memory data. Skip those entries so only usable GPU devices are reported. This makes performance monitoring cleaner on headless VMs with virtual GPUs that report zero VRAM. Co-authored-by: David Soušek <david.sousek@intelogy.co.uk>	2026-05-13 10:03:54 -07:00
rhtenhove	a01afe261b	ci: use manifest-aware cleanup action for multi-arch :cpu (#751 ) Build Containers / build-and-push (cpu) (push) Failing after 55s Details Build Containers / build-and-push (cuda) (push) Failing after 54s Details Build Containers / build-and-push (cuda13) (push) Failing after 8s Details Build Containers / build-and-push (intel) (push) Failing after 50s Details Build Containers / build-and-push (musa) (push) Failing after 50s Details Build Containers / build-and-push (vulkan) (push) Failing after 50s Details Build Containers / build-and-push (rocm) (push) Failing after 57s Details Build Containers / delete-untagged-containers (push) Has been skipped Details actions/delete-package-versions can't see OCI manifest lists. When the cpu build pushes a multi-arch image, the registry gets a tagged index plus one untagged per-platform manifest per arch. The cleanup step with `delete-only-untagged-versions: true` then deletes the per-platform children, leaving the index dangling — `docker pull ghcr.io/mostlygeek/llama-swap:cpu` 404s on the referenced sha. Swap to dataaxiom/ghcr-cleanup-action, which inspects tagged manifest lists first and excludes their children from deletion. Single-arch backends behave the same as before. Fix #746	2026-05-12 18:04:46 -07:00
rhtenhove	174e8562aa	Multi arch cpu (#746 ) Build Containers / build-and-push (cpu) (push) Failing after 2m12s Details Build Containers / build-and-push (cuda) (push) Failing after 2m10s Details Build Containers / build-and-push (cuda13) (push) Failing after 49s Details Build Containers / build-and-push (intel) (push) Failing after 51s Details Build Containers / build-and-push (musa) (push) Failing after 57s Details Build Containers / build-and-push (rocm) (push) Failing after 52s Details Build Containers / build-and-push (vulkan) (push) Failing after 46s Details Build Containers / delete-untagged-containers (push) Has been skipped Details Encountered a similar problem as in https://github.com/mostlygeek/llama-swap/issues/709 but in my case I only needed the :cpu version. So decided to add the github action to build arm64 combined with the amd64 version on the same :cpu tag. Already tested it from this fork: ghcr.io/rhtenhove/llama-swap:cpu and it works perfectly fine. Adding GPU support is a whole other beast, needing quite a bit more work and isn't something I can test.	2026-05-11 21:03:48 -07:00
Abdulazez A.	085b54bc88	proxy: fix data race in /running endpoint and typo in error message (#748 ) Linux CI / run-tests (push) Successful in 3m0s Details Windows CI / run-tests (push) Has been cancelled Details ## Problem The `/running` endpoint in `listRunningProcessesHandler` reads `process.state` directly without holding `stateMutex`. Meanwhile, `swapState()` writes to `process.state` while holding the write lock. This is a data race flagged by the Go race detector. Also fixes a minor typo: "processes was in state" → "process was in state". ## Fix - `proxymanager.go`: Replace `process.state` with `process.CurrentState()` which acquires `stateMutex.RLock()` before reading. - `process.go`: Fix typo in error message. ## Verification - `gofmt -l` — clean - `go test -run "TestProcessGroup_\|TestProxyManager_" ./proxy/` — all pass - `go test ./proxy/config/... ./proxy/cache/... ./proxy/configwatcher/...` — all pass	2026-05-11 12:49:18 -07:00
bankjaneo	2be3416baa	ui: add auto theme switch mode based on system theme (#741 ) UI Tests / run-tests (push) Successful in 51s Details Add system theme detection with automatic switching when OS theme changes. - Add ThemeMode type with "light", "dark", and "system" options - Add system theme listener using matchMedia API - Update theme toggle to cycle through System → Light → Dark - Add combined sun/moon icon for system theme mode - Migrate existing theme preferences to new format	2026-05-09 20:22:18 -07:00
Benson Wong	7e3e94a08a	proxy,ui: add performance monitoring with Prometheus metrics (#743 ) Validate JSON Schema / validate-schema (push) Successful in 25s Details UI Tests / run-tests (push) Successful in 1m16s Details Linux CI / run-tests (push) Successful in 3m36s Details Windows CI / run-tests (push) Has been cancelled Details Add a comprehensive performance monitoring system that collects CPU, memory, swap, load average, network IO, and GPU stats. Provides both a REST API for the UI and a Prometheus /metrics endpoint. Backend changes: - New internal/perf package with configurable interval-based stats collection - GPU monitoring via LACT (Unix socket) and nvidia-smi fallback on Linux - Ring buffer (internal/ring) for time-series stat storage - Prometheus /metrics endpoint with all system and GPU metrics - Moved LogMonitor to internal/logmon package - New PerformanceConfig for hot-reloadable monitoring settings - REST /api/performance endpoint replacing SSE streaming UI changes: - New Performance page with real-time charts for CPU, memory, GPU, and network - Reusable PerformanceChart component - LLAMA_SWAP_URL environment variable support - Improved capture dialog display Other: - Example Grafana dashboard for Prometheus metrics - monitor-test standalone binary - Config schema and example updates fixes #596	2026-05-09 13:29:22 -07:00
Wim Vander Schelden	e261745c66	proxy: add versionless API endpoint (#733 ) Linux CI / run-tests (push) Successful in 3m10s Details Close inactive issues / close-issues (push) Successful in 38s Details Build Unified Docker Image / setup (push) Successful in 5s Details Build Containers / build-and-push (cpu) (push) Failing after 26s Details Build Containers / build-and-push (cuda) (push) Failing after 23s Details Build Containers / build-and-push (cuda13) (push) Failing after 14s Details Build Containers / build-and-push (intel) (push) Failing after 13s Details Build Containers / build-and-push (musa) (push) Failing after 14s Details Build Containers / build-and-push (rocm) (push) Failing after 46s Details Build Containers / build-and-push (vulkan) (push) Failing after 39s Details Build Containers / delete-untagged-containers (push) Has been skipped Details Build Unified Docker Image / build (push) Failing after 14s Details Windows CI / run-tests (push) Has been cancelled Details Add versionless endpoints under v/ to support upstream peers that do not use the v1/ prefix. Fixes #728.	2026-05-03 13:47:38 -07:00
Benson Wong	11b7913287	llama-swap.go: remove debounce, replace fmt.Printlns (#731 ) Linux CI / run-tests (push) Successful in 3m22s Details Build Unified Docker Image / setup (push) Successful in 4s Details Build Containers / build-and-push (cpu) (push) Failing after 10s Details Build Containers / build-and-push (cuda) (push) Failing after 10s Details Build Containers / build-and-push (cuda13) (push) Failing after 10s Details Build Containers / build-and-push (intel) (push) Failing after 10s Details Build Containers / build-and-push (musa) (push) Failing after 10s Details Build Containers / build-and-push (rocm) (push) Failing after 10s Details Build Containers / build-and-push (vulkan) (push) Failing after 10s Details Build Containers / delete-untagged-containers (push) Has been skipped Details Build Unified Docker Image / build (push) Failing after 10s Details Close inactive issues / close-issues (push) Successful in 5s Details Windows CI / run-tests (push) Has been cancelled Details small fixes to clean up the main(): - remove the debounced config reload - replace fmt.Println with a proxy.LogMonitor for consistency	2026-05-02 16:28:53 -07:00
Marcus	c79114d40a	proxy: fix logger not checking matrix for processes Linux CI / run-tests (push) Successful in 4m50s Details Build Unified Docker Image / setup (push) Successful in 2s Details Build Containers / build-and-push (cpu) (push) Failing after 11s Details Build Containers / build-and-push (cuda) (push) Failing after 11s Details Build Containers / build-and-push (cuda13) (push) Failing after 10s Details Build Containers / build-and-push (intel) (push) Failing after 11s Details Build Containers / build-and-push (musa) (push) Failing after 12s Details Build Containers / build-and-push (rocm) (push) Failing after 12s Details Build Containers / build-and-push (vulkan) (push) Failing after 12s Details Build Containers / delete-untagged-containers (push) Has been skipped Details Build Unified Docker Image / build (push) Failing after 11s Details Close inactive issues / close-issues (push) Successful in 11s Details Windows CI / run-tests (push) Has been cancelled Details Fix matrix not being used to search for a logger causing /logs/stream/model_name to return an error v211	2026-05-01 16:43:20 -07:00
Benson Wong	430166d5eb	proxy: fix zero duration for non streaming responses (#723 ) Linux CI / run-tests (push) Successful in 4m9s Details Close inactive issues / close-issues (push) Successful in 7s Details Windows CI / run-tests (push) Has been cancelled Details Updates #654 v210	2026-04-30 19:51:28 -07:00
Marcus	5b4beaceef	fix: ?no-history flag and improve /logs monitoring docs (#721 ) Linux CI / run-tests (push) Successful in 4m31s Details Close inactive issues / close-issues (push) Successful in 7s Details Build Unified Docker Image / setup (push) Successful in 3s Details Build Containers / build-and-push (cpu) (push) Failing after 12s Details Build Containers / build-and-push (cuda) (push) Failing after 11s Details Build Containers / build-and-push (cuda13) (push) Failing after 14s Details Build Containers / build-and-push (intel) (push) Failing after 12s Details Build Containers / build-and-push (musa) (push) Failing after 26s Details Build Containers / build-and-push (rocm) (push) Failing after 26s Details Build Containers / build-and-push (vulkan) (push) Failing after 11s Details Build Containers / delete-untagged-containers (push) Has been skipped Details Build Unified Docker Image / build (push) Failing after 12s Details Windows CI / run-tests (push) Has been cancelled Details - improve logging documentation - small tweaks for edge case issues in upstream and log requests	2026-04-30 00:50:36 -07:00
Benson Wong	fd3c28ffc5	Refactor Activity Page (#710 ) UI Tests / run-tests (push) Successful in 1m16s Details Linux CI / run-tests (push) Successful in 3m59s Details Close inactive issues / close-issues (push) Successful in 8s Details Build Unified Docker Image / setup (push) Successful in 3s Details Build Containers / build-and-push (cpu) (push) Failing after 13s Details Build Containers / build-and-push (cuda) (push) Failing after 11s Details Build Containers / build-and-push (cuda13) (push) Failing after 13s Details Build Containers / build-and-push (intel) (push) Failing after 11s Details Build Containers / build-and-push (musa) (push) Failing after 12s Details Build Containers / build-and-push (rocm) (push) Failing after 12s Details Build Containers / build-and-push (vulkan) (push) Failing after 11s Details Build Containers / delete-untagged-containers (push) Has been skipped Details Build Unified Docker Image / build (push) Failing after 10s Details Windows CI / run-tests (push) Has been cancelled Details - inference handles to store an activity record for all inference endpoints - add path, status code, and content type to Activities page - toggle on/off columns no Activities page - add configurable capture level for inference endpoints so large binary blobs are not stored in memory - store captures in compressed binary format v209	2026-04-28 20:33:03 -07:00
Quentin Machu	a846c4f18c	config: remove hard cap on macro length (#718 ) Linux CI / run-tests (push) Successful in 4m8s Details Close inactive issues / close-issues (push) Successful in 7s Details Build Unified Docker Image / setup (push) Successful in 3s Details Build Containers / build-and-push (cpu) (push) Failing after 11s Details Build Containers / build-and-push (cuda) (push) Failing after 11s Details Build Containers / build-and-push (cuda13) (push) Failing after 11s Details Build Containers / build-and-push (intel) (push) Failing after 11s Details Build Containers / build-and-push (musa) (push) Failing after 11s Details Build Containers / build-and-push (rocm) (push) Failing after 11s Details Build Containers / build-and-push (vulkan) (push) Failing after 11s Details Build Containers / delete-untagged-containers (push) Has been skipped Details Build Unified Docker Image / build (push) Failing after 10s Details Windows CI / run-tests (push) Has been cancelled Details Remove macro value limit of 1024 characters	2026-04-28 13:32:54 -07:00
Marcus	5bae33a769	ui-svelte: default theme to user preferred color scheme (#712 ) UI Tests / run-tests (push) Successful in 5m37s Details Close inactive issues / close-issues (push) Successful in 7s Details Build Unified Docker Image / setup (push) Successful in 3s Details Build Containers / build-and-push (cpu) (push) Failing after 14s Details Build Containers / build-and-push (cuda) (push) Failing after 12s Details Build Containers / build-and-push (cuda13) (push) Failing after 12s Details Build Containers / build-and-push (intel) (push) Failing after 12s Details Build Containers / build-and-push (musa) (push) Failing after 13s Details Build Containers / build-and-push (rocm) (push) Failing after 13s Details Build Containers / build-and-push (vulkan) (push) Failing after 12s Details Build Containers / delete-untagged-containers (push) Has been skipped Details Build Unified Docker Image / build (push) Failing after 11s Details Simple, if not set is localStorage use whatever the user's preferred color scheme is to start.	2026-04-27 06:44:22 -07:00
Benson Wong	8f4ff01f93	ui-svelte: make it easier to toggle panels in logs view UI Tests / run-tests (push) Successful in 3m20s Details	2026-04-26 22:12:43 -07:00
Benson Wong	e8d4384cd2	ui-svelte: support reasoning and reasoning_content (#708 ) UI Tests / run-tests (push) Successful in 8m1s Details Close inactive issues / close-issues (push) Successful in 2m22s Details Build Unified Docker Image / setup (push) Successful in 4s Details Build Containers / build-and-push (cpu) (push) Failing after 15s Details Build Containers / build-and-push (cuda) (push) Failing after 13s Details Build Containers / build-and-push (cuda13) (push) Failing after 26s Details Build Containers / build-and-push (musa) (push) Failing after 11s Details Build Containers / build-and-push (rocm) (push) Failing after 11s Details Build Containers / build-and-push (vulkan) (push) Failing after 10s Details Build Unified Docker Image / build (push) Failing after 10s Details Build Containers / build-and-push (intel) (push) Failing after 2m42s Details Build Containers / delete-untagged-containers (push) Has been skipped Details Support `reasoning` v1/chat/completion delta that vLLM uses. v208	2026-04-26 13:11:48 -07:00
Benson Wong	ce28485be2	ui-svelte: add prompt processing histogram (#705 ) UI Tests / run-tests (push) Successful in 5m46s Details Close inactive issues / close-issues (push) Successful in 2m23s Details Build Unified Docker Image / setup (push) Successful in 2s Details Build Containers / build-and-push (cuda) (push) Failing after 11s Details Build Containers / build-and-push (cuda13) (push) Failing after 11s Details Build Containers / build-and-push (intel) (push) Failing after 10s Details Build Containers / build-and-push (musa) (push) Failing after 10s Details Build Containers / build-and-push (rocm) (push) Failing after 13s Details Build Containers / build-and-push (vulkan) (push) Failing after 25s Details Build Unified Docker Image / build (push) Failing after 10s Details Build Containers / build-and-push (cpu) (push) Failing after 2m44s Details Build Containers / delete-untagged-containers (push) Has been skipped Details Activities page shows histograms for prompt processing and token generation times. Fix: #691 Fix: #703	2026-04-25 16:13:07 -07:00
Damir	3cd7837b1f	fix: support architecture-specific download URLs in install script (#698 ) Close inactive issues / close-issues (push) Successful in 4m37s Details Build Unified Docker Image / setup (push) Successful in 3s Details Build Containers / build-and-push (cpu) (push) Failing after 14s Details Build Containers / build-and-push (cuda) (push) Failing after 28s Details Build Containers / build-and-push (intel) (push) Failing after 11s Details Build Containers / build-and-push (musa) (push) Failing after 11s Details Build Containers / build-and-push (rocm) (push) Failing after 11s Details Build Containers / build-and-push (vulkan) (push) Failing after 11s Details Build Unified Docker Image / build (push) Failing after 11s Details Build Containers / build-and-push (cuda13) (push) Failing after 3m0s Details Build Containers / delete-untagged-containers (push) Has been skipped Details Just a small fix to include proper llama-swap binary when building the arm64 architecture.	2026-04-23 18:05:33 -07:00
Benson Wong	0b31ccacc1	ui-svelte: fix histogram calculation (#695 ) UI Tests / run-tests (push) Successful in 1m17s Details Linux CI / run-tests (push) Successful in 4m8s Details Close inactive issues / close-issues (push) Successful in 7s Details Build Unified Docker Image / setup (push) Successful in 3s Details Build Containers / build-and-push (cpu) (push) Failing after 11s Details Build Containers / build-and-push (cuda) (push) Failing after 11s Details Build Containers / build-and-push (cuda13) (push) Failing after 11s Details Build Containers / build-and-push (intel) (push) Failing after 11s Details Build Containers / build-and-push (musa) (push) Failing after 11s Details Build Containers / build-and-push (rocm) (push) Failing after 11s Details Build Containers / build-and-push (vulkan) (push) Failing after 11s Details Build Containers / delete-untagged-containers (push) Has been skipped Details Build Unified Docker Image / build (push) Failing after 10s Details Windows CI / run-tests (push) Has been cancelled Details - Fix the histogram calculation to use server provided generation tokens/second. - Move histogram to Activities page where it can exist with the rest of the token metrics Fixes #681 v206 v207	2026-04-22 23:42:39 -07:00
Bryan Gahagan	5938dbee8f	Push unified docker images on scheduled runs (#694 ) Fixes #693	2026-04-22 20:46:51 -07:00
Benson Wong	66639e83f7	proxy: replace fsnotify with stat-poll watcher and add SIGHUP reload (#685 ) Linux CI / run-tests (push) Successful in 4m9s Details Close inactive issues / close-issues (push) Successful in 6s Details Build Unified Docker Image / setup (push) Successful in 3s Details Build Containers / build-and-push (cpu) (push) Failing after 11s Details Build Containers / build-and-push (cuda) (push) Failing after 11s Details Build Containers / build-and-push (cuda13) (push) Failing after 12s Details Build Containers / build-and-push (intel) (push) Failing after 12s Details Build Containers / build-and-push (musa) (push) Failing after 11s Details Build Containers / build-and-push (rocm) (push) Failing after 11s Details Build Containers / build-and-push (vulkan) (push) Failing after 11s Details Build Containers / delete-untagged-containers (push) Has been skipped Details Build Unified Docker Image / build (push) Failing after 10s Details Windows CI / run-tests (push) Has been cancelled Details The fsnotify-based config watcher does not work reliably when the config file is bind-mounted into a Docker container as an individual file, and mishandles k8s ConfigMap projections (atomically swapped symlinks). Replace it with a small os.Stat-polling watcher and add SIGHUP as an explicit reload signal. - new proxy/configwatcher package: 2s os.Stat poller, follows symlinks, fires on mtime/size change and on missing -> present transitions - SIGHUP triggers reload unconditionally (works without --watch-config) via the same ConfigFileChangedEvent pipeline so the UI sees identical state transitions - watcher goroutine now exits cleanly on shutdown via a context - drop github.com/fsnotify/fsnotify dependency fixes #682 v205	2026-04-21 23:21:48 -07:00
Benson Wong	625b296720	docker/unified: add uv via pip install (#681 ) Close inactive issues / close-issues (push) Successful in 7s Details Build Unified Docker Image / setup (push) Successful in 3s Details Build Containers / build-and-push (cpu) (push) Failing after 14s Details Build Containers / build-and-push (cuda) (push) Failing after 12s Details Build Containers / build-and-push (cuda13) (push) Failing after 12s Details Build Containers / build-and-push (intel) (push) Failing after 11s Details Build Containers / build-and-push (musa) (push) Failing after 11s Details Build Containers / build-and-push (rocm) (push) Failing after 20s Details Build Containers / build-and-push (vulkan) (push) Failing after 11s Details Build Containers / delete-untagged-containers (push) Has been skipped Details Build Unified Docker Image / build (push) Failing after 13s Details Install uv after the cpp tool binaries are copied and before the llama-swap binary, enabling `uv run` usage for Python-based inference backends like vLLM. - add python3-pip to runtime apt installs - add `pip install uv --break-system-packages` after cpp installs fixes #628 Co-authored-by: Claude <noreply@anthropic.com>	2026-04-20 20:55:51 -07:00
Benson Wong	231e62291c	proxy: fix matrix race and process stop bug (#677 ) Linux CI / run-tests (push) Successful in 4m15s Details Close inactive issues / close-issues (push) Successful in 7s Details Windows CI / run-tests (push) Has been cancelled Details - matrix.go change logic to consider any proxy.Process not in StateStopped or StateShutdown - process.StopImmediately, and Stop() which called it had a subtle bug where it only handled state transitions from StateReady to StateStopping. StateStarting -> StateStopping was ignored completely. fix: #670 v204	2026-04-20 00:21:11 -07:00
Benson Wong	57ac666598	.github/workflows: tweak push ghcr conditional (#676 ) Build Unified Docker Image / setup (push) Successful in 4s Details Build Containers / build-and-push (cpu) (push) Failing after 14s Details Build Containers / build-and-push (cuda) (push) Failing after 12s Details Build Containers / build-and-push (cuda13) (push) Failing after 12s Details Build Containers / build-and-push (intel) (push) Failing after 12s Details Build Containers / build-and-push (musa) (push) Failing after 12s Details Build Containers / build-and-push (rocm) (push) Failing after 13s Details Build Containers / build-and-push (vulkan) (push) Failing after 11s Details Build Containers / delete-untagged-containers (push) Has been skipped Details Build Unified Docker Image / build (push) Failing after 11s Details	2026-04-19 13:56:26 -07:00
Benson Wong	69728301f5	.github/workflows: add toggle for pushing unified images to github (#672 ) Close inactive issues / close-issues (push) Successful in 7s Details Add ability to dispatch (manually run) unified container builds in github without push to ghcr.io.	2026-04-19 10:10:48 -07:00
Benson Wong	c176fa70f1	docker/unified: add spirv-headers to fix vulkan build (#669 ) Close inactive issues / close-issues (push) Successful in 7s Details Build Unified Docker Image / setup (push) Successful in 3s Details Build Containers / build-and-push (cpu) (push) Failing after 12s Details Build Containers / build-and-push (cuda) (push) Failing after 11s Details Build Containers / build-and-push (cuda13) (push) Failing after 13s Details Build Containers / build-and-push (intel) (push) Failing after 13s Details Build Containers / build-and-push (musa) (push) Failing after 13s Details Build Containers / build-and-push (rocm) (push) Failing after 15s Details Build Containers / build-and-push (vulkan) (push) Failing after 12s Details Build Containers / delete-untagged-containers (push) Has been skipped Details Build Unified Docker Image / build (push) Failing after 11s Details	2026-04-18 12:18:10 -07:00
Benson Wong	5e3c646829	proxy: compress captures with zstd (#668 ) Linux CI / run-tests (push) Successful in 4m4s Details Windows CI / run-tests (push) Has been cancelled Details The previous captures were saved uncompressed in memory. In agentic workflows there can be many turns with each request containing the previous context in the body with a lot of redundant data. Use zstd to compress the request and response data before keeping a copy of memory. Results: - Average Percentage Saved: 73.19% - Average Compression Factor: ~6.77:1 v203	2026-04-17 23:29:37 -07:00
Benson Wong	c3f0d43e6e	proxy: fix race conditions during swap (#667 ) I pointed Opus 4.7 (high effort) at proxy.ProcessGroup to identify any race conditions in the swapping code. It found a race condition where there is a small window in the fast path for routing a request to a loaded model. There is a very small window where: - model M1 is loaded and ready for requests - a request, R1, for M1 comes in - a request, R2, for M2 comes in almost immediately after - R1 acquires the lock, sees M1 is loaded (fast path), releases the lock `[race window]` and the request is ready to be forwarded - the race window occurs between the release of the lock and the request being forwarded - the lock is released so requests can be handled concurrently - R2 comes in within the `[race window]`, acquires the lock, triggers a model swap to M2. stopping M1 - R1 is forwarded to a model that is unloaded or in the process of shutting down creating an error response In deployed systems the race window is very small and doesn't happen often. However with #635 and PR #656 I though this deserved a bit more attention. It is not concluded that this race is the cause of #635 but the race is likely to happen more often under sustained or high load. AI Note: Opus 4.7 x-high effort took about an hour to write the original patch. With the pattern discovered the fix to matrix.go was very quick. GLM 5.1 using the previous established patterns was able to easily write the fix for ProcessGroup.StopProcesses(). Supersedes: #656 Updates: #277, #635	2026-04-17 21:23:17 -07:00
Benson Wong	f6cf9f5844	proxy: Refactor tests (#660 ) Linux CI / run-tests (push) Successful in 3m54s Details Close inactive issues / close-issues (push) Successful in 12s Details Build Unified Docker Image / setup (push) Successful in 2s Details Build Containers / build-and-push (cpu) (push) Failing after 12s Details Build Containers / build-and-push (cuda) (push) Failing after 11s Details Build Containers / build-and-push (cuda13) (push) Failing after 14s Details Build Containers / build-and-push (intel) (push) Failing after 12s Details Build Containers / build-and-push (musa) (push) Failing after 12s Details Build Containers / build-and-push (rocm) (push) Failing after 12s Details Build Containers / build-and-push (vulkan) (push) Failing after 11s Details Build Containers / delete-untagged-containers (push) Has been skipped Details Build Unified Docker Image / build (push) Failing after 11s Details Windows CI / run-tests (push) Has been cancelled Details - use YAML for test configurations - remove most uses of simple-responder, opting to use process.testHandler Fixes #655	2026-04-16 22:47:42 -07:00
Benson Wong	121fd93ad8	Makefile: restore linux arm64 targets Validate JSON Schema / validate-schema (push) Successful in 27s Details Linux CI / run-tests (push) Successful in 7m45s Details Windows CI / run-tests (push) Has been cancelled Details Close inactive issues / close-issues (push) Successful in 10s Details Build Unified Docker Image / setup (push) Successful in 3s Details Build Containers / build-and-push (cpu) (push) Failing after 13s Details Build Containers / build-and-push (cuda) (push) Failing after 12s Details Build Containers / build-and-push (cuda13) (push) Failing after 12s Details Build Containers / build-and-push (intel) (push) Failing after 12s Details Build Containers / build-and-push (musa) (push) Failing after 11s Details Build Containers / build-and-push (rocm) (push) Failing after 11s Details Build Containers / build-and-push (vulkan) (push) Failing after 11s Details Build Containers / delete-untagged-containers (push) Has been skipped Details Build Unified Docker Image / build (push) Failing after 11s Details Fix #641	2026-04-14 22:05:39 -07:00
Benson Wong	17233e9278	docs: update configuration.md for matrix v202	2026-04-14 22:01:03 -07:00
Benson Wong	4866d16c3e	README.md: update to use matrix instead of groups	2026-04-14 21:57:49 -07:00
Benson Wong	35193f82f1	proxy: add swap matrix with solver-based model swapping (#646 ) Add a new swap matrix to supersede groups for running concurrent models. The matrix uses a solver that picks the lowest cost evictions to make a requested model available. This simple approach along with a very basic DSL grammar can enable very complex swapping scenarios. - add DSL parser for set expressions with & (AND), \| (OR), (), +ref - add MatrixConfig structs, validation, and topological sort for +ref - add MatrixSolver with cost-minimizing swap decisions - add Matrix runtime integrating solver with Process lifecycle - integrate matrix into ProxyManager with if-branches at all endpoints - update config.example.yaml and config-schema.json with matrix schema - config enforces groups XOR matrix (cannot use both) fixes #643	2026-04-14 21:55:30 -07:00
Benson Wong	40e39f7a86	ui-svelte: fix security issues (#649 ) UI Tests / run-tests (push) Successful in 1m19s Details Close inactive issues / close-issues (push) Successful in 12s Details Build Unified Docker Image / setup (push) Successful in 3s Details Build Containers / build-and-push (cpu) (push) Failing after 17s Details Build Containers / build-and-push (cuda) (push) Failing after 15s Details Build Containers / build-and-push (cuda13) (push) Failing after 11s Details Build Containers / build-and-push (intel) (push) Failing after 11s Details Build Containers / build-and-push (musa) (push) Failing after 11s Details Build Containers / build-and-push (rocm) (push) Failing after 18s Details Build Containers / build-and-push (vulkan) (push) Failing after 12s Details Build Containers / delete-untagged-containers (push) Has been skipped Details Build Unified Docker Image / build (push) Failing after 12s Details	2026-04-12 16:21:31 -07:00
Benson Wong	a9d840ffd7	proxy,proxy/config: restore timeouts to pre PR 619 (#648 ) Build Containers / build-and-push (cpu) (push) Failing after 53s Details Build Containers / build-and-push (cuda) (push) Failing after 57s Details Build Containers / build-and-push (cuda13) (push) Failing after 11s Details Build Containers / build-and-push (intel) (push) Failing after 11s Details Build Containers / build-and-push (musa) (push) Failing after 11s Details Build Containers / build-and-push (vulkan) (push) Failing after 29s Details Build Containers / build-and-push (rocm) (push) Failing after 49s Details goreleaser / goreleaser (push) Failing after 13s Details Build Containers / delete-untagged-containers (push) Has been skipped Details goreleaser / trigger-tap-update (push) Has been skipped Details Reset the default ResponseHeader timeout to 0 (no timeout) which was set to 60 seconds in PR #619. Fixes #647 v201	2026-04-11 20:42:13 -07:00
Benson Wong	7b2b82777f	docker/unified: derive rootless image from root container (#644 ) Build the root image once, then derive the rootless variant from it using a small inline Dockerfile that adds the non-root user and chowns the writable directories. This halves the number of CI jobs (4 → 2) and eliminates the redundant full CUDA compilation for the rootless variant. - remove RUN_UID build arg from build-image.sh - derive rootless image inline after root build completes - collapse variant matrix out of unified-docker.yml - push both root and rootless tags in a single CI job Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-10 22:59:54 -07:00

1 2 3 4 5 ...

470 Commits