llama-swap

mirror of https://github.com/mostlygeek/llama-swap.git synced 2026-06-09 06:46:34 +02:00

Author	SHA1	Message	Date
Benson Wong	4ca9c478a2	Makefile,internal/server: various release tweaks UI Tests / run-tests (push) Failing after 14m52s Details Linux CI / run-tests (push) Failing after 14m54s Details Windows CI / run-tests (push) Has been cancelled Details	2026-05-29 15:27:08 -07:00
Benson Wong	02e015fa49	Introduce new routing backend (#790 ) Linux CI / run-tests (push) Failing after 14m56s Details Windows CI / run-tests (push) Has been cancelled Details This is a huge backend change that essentially started with rewriting the concurrency handling for processes and blew up to a refactor of the entire application. In short these are the improvements: Better state and life cycle management: Life cycle management of processes has always been the trickiest part of the code. Juggling mutex locks between multiple locations to reduce race conditions was complex. Too complex for my feeble brain to build a simple mental model around as llama-swap gained more features. All of that has been refactored. Most of the locks are gone, replaced with a single run() that owns all state changes. There is one place to start from now to understand and extend routing logic. The improved life cycle management makes it easier to implement more complex swap optimization strategies in the future like #727. Collation of requests: llama-swap previously handled requests and swapping in the order they came in. For example requests for models in this order ABCABC would result in 5 swaps. Now those requests are handled in this order AABBCC. The result is less time waiting for swap under a high churn request queue. This fixes #588 #612. A possible future enhancement is to support a starvation parameter so swap can be forced when models have been waiting too long. Shared base implementation for groups and swap matrix: During the refactor it became clear that much of the swapping logic was shared between these two implementations. That is not surprising considering the swap matrix was added many moons after groups. Now they share a common base and their specific swap strategies are implemented into the swapPlanner interface. Requests for bespoke or specific swapping scenarios is a common theme in the issues. Now users can implement whatever bespoke and weird swapping strategy they want in their own fork. Just ask your agent of choice to implement swapPlanner. I'll still remaining more conservative on what actually lands in core llama-swap and will continue to evaluate PRs if the changes is good for everyone or just one specific use case. AI / Agentic Disclosure: I paid very close attention to the low level swap concurrency design and implementation. It's important to keep that essential part reliable, boring and no surprises. Backwards compatibility was also maintained, even the one way non-exclusive group model loading behaviour that people have rightly pointed out be a weird design decision. With the underlying swap core done the web server, api and UI sitting on top were largely ported over with Claude Code and Opus 4.7 in multiple phases. If you're curious I kept the changes in docs/newrouter-todo.md. I did several passes to make sure things weren't left behind. However, even frontier LLMs at the time of this PR still make small decisions that don't make a lot of sense. They get shit wrong all the time, just in small subtle way. That said, there's likely to be some new bugs introduced with this massive refactor. I'm fairly confident that there's no major architectural flaws that would cause goal seeking agents to make dumb, ugly code decisions. For a little while the legacy llama-swap will be available under cmd/legacy/llama-swap. The plan is to eventually delete that entry point as well as the proxy package. On a bit of a personal note, this PR is exciting and a bit sad for me. I hand wrote much of the original code and this PR ultimately replaces much of it. While the old code served as a good reference for the agent to implement the new stuff it still a bit sad to eventually delete it all.	2026-05-28 21:47:01 -07:00
Benson Wong	7e3e94a08a	proxy,ui: add performance monitoring with Prometheus metrics (#743 ) Validate JSON Schema / validate-schema (push) Successful in 25s Details UI Tests / run-tests (push) Successful in 1m16s Details Linux CI / run-tests (push) Successful in 3m36s Details Windows CI / run-tests (push) Has been cancelled Details Add a comprehensive performance monitoring system that collects CPU, memory, swap, load average, network IO, and GPU stats. Provides both a REST API for the UI and a Prometheus /metrics endpoint. Backend changes: - New internal/perf package with configurable interval-based stats collection - GPU monitoring via LACT (Unix socket) and nvidia-smi fallback on Linux - Ring buffer (internal/ring) for time-series stat storage - Prometheus /metrics endpoint with all system and GPU metrics - Moved LogMonitor to internal/logmon package - New PerformanceConfig for hot-reloadable monitoring settings - REST /api/performance endpoint replacing SSE streaming UI changes: - New Performance page with real-time charts for CPU, memory, GPU, and network - Reusable PerformanceChart component - LLAMA_SWAP_URL environment variable support - Improved capture dialog display Other: - Example Grafana dashboard for Prometheus metrics - monitor-test standalone binary - Config schema and example updates fixes #596	2026-05-09 13:29:22 -07:00
Benson Wong	0b31ccacc1	ui-svelte: fix histogram calculation (#695 ) UI Tests / run-tests (push) Successful in 1m17s Details Linux CI / run-tests (push) Successful in 4m8s Details Close inactive issues / close-issues (push) Successful in 7s Details Build Unified Docker Image / setup (push) Successful in 3s Details Build Containers / build-and-push (cpu) (push) Failing after 11s Details Build Containers / build-and-push (cuda) (push) Failing after 11s Details Build Containers / build-and-push (cuda13) (push) Failing after 11s Details Build Containers / build-and-push (intel) (push) Failing after 11s Details Build Containers / build-and-push (musa) (push) Failing after 11s Details Build Containers / build-and-push (rocm) (push) Failing after 11s Details Build Containers / build-and-push (vulkan) (push) Failing after 11s Details Build Containers / delete-untagged-containers (push) Has been skipped Details Build Unified Docker Image / build (push) Failing after 10s Details Windows CI / run-tests (push) Has been cancelled Details - Fix the histogram calculation to use server provided generation tokens/second. - Move histogram to Activities page where it can exist with the rest of the token metrics Fixes #681	2026-04-22 23:42:39 -07:00
Benson Wong	121fd93ad8	Makefile: restore linux arm64 targets Validate JSON Schema / validate-schema (push) Successful in 27s Details Linux CI / run-tests (push) Successful in 7m45s Details Windows CI / run-tests (push) Has been cancelled Details Close inactive issues / close-issues (push) Successful in 10s Details Build Unified Docker Image / setup (push) Successful in 3s Details Build Containers / build-and-push (cpu) (push) Failing after 13s Details Build Containers / build-and-push (cuda) (push) Failing after 12s Details Build Containers / build-and-push (cuda13) (push) Failing after 12s Details Build Containers / build-and-push (intel) (push) Failing after 12s Details Build Containers / build-and-push (musa) (push) Failing after 11s Details Build Containers / build-and-push (rocm) (push) Failing after 11s Details Build Containers / build-and-push (vulkan) (push) Failing after 11s Details Build Containers / delete-untagged-containers (push) Has been skipped Details Build Unified Docker Image / build (push) Failing after 11s Details Fix #641	2026-04-14 22:05:39 -07:00
Benson Wong	916d13f5bd	.github/workflows,docker/unified: add cuda based unified container (#597 ) Add Docker build scripts for a unified cuda docker container with llama-server, stable-diffusion.cpp, whisper.cpp.	2026-03-22 21:11:54 +09:00
Benson Wong	4384315b44	ui-svelte: add Svelte port of React UI (#487 ) Trying out svelte for the UI. The port was done by Claude Code on the iOS app w/ Opus 4.5. --- * ui: add Svelte port of React UI Port the React-based UI to Svelte 5 with the following changes: - Create new ui-svelte directory with complete Svelte 5 implementation - Use Svelte stores instead of React contexts for state management - Implement custom ResizablePanels component to replace react-resizable-panels - Port all pages: LogViewer, Models, Activity - Port all components: Header, ConnectionStatus, LogPanel, ModelsPanel, etc. - Use svelte-spa-router for client-side routing - Same build output directory (proxy/ui_dist) and base path (/ui/) - Tailwind CSS 4 with same theme configuration https://claude.ai/code/session_01F3xXLYsd62gePVSFv7aboP * ui-svelte: simplify state management - Remove redundant state syncing pattern in LogPanel and ModelsPanel - Use store values directly with $ syntax instead of manual subscriptions - Consolidate duplicate title sync logic in App.svelte - Use existing syncTitleToDocument() from theme.ts https://claude.ai/code/session_01F3xXLYsd62gePVSFv7aboP * ui-svelte: use idiomatic Svelte 5 patterns - Use $effect for document side effects (theme, title) instead of store subscriptions - Use class: directive for active nav links in Header - Remove SSR guards (unnecessary for client-only SPA) - Remove leaked subscription in syncThemeToDocument - Simplify theme.ts by removing sync functions https://claude.ai/code/session_01F3xXLYsd62gePVSFv7aboP * ui-svelte: fix build warnings and improve accessibility Fix Svelte build warnings and add proper accessibility support to interactive components. - add aria-labels to buttons for screen readers - implement keyboard navigation for resizable separator - suppress intentional state initialization warnings - update Makefile to use ui-svelte build directory - add peer:true to package-lock.json dependencies * ui-svelte: reorganize navigation and add log view toggle Make Models the default landing page and add view mode toggle to the Logs page with persistent state. - set Models as default route at / - move Logs to /logs route - reorder navigation: Models, Activity, Logs - add view toggle with three modes: Panels, Proxy only, Upstream only - fix horizontal overflow with width constraints	2026-01-28 21:37:29 -08:00
Benson Wong	d18dc26d01	cmd/wol-proxy: tweak logs to show what is causing wake ups (#356 ) fix the extra wake ups being caused by wol-proxy * cmd/wol-proxy: tweak logs to show what is causing wake ups * cmd/wol-proxy: add skip wakeup * cmd/wol-proxy: replace ticker with SSE connection * cmd/wol-proxy: increase scanner buffer size * cmd/wol-proxy: improve failure tracking	2025-10-25 11:04:31 -07:00
Benson Wong	c07179d6e2	cmd/wol-proxy: add wol-proxy (#352 ) add a wake-on-lan proxy for llama-swap. When the target llama-swap server is unreachable it will send hold a request, send a WoL packet and proxy the request when llama-swap is available.	2025-10-20 20:55:02 -07:00
Benson Wong	9fc0431531	Clean up and Documentation (#347 ) [skip ci] * cmd,misc: move misc binaries to cmd/ * docs: add docs and move examples/ there * misc: remove unused misc/assets dir * docs: add configuration.md * update README with better structure Updates: #334	2025-10-19 14:53:13 -07:00
Benson Wong	caf9e98b1e	Fix race conditions in proxy.Process (#349 ) - Fix data races found in proxy.Process by go's race detector. - Add data race detection to the CI tests. Fixes #348	2025-10-13 16:42:49 -07:00
Benson Wong	70930e4e91	proxy: add support for user defined metadata in model configs (#333 ) Changes: - add Metadata key to ModelConfig - include metadata in /v1/models under meta.llamaswap key - add recursive macro substitution into Metadata - change macros at global and model level to be any scalar type Note: This is the first mostly AI generated change to llama-swap. See #333 for notes about the workflow and approach to AI going forward.	2025-10-04 19:56:41 -07:00
Benson Wong	216c40b951	proxy/config: create config package and migrate configuration (#329 ) * proxy/config: create config package and migrate configuration The configuration is become more complex as llama-swap adds more advanced features. This commit moves config to its own package so it can be developed independently of the proxy package. Additionally, enforcing a public API for a configuration will allow downstream usage to be more decoupled.	2025-09-28 16:50:06 -07:00
Aaron Ang	6307bd3205	Add support for building Linux ARM64 binary in Makefile (#221 )	2025-08-05 16:26:06 -07:00
Benson Wong	54c519e365	update Makefile to install ui deps	2025-06-17 09:54:01 -07:00
Benson Wong	9a3c656738	New UI (#157 , #164 ) - Add a react UI to replace the plain HTML one. - Serve as a foundation for better GUI interactions	2025-06-16 16:45:19 -07:00
Benson Wong	d7b390df74	Add GH Action for Testing on Windows (#132 ) * Add windows specific test changes * Change the command line parsing library - Possible breaking changes for windows users!	2025-05-14 21:51:53 -07:00
Benson Wong	7f37bcc6eb	Improve testing around using SIGKILL (#127 ) * Add test for SIGKILL of process * silent TestProxyManager_RunningEndpoint debug output * Ref #125	2025-05-13 21:21:52 -07:00
Benson Wong	0815bb4cc3	Add windows to goreleaser #54	2025-02-18 17:26:43 -08:00
daschiller	7187cfe52e	add Windows build support to Makefile (#54 )	2025-02-18 17:24:31 -08:00
Benson Wong	13d4552edc	Add FreeBSD/amd64 to auto built releases (#51 )	2025-02-13 16:44:31 -08:00
Benson Wong	d6ca535939	tweak release tagging so it is not based on number of commits	2024-12-14 15:46:10 -08:00
Benson Wong	27302c0c02	change llama-swap to use goreleaser default ldflag values	2024-12-14 10:30:06 -08:00
Benson Wong	4c94927658	Move release to Makefile out of goreleaser - less complexity - easier - goreleaser, github, pipelines: 1... mostlygeek: 0	2024-12-14 10:16:46 -08:00
Benson Wong	22d3f1a4f9	Change versioning to use git commits counts instead of semver - less work for me - more frequent releases	2024-12-14 09:53:13 -08:00
Benson Wong	533162ce6a	add support for automatically unloading a model (#10 ) (#14 ) * Make starting upstream process on-demand (#10) * Add automatic unload of model after TTL is reached * add `ttl` configuration parameter to models in seconds, default is 0 (never unload)	2024-11-19 16:32:51 -08:00
Benson Wong	e5c909ddf7	add tests for proxy.Process	2024-11-17 20:49:14 -08:00
Benson Wong	8cf2a389d8	Refactor log implementation - use []byte instead of unnecessary string conversions - make LogManager.Broadcast private - make LogManager.GetHistory public - add tests	2024-10-31 12:16:54 -07:00
Benson Wong	ef05c05f9c	renaming to llama-swap	2024-10-04 20:21:11 -07:00
Benson Wong	e0103d1884	build simple-responder with make all	2024-10-04 12:14:10 -07:00
Benson Wong	d682589fb1	support environment variables	2024-10-04 11:55:27 -07:00
Benson Wong	aaca9d889b	add Makefile	2024-10-04 11:07:00 -07:00

32 Commits