llama-swap

mirror of https://github.com/mostlygeek/llama-swap.git synced 2026-06-09 06:46:34 +02:00

Author	SHA1	Message	Date
Benson Wong	f6cf9f5844	proxy: Refactor tests (#660 ) Linux CI / run-tests (push) Successful in 3m54s Details Close inactive issues / close-issues (push) Successful in 12s Details Build Unified Docker Image / setup (push) Successful in 2s Details Build Containers / build-and-push (cpu) (push) Failing after 12s Details Build Containers / build-and-push (cuda) (push) Failing after 11s Details Build Containers / build-and-push (cuda13) (push) Failing after 14s Details Build Containers / build-and-push (intel) (push) Failing after 12s Details Build Containers / build-and-push (musa) (push) Failing after 12s Details Build Containers / build-and-push (rocm) (push) Failing after 12s Details Build Containers / build-and-push (vulkan) (push) Failing after 11s Details Build Containers / delete-untagged-containers (push) Has been skipped Details Build Unified Docker Image / build (push) Failing after 11s Details Windows CI / run-tests (push) Has been cancelled Details - use YAML for test configurations - remove most uses of simple-responder, opting to use process.testHandler Fixes #655	2026-04-16 22:47:42 -07:00
Benson Wong	d3f329f924	proxy: Improve logging performance and allow separate log streaming (#421 ) Replace container/ring.Ring with a custom circularBuffer that uses a single contiguous []byte slice. This fixes the original implementation which created 10,240 ring elements instead of 10KB of storage. GetHistory is now 139x faster (145μs → 1μs) and uses 117x less memory (1.2MB → 10KB). Allocations reduced from 2 to 1 per write operation. Create a LogMonitor per proxy.Process, replacing the usage of a shared one. The buffer in LogMonitor is lazy allocated on the first call to Write and freed when the Process is stopped. This reduces unnecessary memory usage when a model is not active. The /logs/stream/{model_id} endpoint was added to stream logs from a specific process.	2025-12-18 21:49:25 -08:00
Benson Wong	00b738cd0f	Add Macro-In-Macro Support (#337 ) Add full macro-in-macro support so any user defined macro can contain another one as long as it was previously declared in the configuration file. Fixes #336 Supercedes #335	2025-10-06 22:57:15 -07:00
Benson Wong	70930e4e91	proxy: add support for user defined metadata in model configs (#333 ) Changes: - add Metadata key to ModelConfig - include metadata in /v1/models under meta.llamaswap key - add recursive macro substitution into Metadata - change macros at global and model level to be any scalar type Note: This is the first mostly AI generated change to llama-swap. See #333 for notes about the workflow and approach to AI going forward.	2025-10-04 19:56:41 -07:00

4 Commits