- Login to git.wylab.me instead of ghcr.io
- Use Gitea-hosted llama.cpp-rocm base image instead of ghcr.io
- Rewrite fetch_llama_tag to use anonymous OCI registry API
- Add LS_UPSTREAM for release binary fetches on forks
- Add REGISTRY and BASE_TAG overrides for self-hosted builds
- Only build rocm platform
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Build the root image once, then derive the rootless variant from it
using a small inline Dockerfile that adds the non-root user and chowns
the writable directories. This halves the number of CI jobs (4 → 2) and
eliminates the redundant full CUDA compilation for the rootless variant.
- remove RUN_UID build arg from build-image.sh
- derive rootless image inline after root build completes
- collapse variant matrix out of unified-docker.yml
- push both root and rootless tags in a single CI job
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Extend the existing config-schema workflow to also validate
config.example.yaml against config-schema.json using check-jsonschema.
- add config.example.yaml to PR and push path triggers
- install check-jsonschema via pip
- run validation of config.example.yaml against schema
https://claude.ai/code/session_01Y1oqwE6mwNs9UTJgZRgXtG
---------
Co-authored-by: Claude <noreply@anthropic.com>
multiple fixes to vulkan build:
- use ubuntu 26.04 to be compatible with AMD 395+ (Strix halo) hardware
- add home directory in container
- fix stable-diffusion install to actually enable vulkan
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
- set up a GHA scheduled job to build the container nightly
- enabling pushing a llama-swap:unified and a llama-swap:unified-Y-M-D
image to ghcr.io
- tidy up Dockerfile to use a non-root user and llama-swap as an entry
point
Add proxy routes for stable-diffusion.cpp's /sdapi/v1/txt2img,
/sdapi/v1/img2img, and /sdapi/v1/loras endpoints. POST endpoints
use proxyInferenceHandler (model in JSON body), GET /loras uses
proxyGETModelHandler (model in query param).
Update the image playground with a dual-mode UI supporting both
OpenAI and SDAPI backends. In SDAPI mode, loras are fetched first
to prime the server-side cache, and all txt2img parameters are
exposed (negative prompt, steps, cfg_scale, seed, batch_size,
clip_skip, sampler, scheduler, lora selection with multipliers).
- Add 3 sdapi route registrations in proxymanager.go
- Add sdApi.ts client with generateSdImage and fetchSdLoras
- Add SDAPI types (SdApiTxt2ImgRequest, SdApiResponse, etc.)
- Add /sdapi to vite dev proxy config
- Add backend tests for sdapi routing
- Support batch image display in gallery grid
https://claude.ai/code/session_0186MGX6NXdHVBTv2KH45fqn
---------
Co-authored-by: Claude <noreply@anthropic.com>
Add `cuda13` as a supported build architecture, targeting the
`ghcr.io/ggml-org/llama.cpp:server-cuda13` upstream base image.
The `server-cuda13` image ships with CUDA 13 libraries, providing
improved performance on recent NVIDIA hardware compared to the existing
`server-cuda` (CUDA 12) image. Users with newer GPUs (e.g., RTX
50-series) benefit from reduced model load latency and higher token
throughput.
- Add `cuda13` to the allowed architectures list in
`docker/build-container.sh`
- Add `cuda13` to the CI matrix in `.github/workflows/containers.yml` so
the container is built and pushed automatically
* .github/workflows: add UI tests and path-filter Go CI
Add ui-tests.yml workflow to run svelte type checking and vitest
on push/PR to main when ui-svelte/ files change.
- Add path filters to go-ci.yml and go-ci-windows.yml to skip
Go tests when only non-backend files change
- Filter on **/*.go, go.mod, go.sum, and Makefile
https://claude.ai/code/session_01E6acq54D8JjuE7pczxPGT7
* ui-svelte: remove unused declarations in SpeechInterface
Remove unused `generatedText` state and `clearAudio` function
that caused svelte-check errors.
https://claude.ai/code/session_01E6acq54D8JjuE7pczxPGT7
* .github/workflows: update Node.js to v24
Node 23 is end-of-life; bump to 24 in ui-tests.yml and release.yml.
https://claude.ai/code/session_01E6acq54D8JjuE7pczxPGT7
---------
Co-authored-by: Claude <noreply@anthropic.com>
Replace the legacy React UI with the new Svelte-based one. Introduce a Playground in the UI to quickly test out text, image, text to speech and speech to text models behind llama-swap.
Key Changes
New Svelte UI (ui-svelte/)
- Multi-tab Playground with Chat, Image Generation, Audio Transcription, and Speech interfaces
- Chat: message editing/regeneration, markdown rendering with LaTeX math support, image attachments, code syntax highlighting
- Image: size selector, download/fullscreen viewing
- Audio: transcription with peer support
- Speech: voice caching with manual refresh, download button
- Responsive mobile layout with collapsible navigation
- XSS fixes and accessibility improvements
Proxy Improvements
- Add gzip/brotli compression for UI static assets (proxy/ui_compress.go)
- Add GET /v1/audio/voices?model={model} endpoint for voice listing
- Add peer support for /v1/audio/transcriptions
* docker: add .env usage in build-container.sh
* .github,docker: add rocm, improve logging
* .github,CLAUDE.md: fix workflow and update guidelines
Update containers workflow to only push images when triggered
manually or on schedule, not on workflow file changes.
- add push trigger for workflow file changes in containers.yml
- update push condition to skip on regular push events
- update CLAUDE.md commit message guidelines
* docker: remove comma in build-container.sh
* .github,docker: improve container build workflow
Add pagination support for fetching llama.cpp tags and improve debugging.
- add build-container.sh to workflow trigger paths
- implement fetch_llama_tag() with pagination support
- replace .env with local testing instructions
- add DEBUG_ABORT_BUILD flag for testing