607 Commits

Author SHA1 Message Date
Sigbjørn Skjæret c2ba3e47a2 add sycl to check-release (#24583) 2026-06-14 09:42:26 +08:00
Xuan-Son Nguyen 57fe1f07c3 server: clean up static assets handling (#24550)
* server: clean up static assets handling

* nits

* simplify file name handling, use static file name everywhere

* cmake/ui : bundle UI assets in an archive

* ui : run prettier on post-build.js

---------

Co-authored-by: Alde Rojas <hello@alde.dev>
2026-06-13 11:51:20 +02:00
Muhammad Salem c34b92235b fix sycl links in release notes (#24527)
* fix sycl links in release notes

* remove extra line
2026-06-13 08:37:55 +08:00
Sigbjørn Skjæret f58bad4137 ci : unbreak release harder (#24545)
* unbreak release harder

* missed one

* remove missing test for now
2026-06-12 23:49:36 +02:00
Sigbjørn Skjæret cd5044661c ci : unbreak release (#24544) 2026-06-12 23:29:49 +03:00
Aleksander Grygier f7ca93d12c ui: PWA support (#23871)
* feat: Add basic PWA support and service worker for offline caching

* feat: Vite PWA implementation WIP

* feat: Improve PWA icons generation

* feat: Add PWA workbox to server routes

* feat: Include `version.json` in static assets

* feat: Add HTTP cache headers for PWA static assets

* feat: Update app name for `apple-mobile-web-app-title`

* feat: Implement PWA versioning and automatic update detection

* chore: Update `.gitignore` files

* feat: Splash Screens

* feat: Add dark mode favicon support

* refactor: Cleanup

* fix: Use dark logo for dark splash screens

* refactor: Simplify favicons SVG code

* fix: Adjust caching and polling for reliable service worker updates

* fix: Add missing favicon entry

* fix: Align PWA service worker configuration with SvelteKit build structure

* fix: Replace hashed bundle paths with versioned static paths

* test: Add PWA tests

* ci: Add build output for unit tests

* refactor: Cleanup

* fix: Server build & release versioning

* chore: Update package-lock.json

* chore: Increase PWA cache size

* chore: Update packages

* feat: Update favicons

* refactor: Post-merge fix

* feat: support explicit build version for PWA cache busting

* fix: CI

* feat: Improve PWA Refresh Alert UI

* feat: Add toggleable build version display

* refactor: Cleanup

* feat: Add version mismatch detection and manual app reload

* refactor: replace dynamic imports with static

* refactor: Cleanup

* feat: Add safe space for `pwa-<size>.png` rendered icons

* fix: use relative paths for PWA assets to support base path deployment

* feat: add PWA mode detection via URL query parameter

* feat: Use ?cache=true for SW-cached PWA assets

* refactor: Build process cleanup

* refactor: Decouple PWA versioning and remove ?cache=true workaround

* chore: Update README logo

* feat: Include PWA Assets generation in build script

* refactor: `usePwa` hook for core layout

* fix: Relativize base vite plugin

* fix: remove unnecessary backslash escapes in test regexes

* test: update static asset paths for API Key test

* refactor: Move SvelteKit PWA Options config to constants

* ui: fix update notification never appearing

Keep the PWA hook object intact instead of destructuring needRefreshByStorage,
which freezes the reactive getter. Also exclude loading.html from PWA
precache to prevent 404 errors and broken SW installation.
2026-06-12 15:53:26 +02:00
Neo Zhang 099ea76fb4 [SYCL] Fix CI build & release for SYCL backend (#24387)
* restore SYCL build and release, remove github cache

* modify for test only

* verify the ccache is used

* remove debug code change

* rm duplicate action, update key in ccache

* add action ccache-clear after building in both ubuntu and windows

* set %NUMBER_OF_PROCESSORS% in widnows build
2026-06-12 09:30:24 +03:00
Sigbjørn Skjæret 039e20a2db ci : bump komac version (#24396) 2026-06-10 09:45:20 +02:00
Sigbjørn Skjæret e25a32e98c ci : fix windows release (#24369) 2026-06-09 19:42:23 +03:00
Reese Levine 3ac3c20c96 ggml-webgpu: Add clang-format job (#24308)
* Add clang-format job

* try local formatting
2026-06-08 20:54:24 -07:00
Sigbjørn Skjæret 3f7c79d7b5 docker : bump cuda13 to 13.3.0 (#24228) 2026-06-07 08:31:58 +02:00
Daniel Bevenius 46fa662b1f ci : build-msys job slimming [no ci] (#24157)
This PR attempts to slim down the dependencies for build-msys jobs
making the same changes that we applied in whisper.cpp to reduce the
size of the github actions cache, and should also improve the run time
due to fewer dependencies that need to be installed.

I realize this is a scheduled job but I think it would still make sense
to apply these changes.

Refs: https://github.com/ggml-org/whisper.cpp/pull/3858
2026-06-05 07:57:36 +02:00
Georgi Gerganov 4da6370d43 ci : disable ccache for msvc windows release jobs (#23911) 2026-06-03 08:05:21 +03:00
Georgi Gerganov a468b89018 ci : reduce self-hosted server workflow jobs (#24012)
Reduce the number of parallel jobs in server-self-hosted.yml by stacking
test configurations as sequential steps within a single job, following the
pattern from #23927.

- server-metal: 4 matrix jobs -> 1 job with 4 sequential test steps
- server-cuda: 2 matrix jobs -> 1 job with 2 sequential test steps
- server-kleidiai: removed unnecessary single-entry matrix
- removed unused Setup Node.js step from server-metal

Total: 7 parallel jobs -> 3 parallel jobs

Assisted-by: llama.cpp:local pi
2026-06-02 13:17:59 +03:00
Georgi Gerganov 5dcb711666 speculative : fix n_outputs_max and remove draft-simple auto-enable (#23988)
* speculative : add common_speculative_n_max helper function

Extract the speculative max-draft-size logic from server_n_outputs_max
into a reusable common_speculative_n_max() function in common/speculative.

Assisted-by: llama.cpp:local pi

* cont : draft context always has n_parallel outputs

* llama : log n_outputs_max

* speculative : remove draft-simple auto-enable

* ci : enable server tests on PRs
2026-06-01 22:26:58 +03:00
Georgi Gerganov e22b0de60d ci : add missing Linux label to cpu-x64-high-perf runner (#23958)
Fixes: https://github.com/ggml-org/llama.cpp/pull/23927#discussion_r3332213086

The cpu-x64-high-perf job was missing the Linux label in its runs-on
specification, causing the runner to not be discovered. All other
self-hosted Linux jobs include this label.

Assisted-by: llama.cpp:local pi
2026-06-01 10:39:59 +03:00
Eve af6528e6df ci: remove redundant or duplicate jobs (#23927)
* remove redundant apple job

openvino gpu and cpu test can share the same build and machine

Update build-rpc.yml

Update build-openvino.yml

cpu any doesnt make sense as we have an arm job already, so do high perf on both x86 and arm

remove duplicate x86 vulkan

combine backend sampling

Update server.yml

run server on arm as windows is x86

* emdawn on one machine only

* fix openvino, remove cpu tag as we dont have many x64 machines with that tag
2026-06-01 06:32:17 +03:00
Georgi Gerganov 399739d5c5 ci : limit trigger paths for the CPU workflow (#23938) 2026-05-31 19:02:47 +03:00
Georgi Gerganov 4c4e91b799 ci : update ios-xcode release job to macos-26 (#23906)
* ci : disable libcommon build from xcframework

* ocd : fix name

* ci : ios-xcode change to macos-26

* cont : pin xcode

* cont : pin xcode to minor version
2026-05-30 13:21:46 +03:00
Georgi Gerganov 337528571d ci : fix s390x release job (#23898)
* ci : fix s390x release job

* ci : multi-thread build for `ios-xcode`

* ocd : names
2026-05-30 09:21:38 +03:00
Georgi Gerganov d4204b03a5 ci : clear cache instead of "no timestamp" keys + fix macos (#23895)
* ci : ios use macos-15 again

* ci : add and test ccache-clear

* cont : fix

* cont : set permission

* cont : another permission

* cont : token

* cont : print key

* cont : bring back perms

* cont : test windows

* cont : add token

* cont : cleanup

* ci : make release jobs clean-up their ccache
2026-05-30 08:52:30 +03:00
Georgi Gerganov dc71236b6c ci : update macos release to use macos-26 runner (#23878) 2026-05-29 20:41:57 +03:00
Sigbjørn Skjæret 3ef2369551 ci : run ui publish on ubuntu-slim (#23818)
* run ui publish on self-hosted fast

* run on ubuntu-slim
2026-05-28 20:58:32 +03:00
Georgi Gerganov 445b7cef62 ci : releases use Github-hosted builds for the UI (#23823)
* ci : releases use Github-hosted builds for the UI

* cont : fix name
2026-05-28 17:50:32 +03:00
Georgi Gerganov dd1557907a ci : change Vulkan builds to Release to reduce ccache (#23820)
* ci : disable all CPU variant builds for Vulkan workflow

* cont : change cache key

* cont : change build type
2026-05-28 17:29:11 +03:00
Georgi Gerganov 491c4d7d2e ci : refactor (#23789)
* ci : separate CUDA windows workflow + fix names

* ci : rename workflow

* ci : prefix cache names with workflow name

* ci : rename build.yml -> build-cpu.yml

* ci : cache keys

* ci : fix windows cuda/hip concurrency of release workflow

* ci : fix apple cache names

* ci : add TODOs

* cont : keep just the last cache

* ci : update release concurrency to queue

* ci : move the release trigger to ubuntu-slim

* ci : hip add TODO

* cont : improve words

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2026-05-28 09:44:25 +03:00
Georgi Gerganov ba4dd0bc67 ci : move ARM jobs to self-hosted + disable kleidiai mac release (#23780)
* ci : move ARM jobs to 3rd-party runners + disable kleidiai release

* cont : fix deps + fix names

* ocd : fix names

* cont : fix PR links
2026-05-27 17:22:20 +03:00
Sigbjørn Skjæret 2d0656fbdd ci : bump cuda release to 13.3 (#23749) 2026-05-27 15:06:08 +03:00
Georgi Gerganov 6b4e4bd582 common : fix env names to all have LLAMA_ARG_ prefix (#23778) 2026-05-27 14:52:47 +03:00
Georgi Gerganov 9f0e4b14d2 ci : fix windows ccaches (#23777)
* ci : server windows set build type explicitly

* cont : try windows-2025

* ci : use llvm

* cont : use ninja

* cont : fix shell

* ci : set number of jobs correctly

* ci : fix windows with vulkan ccache by using llvm

* ci : server ccache only on master

* ocd : fix job names

[no release]
2026-05-27 13:54:21 +03:00
Sigbjørn Skjæret b3a739c9b6 ci : remove wasm test (#23733)
* run tests in correct build folder

* remove wasm test
2026-05-27 13:11:37 +03:00
Georgi Gerganov 0d227ec358 ci : add ccache to server builds + fix undefined sanitizer build (#23763)
* ci : fix undefined sanitizer build to use Debug build type only

* ci : ccache the server builds

* cont : remove ui dependency + reuse ccache for both ubuntu jobs

* tmp : force ccache save

* Revert "tmp : force ccache save"

This reverts commit a857b03a10.

* cont : no need for node.js
2026-05-27 11:45:12 +03:00
Georgi Gerganov 0d18aaa9d1 ci : do not allocate ccache for 3rd-party hosted runners (#23730)
* ci : do not allocate ccache for 3rd-party hosted runners

[no release]

* cont : add prints

[no ci]
[no release]
2026-05-26 20:15:01 +03:00
Georgi Gerganov 08bc21b459 ci : move [no release] check to dedicated check_release job (#23734)
* ci : move [no release] check to dedicated check_release job

Move the workflow-level \`if\` condition that skips builds when the commit
message contains \`[no release]\` into a lightweight \`check_release\` job.
All build jobs now depend on it via \`needs\` and check its output.

This ensures the skip logic is evaluated at the job level rather than at
the workflow level, which is the recommended approach for conditional jobs.

Assisted-by: llama.cpp:local pi

* cont : use `fast` runner
2026-05-26 19:49:41 +03:00
Georgi Gerganov 35a74c8fb9 ci : add [no release] keyword + fix sanitizer builds (#23728)
* ci : skip release workflow on master when commit message contains [no release]

Assisted-by: llama.cpp:local pi

* ci : restrict sanitizer builds to x86_64 + fix build type

the spark is apparently too slow for some reason

* tests : fix undefined warning

[no ci]
2026-05-26 19:05:48 +03:00
Georgi Gerganov 5190c2ea8d ci : move macos jobs to the apple workflow + fix names (#23721) 2026-05-26 16:57:55 +03:00
Georgi Gerganov 3a3ed153d9 ci : remove vulkan SDK dep from webgpu job (#23718)
* ci : remove vulkan dep from webgpu build

* cont : add ccache to `ubuntu-24-webgpu-wasm`

* ci : fix name + add wasm test
2026-05-26 16:40:30 +03:00
Georgi Gerganov 678d43d720 ci : move more CPU jobs to self-hosted runners (#23715) 2026-05-26 15:37:40 +03:00
Georgi Gerganov ef41a69179 ci : move sanitizer jobs to self-hosted runners (#23713) 2026-05-26 15:22:09 +03:00
Georgi Gerganov 3dc7684f39 ci : reduce (disable SYCL and CANN builds/releases) (#23705)
* ci : reduce

[no ci]

* cont : disable sycl, cann + rename caches

[no ci]

* cont : cann

[no ci]
2026-05-26 15:21:21 +03:00
Max Krasnyansky 4bead4e30d snapdragon: bump toolchain docker to v0.7 to fix ui build issues (#23680) 2026-05-25 10:57:43 -07:00
Georgi Gerganov 302e2c2652 ci : reduce PR jobs by matching backend paths (#23675)
* ci : disable SYCL f16 builds

* ci : extract android and hip into separate workflows

* ci : move webgpu to separate workflow

* ci : move the rpc to a separate workflow

* ci : extract s309x and ppcl jobs

* ci : extract opencl job into a separate workflow
2026-05-25 20:54:54 +03:00
alex-spacemit 5fdf07e33b ci : update spacemit toolchain url and enhance curl command (#23642)
* fix(action): update SpacemiT toolchain URL and version

Change-Id: If4cc1c738a855274103f8c3ad52daa33528acd0c

* fix(action): add -L flag to curl command for URL redirection

Change-Id: I9b6c37390f0c7a733a36308c8fb53d22d234ab06
2026-05-25 10:43:24 +02:00
Sigbjørn Skjæret 062d3115aa ci : fix pre-tokenizer-hashes check (#23651) 2026-05-25 10:41:25 +02:00
Aldehir Rojas d55fb97174 ci : install host compiler on android-ndk build (#23630) 2026-05-25 10:18:08 +03:00
Georgi Gerganov 28123a3937 ci : move most slim jobs to self-hosted runners (#23619)
* ci : remove tag from build-self-hosted.yml

* ci : slim -> self-hosted

* ci : prevent heavy CPU jobs from running on fast runners

* ci : prevent cmake pkg to run on dedicated fast runners

* ci : try to bump 3.11 -> 3.13

* ci : move lint back to 3.11

* ci : back to 3.11

* ci : add comment about UI jobs

* ci : move python requirements check to CPU runners

this job is a bit slow for a dedicated "fast" runner

* ci : add self-hosted ui workflow

* ci : fix UI naming

* tmp to check if arm64 fast is compatible with all jobs

* revert last commit
2026-05-25 08:11:19 +03:00
Georgi Gerganov 549b9d8433 ci : update build-self-hosted.yml (#23616) 2026-05-24 18:20:10 +03:00
Aldehir Rojas b22ff4b7b4 cmake/ui : refactor the build (#23352) 2026-05-23 17:08:22 -04:00
Georgi Gerganov bbce619adb cmake : add install() for impl libraries + fix apple builds (#23511)
* pi : update

* ci : fix ios build

* ci : fix andoroid

* ci : fix apple builds

* cmake : add install() for impl libraries

Add install(TARGETS <target> LIBRARY) for all -impl libraries that were
changed from STATIC to shared (controlled by BUILD_SHARED_LIBS) in
commit bb28c1fe2. Without this, cmake --install fails to copy the shared
libraries, causing runtime errors like:

  llama-server: error while loading shared libraries: libllama-server-impl.so

Ref: https://github.com/ggml-org/llama.cpp/issues/23494#issuecomment-4512912515

Assisted-by: llama.cpp:local pi

* ci : fix xcframework build
2026-05-22 11:46:26 +03:00
Georgi Gerganov bb28c1fe24 cmake : remove STATIC from impl libraries, enable LLAMA_BUILD_APP by default (#23462)
* cmake : remove STATIC from impl libraries, allow BUILD_SHARED_LIBS control

Remove explicit STATIC from all -impl libraries (server, cli, completion, bench,
batched-bench, fit-params, quantize, perplexity) so BUILD_SHARED_LIBS controls
shared vs static linkage.

Add WINDOWS_EXPORT_ALL_SYMBOLS ON for proper DLL export on Windows.

Assisted-by: llama.cpp:local pi

* cmake : enable LLAMA_BUILD_APP by default

Assisted-by: llama.cpp:local pi

* ci : disable app in build-cmake-pkg.yml
2026-05-21 21:13:59 +03:00