* webui: add custom CSS injection via config
register a customCSS setting in the Developer section under Custom JSON,
syncable so it rides the existing ui-config pass through. inject the value
into a single style element in the head, reactive on the setting. lets an
operator theme a prebuilt binary through --ui-config without rebuilding,
and lets a user set it from the settings panel.
* ui: address review from @niutech and @allozaur, rename custom JSON key and CSS field
* ui: address review from @allozaur, move custom CSS injection to a style tag in svelte:head
* ui: inject custom CSS through a svelte action instead of a bound element
move the textContent write into a use: action on the head style node.
the action is the idiomatic way to touch a node, so the no-dom-manipulating
lint rule is satisfied without a disable. value stays text through
textContent, never parsed as HTML.
* Update tools/ui/src/lib/constants/settings-keys.ts
Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
* ui: address review from @allozaur, rename custom config key to customJson with migration
rename the custom config key to customJson across the type, the chat
request builder, the settings save check and the custom tools reader,
keeping the custom API param name unchanged. add a non destructive
migration that copies the legacy custom key to customJson at startup.
only render the head style tag when custom CSS is set.
---------
Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
* Support `-fa auto` in llama-bench
Make the default value of `-ngl` -1, similar to other tools.
Update README with latest usage and examples
* Address review comments
* ci : disable libcommon build from xcframework
* ocd : fix name
* ci : ios-xcode change to macos-26
* cont : pin xcode
* cont : pin xcode to minor version
* vulkan: add flash attention bf16 kv support
* vulkan: bf16 FA coopmat1 support
* vulkan: bf16 FA coopmat2 support
* fix FA bf16 f32 fallback
* fix FA bf16 coopmat1 shader
* fix FA bf16 coopmat2 shader
* code cleanup
* cleanup comment change
* address feedback
* add O_TYPE for cm2 FA
* use O_TYPE for gqaStore function
* reduce BFLOAT16 ifdefs
* ci : ios use macos-15 again
* ci : add and test ccache-clear
* cont : fix
* cont : set permission
* cont : another permission
* cont : token
* cont : print key
* cont : bring back perms
* cont : test windows
* cont : add token
* cont : cleanup
* ci : make release jobs clean-up their ccache
After #23007 reclassified integrated CUDA/HIP devices as IGPU, the device
selection logic dropped the local iGPU whenever any RPC server was added,
because RPC devices made `model->devices` non-empty. On systems where the
"iGPU" is the main compute device (e.g. Strix Halo with 128 GiB of unified
memory), this caused all tensors to be allocated on the RPC peer alone and
model loading to fail.
Gate the iGPU inclusion on `gpus.empty()` instead, so RPC peers no longer
suppress the local iGPU.
closes: #23858
* CUDA: Check PTX version on host side to guard PDL dispatch
Checking on `__CUDA_ARCH_LIST__` alone is insufficient for JIT, as this
variable doesn't differentiate between compiling for say sm_90, sm_90a
or sm_90f (so forward-jittable PTX vs. arch/family-specific PTX).
Thus, one can have a bug when compiling with
`DCMAKE_CUDA_ARCHITECTURES="89;90a"`, where current code would wrongly
dispatch to PDL on sm_90/sm_120 in forward-JIT mode.
This PR fixes this issue by checking `cudaFuncAttributes::ptxVersion` of
the incoming kernel at runtime. A check on ptxVersion alone is
sufficient, as device-codes will always be >= ptxVersion (and any
violation of this would be a severe bug in CUDA/nvcc), see:
https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/#gpu-code-code-code
* Implement MurmurHash3 mixer for better hash distribution
Magic constants were taken from boost:
https://github.com/boostorg/container_hash/blob/2698b43803c012601e6bb1a6116e83767b97986c/include/boost/container_hash/detail/hash_mix.hpp#L19-L65
* Update ggml/src/ggml-cuda/common.cuh
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
* Address review comments, make seed non-zero
* Apply code-formatting
* Replace std::size_t -> size_t for consistency
---------
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
When model props are fetched asynchronously from the server,
modelPropsVersion is incremented to trigger reactivity, but
only the vision effect was listening to it.