Commit Graph

  • ac7fb69bc6 ci: revert to git.wylab.me (size limit was the issue, now fixed) master ClaudeBot 2026-04-13 00:54:02 +02:00
  • 691ce13071 ci: use host IP 192.168.1.50:3000 for registry (localhost is the container) ClaudeBot 2026-04-13 00:49:00 +02:00
  • 5e7f568058 ci: push to localhost:3000 to bypass Traefik timeout on large layers ClaudeBot 2026-04-13 00:41:54 +02:00
  • 5a1dd90fe6 ci: use REGISTRY_TOKEN secret for docker login ClaudeBot 2026-04-12 19:54:32 +02:00
  • 95c35de76f ci: add packages:write permission for registry push ClaudeBot 2026-04-12 19:49:28 +02:00
  • ab2a3331b5 ci: trigger workflow detection ClaudeBot 2026-04-12 19:44:38 +02:00
  • ebae963047 ci: add Gitea ROCm server image build workflow ClaudeBot 2026-04-12 19:36:07 +02:00
  • 1e9d771e2c convert : force f16 or f32 on step3-vl conv weights (#21646) Sigbjørn Skjæret 2026-04-12 19:22:29 +02:00
  • aa4695c5e5 mtmd: add gemma 4 test (vision + audio) [no ci] (#21806) Xuan-Son Nguyen 2026-04-12 16:29:03 +02:00
  • 547765a93e mtmd: add Gemma 4 audio conformer encoder support (#21421) b8766 Stephen Cox 2026-04-13 00:15:26 +12:00
  • 9e209c5aee fix: Proper messages rendering for "Show raw output" (#21672) Aleksander Grygier 2026-04-12 13:08:11 +02:00
  • 6313acbef0 docs: add guide on how to add multimodal support (#21778) Xuan-Son Nguyen 2026-04-12 13:02:38 +02:00
  • ff5ef82786 CUDA: skip compilation of superfluous FA kernels (#21768) b8763 Johannes Gäßler 2026-04-11 18:52:11 +02:00
  • 073bb2c20b mtmd : add MERaLiON-2 multimodal audio support (#21756) b8762 Sirui He 2026-04-11 20:15:48 +08:00
  • af1127d3c4 opencl: add basic support for q5_k (#21593) b8761 shaofeiqi 2026-04-11 01:46:19 -07:00
  • 865ff06b2f TP: fix Qwen 3 Next data split (#21732) b8760 Johannes Gäßler 2026-04-11 09:23:42 +02:00
  • 2b2cd57de6 ggml : fix a few instances of missing GGML_TYPE_Q1_0 cases (#21716) b8759 Sigbjørn Skjæret 2026-04-11 08:45:00 +02:00
  • 660386f6f8 py : Bump typer to latest to fix huggingface_hub issue (#21701) Bartowski 2026-04-11 02:44:15 -04:00
  • a29e4c0b7b CUDA: also store node->src ne/nb for graph equality (#21736) b8757 Aman Gupta 2026-04-11 10:30:30 +08:00
  • b136b62cf9 fix: Fix broken structured output when using $refs in json_schema (#21699) b8756 Galunid 2026-04-11 01:26:36 +02:00
  • 81069a808a hexagon: add support for linux on snapdragon (#21707) b8755 Todor Boinovski 2026-04-10 15:57:23 -07:00
  • 9aa2807769 hexagon: improved Op queuing, buffer and cache management (#21705) b8754 Max Krasnyansky 2026-04-10 15:47:43 -07:00
  • 3fc65063d9 common : better align to the updated official gemma4 template (#21704) b8753 Aldehir Rojas 2026-04-10 16:12:53 -05:00
  • 05b3caaa48 common : add callback interface for download progress (#21735) b8752 Adrien Gallouët 2026-04-10 22:17:00 +02:00
  • e62fa13c24 model : make Gemma 4 shared-KV tail attn_k tensors optional on load (#21739) b8751 MoonRide303 2026-04-10 21:45:50 +02:00
  • bfd1f453cb ggml-webgpu: support non-square subgroup matrix configs for Intel GPUs (#21669) b8750 Rithik Sharma 2026-04-10 10:52:38 -07:00
  • e4fed9d08d ggml-webgpu: address quantization precision and backend lifecycle managment (#21521) b8749 Chen Yuan 2026-04-10 13:52:01 -04:00
  • 5dd102539b server : ignore --alias when using --models-preset (#21380) b8748 Adrien Gallouët 2026-04-10 17:42:56 +02:00
  • fb38d6f278 common : fix when loading a cached HF models with unavailable API (#21670) b8747 Adrien Gallouët 2026-04-10 16:37:46 +02:00
  • 0893f50f2d common: mark --split-mode tensor as experimental (#21684) b8746 Johannes Gäßler 2026-04-10 12:27:27 +02:00
  • f989a6e39e webui: Static build output improvements (#21667) Aleksander Grygier 2026-04-10 11:49:47 +02:00
  • d7ff074c87 common : enable reasoning budget sampler for gemma4 (#21697) b8744 Berk Idem 2026-04-10 05:49:14 -04:00
  • 3f8752b559 docs : fix broken link to ggml-openvino in OPENVINO.md (#21709) Belem Zhang 2026-04-10 15:50:08 +08:00
  • cb1117d7db Revert "codeowners : use teams (#20526)" 0cc4m/codeowners Ruben Ortlam 2026-04-10 09:17:56 +02:00
  • 7b69125331 vulkan: Support Q1_0 (#21539) b8742 Jeff Bolz 2026-04-10 01:35:27 -05:00
  • e095a482a0 common : add fluidity to the progress bar (#21671) b8741 Adrien Gallouët 2026-04-10 08:24:53 +02:00
  • 15e9242451 fix mmq gate and shmem checks 0cc4m/vulkan-flash-attention-dp4a Ruben Ortlam 2026-04-08 13:43:03 +02:00
  • 678a4315b8 readd fast paths for <8bit quants Ruben Ortlam 2026-04-08 12:34:26 +02:00
  • 76c101d854 add supported quants to FA tests Ruben Ortlam 2026-04-08 11:31:22 +02:00
  • c2f4d7aedf fixes Ruben Ortlam 2026-04-08 11:30:59 +02:00
  • 90005edeea add missing KV type quants Ruben Ortlam 2026-04-08 11:18:22 +02:00
  • 8311266867 fix SHMEM_STAGING indexing Ruben Ortlam 2026-03-20 14:32:40 +01:00
  • 2090237330 small improvements Ruben Ortlam 2026-03-19 15:09:51 +01:00
  • 57ff5bf96d use integer dot product for quantized KV flash attention Ruben Ortlam 2026-03-19 14:22:31 +01:00
  • bb254903f9 minimal device tuning 0cc4m/vulkan-im2col-opt Ruben Ortlam 2026-04-10 06:46:54 +02:00
  • e34f042154 CUDA: fuse muls (#21665) b8740 Aman Gupta 2026-04-10 10:24:09 +08:00
  • d132f22fc9 HIP: add CDNA4 (gfx950) architecture support for MI350X/MI355X (#21570) b8739 andyluo7 2026-04-09 22:13:32 +03:00
  • d6f3030047 ggml: backend-agnostic tensor parallelism (experimental) (#19378) b8738 Johannes Gäßler 2026-04-09 16:42:19 +02:00
  • 9403602b19 cap workgroups Ruben Ortlam 2026-04-09 16:42:12 +02:00
  • 009a113326 ggml : check return value of CUB calls used in argsort and top-k (they all return cudaError_t) (#21676) b8737 fairydreaming 2026-04-09 15:17:11 +02:00
  • a2a9ca8b89 vulkan: improve im2col memory write layout Ruben Ortlam 2026-04-09 14:02:26 +02:00
  • 4cabbe36e0 state 0cc4m/vulkan-async-p2p Ruben Ortlam 2026-04-09 13:00:31 +02:00
  • 9f001cae27 state Ruben Ortlam 2026-04-09 12:51:43 +02:00
  • 88335c0490 state Ruben Ortlam 2026-04-09 12:39:51 +02:00
  • c8ac02fa1b requirements : update transformers to 5.5.1 (#21617) Daniel Bevenius 2026-04-09 12:36:29 +02:00
  • 204023c897 state Ruben Ortlam 2026-04-09 12:36:15 +02:00
  • d88d722fc1 state Ruben Ortlam 2026-04-09 12:32:08 +02:00
  • 4ef9301e4d webui: add "Send message on Enter" setting (#21577) JvM 2026-04-09 12:26:27 +02:00
  • 96d9516329 state Ruben Ortlam 2026-04-09 12:25:27 +02:00
  • ddf03c6d9a common : fix ambiguous grammar rule in gemma4 (#21661) b8734 Aldehir Rojas 2026-04-09 05:25:07 -05:00
  • 26229755c5 common : simplify autoparser tagged parser rules (#21216) b8733 Aldehir Rojas 2026-04-09 05:24:20 -05:00
  • 057dba336e model: fix multimodal padding token for gemma3n/gemma4 (#21625) b8732 Xuan-Son Nguyen 2026-04-09 12:18:23 +02:00
  • 501aeed18f mtmd: support dots.ocr (#17575) b8731 Xuan-Son Nguyen 2026-04-09 12:16:38 +02:00
  • 8a108eddb4 state Ruben Ortlam 2026-04-09 12:05:15 +02:00
  • 47dde34e00 state Ruben Ortlam 2026-04-09 11:58:46 +02:00
  • 8d0e158076 state Ruben Ortlam 2026-04-09 11:51:39 +02:00
  • aade0f81dd state Ruben Ortlam 2026-04-09 11:42:50 +02:00
  • 0ec191e1d7 vocab: add gemma4 tokenizer tests, fix edge case (#21534) b8730 Piotr Wilkin (ilintar) 2026-04-09 11:41:14 +02:00
  • 243532e556 jinja : support ensure_ascii=true, string repetition and int/float self-filtering (#21623) b8729 Kwa Jie Hao 2026-04-09 17:28:33 +08:00
  • 700270239d state Ruben Ortlam 2026-04-09 11:24:21 +02:00
  • ddaafa3dc1 state Ruben Ortlam 2026-04-09 11:11:17 +02:00
  • e5e0be0add state Ruben Ortlam 2026-04-09 11:00:36 +02:00
  • 5e9c635463 metal : add missing mm-id specializations for q1_0 (#21662) b8728 Georgi Gerganov 2026-04-09 10:54:00 +03:00
  • 9949ad08f6 fix: Model Selector choice sync (#21628) Aleksander Grygier 2026-04-09 09:46:27 +02:00
  • 3ee9da0e4f server : fix grammar commandline args (#21543) b8726 AUTOMATIC1111 2026-04-09 10:16:54 +03:00
  • 75511a8d7e webui: Add option to pre-encode conversation for faster next turns (#21034) Aleksander Grygier 2026-04-09 09:10:18 +02:00
  • b54cb2e3d0 sycl : add flash-attn support for head size 512 (#21654) b8724 Akarshan Biswas 2026-04-09 12:06:48 +05:30
  • 8a65a7a8ee ci: drop v5 all: composition from labeler.yml (#21627) Marxist-Leninist 2026-04-09 07:20:19 +01:00
  • 3c4eae7dc9 state Ruben Ortlam 2026-04-09 07:50:05 +02:00
  • 7e2799c8c9 state Ruben Ortlam 2026-04-09 07:40:02 +02:00
  • 8a132faaa0 vulkan: unify type macros to use Vx instead of _VECx (#21605) b8722 Ruben Ortlam 2026-04-09 07:31:51 +02:00
  • 4293919068 common : skip non-primary GGUF split files when selecting model (#21633) b8721 Adrien Gallouët 2026-04-09 07:28:06 +02:00
  • cd0722594a state Ruben Ortlam 2026-04-09 07:25:33 +02:00
  • 09cd78874e Converge implementation with export-graph-ops cross-profiler Piotr Wilkin 2026-04-07 22:01:00 +02:00
  • 728f365497 Add missing op parameters to the profiler; add support for test-backend-ops to run performance tests with exactly the tensor shapes from the run Piotr Wilkin 2026-04-03 17:41:57 +02:00
  • c907d259ae docs, pass copy details Piotr Wilkin 2026-03-29 23:35:38 +02:00
  • c293858eb0 fix mul_mat_id stats, add throughput stat, add envvar trigger, add concurrent mode fix Piotr Wilkin 2026-03-29 22:52:33 +02:00
  • 7baee997b8 fix builds, integrate vulkan profiler, fix copy events, fix export Piotr Wilkin 2026-03-29 16:52:50 +02:00
  • 92e45c8c52 Fix more missing backend stuff (and Python errors) Piotr Wilkin 2026-03-29 01:57:02 +01:00
  • dee7edea92 add second dimension to reported tensors, fix Mac build, add missing initializer to all backends Piotr Wilkin 2026-03-29 01:49:52 +01:00
  • 26459c7ede feat: cool profiler thingy Piotr Wilkin 2026-03-29 01:14:09 +01:00
  • d12cc3d1ca CUDA: also store node->src->data ptrs for equality check (#21635) b8720 Aman Gupta 2026-04-09 01:01:56 +08:00
  • d5344395d0 benchmark 0cc4m/vulkan-device-cpy-benchmark Ruben Ortlam 2026-04-08 18:26:50 +02:00
  • 2dcb7f74ed fix: free ctx_copy in ggml_opt_free to plug per-training-session leak (#21592) b8719 RealOrko 2026-04-08 16:40:15 +01:00
  • 660600081f server: respect the ignore eos flag (#21203) b8718 Yuri Khrustalev 2026-04-08 11:12:15 -04:00
  • d9a12c82f0 vocab : remove </s> eog token if gemma4 (#21492) b8717 Aldehir Rojas 2026-04-08 09:53:06 -05:00
  • 4a05e0c566 webui : send both backend_sampling == false/true (#18781) Georgi Gerganov 2026-04-08 17:35:52 +03:00
  • e9fd96283d Propose fix a couple of typos (#21581) b8715 John Eismeier 2026-04-08 10:29:03 -04:00
  • 3ba12fed0a kv-cache : extend cache quantization checks (#21586) b8714 Erik Scholz 2026-04-08 15:08:57 +02:00
  • 5473949070 webgpu : Query for adapter support when registering WebGPU backend (#21579) b8713 Reese Levine 2026-04-08 06:08:29 -07:00