Commit Graph

  • 0b56d283bf mtmd: n_head_kv defaults to n_head (#23782) b9394 Saba Fallah 2026-05-28 16:44:36 +02:00
  • d6be3158e1 mtmd: fix gemma 4 audio rms norm eps (#23815) b9393 Xuan-Son Nguyen 2026-05-28 16:31:37 +02:00
  • dd1557907a ci : change Vulkan builds to Release to reduce ccache (#23820) Georgi Gerganov 2026-05-28 17:29:11 +03:00
  • 7fb1e70b59 arg: Add LLAMA_ARG_API_KEY_FILE environment variable for --api-key-file (#23167) b9391 Mikolaj Kucharski 2026-05-28 14:25:40 +00:00
  • d374e71e55 test-llama-archs: fix table format [no release] (#23810) Johannes Gäßler 2026-05-28 15:53:54 +02:00
  • 30af6e2b98 ggml: auto apply iGPU flag CUDA/HIP if integrated device (#23007) b9389 fl0rianr 2026-05-28 15:01:14 +02:00
  • d7be46189f mmvq Optim: add MMVQ_PARAMETERS_TURING(mmvq_parameter_table_id) for … (#23729) b9388 redfox 2026-05-28 20:51:14 +08:00
  • bc81d47aba CUDA: route batch>=4 quantized matmul to MMQ on AMD MFMA hardware (#23227) b9387 Jaden_Mach 2026-05-28 08:50:25 -04:00
  • 0b246862b9 server: minor tweaks to use more cpp features (#23785) b9386 Funtowicz Morgan 2026-05-28 14:00:25 +02:00
  • a919001134 hexagon: minor refresh for HMX FA and MM (#23796) Max Krasnyansky 2026-05-28 04:49:11 -07:00
  • 48e7078ee0 vulkan: fast path for walsh-hadamard transform (#23687) b9384 Jeff Bolz 2026-05-28 06:18:43 -05:00
  • bb771cbd2b chat : add Granite 4.1 chat template (#23518) b9383 Jesus Talavera 2026-05-28 13:13:33 +02:00
  • 7c48fb81ce vulkan: fix wrong index variable in inner loop (#23665) b9382 Winston Ma 2026-05-28 18:48:34 +08:00
  • 91eb8f4fa0 vulkan: Fix memory logger unsafe iterator access (#23667) b9381 Winston Ma 2026-05-28 18:46:07 +08:00
  • d205df6812 server, ui : Add support for HTTP ETags in llama-server (#23701) b9380 Markus Tavenrath 2026-05-28 20:21:24 +10:00
  • e8d2567429 docker : add ZenDNN Dockerfile (#23716) Sachin Sharma 2026-05-28 15:10:49 +05:30
  • 09e7b76c93 cuda : fix KQ mask offset integer overflow in fattn MMA kernel (#23610) b9378 fairydreaming 2026-05-28 10:55:42 +02:00
  • 48e7eae41c perplexity : fix format specifier in LOG_ERR (#23788) b9377 Adrien Gallouët 2026-05-28 09:34:58 +02:00
  • c5229087a5 convert : add FP8 to Q8 conversion (#23250) ynankani 2026-05-28 07:16:17 +00:00
  • e31cdaa0eb ggml: fixed Arm SVE usage bug in vec.h, vec.cpp (#22841) b9375 Martin Klacer 2026-05-28 08:04:21 +01:00
  • 491c4d7d2e ci : refactor (#23789) b9374 Georgi Gerganov 2026-05-28 09:44:25 +03:00
  • 939a7dd648 Hexagon: OP_GATED_DELTA_NET K>1 support (#23531) ymcki 2026-05-28 14:05:25 +08:00
  • 8ad8aef447 opencl: OP_GATED_DELTA_NET (#23312) ymcki 2026-05-28 12:23:21 +08:00
  • f12cc6d0fa ggml-webgpu: remove legacy constants (#23672) b9371 Reese Levine 2026-05-27 14:22:33 -07:00
  • aa50b2c2ae hexagon: add support for Q4_1 in MUL_MAT and MUL_MAT_ID (#23647) b9370 Max Krasnyansky 2026-05-27 10:46:11 -07:00
  • c40006a62e ggml-webgpu: Fix how to dispatch WG to some ops (#23750) b9369 Masashi Yoshimura 2026-05-28 01:48:12 +09:00
  • c6e4088376 vulkan: Switch MUL_MAT_VEC to 4 K per iteration for F16/32 (#22887) b9368 Matt Corallo 2026-05-27 15:19:23 +00:00
  • b36eefc1b3 vulkan: use GL_NV_cooperative_matrix_decode_vector for faster matmul (#23541) b9367 Jeff Bolz 2026-05-27 10:18:28 -05:00
  • 837bb6b447 vulkan: add REPEAT op support for f16 to f16. (#23298) b9366 l8bloom 2026-05-27 16:59:08 +02:00
  • ba4dd0bc67 ci : move ARM jobs to self-hosted + disable kleidiai mac release (#23780) b9365 Georgi Gerganov 2026-05-27 17:22:20 +03:00
  • 617255d437 vendor : update cpp-httplib to 0.46.0 (#23650) Alessandro de Oliveira Faria (A.K.A.CABELO) 2026-05-27 10:36:24 -03:00
  • 87b0a60cdd pyproject : add conversion folder and update dependencies (#23746) Sigbjørn Skjæret 2026-05-27 15:06:18 +02:00
  • fda8528aa8 CUDA: restrict PDL to CTK >= 12.3 due to MSVC issues (#23742) Oliver Simons 2026-05-27 14:21:04 +02:00
  • 2d0656fbdd ci : bump cuda release to 13.3 (#23749) Sigbjørn Skjæret 2026-05-27 14:06:08 +02:00
  • 6b4e4bd582 common : fix env names to all have LLAMA_ARG_ prefix (#23778) b9360 Georgi Gerganov 2026-05-27 14:52:47 +03:00
  • 9f0e4b14d2 ci : fix windows ccaches (#23777) Georgi Gerganov 2026-05-27 13:54:21 +03:00
  • b3a739c9b6 ci : remove wasm test (#23733) Sigbjørn Skjæret 2026-05-27 12:11:37 +02:00
  • 4d8cc0c56f vulkan: avoid preferring transfer queue on AMD UMA devices (#22455) b9357 Winston Ma 2026-05-27 17:48:40 +08:00
  • 0d227ec358 ci : add ccache to server builds + fix undefined sanitizer build (#23763) Georgi Gerganov 2026-05-27 11:45:12 +03:00
  • 1d971bba36 docs : fix duplicated "the" in granitevision and model-conversion docs (#23767) quyentonndbs 2026-05-27 15:34:06 +08:00
  • 9777256c31 convert: add MiniCPM5 tokenizer support (#23384) b9354 zhangtao2-1 2026-05-27 13:08:33 +08:00
  • 7085492c6f server : fix the log message when using SSL (#23393) b9353 Radoslav Gerganov 2026-05-27 08:06:30 +03:00
  • b4c0549a49 ggml-zendnn : fixed naming of matmul function (#20964) b9352 Vladislav 2026-05-27 01:59:35 +03:00
  • 0d18aaa9d1 ci : do not allocate ccache for 3rd-party hosted runners (#23730) b9351 Georgi Gerganov 2026-05-26 20:15:01 +03:00
  • 08bc21b459 ci : move [no release] check to dedicated check_release job (#23734) Georgi Gerganov 2026-05-26 19:49:41 +03:00
  • 35a74c8fb9 ci : add [no release] keyword + fix sanitizer builds (#23728) Georgi Gerganov 2026-05-26 19:05:48 +03:00
  • 5190c2ea8d ci : move macos jobs to the apple workflow + fix names (#23721) Georgi Gerganov 2026-05-26 16:57:55 +03:00
  • 7799d31e68 vulkan: optimize conv2d and implement coopmat1 support (#22620) Jeff Bolz 2026-05-26 08:48:05 -05:00
  • 3a3ed153d9 ci : remove vulkan SDK dep from webgpu job (#23718) Georgi Gerganov 2026-05-26 16:40:30 +03:00
  • ef66bfab68 hexagon: add support for CONCAT op (#23648) Max Krasnyansky 2026-05-26 06:20:05 -07:00
  • 678d43d720 ci : move more CPU jobs to self-hosted runners (#23715) Georgi Gerganov 2026-05-26 15:37:40 +03:00
  • ef41a69179 ci : move sanitizer jobs to self-hosted runners (#23713) Georgi Gerganov 2026-05-26 15:22:09 +03:00
  • 3dc7684f39 ci : reduce (disable SYCL and CANN builds/releases) (#23705) Georgi Gerganov 2026-05-26 15:21:21 +03:00
  • dbe9c0c8ce convert : support Gemma4ForCausalLM architecture (#23682) b9341 ghleg 2026-05-26 07:00:31 +02:00
  • 6fe90deffa models : Attach Mistral3 NVFP4 weight scales (#23629) Michael Wand 2026-05-26 00:59:59 -04:00
  • 581d020b12 SYCL: implement ggml_sycl_pool_vmm (#22862) Alexey Kopytko 2026-05-26 13:59:00 +09:00
  • 7623de11d9 tests: test-backend-ops -j <N> to run tests in parallel (#23637) Jeff Bolz 2026-05-25 23:57:56 -05:00
  • c9d98295a3 model : add support for talkie-1930-13b (#22596) Niklas Sheth 2026-05-26 00:57:38 -04:00
  • 1506d39e76 ggml-webgpu: Add MMVQ path for Q4/Q8/Q2_K/Q4_K and clean up legacy MUL_MAT pipeline (#23594) Masashi Yoshimura 2026-05-26 12:42:49 +09:00
  • 54121f7325 [WebGPU] Check batch_compute_passes before sending passes when not doing GPU profiling (#23457) Nikhil Jain 2026-05-25 20:32:49 -07:00
  • 192d8ae8b8 CUDA: missing PDL sync for FWHT, better fallback (#23690) b9334 Johannes Gäßler 2026-05-26 05:05:51 +02:00
  • 35c9b1f39e metal : add apple device id (#23566) b9333 forforever73 2026-05-26 02:05:16 +08:00
  • 4bead4e30d snapdragon: bump toolchain docker to v0.7 to fix ui build issues (#23680) Max Krasnyansky 2026-05-25 10:57:43 -07:00
  • 302e2c2652 ci : reduce PR jobs by matching backend paths (#23675) b9331 Georgi Gerganov 2026-05-25 20:54:54 +03:00
  • 328874d054 model: tag ffn_latent as MUL_MAT to fix buft probe (#23664) b9330 Pascal 2026-05-25 16:05:04 +02:00
  • c1f1e28d29 CUDA: add fast walsh-hadamard transform (#23615) b9329 Aman Gupta 2026-05-25 21:12:10 +08:00
  • 5a4126adc1 ui: fix stop/continue during an agentic loop (#23356) Pascal 2026-05-25 14:18:59 +02:00
  • a4d2d4ae41 convert : add compressed-tensors NVFP4 support (#21095) Michael Wand 2026-05-25 08:16:11 -04:00
  • d161ea7071 sync : ggml b9326 Georgi Gerganov 2026-05-25 12:42:28 +03:00
  • 45158f460e ggml : bump version to 0.13.0 (ggml/1510) Georgi Gerganov 2026-05-25 12:40:17 +03:00
  • 22307b3e8b sync : ggml Georgi Gerganov 2026-05-25 12:33:22 +03:00
  • ce5890b5f7 ggml : bump version to 0.12.1 (ggml/1508) Georgi Gerganov 2026-05-25 12:13:21 +03:00
  • b251f74f49 ggml.h: correct ggml_silu_back arg docstring (a=dy, b=x) (ggml/1500) Ori Pekelman 2026-05-21 12:00:16 +00:00
  • fa97041524 ggml-alloc: fix out-of-bounds read in ggml_dyn_tallocr_remove_block (ggml/1492) Dev-X25874 2026-05-21 17:28:08 +05:30
  • ae251b5ff2 TP: fix ggml context size calculation (#22616) b9320 Johannes Gäßler 2026-05-25 11:37:25 +02:00
  • 66efd13375 ggml: gguf_init_from_callback and gguf_init_from_buffer (#22341) b9319 Gilad S. 2026-05-25 11:33:29 +02:00
  • 6c4cbdc70b server: MTP layer kv-cache should respect draft type ctk (#23646) b9318 Aman Gupta 2026-05-25 16:46:23 +08:00
  • 5fdf07e33b ci : update spacemit toolchain url and enhance curl command (#23642) alex-spacemit 2026-05-25 16:43:24 +08:00
  • 062d3115aa ci : fix pre-tokenizer-hashes check (#23651) Sigbjørn Skjæret 2026-05-25 10:41:25 +02:00
  • 314e729347 llama : document that only one on-device state can be saved per sequence (#23520) b9315 Tim Neumann 2026-05-25 09:29:28 +02:00
  • d55fb97174 ci : install host compiler on android-ndk build (#23630) Aldehir Rojas 2026-05-25 03:18:08 -04:00
  • 826539ce59 ggml : Parallelize quant LUT init (#23595) b9313 Jeff Bolz 2026-05-25 02:15:46 -05:00
  • f3ba33ec35 address feedback 0cc4m/cuda-get-memory-contextless Ruben Ortlam 2026-05-25 08:52:58 +02:00
  • b96487645c ui: media attachments before text (#23467) Saba Fallah 2026-05-25 08:50:41 +02:00
  • 9627d0f540 vendor : update cpp-httplib to 0.45.1 (#23639) b9311 Alessandro de Oliveira Faria (A.K.A.CABELO) 2026-05-25 03:45:22 -03:00
  • baac31998a cont : another try gg/ui-fix-ci-errors Georgi Gerganov 2026-05-25 09:14:48 +03:00
  • e2ef8fe42c server: fix checkpoints creation (#22929) b9310 jacekpoplawski 2026-05-25 07:56:18 +02:00
  • 3021f0f4c5 ui : try to fix e2e demo test Georgi Gerganov 2026-05-25 08:32:44 +03:00
  • b02a677519 ui : run prettier Georgi Gerganov 2026-05-25 08:28:25 +03:00
  • 6d57c26ef8 perplexity : fix even more integer overflows (#23623) b9309 fairydreaming 2026-05-25 07:12:39 +02:00
  • 28123a3937 ci : move most slim jobs to self-hosted runners (#23619) Georgi Gerganov 2026-05-25 08:11:19 +03:00
  • 87f18f760e ci : add self-hosted ui workflow gg/ci-ui-test Georgi Gerganov 2026-05-24 22:18:31 +03:00
  • 16b648c897 ci : try ui SH gg/ci-ui-sh Georgi Gerganov 2026-05-24 21:09:13 +03:00
  • cf285e195e ci : move python requirements check to CPU runners Georgi Gerganov 2026-05-24 20:16:00 +03:00
  • 07ec9fd8d9 ci : add comment about UI jobs Georgi Gerganov 2026-05-24 20:10:36 +03:00
  • 36aa88a853 cont : move e2e to SH gg/ci-ui-self-hosted Georgi Gerganov 2026-05-24 20:00:15 +03:00
  • a85051e51c ci : try to move UI to self hosted runner Georgi Gerganov 2026-05-24 19:56:32 +03:00
  • 5a2e768430 ci : back to 3.11 Georgi Gerganov 2026-05-24 19:39:33 +03:00
  • 5a727def3d ci : move lint back to 3.11 Georgi Gerganov 2026-05-24 19:35:39 +03:00
  • f0bbb1a9ea ci : try to bump 3.11 -> 3.13 Georgi Gerganov 2026-05-24 19:24:35 +03:00