Commit Graph

  • 3686e9d643 CUDA: only support F32/F16 for GGML_OP_REPEAT (#24533) b9642 leonardHONG 2026-06-15 14:11:00 +08:00
  • 6e9007ae61 ggml-webgpu: improve i-quants mul_mat performance and speed up prefill (#24530) b9641 Masashi Yoshimura 2026-06-15 10:15:30 +09:00
  • dd4623a74f convert : fix lora base model arch retrieval (#24621) Sigbjørn Skjæret 2026-06-15 00:55:26 +02:00
  • 9ede367e6b Revert "Purge trailing spaces from grammar generation" fix-whitespace-parser-probs Piotr Wilkin 2026-06-15 00:17:53 +02:00
  • b0827ecb7d Purge trailing spaces from grammar generation Piotr Wilkin 2026-06-15 00:02:32 +02:00
  • ef8268feee fix(ui): render thinking/reasoning block content as markdown (#24611) franitel 2026-06-14 22:56:56 +02:00
  • c3fad44e50 chat: fix whitespace problems once and for all Piotr Wilkin 2026-06-14 22:40:20 +02:00
  • 5f04dc7ac3 ui: Add HEIC/HEIF image support (#24137) Nicolas Mowen 2026-06-14 12:42:16 -06:00
  • aedb2a5e9c chat: add dedicated Cohere2MoE (North Code) parser (#24615) b9637 Piotr Wilkin (ilintar) 2026-06-14 20:17:40 +02:00
  • 4c4a3e2596 Some renames to make @CISC happy :> cohere2moe-chat-parser Piotr Wilkin 2026-06-14 19:52:23 +02:00
  • 8edaca9034 docs : fix typos in CUDA-FEDORA.md and grammars/README.md (#24459) Mohammad Athar 2026-06-14 23:03:38 +05:30
  • 20c5266f8a docker: specify registry to simplify Podman builds (#24607) Alexander Batischev 2026-06-14 20:27:20 +03:00
  • fd5869fb62 UI/mobile keyboard and pwa popup fixes (#24610) Pascal 2026-06-14 18:35:00 +02:00
  • f2bb114a32 chat: add dedicated Cohere2MoE (North Code) parser Piotr Wilkin 2026-06-14 18:25:19 +02:00
  • 1fd6dfe9f3 ui : fix ui clipping in mobile due to incorrect height setup (#24605) Amos Wong 2026-06-14 22:15:51 +08:00
  • acd79d603c jinja : add count/d/e filter aliases (#24606) b9632 Sigbjørn Skjæret 2026-06-14 15:07:31 +02:00
  • 6e14286eda cli : fix not copying preserved tokens (#24258) b9631 Michael Wand 2026-06-14 02:52:15 -07:00
  • 8ed274ef46 Add cohere2moe to llama-vocab for TINY_AYA (#24601) b9630 Bartowski 2026-06-14 03:04:46 -04:00
  • 46722116b9 ci : use CUDA label for cuda backend (#24594) Sigbjørn Skjæret 2026-06-14 08:27:52 +02:00
  • c2ba3e47a2 add sycl to check-release (#24583) b9628 Sigbjørn Skjæret 2026-06-14 03:42:26 +02:00
  • 53bd47ea5b ui : fix llama-ui-embed crash when no asset dir is given (#24597) b9627 Aldehir Rojas 2026-06-13 17:53:30 -05:00
  • 4988f6e866 Add arch support for cohere2-MoE (#24260) b9626 Michael Wand 2026-06-13 10:49:00 -07:00
  • f05cf4676a jinja : fix negative step slice with start/stop values (#24580) b9625 Sigbjørn Skjæret 2026-06-13 18:28:40 +02:00
  • e8067a8b36 ui: build-time gzip compression (#24571) b9624 Xuan-Son Nguyen 2026-06-13 16:57:27 +02:00
  • 341babcf73 jinja : fix split and replace with empty first arg (#24574) b9623 Sigbjørn Skjæret 2026-06-13 16:56:59 +02:00
  • 1a7718b4c5 vulkan: support non-contig unary/glu ops (#24215) b9622 Jeff Bolz 2026-06-13 08:44:15 -05:00
  • 597b6672e8 ui: keep original file name and path (#24568) b9621 Xuan-Son Nguyen 2026-06-13 14:31:41 +02:00
  • 57fe1f07c3 server: clean up static assets handling (#24550) b9620 Xuan-Son Nguyen 2026-06-13 11:51:20 +02:00
  • d8a24ccee2 fit : wrap llama_device_memory_data (#24522) b9619 Georgi Gerganov 2026-06-13 08:09:52 +03:00
  • c34b92235b fix sycl links in release notes (#24527) Muhammad Salem 2026-06-13 03:37:55 +03:00
  • e37abd6b5f mtmd: add batching API (#24384) Xuan-Son Nguyen 2026-06-13 00:10:29 +02:00
  • f58bad4137 ci : unbreak release harder (#24545) b9616 Sigbjørn Skjæret 2026-06-12 23:49:36 +02:00
  • cd5044661c ci : unbreak release (#24544) Sigbjørn Skjæret 2026-06-12 22:29:49 +02:00
  • 3518061868 fit : wrap llama_device_memory_data gg/fit-wrap-dmd Georgi Gerganov 2026-06-12 18:12:24 +03:00
  • ebc10770ac server : fix reasoning budget WebUI precedence over model.ini (#24517) Georgi Gerganov 2026-06-12 17:59:56 +03:00
  • 3e7bd4f39a vulkan: add pipeline barriers for memcpy read operations (#23770) Ruben Ortlam 2026-06-12 16:43:50 +02:00
  • 9c1d7406b6 Revert "submit only twice for graph reuse" Ruben Ortlam 2026-06-12 16:43:11 +02:00
  • e218a39018 submit only twice for graph reuse Ruben Ortlam 2026-06-10 14:23:35 +02:00
  • ccceabc031 vulkan: capture and replay command buffers where possible Ruben Ortlam 2026-05-07 11:18:23 +02:00
  • f7ca93d12c ui: PWA support (#23871) Aleksander Grygier 2026-06-12 15:53:26 +02:00
  • 02182fc5b9 fit : avoid including llama-ext.h in fit.h (#24506) b9611 Georgi Gerganov 2026-06-12 15:57:05 +03:00
  • f532be8fac sync : ggml b9610 Georgi Gerganov 2026-06-12 15:55:01 +03:00
  • e08c226a2c ggml : bump version to 0.15.1 (ggml/1541) Georgi Gerganov 2026-06-12 15:32:00 +03:00
  • 70b54e140c vendor : update cpp-httplib to 0.47.0 (#24395) b9608 Adrien Gallouët 2026-06-12 11:34:44 +02:00
  • 6471e3c090 UI/jpeg exif orientation (#24196) Pascal 2026-06-12 10:20:27 +02:00
  • 88a39274ec spec: add EAGLE3 speculative decoding support (#18039) b9606 Ruixiang Wang 2026-06-12 09:21:06 +02:00
  • 85f99dca8b ggml: support concat for scalar types at cuda backend (#24011) b9605 ZihaoMu 2026-06-12 14:32:44 +08:00
  • 099ea76fb4 [SYCL] Fix CI build & release for SYCL backend (#24387) b9604 Neo Zhang 2026-06-12 14:30:24 +08:00
  • ba1df050f3 opencl: add q5_0/q5_1 gemm and gemv kernels for Adreno (#24319) b9603 shaofeiqi 2026-06-11 21:43:09 -07:00
  • 1593d5684d docker : support specifying the GCC version for CUDA (#24447) wencan 2026-06-12 05:12:09 +08:00
  • 4c6595503f vulkan: ifdef eMesaHoneykrisp (build fix) (#24479) b9601 Jeff Bolz 2026-06-11 13:22:17 -05:00
  • 263cc04a54 sync : ggml Georgi Gerganov 2026-06-11 19:33:33 +03:00
  • 17e59d6209 ggml : bump version to 0.15.0 (ggml/1539) Georgi Gerganov 2026-06-11 19:32:38 +03:00
  • fdc3db9b65 vulkan: add fast path for contiguous buffer transfers (#23973) Winston Ma 2026-06-11 21:46:25 +08:00
  • 1af154a76f vulkan: use medium matmul tile on Asahi Linux (#24306) Kevin Liu 2026-06-11 09:43:04 -04:00
  • 18ef86ecec server: skip unused log lines on router mode (#24463) b9596 Xuan-Son Nguyen 2026-06-11 11:36:35 +02:00
  • 1bfbdb134e vocab : adopt leading TemplateProcessing special token as BOS (#24428) o7si 2026-06-11 15:37:23 +08:00
  • 68f30663cf vocab : refactor normalizer flags into options struct, add strip_accents (#24371) b9594 o7si 2026-06-11 15:36:50 +08:00
  • db94854ff5 server : skip checkpoints beyond pos_next (#24411) Aldehir Rojas 2026-06-11 02:18:12 -05:00
  • ac4cddeb0d vendor : update LibreSSL to 4.3.2 (#24397) b9592 Adrien Gallouët 2026-06-10 22:28:03 +02:00
  • e95dae18d6 Remove padding and multiple D2D copies for MTP (#24086) b9591 Gaurav Garg 2026-06-10 23:21:16 +05:30
  • d2462f8f7a chat: fix LFM2/LFM2.5 ignoring json_schema (#24377) b9590 Tarek Dakhran 2026-06-10 14:41:41 +02:00
  • fb83cc9a07 CUDA: Fix ssm_scan_f32 data-races (#24360) b9589 Oliver Simons 2026-06-10 14:27:08 +02:00
  • 039e20a2db ci : bump komac version (#24396) Sigbjørn Skjæret 2026-06-10 09:45:20 +02:00
  • 41f049a840 Revert "speculative : fix "ngram-map-k4v" name in logging (#24253)" revert-24253-ngram-map-k-name-fix Piotr Wilkin (ilintar) 2026-06-10 09:31:42 +02:00
  • d2e22ed975 speculative : fix "ngram-map-k4v" name in logging (#24253) b9587 ddh0 2026-06-10 02:31:35 -05:00
  • 76da2450a4 webui: implement pinned conversations support (#21387) b9586 Rémy Mathieu 2026-06-09 21:33:22 +02:00
  • d73cd07674 graph: Fix granite speech model inference by applying embedding scale when deepstack is not used (#24357) b9585 Aarnav Pai 2026-06-09 23:16:27 +05:30
  • e25a32e98c ci : fix windows release (#24369) b9584 Sigbjørn Skjæret 2026-06-09 18:42:23 +02:00
  • 483609509d ui: add opt-in run_javascript frontend tool (#24244) Pascal 2026-06-09 18:02:31 +02:00
  • 49f3542190 mtmd: build_vit batching (#24352) Saba Fallah 2026-06-09 16:32:08 +02:00
  • 6c2cbc4e33 vulkan: disable FA mask_opt on GCN to improve performance 0cc4m/vulkan-fa-mask-opt-gcn Ruben Ortlam 2026-06-09 15:40:07 +02:00
  • b6cf9cd8fe mtmd, llama: shared backend sched xsn/mtmd_shared_sched Xuan Son Nguyen 2026-06-09 15:34:17 +02:00
  • d6d0ce8215 vulkan: reduce iq1 shared memory usage for mul_mm (#24287) b9581 Jeff Bolz 2026-06-09 06:27:38 -05:00
  • b4e3dc613b vulkan: add v_dot2_f32_f16 support in matrix-matrix multiplication and Flash Attention (#24123) b9580 Ruben Ortlam 2026-06-09 13:27:04 +02:00
  • ae735b1314 ui: Fix excessive style recalculation on hover (#24243) Nick Towle 2026-06-09 03:52:20 -07:00
  • 9682e351b8 mtmd: refactor video subproc handling (#24316) b9578 Xuan-Son Nguyen 2026-06-09 12:15:12 +02:00
  • 1e912561dd server: log prompts to directory (#22031) b9577 jacekpoplawski 2026-06-09 12:09:07 +02:00
  • efbacf8d21 ui: fix mobile chat form overflow and bust stale bundle cache (#24158) Pascal 2026-06-09 11:12:58 +02:00
  • 26021699bc ggml : add GGML_OP_COL2IM_1D (#24206) b9575 Pascal 2026-06-09 11:01:37 +02:00
  • 961e9a3e46 server : do not clear slots without unified KV cache (#24190) b9574 fiesh 2026-06-09 09:45:16 +02:00
  • f0152efe40 models : fix plamo2 attention_key/value_length regression (#24317) b9573 Sigbjørn Skjæret 2026-06-09 09:26:44 +02:00
  • fd3271e0b4 ggml-cpu : fix rms_norm_back wrong output under in-place aliasing (#24305) b9572 Yash Raj Pandey 2026-06-09 03:24:27 -04:00
  • e3471b3e73 Remove case for GGML_TYPE_Q4_K in mvvq.cu (#23528) b9571 ravel7524 2026-06-09 07:46:23 +02:00
  • 3ac3c20c96 ggml-webgpu: Add clang-format job (#24308) b9570 Reese Levine 2026-06-08 20:54:24 -07:00
  • 1e1aca09da ggml-webgpu: Improve prefill speeds for k-quants + refactor matmul for Q4/Q5/Q8 and k-quants (#24225) Masashi Yoshimura 2026-06-09 07:19:56 +09:00
  • 7d2b45b4f7 mtp: support for gemma-4 E2B and E4B assistants (#24282) b9568 Max Krasnyansky 2026-06-08 13:48:52 -07:00
  • 9eb4e9dbb7 nits xsn/video_args Xuan Son Nguyen 2026-06-08 22:13:07 +02:00
  • c21dcd8bda gen docs Xuan Son Nguyen 2026-06-08 22:12:24 +02:00
  • e948fee3fd args: add --video-* CLI arguments Xuan Son Nguyen 2026-06-08 22:10:13 +02:00
  • 42a0afd594 server : do not parse when flushing http headers (#24281) b9567 Aldehir Rojas 2026-06-08 13:32:41 -05:00
  • a66d50588b graph: guard iswa kq_mask on its own buffer (#24294) b9566 Pascal 2026-06-08 19:20:28 +02:00
  • 1705d434f6 [ggml-webgpu] Handle buffer overlap / buffer aliasing for concat operator (#24000) b9565 Nikhil Jain 2026-06-08 08:07:31 -07:00
  • 3b3da01dc2 [ggml-webgpu] Implement 2D workgroups for scale, binary, and unary ops (#24044) b9564 Nikhil Jain 2026-06-08 08:07:15 -07:00
  • 3ebe862b5d docker: install ffmpeg in the released image (#24302) b9563 Xuan-Son Nguyen 2026-06-08 16:59:57 +02:00
  • de396e8790 nits (2) Xuan Son Nguyen 2026-06-08 14:05:24 +02:00
  • 2afe34a58c nits Xuan Son Nguyen 2026-06-08 14:02:29 +02:00
  • 93e126aa08 wire up input_video, accept raw base64 Xuan Son Nguyen 2026-06-08 13:59:14 +02:00
  • 7705270eec Merge branch 'master' into xsn/server_input_file_schema Xuan Son Nguyen 2026-06-08 13:43:32 +02:00
  • 8f83d6c271 mtmd : add video input support (#24269) b9562 Xuan-Son Nguyen 2026-06-08 13:40:12 +02:00