Commit Graph

  • 159d093a43 server: fix non-bound n_discard value (ctx shifting) (#24786) b9722 Xuan-Son Nguyen 2026-06-19 10:53:44 +02:00
  • 5fd2dc2c41 sync : ggml b9721 Georgi Gerganov 2026-06-19 10:18:14 +03:00
  • 1868af13ac ggml : bump version to 0.15.2 (ggml/1548) Georgi Gerganov 2026-06-19 10:14:26 +03:00
  • 5bd21b8555 pi : remove docs from system prompt (#24791) Georgi Gerganov 2026-06-19 09:34:00 +03:00
  • 80452d65b9 server : consolidate slot selection into get_available_slot (#24755) b9718 Georgi Gerganov 2026-06-19 09:22:34 +03:00
  • 8141e730f1 ggml-cpu: support K tails in power10 Q8/Q4 MMA matmul (#24753) b9717 shalinib-ibm 2026-06-19 11:25:38 +05:30
  • db52540f73 mtmd: add batching support for internvl (#24775) b9716 Xuan-Son Nguyen 2026-06-19 01:16:16 +02:00
  • 3a3edc9ac6 Ggml/cuda col2im 1d (#24417) b9715 Pascal 2026-06-18 22:23:01 +02:00
  • 40f3aafc45 server: add "X-Accel-Buffering": "no" header to streaming endpoints (#24774) b9714 Reguna 2026-06-19 04:01:24 +08:00
  • a6b3260a42 mtmd: add batching for mtmd-cli, add video tests (#24778) b9713 Xuan-Son Nguyen 2026-06-18 21:55:04 +02:00
  • 959ce58197 improve Xuan Son Nguyen 2026-06-18 19:29:43 +02:00
  • 39fffcda7b Merge branch 'master' into xsn/mtmd_ds_ocr_tiles Xuan Son Nguyen 2026-06-18 18:59:26 +02:00
  • 32eddaf2ea cmake : fix ui build with read-only source (#24752) b9712 o7si 2026-06-19 00:59:18 +08:00
  • 060ce1bf72 mtmd: refactor llava-uhd overview image handling (always use ov_img_first) (#24769) b9711 Xuan-Son Nguyen 2026-06-18 18:53:49 +02:00
  • d2c67959b3 hexagon: support for op-trace (fine-grain tracing of HVX/HMX/DMA events) (#24592) Max Krasnyansky 2026-06-18 08:35:02 -07:00
  • 7b6c5a2aed docs: fix export-lora --lora-scaled syntax [no release] (#24703) Kangjia Gao 2026-06-18 22:46:17 +08:00
  • 4ea849efc7 rm debugging printf Xuan Son Nguyen 2026-06-18 16:41:12 +02:00
  • 9158400c42 adapt to new preprocessor api Xuan Son Nguyen 2026-06-18 16:32:08 +02:00
  • ea4d61c4f5 Merge branch 'master' into xsn/mtmd_ds_ocr_tiles Xuan Son Nguyen 2026-06-18 16:20:15 +02:00
  • fe7c8b2414 server: (router) fix stopping_thread potentially hang (#24728) Xuan-Son Nguyen 2026-06-18 15:41:09 +02:00
  • e1efd0991d server: add "schema" and validation (#24150) b9707 Xuan-Son Nguyen 2026-06-18 15:40:58 +02:00
  • 08023072ef server : add last-5-seconds generation speed display (#24291) Aarni Koskela 2026-06-18 15:02:20 +03:00
  • 20832179e2 ui: provide touch accessible model selection UI (#24604) Amos Wong 2026-06-18 19:14:20 +08:00
  • 10786217e9 server : return HTTP 400 on invalid grammar (#24144) (#24154) b9704 Anuj Attri 2026-06-18 06:49:14 -04:00
  • 552258c535 server: (router) rework -hf preset repo (#24739) b9703 Xuan-Son Nguyen 2026-06-18 12:45:23 +02:00
  • 968c43891a server: fix router args not being forwarded to child instances (#24760) b9702 Xuan-Son Nguyen 2026-06-18 12:15:46 +02:00
  • 24bba7b98e mtmd: refactor preprocessor, add mtmd_image_preproc_out (#24736) b9701 Xuan-Son Nguyen 2026-06-18 12:04:39 +02:00
  • 9724f664e8 [SYCL] rename GGML_SYCL_SUPPORT_LEVEL_ZERO (#24719) b9700 Neo Zhang 2026-06-18 16:18:26 +08:00
  • dd69db2924 sycl : support MUL_MAT and OUT_PROD with Q1_0 (#24721) b9699 Neo Zhang 2026-06-18 16:17:37 +08:00
  • 6ec59ddaea app : enable self-update only when built with llama-install.sh (#24754) b9698 Adrien Gallouët 2026-06-18 09:57:59 +02:00
  • 32e806b9c1 ci : fix check-release message parsing (#24751) b9697 Sigbjørn Skjæret 2026-06-18 09:32:56 +02:00
  • 6f1034b32a [SYCL] support OPs: conv_2d, conv_2d_dw, conv2d_transpose (#24600) Neo Zhang 2026-06-18 14:40:03 +08:00
  • 0b73fc79fe ui: Update code formatting command in pre-commit hook (#24685) Aleksander Grygier 2026-06-18 08:33:50 +02:00
  • 4a79037b8b ci : fix Windows x64 (OpenVINO) release link (#24731) b9694 Ravi Panchumarthy 2026-06-17 23:30:08 -07:00
  • cae0a3b0b0 metal : check for BF16 support in concat kernel (#24747) b9693 Georgi Gerganov 2026-06-18 09:16:06 +03:00
  • f3e1828164 mtmd: llava_uhd should no longer use batch dim (#24732) b9692 Xuan-Son Nguyen 2026-06-17 22:40:50 +02:00
  • 2e88c49c90 ggml-cpu: Conditionally enable power11 backend based on compiler support (#24687) b9691 shalinib-ibm 2026-06-18 00:15:19 +05:30
  • 0843245cb1 metal : implement rope_back operator (#24725) b9690 Georgi Gerganov 2026-06-17 20:36:05 +03:00
  • 8d2e580632 metal : add f16 and bf16 support for concat operator (#24724) b9689 Georgi Gerganov 2026-06-17 19:38:55 +03:00
  • 4b4d13ae72 server: (router) add model management API (#23976) b9688 Xuan-Son Nguyen 2026-06-17 18:04:58 +02:00
  • 37db4fa4be improve test 0cc4m/test-backend-copy Ruben Ortlam 2026-06-17 17:42:56 +02:00
  • b4024af6c2 llama : skip main_gpu validation when no devices are available (#23405) b9687 Dev-iL 2026-06-17 17:30:26 +03:00
  • 1a2dea29b9 spec: fix segfault error on long prompts for eagle3 (#24707) b9686 Ruixiang Wang 2026-06-17 16:29:49 +02:00
  • 74a80dd9c0 [SYCL] add dev2dev memcpy by SYCL API (#24476) b9685 Neo Zhang 2026-06-17 22:21:34 +08:00
  • d1759e4156 [SYCL] Add conv_3d (#24691) b9684 Neo Zhang 2026-06-17 22:20:01 +08:00
  • e804ed3fbe tests: add backend copy test Ruben Ortlam 2026-05-25 11:07:11 +02:00
  • 42874dfd8f clean up logging and timing 0cc4m/vulkan-graph-reuse Ruben Ortlam 2026-06-17 13:47:53 +02:00
  • 71d9373b82 simplify replay submission Ruben Ortlam 2026-06-17 13:32:30 +02:00
  • 8086439a4c webui: export conversations as jsonl (#24688) Julien Chaumond 2026-06-17 13:25:47 +02:00
  • f10a92dd17 fix queue debug utils label Ruben Ortlam 2026-06-17 13:20:52 +02:00
  • 558e221b70 vulkan: record actual memory properties during buffer creation (#24326) b9682 Winston Ma 2026-06-17 17:14:48 +08:00
  • ea21e03955 Revert "cuda: reset cuda context after reading memory size (#23935)" (#24715) Ruben Ortlam 2026-06-17 10:59:35 +02:00
  • 96d2b583ea almost working Xuan Son Nguyen 2026-06-17 10:38:30 +02:00
  • d5376cf5d7 ci: fix vulkan docker images (#24595) b9680 kononnable 2026-06-17 09:43:45 +02:00
  • bae36efa30 UI : fix SSE transport detection and routing through CORS proxy. Assi… (#24500) Harapan Rachman 2026-06-17 13:26:30 +07:00
  • 51571722aa opencl: optimize mul_mat_f16_f32_l4 for decode (#24504) b9678 lhez 2026-06-16 23:21:26 -07:00
  • cda63856b8 common: update logging to enforce max_capacity and optimize queue resizing (#24490) b9677 Max Krasnyansky 2026-06-16 23:19:11 -07:00
  • 890f1a27ed openvino: OV 2026.2, context-shift, Q5_1 support, gemma4 dense/embedding, and -fa off (#24503) Zijun Yu 2026-06-17 14:11:21 +08:00
  • 58728bdbf0 sycl : Enable to support fp16 by OPs: SQR, SQRT, LOG, SIN, COS, CLAMP (#24692) b9675 Neo Zhang 2026-06-17 13:58:03 +08:00
  • ebbc1e51c1 SYCL: fix use-after-free bug with async memcpy in MoE prefill (#24676) b9674 Alexey Kopytko 2026-06-17 14:57:29 +09:00
  • 9b260fc9ef sycl: Add optional USM system allocations (#22526) b9673 Francois Dugast 2026-06-17 07:54:21 +02:00
  • 6b482fd842 fuse row into a long image Xuan Son Nguyen 2026-06-17 00:44:57 +02:00
  • c8ba3e9010 remove hacky API Xuan Son Nguyen 2026-06-16 22:39:01 +02:00
  • d4bbef8083 mtmd: deepseek-ocr v1 multi-tile dynamic resolution + unified image-preprocessors for both versions (ds-ocr v1 and v2) Saba Fallah 2026-06-15 11:57:03 +02:00
  • 74ade52741 vendor : update BoringSSL to 0.20260616.0 (#24693) b9672 Alessandro de Oliveira Faria (A.K.A.CABELO) 2026-06-16 15:24:28 -03:00
  • c1304d7b28 ui: add source toggle to mermaid and svg blocks (#24652) Pascal 2026-06-16 14:14:22 +02:00
  • 02810c7aa8 Fix and restrict NVFP4 edge-cases in llama-graph (#24331) b9670 Oliver Simons 2026-06-16 11:52:38 +02:00
  • a1824902b5 spec: add backend sampling support for eagle3 (#24655) b9669 Ruixiang Wang 2026-06-16 11:05:52 +02:00
  • 32120c10e3 vulkan: prefer host-visible memory buffers on UMA devices (#22930) b9668 Winston Ma 2026-06-16 15:36:52 +08:00
  • d5fb104293 vulkan: Support gated_delta_net with S_v=16 (#24581) b9667 Jeff Bolz 2026-06-16 02:26:57 -05:00
  • 635b65ad7a spec: add spec metrics mean acceptance length and acceptance rate per position (#24536) Ruixiang Wang 2026-06-16 09:23:09 +02:00
  • e3a74b2990 bench : add --offline (#24511) b9665 Adrien Gallouët 2026-06-16 08:26:05 +02:00
  • ac79caa7ce sycl: support reordered Q4_K/Q5_K/Q6_K MoE MUL_MAT_ID (#24452) b9664 Frosty40 2026-06-16 00:35:00 -05:00
  • fdd109883d [SYCL] Support OP EXPM1, support all UT cases of FLOOR, TRUNC, ROUND (#24363) b9663 Neo Zhang 2026-06-16 13:34:29 +08:00
  • 4196b477da sycl : Make GGML_SYCL_F16=ON the default (#23996) Todd Malsbary 2026-06-15 22:34:02 -07:00
  • ad39ccaa19 vulkan: add col2im_1d op (#24425) b9661 Pascal 2026-06-16 06:34:43 +02:00
  • 7dad2f1a17 chat : fix LFM2 tool-call parsing double-escaping (#24667) b9660 Tarek Dakhran 2026-06-15 22:10:09 +02:00
  • e36a602ba3 mtmd: fix miscounting n_tokens (#24656) b9659 Xuan-Son Nguyen 2026-06-15 18:07:14 +02:00
  • 38d546330a chat: include full unparsed prompt in debug (#24650) b9658 Piotr Wilkin (ilintar) 2026-06-15 17:33:54 +02:00
  • a1eb756c0b docs: Add instructions to install llama.cpp from conda-forge (#22219) Julien Jerphanion 2026-06-15 17:12:25 +02:00
  • fcff47bcb1 Merge branch 'master' into add-long-debug-prompt add-long-debug-prompt Piotr Wilkin (ilintar) 2026-06-15 17:05:22 +02:00
  • 581e8eca8b chat: harden peg-native tool call parsing (#24329) b9656 Pascal 2026-06-15 15:37:04 +02:00
  • 0ae3f450f0 chat: fix an "oldie but goodie" grammar generator bug that surfaced during last changes (#24653) b9655 Piotr Wilkin (ilintar) 2026-06-15 15:27:47 +02:00
  • 911b67a603 update erroneous case in PEG parser test fix-gbnf-until Piotr Wilkin 2026-06-15 15:14:52 +02:00
  • 7c7be0fbc3 cont : fix cur_buf_size init after flushing a buffer gg/ggml-alloc-refactor-meta Georgi Gerganov 2026-05-25 18:38:35 +03:00
  • e0d7afdf74 ggml : add alloc_buffer_n to buffer type interface Georgi Gerganov 2026-05-25 16:50:13 +03:00
  • e3cab403bf mtmd : add post-decode callback (#24645) b9654 Georgi Gerganov 2026-06-15 16:02:05 +03:00
  • 6786edd14a chat: fix an "oldie but goodie" grammar generator bug that surfaced during last changes Piotr Wilkin 2026-06-15 15:00:53 +02:00
  • 6787472942 chat: include full unparsed prompt in debug message on parse error Piotr Wilkin 2026-06-15 13:22:23 +02:00
  • 9dbc6621ae vulkan: support more CONCAT types (#24579) b9653 Jeff Bolz 2026-06-15 06:19:21 -05:00
  • 6eab47181c wasm : fix fallback symbol collision (#24639) b9652 Andrei 2026-06-15 00:11:59 -07:00
  • e3bb1add8c SYCL: use native subgroup size for K-quant DMMV (#21700) b9651 Katostrofik 2026-06-15 03:10:53 -04:00
  • d8a3f523c8 sycl: fix soft_max_f32 max reduction (#24451) b9650 someoneinjd 2026-06-15 15:10:12 +08:00
  • 72be44f1d2 sycl : fix reorder function; add fp32/fp16 in build script (#24578) b9649 Neo Zhang 2026-06-15 15:08:34 +08:00
  • 8872ab5467 sycl : enhance set_rows to support q1_0, mxfp4, nvfp4 (#24564) Neo Zhang 2026-06-15 15:01:40 +08:00
  • 987fbd821d [SYCL] add to support pool_1d, move pool_1d/2d code to pool.cpp/hpp (#24584) b9647 Neo Zhang 2026-06-15 15:01:07 +08:00
  • c035ff4902 [SYCL]: Remove per-allocation Level Zero runtime checks (#23399) b9646 Alexey Kopytko 2026-06-15 15:58:42 +09:00
  • 272088b9f2 metal : add repeat bf16 (#24638) b9645 Georgi Gerganov 2026-06-15 09:57:16 +03:00
  • a6dff71270 chat: fix whitespace problems once and for all (#24624) b9644 Piotr Wilkin (ilintar) 2026-06-15 08:27:10 +02:00
  • 2a6c391a5e UI/svg block rendering (#24080) Pascal 2026-06-15 08:11:36 +02:00