Commit Graph

  • c2b1518fd4 sync : ggml b9561 Georgi Gerganov 2026-06-08 12:56:07 +03:00
  • 6a1de6fbf1 ggml : bump version to 0.14.0 (ggml/1533) Georgi Gerganov 2026-06-08 12:51:59 +03:00
  • 1458c8e581 server: refactor/generalize input file schema Xuan Son Nguyen 2026-06-08 13:07:26 +02:00
  • 715b86a366 cli: fix spinner not show during prompt processing (#24283) b9559 Xuan-Son Nguyen 2026-06-08 11:11:45 +02:00
  • c74759a244 vulkan: Use cm2 decode_vector for mul_mat_id B matrix loads (#23991) b9558 Jeff Bolz 2026-06-08 03:40:37 -05:00
  • 0f7fada56b cuda: reset cuda context after reading memory size (#23935) b9557 Ruben Ortlam 2026-06-08 10:22:44 +02:00
  • 19bba67c1f HIP: add gfx1152 and gfx1153 to RDNA3.5 (#24129) b9556 Harkirat Gill 2026-06-08 02:33:23 -04:00
  • daf6bc9f2d metal : fix im2col 1D case (audio models) (#24220) b9555 Xuan-Son Nguyen 2026-06-08 08:03:18 +02:00
  • d403f00ec3 [SYCL] Update compute runtime version to 26.x in docker (#24070) b9554 Neo Zhang 2026-06-08 10:35:18 +08:00
  • 9e3b928fd8 common : relax sampler name matching (#23744) b9553 ddh0 2026-06-07 15:48:11 -05:00
  • 8a963fc10e convert : fix conversion for Mistral-Medium-3.5-128B (#24268) David Friehs 2026-06-07 21:41:39 +02:00
  • 379ac6673b kv-cache : avoid kv cells copies (#24277) b9551 Georgi Gerganov 2026-06-07 21:42:54 +03:00
  • f0156d1401 kv-cache: follow the source cache size when sharing cells (#24267) b9550 Pascal 2026-06-07 17:33:00 +02:00
  • 04eb4c446d llama : add Gemma4 MTP (#23398) b9549 Aman Gupta 2026-06-07 20:50:54 +08:00
  • 8a091c47ab spec : fix vocab compatibility check (#24256) b9548 Sigbjørn Skjæret 2026-06-07 13:43:52 +02:00
  • 465b1f0e75 arg: Skip mmproj download when user supplied mmproj (#24239) b9547 konradmb 2026-06-07 11:18:44 +02:00
  • f71af352a5 convert : fix Gemma4 with no audio encoder (#24242) Sigbjørn Skjæret 2026-06-07 08:43:05 +02:00
  • 3f7c79d7b5 docker : bump cuda13 to 13.3.0 (#24228) Sigbjørn Skjæret 2026-06-07 08:31:58 +02:00
  • 98d5e8ba8a common/chat : fix LFM2/LFM2.5 reasoning round-trip and <think> leak (#24234) b9544 Tarek Dakhran 2026-06-06 22:39:21 +02:00
  • 22634e0eee Add tensor name to JSON output cross-profiler Piotr Wilkin 2026-06-06 22:33:01 +02:00
  • 2bfe4ff9ca tentative Metal support Piotr Wilkin 2026-05-19 11:52:22 +02:00
  • 28ef941775 Add missing unrolls Piotr Wilkin 2026-05-16 15:47:06 +02:00
  • 5ef996bd6a Revert accidental change. Piotr Wilkin 2026-05-13 17:22:19 +02:00
  • 1e47576c36 Fix braces Piotr Wilkin 2026-05-13 11:09:53 +02:00
  • 1b9b3e6489 Fix FATTN profiling Piotr Wilkin 2026-05-12 23:58:28 +02:00
  • 56f349fdd7 Converge implementation with export-graph-ops Piotr Wilkin 2026-04-07 22:01:00 +02:00
  • 041605fdc9 Add missing op parameters to the profiler; add support for test-backend-ops to run performance tests with exactly the tensor shapes from the run Piotr Wilkin 2026-04-03 17:41:57 +02:00
  • 3f00bcd871 docs, pass copy details Piotr Wilkin 2026-03-29 23:35:38 +02:00
  • 61bb65d9c9 fix mul_mat_id stats, add throughput stat, add envvar trigger, add concurrent mode fix Piotr Wilkin 2026-03-29 22:52:33 +02:00
  • fbb7ceff1d fix builds, integrate vulkan profiler, fix copy events, fix export Piotr Wilkin 2026-03-29 16:52:50 +02:00
  • 2895925203 Fix more missing backend stuff (and Python errors) Piotr Wilkin 2026-03-29 01:57:02 +01:00
  • 4e927afd4c add second dimension to reported tensors, fix Mac build, add missing initializer to all backends Piotr Wilkin 2026-03-29 01:49:52 +01:00
  • 893aa72363 feat: cool profiler thingy Piotr Wilkin 2026-03-29 01:14:09 +01:00
  • 31e82494c0 mtmd: support "frame merge" for qwen-vl-based models (#21858) b9543 Xuan-Son Nguyen 2026-06-06 21:17:25 +02:00
  • 37c56c245e wip gg/pr/23398-save Georgi Gerganov 2026-06-06 16:30:41 +03:00
  • 6b80c74f28 completion : remove useless statics (#24226) b9542 Adrien Gallouët 2026-06-06 12:16:16 +02:00
  • 588f0dc2ce completion : fix format specifier in LOG_INF (#24213) b9541 Adrien Gallouët 2026-06-06 11:24:27 +02:00
  • f5c6ae1827 mtmd, server: add "placeholder bitmap" for counting tokens , add */input_tokens API (#23913) Xuan-Son Nguyen 2026-06-06 11:06:51 +02:00
  • 1c4a91c0f3 wip Georgi Gerganov 2026-06-06 10:48:36 +03:00
  • 5a69c97439 vulkan: check coopmat2 features before reporting support (#24186) Ruben Ortlam 2026-06-06 09:11:35 +02:00
  • 5343f4502a model : rename local n_layer_all variable (#24209) b9538 Sigbjørn Skjæret 2026-06-06 06:07:20 +02:00
  • 603300b008 context : fix off-by-one comparisons to n_gpu_layers (#24208) b9537 Sigbjørn Skjæret 2026-06-06 06:06:47 +02:00
  • 308f61c31f opencl: improve get_rows, cpy, concat and q6_k flat gemv (#24160) b9536 lhez 2026-06-05 13:45:25 -07:00
  • da87e9b612 common/chat : unify and fix LFM2/LFM2.5 tool parser (#24178) b9535 Tarek Dakhran 2026-06-05 21:31:56 +02:00
  • e82beaa60d vulkan: add fwht support for Intel with shmem reduction (#23964) b9534 Ruben Ortlam 2026-06-05 19:44:40 +02:00
  • c4a278d68e model: fix build failed (#24193) b9533 Xuan-Son Nguyen 2026-06-05 18:12:27 +02:00
  • 64086f2b2f model, mtmd: Granite4 Vision (#23545) Gabe Goodhart 2026-06-05 09:44:59 -06:00
  • 6effcecd0b TP: round up granularity to 128 (#24180) b9531 Johannes Gäßler 2026-06-05 17:35:13 +02:00
  • 86591c7536 cli: fix model params not propagated (#23893) b9530 therealkenc 2026-06-05 08:29:41 -07:00
  • 65eef9549c Merge branch 'master' into pr/23398 Georgi Gerganov 2026-06-05 17:47:19 +03:00
  • 96fbe00393 model : fix llama_model::n_gpu_layers() (#24188) b9529 Georgi Gerganov 2026-06-05 17:11:42 +03:00
  • 2016bf2b3b ui: run npm install when package-lock.json is newer than node_modules (#24171) b9528 Pascal 2026-06-05 14:57:32 +02:00
  • 9c955c48b0 Fix link to available UI settings (#24169) Mario 2026-06-05 13:39:32 +01:00
  • cc7bef34e2 ui: add ignore-scripts=true to npmrc (#24149) Xuan-Son Nguyen 2026-06-05 14:31:03 +02:00
  • f0438b1b15 cont : avoid computations on the CPU Georgi Gerganov 2026-06-05 14:39:03 +03:00
  • d78a3864f0 cont : adjust to hparams changes Georgi Gerganov 2026-06-05 14:38:41 +03:00
  • 5954f196ed Merge branch 'master' into pr/23398 Georgi Gerganov 2026-06-05 14:02:53 +03:00
  • ad1b88ca0d docs: Update quantization readme (#24133) Pedro Cuenca 2026-06-05 12:21:26 +02:00
  • 59917d3922 minor : fix lint issues (#24165) b9524 Georgi Gerganov 2026-06-05 11:17:54 +03:00
  • 7acb4e8cd2 hparams : refactor hparams.n_layer (#24060) b9523 Georgi Gerganov 2026-06-05 11:09:36 +03:00
  • 3ecfb150a4 kleidiai : dynamic chunck-based scheduling for hybrid execution (#23819) b9522 Charles Xu 2026-06-05 09:11:47 +02:00
  • 4eaa3cee66 add unified assistant Aman Gupta 2026-06-05 14:59:44 +08:00
  • 2154a0fdcf CUDA: enroll mul_mat_vec_q_moe into pdl (#24087) b9521 Oliver Simons 2026-06-05 08:37:34 +02:00
  • 46fa662b1f ci : build-msys job slimming [no ci] (#24157) Daniel Bevenius 2026-06-05 07:57:36 +02:00
  • 7fe2ae45ab sycl : port multi-column MMVQ from CUDA backend (#21845) b9519 Mason Milburn 2026-06-05 01:10:31 -04:00
  • 7c158fbb4a server : disable on-device spec checkpoints (#24108) b9518 Georgi Gerganov 2026-06-04 19:30:59 +03:00
  • 260862b8ca arg: fix double mtp downloads (#24128) Xuan-Son Nguyen 2026-06-04 18:23:48 +02:00
  • 42b2d60e57 webui: [a11y] fix keyboard navigation issues in chat interface and sidebar (#23132) viggy 2026-06-04 08:59:00 -07:00
  • e7bcf1c3a8 Move duplicated imatrix code into single common imatrix-loader.cpp (#22445) b9515 Bartowski 2026-06-04 11:45:40 -04:00
  • 21444c822e ui: Fixed packages (#24119) Aleksander Grygier 2026-06-04 16:23:08 +02:00
  • 526977068f ui: added single line reasoning preview (#23601) MagicExists 2026-06-04 21:09:43 +07:00
  • 0dbfa66a1f return filter to save memory (#24125) b9512 forforever73 2026-06-04 21:56:33 +08:00
  • e8023568d0 convert: Fix Gemma 4 Unified conversion (#24118) Pedro Cuenca 2026-06-04 15:21:38 +02:00
  • 4c51309617 ggml: vectorize ggml_vec_dot_q4_1_q8_1 with WASM SIMD128 (#22209) b9510 Kartik Sirohi 2026-06-04 18:42:38 +05:30
  • 6f3a9f3dee server: avoid unnecessary checkpoint restore when new tokens are present (#24110) b9509 Yongyue Sun 2026-06-04 21:09:01 +08:00
  • a121232fdc agents: refactor, include more guidelines (#24111) Xuan-Son Nguyen 2026-06-04 13:40:23 +02:00
  • 4586479852 webui: fix tool selector toggle/counter, key tools by stable identity (#24065) Pascal 2026-06-04 13:09:49 +02:00
  • 4d742877b2 build : use umbrella Headers directory for XCFramework module map (#23974) Gerard Martinez 2026-06-04 03:58:25 -07:00
  • dd97604fc4 move assistant to separate file Aman Gupta 2026-05-28 14:12:23 +08:00
  • c0da00af04 add exception in test-llama-archs Aman Gupta 2026-05-28 13:41:39 +08:00
  • 777af6af54 add temp hack to not use fit with gemma4, rm later Aman Gupta 2026-05-28 12:53:08 +08:00
  • 27461cd888 add Q rot when cache is quantized Aman Gupta 2026-05-22 00:17:02 +08:00
  • 7b87cd3598 add assert that draft + shared kv should be on same device Aman Gupta 2026-05-20 23:41:33 +08:00
  • 9af0434d8c fix multi-seq Aman Gupta 2026-05-19 22:17:09 +08:00
  • f268966d49 llama: Gemma 4 MTP Aman Gupta 2026-05-19 20:18:00 +08:00
  • 0066404085 server : add header to tools/server/server-http.h (#24089) b9505 A B 2026-06-04 05:14:46 -05:00
  • 7ac5a4225e cmake: skip cvector-generator and export-lora when CPU backend is disabled (#24053) b9504 Andrea Richiardi 2026-06-04 04:13:19 -06:00
  • e3ba22d6cc fix(mtmd): handle Gemma 4 audio projector embedding size (#24091) b9503 Andrei 2026-06-04 02:51:23 -07:00
  • 6ddc9430b1 readme : add status badges (#24104) Georgi Gerganov 2026-06-04 10:58:13 +03:00
  • 65ef50a0a4 tests : refactor test-save-load-state to accept token input (#24073) b9501 Georgi Gerganov 2026-06-04 08:06:36 +03:00
  • 3d1998634e metal : reduce rset heartbeat from 500ms -> 5ms (#24074) b9500 Georgi Gerganov 2026-06-04 08:05:32 +03:00
  • e8c54893f2 ggml-webgpu: FlashAttention refactor + standardize quantization support (#23834) b9499 Reese Levine 2026-06-03 22:05:04 -07:00
  • 3c7450cee1 ggml-cpu: extend RVV quantization vec dot to higher VLENs (#22754) b9498 rehan-10xengineer 2026-06-04 10:03:40 +05:00
  • f478f1b6d7 sycl : Improve SYCL doc (#23025) Todd Malsbary 2026-06-03 22:02:54 -07:00
  • 94a220cd67 mtmd: fix Gemma 4 unified FPE (#24088) b9496 Andrei 2026-06-03 12:51:18 -07:00
  • 166fe29492 qwen35: use post-norm hidden state for MTP (#24025) b9495 Aman Gupta 2026-06-04 01:29:09 +08:00
  • c8d6a00636 mtmd: enable non-causal vision for gemma 4 unified (#24082) b9494 Xuan-Son Nguyen 2026-06-03 19:05:17 +02:00
  • a731805ced mtmd, model: allow skip build_vit() (#24077) b9493 Xuan-Son Nguyen 2026-06-03 17:10:35 +02:00
  • ee4cf705bb ui: Mermaid Diagrams in chat + interactive preview (#24032) Aleksander Grygier 2026-06-03 16:55:36 +02:00
  • 9e58d4d692 Avoid PDL race conditions by disabling __restrict__ when PDL is used (#24030) b9491 Andreas Kieslinger 2026-06-03 13:56:42 +02:00