Commit Graph

  • 2d9cf41192 check once 0cc4m/vulkan-device-cpy-benchmark Ruben Ortlam 2026-06-29 16:03:26 +02:00
  • 7b04249248 fix cleanup Ruben Ortlam 2026-06-29 15:45:12 +02:00
  • 25a1d63f43 vulkan: use flops instead of weight tensor size for submission heuristic (#25005) master Ruben Ortlam 2026-06-29 15:24:44 +02:00
  • 533749756c add dmabuf capability check Ruben Ortlam 2026-06-29 13:51:47 +02:00
  • 8c146a8366 DeepSeek V4 (#24162) b9840 Aman Gupta 2026-06-29 16:58:51 +08:00
  • 6cb18b2f2e tools/ui: restore Tailwind scanning in ignored worktrees (#24879) b9839 seryogakovalyov 2026-06-29 11:55:52 +03:00
  • 71292c3f59 add reverse order tests for dmabuf Ruben Ortlam 2026-05-21 14:44:52 +02:00
  • 28a7780793 skip dmabuf_p2p when one device is nvidia, due to driver crashes Ruben Ortlam 2026-05-21 14:01:49 +02:00
  • 359f1269aa improve output device consistency Ruben Ortlam 2026-05-21 13:54:12 +02:00
  • 0ec15bd686 catch driver issues in benchmarks Ruben Ortlam 2026-05-21 13:35:39 +02:00
  • 607dc68378 add host dmabuf p2p test Ruben Ortlam 2026-05-21 13:16:37 +02:00
  • c50ab7afa8 output device group info Ruben Ortlam 2026-05-01 14:53:12 +02:00
  • 57e31bf544 add device group test Ruben Ortlam 2026-05-01 07:49:49 +02:00
  • b53536b94a clean up tests, add dma_buf test Ruben Ortlam 2026-04-29 15:20:42 +02:00
  • ad6cddf0fc benchmark Ruben Ortlam 2026-04-08 18:26:50 +02:00
  • e4d2e198b9 server: add --models-memory-max parameter to allow dynamically unloading models when they exceed a memory size threshold 0cc4m/server-memory-limit Ruben Ortlam 2026-03-29 10:00:49 +02:00
  • 277a105dc8 common : remove unused regex-partial (#25118) b9838 o7si 2026-06-29 14:48:39 +08:00
  • b3fed31b99 jinja, chat: add --reasoning-preserve flag (#25105) b9837 Xuan-Son Nguyen 2026-06-28 23:33:51 +02:00
  • dbdaece23d Revert "ui: fix accessibility for hover-gated interactive elements assisted by claude(in debugging and tests) (#24727)" (#25098) Aleksander Grygier 2026-06-28 21:30:03 +02:00
  • 7cb8576e7c ui: fix stop and reasoning skip in single-model mode (#25084) b9835 Pascal 2026-06-28 21:06:43 +02:00
  • fa72bc6826 dflash: refactor draft model conversion (#25110) Ruixiang Wang 2026-06-28 20:31:48 +02:00
  • fcf66c1d80 correct help message xsn/jinja_preseve_thinking Xuan Son Nguyen 2026-06-28 19:52:36 +02:00
  • 6989cb1e53 jinja, chat: add --reasoning-preserve flag Xuan Son Nguyen 2026-06-28 17:40:43 +02:00
  • c818263f2a chat : implement minicpm5 parser (#24889) b9833 Aldehir Rojas 2026-06-28 09:53:32 -05:00
  • f68a788b0b jinja: add --dump-prog for debugging (#25086) b9832 Xuan-Son Nguyen 2026-06-28 15:50:31 +02:00
  • d1b34251bc spec : add DFlash support (#22105) b9831 Ruixiang Wang 2026-06-28 15:01:34 +02:00
  • c1a1c8ee94 common : allow --offline in llama download (#25091) b9830 Adrien Gallouët 2026-06-28 12:34:11 +02:00
  • adba174148 server : hint preserve_thinking when supported by chat template gg/preserve-thinking-hint Georgi Gerganov 2026-06-27 17:45:41 +03:00
  • 84ff9ef023 server : hint preserve_thinking when supported by chat template Georgi Gerganov 2026-06-27 17:42:42 +03:00
  • 2c9e01c217 server : hint preserve_thinking when supported by chat template Georgi Gerganov 2026-06-27 17:39:39 +03:00
  • 27c8bb4f63 logs : reduce v2 (#25078) b9829 Georgi Gerganov 2026-06-28 08:52:15 +03:00
  • ebd048fc5e opencl: flash attention improvement (#25069) b9828 Hongqiang Wang 2026-06-27 15:36:06 -07:00
  • 0ed235ea2c [CUDA] Added a cudaMemcpy2DAsync fast path to ggml_cuda_cpy (#25057) b9827 Gaurav Garg 2026-06-27 17:46:21 +05:30
  • 0aac18be29 use memcpy for small copies across host-visible memory 0cc4m/vulkan-d2d-copy Ruben Ortlam 2026-06-18 11:55:10 +02:00
  • 53e8d97e8c add semi-async staging copy and use it over syncfd for small copies Ruben Ortlam 2026-06-11 11:37:09 +02:00
  • 37e842aee9 swap dmabuf check order Ruben Ortlam 2026-06-10 15:00:00 +02:00
  • 43ed6f2584 use allocator instead of hop2 Ruben Ortlam 2026-05-29 15:08:39 +02:00
  • e9c5242fd4 use sync_fd binary semaphores for cross-driver synchronization Ruben Ortlam 2026-05-26 09:33:34 +02:00
  • fd1d315da0 double buffering Ruben Ortlam 2026-05-26 08:05:14 +02:00
  • 412c0f19f7 fixes, disable semaphore sharing on Nvidia + non-Nvidia Ruben Ortlam 2026-05-26 07:37:24 +02:00
  • 69855b4779 add async copy Ruben Ortlam 2026-05-25 16:27:17 +02:00
  • 3d8b3e16ce vulkan: add optimized device to device copy function Ruben Ortlam 2026-05-25 16:03:44 +02:00
  • f7c1df6502 metal : per-op source split + parallel compile (#24021) dev-metal YiChen Lv 2026-06-20 18:36:32 +08:00
  • 9bebfcb4bc sycl : fix failed ut cases of norm (#25044) b9826 Neo Zhang 2026-06-27 17:13:43 +08:00
  • 0b6529d818 vulkan: fix step operator for 0 input (#25036) b9825 Ruben Ortlam 2026-06-27 10:57:31 +02:00
  • c299a92c38 binaries : Improve rpc-server and export-graph-ops names. (#25045) b9824 Christian Kastner 2026-06-27 09:31:29 +02:00
  • 0275c0f800 ci : add windows-openvino to check-release (#25022) b9823 Sigbjørn Skjæret 2026-06-27 09:30:56 +02:00
  • 83d385b429 tests : fix test-chat-template --no-common option (#25075) b9822 Sigbjørn Skjæret 2026-06-27 09:30:19 +02:00
  • 050ee92d04 app : allow --version, --licenses & --help (#25054) b9821 Adrien Gallouët 2026-06-26 23:18:11 +02:00
  • 3fc4e10527 sched : reintroduce less synchronizations during split compute (#20793) b9820 Andreas Kieslinger 2026-06-26 16:18:30 +02:00
  • 5d8ccdf9d1 devops : add llama in all docker images (#25035) Adrien Gallouët 2026-06-26 15:15:48 +02:00
  • 024930c6ad arg: fix handling --spec-draft-hf and --hf-repo-v (#25043) Xuan-Son Nguyen 2026-06-26 14:36:03 +02:00
  • 5397c36194 openvino: Update to OV 2026.2.1, self-contained release packages, operator improvements (#24974) b9817 Ravi Panchumarthy 2026-06-26 05:07:19 -07:00
  • e7ea94afcb sync : ggml b9816 Georgi Gerganov 2026-06-26 15:04:05 +03:00
  • 96183e9820 ggml : bump version to 0.15.3 (ggml/1550) Georgi Gerganov 2026-06-26 14:37:43 +03:00
  • 487a6cc164 vulkan: opt mul_mat_vecq for mi50 (#22933) b9814 nullname 2026-06-26 19:49:24 +08:00
  • 5a6a0dd7e1 vulkan: add INTEL_XE1 arch enum and enable coopmat1 on Intel Xe-LPG Plus (#24404) b9813 Jiang, Fish 2026-06-26 11:26:22 +00:00
  • c35f33b0a1 fix missing mparams.hf_file xsn/fix_handling_spec_hf Xuan Son Nguyen 2026-06-26 13:17:04 +02:00
  • ded1561b42 ui: fix accessibility for hover-gated interactive elements assisted by claude(in debugging and tests) (#24727) Sanjay Ahari 2026-06-26 16:25:38 +05:30
  • df3ba2874f arg: fix handling --spec-draft-hf and --hf-repo-v Xuan Son Nguyen 2026-06-26 12:55:06 +02:00
  • 9df06805ee vulkan: Workaround compiler bug in conv2d coopmat2 path (#24924) b9811 Jeff Bolz 2026-06-26 04:53:32 -05:00
  • 2f18fe13c5 CUDA: add cublasSgemmBatched mapping for HIP/MUSA vendor headers (#25033) b9810 leonardHONG 2026-06-26 17:42:56 +08:00
  • 5004859421 server-stream : pimpl gg/server-stream-clean-up Georgi Gerganov 2026-06-26 11:36:15 +03:00
  • c16c35b814 ggml-cpu: fix SVE leftover path in ggml_vec_dot_f32 (#24699) Tarek Dakhran 2026-06-26 09:41:56 +02:00
  • 1a87dcdc45 server + ui: SSE Replay Buffer (#23226) Pascal 2026-06-26 09:31:29 +02:00
  • e7e3f35090 sycl : clamp softmax input to avoid underflow (#24941) Jassieluo 2026-06-26 15:02:42 +08:00
  • b11f7c16bc mtmd: add more validations (#25013) Xuan-Son Nguyen 2026-06-26 08:43:29 +02:00
  • f818065d75 CUDA: batch out_prod broadcast (dps2>1) path with cublasSgemmBatched (#24426) leonardHONG 2026-06-26 13:51:25 +08:00
  • 960d628f46 mamba2: remove hardcoded 2x expansion factor and invalid d_inner % d_state check (#23082) b9804 Arsen Arutunan 2026-06-26 08:50:54 +03:00
  • 5c7c22c3e1 opencl: flush profiling batch at shutdown for incomplete batches (#25016) b9803 shaofeiqi 2026-06-25 18:48:24 -07:00
  • beac5309f1 xcframework : disable mtmd video on i/tv/visionos (#25018) b9802 Sigbjørn Skjæret 2026-06-26 00:13:59 +02:00
  • 9d5d882d8c model : Add label for LFM2.5-230M (#25008) Tarek Dakhran 2026-06-25 18:58:52 +02:00
  • 81313a35ae type check for get_arr_int xsn/mtmd_fix_2 Xuan Son Nguyen 2026-06-25 18:54:57 +02:00
  • a4b1c14c1a refactor a bit Xuan Son Nguyen 2026-06-25 18:49:30 +02:00
  • 732d5b6fd8 fix Xuan Son Nguyen 2026-06-25 17:36:53 +02:00
  • 1ec44d178d CUDA: Various fixes to cpy.cu (#25000) Oliver Simons 2026-06-25 17:29:23 +02:00
  • 8eef8c1b21 mtmd: add more validations Xuan Son Nguyen 2026-06-25 17:29:23 +02:00
  • c7cddefcbd misc: fix labeler (#25012) Xuan-Son Nguyen 2026-06-25 17:23:37 +02:00
  • e9d1b76d0a server: use status code 403 for disabled features (#24970) Xuan-Son Nguyen 2026-06-25 16:36:40 +02:00
  • 2e4cbade70 Merge branch 'master' into xsn/mtmd_ds_ocr_tiles xsn/mtmd_ds_ocr_tiles Xuan Son Nguyen 2026-06-25 16:28:50 +02:00
  • 099bf06952 misc: update lables (#24920) Xuan-Son Nguyen 2026-06-25 16:26:56 +02:00
  • 68ed5149fb bring back examples, add mtmd xsn/update_labels Xuan Son Nguyen 2026-06-25 15:23:03 +02:00
  • 60bc8866b1 common: refactor model handling (#24980) Xuan-Son Nguyen 2026-06-25 15:17:51 +02:00
  • bf05250df9 use unsigned ints 0cc4m/vulkan-submission-threshold-flops Ruben Ortlam 2026-06-25 15:02:50 +02:00
  • e8ecce53b8 docs : Eagle3 qwen3 draft model support (#24977) Kashif Rasul 2026-06-25 14:58:00 +02:00
  • cb2a4259aa use flops instead of matmul src0 tensor size for submission threshold Ruben Ortlam 2026-06-24 15:24:56 +02:00
  • 492adff8fb vulkan: extract flops calculation into function Ruben Ortlam 2026-06-24 14:30:38 +02:00
  • 683b04cc4a app : add the llama download subcommand (#24982) Adrien Gallouët 2026-06-25 13:36:36 +02:00
  • f728adab68 ggml : address integer overflows in binary ops CUDA implementation (#24706) fairydreaming 2026-06-25 10:06:44 +02:00
  • 3e61ea0e2f ui: fix always-show-sidebar-on-desktop setting after navigation refactor (#24979) Pascal 2026-06-25 09:45:55 +02:00
  • fdbd6abee2 tests : synchronize contexts at end of test-thread-safety (#24935) Christopher Albert 2026-06-25 08:22:51 +02:00
  • e12a0128ab build: include libmtmd in Apple XCFramework (#21935) Abraham Gonzalez 2026-06-25 01:37:30 -04:00
  • b3ce5cedf4 quant : fix quantizing moe with mtp (#24986) b9789 Sigbjørn Skjæret 2026-06-25 07:36:49 +02:00
  • e9fb3b3fc0 sycl : support --split-mode tensor (#24152) b9788 David Spruill 2026-06-25 01:35:21 -04:00
  • 9c10954865 sycl : fix the failed UT cases of conv_3d (#24900) b9787 Neo Zhang 2026-06-25 13:27:58 +08:00
  • fdb2c11c70 opencl: support non-contig rows in norm (#24965) b9786 lhez 2026-06-24 19:21:25 -07:00
  • 09cedfd699 chat: harden caps check (#24973) b9785 Piotr Wilkin (ilintar) 2026-06-25 02:49:22 +02:00
  • 8be759e6f7 hexagon: MUL_MAT and MUL_MAT_ID rework : 32x32 tiled weight repack, kernel-params, cached graphs (#24954) b9784 Max Krasnyansky 2026-06-24 12:14:25 -07:00
  • 894bb27af3 mtmd: model: unlimited-ocr: converter + parity test (#24969) Saba Fallah 2026-06-24 18:20:22 +02:00
  • fb401045cc common: remove unused json-partial (#24968) b9782 Xuan-Son Nguyen 2026-06-24 18:12:16 +02:00