Commit Graph

  • fb401045cc common: remove unused json-partial (#24968) b9782 Xuan-Son Nguyen 2026-06-24 18:12:16 +02:00
  • 51eae8cfca vulkan: allow reducing the graph submission batches to avoid timeouts (#24872) b9781 Wagner Bruna 2026-06-24 11:29:24 -03:00
  • 3199d5357c chat: harden caps check caps-harden Piotr Wilkin 2026-06-24 15:16:13 +02:00
  • a14f8d2ed5 fix test case xsn/server_403_disabled_endpoints Xuan Son Nguyen 2026-06-24 13:38:25 +02:00
  • d9a0c0fe9b cont Xuan Son Nguyen 2026-06-24 13:29:28 +02:00
  • 796b1ada8d server: use status code 403 for disabled features Xuan Son Nguyen 2026-06-24 12:56:51 +02:00
  • ef687feb42 common: remove unused json-partial xsn/rm_unused_json_partial Xuan Son Nguyen 2026-06-24 12:49:42 +02:00
  • 1191758c5d vulkan: fail the build when a shader fails to compile (#24450) b9780 liminfei-amd 2026-06-24 17:42:03 +08:00
  • 00139b660b ui: loading bar below the model picker (#24931) Pascal 2026-06-24 10:50:44 +02:00
  • ef9c13d4c2 ui: New Logo + Navigation cleanup & Mobile UI/UX improvements (#24897) Aleksander Grygier 2026-06-24 10:21:33 +02:00
  • 88636e178f model : Add LFM2.5-ColBERT-350M and LFM2.5-Embedding-350M (#24913) b9777 Tarek Dakhran 2026-06-24 08:49:46 +02:00
  • ac4105d68b vulkan: Apply bias before softmax in FA, to avoid overflow (#24909) b9776 Jeff Bolz 2026-06-23 22:34:00 -05:00
  • a432e6f863 use destructor instead xsn/cli_http_based Xuan Son Nguyen 2026-06-23 22:57:20 +02:00
  • 5d67f69f59 remove outdated comment Xuan Son Nguyen 2026-06-23 22:49:40 +02:00
  • beef5cf077 Apply suggestions from code review Xuan-Son Nguyen 2026-06-23 22:48:04 +02:00
  • be4a6a63eb server : check draft context creation error (#24922) b9775 kononnable 2026-06-23 16:56:50 +02:00
  • 72a9269172 vulkan: support all backend tests for SQR/SQRT/SIN/COS/CLAMP/LEAKY_RELU/NORM (#24582) b9774 Jeff Bolz 2026-06-23 09:48:24 -05:00
  • b093e46873 case: router with only one model Xuan Son Nguyen 2026-06-23 16:47:30 +02:00
  • 1401fc3ca7 cli support router mode Xuan Son Nguyen 2026-06-23 16:39:59 +02:00
  • 85c58bbcd0 remote server ok Xuan Son Nguyen 2026-06-23 16:19:28 +02:00
  • 19296c1735 working Xuan Son Nguyen 2026-06-23 16:09:09 +02:00
  • 92e854ab83 vulkan: Support GET_ROWS_BACK (#24883) b9773 Jeff Bolz 2026-06-23 08:39:37 -05:00
  • c5606364b2 vulkan: support CONV_3D (#24612) Jeff Bolz 2026-06-23 08:39:20 -05:00
  • 0eb874d374 vulkan: make mul_mm ALIGNED a spec constant (#24689) b9771 Jeff Bolz 2026-06-23 07:26:17 -05:00
  • 90c111bf98 Merge branch 'master' into xsn/cli_http_based Xuan Son Nguyen 2026-06-23 13:29:22 +02:00
  • 75ad0b23ed server: fix remote preset handling, add test (#24938) b9770 Xuan-Son Nguyen 2026-06-23 13:28:34 +02:00
  • f7421eabe8 wip Xuan Son Nguyen 2026-06-23 13:28:14 +02:00
  • 59797670dc cli: move to HTTP-based implementation Xuan Son Nguyen 2026-06-23 13:14:28 +02:00
  • c926ad0985 vulkan: link ggml-cpu when GGML_VULKAN_CHECK_RESULTS / RUN_TESTS are enabled (#24444) b9769 Wyatt Caldwell 2026-06-23 03:55:46 -07:00
  • a3900a6694 model: Granite Speech Plus (#24818) b9768 Gabe Goodhart 2026-06-23 04:03:31 -06:00
  • 7c908502ea ggml-webgpu: improve MTP inference by using mat-vec path for small batches (#24811) b9767 Masashi Yoshimura 2026-06-23 17:13:55 +09:00
  • 035cd8f9a6 codeowners: add yomaytk to ggml-webgpu (#24930) Masashi Yoshimura 2026-06-23 15:19:34 +09:00
  • 73618f27a8 server: improve user message detection and create checkpoints at every user message (#24176) b9765 Aldehir Rojas 2026-06-23 00:27:28 -05:00
  • 23ee8797e1 opencl: q8_0 gemv precision improvement (#24923) Shawn Gu 2026-06-22 22:25:21 -07:00
  • a19f3ea631 misc: update lables Xuan Son Nguyen 2026-06-23 00:22:12 +02:00
  • dec5ca5577 server : Add id to tool call responses api (#24882) b9763 Matt Thompson 2026-06-22 14:03:12 -07:00
  • 9c0ac887f3 ui: Prioritize favorite models in model selection (#24766) Mahdiou Diallo 2026-06-22 21:00:21 +02:00
  • 095058ca19 add arg --threads-sampling xsn/server_multithread_sampling Xuan Son Nguyen 2026-06-22 20:03:49 +02:00
  • c62fdd5fd0 working Xuan Son Nguyen 2026-06-22 19:38:25 +02:00
  • 41ed530be2 wip Xuan Son Nguyen 2026-06-22 19:30:11 +02:00
  • fe03cce8db server: run sampling in a threadpool Xuan Son Nguyen 2026-06-22 19:05:39 +02:00
  • 721354fbdf server: (router) move model downloading to dedicated process (#24834) b9761 Xuan-Son Nguyen 2026-06-22 18:24:04 +02:00
  • 6ee0f65793 server: refactor/generalize input file schema (#24299) b9760 Xuan-Son Nguyen 2026-06-22 16:42:47 +02:00
  • 1b82e9ae51 fix windows xsn/server_input_file_schema Xuan Son Nguyen 2026-06-22 16:20:56 +02:00
  • 61653c7989 Merge branch 'master' into xsn/server_input_file_schema Xuan Son Nguyen 2026-06-22 16:19:59 +02:00
  • 099b579acb ui: model status and load progress via /models/sse feed (#24878) Pascal 2026-06-22 15:55:30 +02:00
  • 037397792a vulkan: split ggml-vulkan.cpp file 0cc4m/vulkan-cpp-split Ruben Ortlam 2026-06-22 15:50:01 +02:00
  • f8cc15f163 [SYCL] support bf16 on bin_bcast OP and unary OPs (#24838) b9758 Neo Zhang 2026-06-22 19:09:02 +08:00
  • 37957e8531 sampling : remove unconditional softmax+sort in top-n-sigma sampler (#22645) b9757 Tim Neumann 2026-06-22 13:08:32 +02:00
  • d0f9d2e5ac server: fix edit_file crash on append at end of file (line_start -1) (#24893) b9756 Pascal 2026-06-22 10:55:28 +02:00
  • 0ef6f06d55 docs/android.md: Add dependency libandroid-spawn for building in termux (#21812) b9755 aafsmarak 2026-06-22 09:18:31 +05:30
  • 52b3df0023 common/peg : implement ac parser for stricter grammar generation (#24869) b9754 Aldehir Rojas 2026-06-21 16:20:58 -05:00
  • 7c082bc417 server: fix report progress for loading spec models, add "stages" list (#24870) b9753 Xuan-Son Nguyen 2026-06-21 17:36:52 +02:00
  • bddfd2b113 server: refactor batch construction (#24843) b9752 Xuan-Son Nguyen 2026-06-21 14:16:11 +02:00
  • 0d135df48c mtmd: fix mtmd_get_memory_usage (#24867) b9751 Xuan-Son Nguyen 2026-06-21 14:12:15 +02:00
  • bf533823cd jinja : implement call statement (#24847) b9750 Sigbjørn Skjæret 2026-06-21 14:04:52 +02:00
  • 2f89acc2bc mtmd: add load progress callback (#24865) Xuan-Son Nguyen 2026-06-21 13:40:52 +02:00
  • 7ac864bf97 disable DEBUG_TIMINGS xsn/server_refactor_batch Xuan Son Nguyen 2026-06-21 13:38:09 +02:00
  • d37414510b address comments Xuan Son Nguyen 2026-06-21 13:15:58 +02:00
  • bfa3219177 server: add "verbose" field to schema (#24864) b9748 Xuan-Son Nguyen 2026-06-21 13:03:14 +02:00
  • d6d899580d server: real-time model load progress tracking via /models/sse (#24828) b9747 Xuan-Son Nguyen 2026-06-21 11:58:14 +02:00
  • f1ef61fb1b server: add "verbose" field to schema xsn/server_verbose_field Xuan Son Nguyen 2026-06-21 11:16:06 +02:00
  • 8a118ee86c minor : clean-up whitespaces (#24862) Georgi Gerganov 2026-06-21 11:37:12 +03:00
  • d789527482 spec : Support Step3.5/3.7 flash mtp3 (#24340) b9745 YiChen Lv 2026-06-21 16:33:18 +08:00
  • 063d9c156e common/peg : refactor until gbnf grammar generation (#24839) b9744 Aldehir Rojas 2026-06-20 21:15:06 -05:00
  • c57607016a common/json-schema-to-grammar : align spacing rules with parsers (#24835) b9743 Aldehir Rojas 2026-06-20 17:43:04 -05:00
  • 4a80943174 fix(hexagon): use padded stride for ssm-conv weights (#24470) b9742 Guanhuai Zhang 2026-06-21 05:58:49 +08:00
  • 447b0c3646 poc: threadpool sampling xsn/tmp_smpl_parallel Xuan Son Nguyen 2026-06-20 22:08:42 +02:00
  • a527509d0f debug: force llama_synchronize for accurate timings Xuan Son Nguyen 2026-06-20 20:22:31 +02:00
  • 7486a39756 (debug) add timings Xuan Son Nguyen 2026-06-20 20:12:05 +02:00
  • 84de01a1f1 llama : use LLM_KV for quantization_version & file_type (#24802) b9741 Adrien Gallouët 2026-06-20 20:07:01 +02:00
  • ea65a4b1c8 small nits Xuan Son Nguyen 2026-06-20 19:54:31 +02:00
  • b28e3682e5 Merge branch 'master' into xsn/server_refactor_batch Xuan Son Nguyen 2026-06-20 19:48:36 +02:00
  • 53763db789 rm debug log Xuan Son Nguyen 2026-06-20 19:48:14 +02:00
  • 75f460ac28 arg: try fixing test-args-parser randomly fails (#24826) b9740 Xuan-Son Nguyen 2026-06-20 19:45:27 +02:00
  • bf36838ebd fix assert Xuan Son Nguyen 2026-06-20 19:32:47 +02:00
  • 64ec03d10b handle batch full more carefully Xuan Son Nguyen 2026-06-20 19:30:59 +02:00
  • d704c7929b add abort_all_slots Xuan Son Nguyen 2026-06-20 19:20:12 +02:00
  • af583e3ed3 wip 4 Xuan Son Nguyen 2026-06-20 19:18:05 +02:00
  • b786bb2e60 wip 3 Xuan Son Nguyen 2026-06-20 18:56:58 +02:00
  • 2b2eed8fd7 wip 2 Xuan Son Nguyen 2026-06-20 18:41:56 +02:00
  • 8452824611 release: add missing link for win opencl adreno arm64 (#24809) b9739 Muhammad Salem 2026-06-20 18:08:59 +03:00
  • 6c5c5a29d6 wip Xuan Son Nguyen 2026-06-20 16:48:12 +02:00
  • d5037c508a server: refactor batch construction Xuan Son Nguyen 2026-06-20 16:35:57 +02:00
  • e27f308597 server: avoid forwarding auth headers in CORS proxy (#24373) b9738 Matti4 2026-06-20 15:34:47 +02:00
  • 67e9fd3b74 docker : prebuild web UI for s390x build [no release] (#24829) b9737 Aldehir Rojas 2026-06-20 05:54:42 -05:00
  • 796f41bedc model : glm-dsa load DSA indexer tensors as optional (#24770) b9736 davidrhodus 2026-06-20 03:48:24 -07:00
  • 37a77fb057 ggml : optimize AMX (#24806) b9735 Adrien Gallouët 2026-06-20 12:43:06 +02:00
  • f4043fec01 convert : more consistent handling of rope_parameters (#24833) Sigbjørn Skjæret 2026-06-20 12:42:36 +02:00
  • f449e05537 ggml-webgpu: add adapter toggles for F16 on Vulkan + NVIDIA b9733 Masashi Yoshimura 2026-06-20 08:12:32 +09:00
  • 2b686a9120 server: refactor child --> router communication (#24821) b9732 Xuan-Son Nguyen 2026-06-20 01:02:26 +02:00
  • 4b48a53b6c server : optimize get_token_probabilities (#24796) b9731 Adrien Gallouët 2026-06-19 23:26:54 +02:00
  • e475fa2b5f mtmd, arg: fix utf8 handling on windows (#24779) b9730 Xuan-Son Nguyen 2026-06-19 22:28:38 +02:00
  • 175147e8f6 server: remove all internal mentions about "webui" (#24817) b9729 Xuan-Son Nguyen 2026-06-19 22:12:46 +02:00
  • fabde3bf51 arg: Add comment line support to --api-key-file (#23168) b9728 Mikolaj Kucharski 2026-06-19 15:33:54 +00:00
  • 0d2d9ccbf6 vendor : update cpp-httplib to 0.48.0 (#24787) b9727 Alessandro de Oliveira Faria (A.K.A.CABELO) 2026-06-19 11:16:35 -03:00
  • 8c2d6f6475 server: add --agent arg, remove redundant webui naming compat (#24801) b9726 Xuan-Son Nguyen 2026-06-19 16:06:13 +02:00
  • 38724ab593 docker : build the UI (#24794) b9725 Aldehir Rojas 2026-06-19 08:32:31 -05:00
  • e2e7a9b2d0 mtmd: several bug fixes (#24784) b9724 Xuan-Son Nguyen 2026-06-19 12:18:36 +02:00
  • b14e3fb90c spec: support eagle3 for qwen3.5 & 3.6 (#24593) b9723 Ruixiang Wang 2026-06-19 12:08:50 +02:00