llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-06-30 09:37:42 +02:00

Files

T

History

Masashi Yoshimura 7c908502ea ggml-webgpu: improve MTP inference by using mat-vec path for small batches (#24811 )

* ggml-webgpu: improve small batches decoding

* Add barrier to the NUM_COLS loop in mul-mat-vec

2026-06-23 17:13:55 +09:00

2026-06-23 17:13:55 +09:00

CMakeLists.txt

2026-06-04 08:05:04 +03:00

ggml-webgpu-shader-lib.hpp

2026-06-23 17:13:55 +09:00

ggml-webgpu.cpp

2026-06-23 17:13:55 +09:00

pre_wgsl.hpp

2026-06-04 08:05:04 +03:00