llama.cpp/src at b9670 - llama.cpp - Gitea: Git with a cup of tea

wylab/llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-06-30 01:27:42 +02:00

Files

T

History

Oliver Simons 02810c7aa8 Fix and restrict NVFP4 edge-cases in llama-graph (#24331 )

* Move post-GEMM MUL required for dequant b4 lora and bias add

see https://github.com/ggml-org/llama.cpp/pull/23484 :
1. For lora, I would presume we want fully dequantized values before
   doing the residuals, but this depends on how the LORAs were
generated. Literature tells me LORA happens post-mul but pre-bias add https://github.com/ggml-org/llama.cpp/pull/8332
2. For ModelOPT, bias-add should happen on [fully-dequantized
   values](https://github.com/NVIDIA/Model-Optimizer/blob/b49f9b9e2d747af992d78a3aa7f10efe5a8847e1/modelopt/torch/quantization/backends/nvfp4_gemm.py#L59-L64)

* Restrict build_ffn for NVFP4 to supported combinations

2026-06-16 11:52:38 +02:00

..

Add arch support for cohere2-MoE (#24260 )

2026-06-13 19:49:00 +02:00

CMakeLists.txt

model : support for DeepseekV32ForCausalLM with generic DeepSeek Sparse Attention (DSA) implementation (#23346 )

2026-05-29 10:15:17 +02:00

llama-adapter.cpp

hparams : refactor hparams.n_layer (#24060 )

2026-06-05 11:09:36 +03:00

llama-adapter.h

llama : re-enable manual LoRA adapter free (#19983 )

2026-03-18 12:03:26 +02:00

llama-arch.cpp

Add arch support for cohere2-MoE (#24260 )

2026-06-13 19:49:00 +02:00

llama-arch.h

Add arch support for cohere2-MoE (#24260 )

2026-06-13 19:49:00 +02:00

llama-batch.cpp

kv-cache : fix M-RoPE checkpoints (#20132 )

2026-03-06 08:46:51 +02:00

llama-batch.h

fix: correct misspellings in code comments (#21217 )

2026-03-31 13:50:51 +02:00

llama-chat.cpp

chat : add Granite 4.1 chat template (#23518 )

2026-05-28 13:13:33 +02:00

llama-chat.h

chat : add Granite 4.1 chat template (#23518 )

2026-05-28 13:13:33 +02:00

llama-context.cpp

spec: add EAGLE3 speculative decoding support (#18039 )

2026-06-12 10:21:06 +03:00

llama-context.h

spec: add EAGLE3 speculative decoding support (#18039 )

2026-06-12 10:21:06 +03:00

llama-cparams.cpp

cparams : rename LLAMA_MAX_PARALLEL_SEQUENCES to LLAMA_MAX_SEQ (#14188 )

2025-06-15 10:08:58 +03:00

llama-cparams.h

spec: add EAGLE3 speculative decoding support (#18039 )

2026-06-12 10:21:06 +03:00

llama-ext.h

fit : avoid including llama-ext.h in fit.h (#24506 )

2026-06-12 15:57:05 +03:00

llama-grammar.cpp

common/grammar: fix grammar parsing issues to prevent stack overflow and hangs (#18604 )

2026-03-21 18:43:35 +01:00

llama-grammar.h

common/grammar : replace problematic backtracking regex [\s\S]* (#18342 )

2026-01-03 16:02:43 -06:00

llama-graph.cpp

Fix and restrict NVFP4 edge-cases in llama-graph (#24331 )

2026-06-16 11:52:38 +02:00

llama-graph.h

Fix and restrict NVFP4 edge-cases in llama-graph (#24331 )

2026-06-16 11:52:38 +02:00

llama-hparams.cpp

llama : add Gemma4 MTP (#23398 )

2026-06-07 20:50:54 +08:00

llama-hparams.h

spec: add EAGLE3 speculative decoding support (#18039 )

2026-06-12 10:21:06 +03:00

llama-impl.cpp

llama : correct platform-independent loading of BOOL metadata (#21428 )

2026-04-06 01:40:38 +02:00

llama-impl.h

llama: use f16 mask for FA to save VRAM (#23764 )

2026-05-29 15:44:43 +08:00

llama-io.cpp

server : avoid checkpoint data host copies (#22558 )

2026-05-02 18:03:25 +03:00

llama-io.h

llama : add option to save memory in device buffers (#22679 )

2026-05-05 06:35:07 +03:00

llama-kv-cache-dsa.cpp

llama : add Gemma4 MTP (#23398 )

2026-06-07 20:50:54 +08:00

llama-kv-cache-dsa.h

model : support for DeepseekV32ForCausalLM with generic DeepSeek Sparse Attention (DSA) implementation (#23346 )

2026-05-29 10:15:17 +02:00

llama-kv-cache-iswa.cpp

llama : add Gemma4 MTP (#23398 )

2026-06-07 20:50:54 +08:00

llama-kv-cache-iswa.h

llama : add Gemma4 MTP (#23398 )

2026-06-07 20:50:54 +08:00

llama-kv-cache.cpp

kv-cache : avoid kv cells copies (#24277 )

2026-06-07 21:42:54 +03:00

llama-kv-cache.h

kv-cache : avoid kv cells copies (#24277 )

2026-06-07 21:42:54 +03:00

llama-kv-cells.h

kv-cache : avoid kv cells copies (#24277 )

2026-06-07 21:42:54 +03:00

llama-memory-hybrid-iswa.cpp

llama : add Gemma4 MTP (#23398 )

2026-06-07 20:50:54 +08:00

llama-memory-hybrid-iswa.h

llama + spec: MTP Support (#22673 )

2026-05-16 20:06:23 +08:00

llama-memory-hybrid.cpp

llama : add Gemma4 MTP (#23398 )

2026-06-07 20:50:54 +08:00

llama-memory-hybrid.h

llama + spec: MTP Support (#22673 )

2026-05-16 20:06:23 +08:00

llama-memory-recurrent.cpp

hparams : refactor hparams.n_layer (#24060 )

2026-06-05 11:09:36 +03:00

llama-memory-recurrent.h

llama : MTP clean-up (#23269 )

2026-05-19 15:32:58 +03:00

llama-memory.cpp

memory : correctly handle failure in apply() (#14438 )

2025-06-30 18:03:03 +03:00

llama-memory.h

llama : add Gemma4 MTP (#23398 )

2026-06-07 20:50:54 +08:00

llama-mmap.cpp

Update llama-mmap to use ftello/fseeko (#22497 )

2026-04-30 14:17:52 -07:00

llama-mmap.h

llama: fix llama-model-saver (#20503 )

2026-03-25 12:53:16 +02:00

llama-model-loader.cpp

spec: add EAGLE3 speculative decoding support (#18039 )

2026-06-12 10:21:06 +03:00

llama-model-loader.h

llama + spec: MTP Support (#22673 )

2026-05-16 20:06:23 +08:00

llama-model-saver.cpp

Add arch support for cohere2-MoE (#24260 )

2026-06-13 19:49:00 +02:00

llama-model-saver.h

llama: fix llama-model-saver (#20503 )

2026-03-25 12:53:16 +02:00

llama-model.cpp

Add arch support for cohere2-MoE (#24260 )

2026-06-13 19:49:00 +02:00

llama-model.h

spec: add EAGLE3 speculative decoding support (#18039 )

2026-06-12 10:21:06 +03:00

llama-quant.cpp

hparams : refactor hparams.n_layer (#24060 )

2026-06-05 11:09:36 +03:00

llama-quant.h

llama : refactor src/llama.cpp (#10902 )

2025-01-03 10:18:53 +02:00

llama-sampler.cpp

llama : rename llama-sampling to llama-sampler (#19363 )

2026-02-06 07:26:54 +01:00

llama-sampler.h

llama : rename llama-sampling to llama-sampler (#19363 )

2026-02-06 07:26:54 +01:00

llama-vocab.cpp

Add cohere2moe to llama-vocab for TINY_AYA (#24601 )

2026-06-14 09:04:46 +02:00

llama-vocab.h

vocab : refactor normalizer flags into options struct, add strip_accents (#24371 )

2026-06-11 10:36:50 +03:00

llama.cpp

llama: only use one iGPU device by default (#23897 )

2026-05-31 08:17:47 +02:00

unicode-data.cpp

server : better security control for public deployments (#9776 )

2024-10-08 13:27:04 +02:00

unicode-data.h

llama : reduce compile time and binary size (#9712 )

2024-10-02 15:49:55 +02:00

unicode.cpp

unicode,test: add Qwen3.5 non-backtracking tokenizer handler and regr… (#22110 )

2026-05-14 11:03:40 +02:00

unicode.h

vocab: fix Gemma4 tokenizer (#21343 )

2026-04-03 10:33:03 +02:00