Files
llama.cpp/src
Oliver Simons 02810c7aa8 Fix and restrict NVFP4 edge-cases in llama-graph (#24331)
* Move post-GEMM MUL required for dequant b4 lora and bias add

see https://github.com/ggml-org/llama.cpp/pull/23484 :
1. For lora, I would presume we want fully dequantized values before
   doing the residuals, but this depends on how the LORAs were
generated. Literature tells me LORA happens post-mul but pre-bias add https://github.com/ggml-org/llama.cpp/pull/8332
2. For ModelOPT, bias-add should happen on [fully-dequantized
   values](https://github.com/NVIDIA/Model-Optimizer/blob/b49f9b9e2d747af992d78a3aa7f10efe5a8847e1/modelopt/torch/quantization/backends/nvfp4_gemm.py#L59-L64)

* Restrict build_ffn for NVFP4 to supported combinations
2026-06-16 11:52:38 +02:00
..
2026-06-07 20:50:54 +08:00
2026-04-03 10:33:03 +02:00