mirror of
https://github.com/ggml-org/llama.cpp.git
synced 2026-06-30 09:37:42 +02:00
10e5b148b0
* llama-quant : correct `n_attention_wv` usage In #19770, I introduced a regression in the way the `quantize_state_impl` counter values were initialized. I was incrementing and using `n_attention_wv` in the same loop, when it should have been fixed by the time we're deciding tensor types in `llama_tensor_get_type_impl` (for `use_more_bits`). I never observed a difference in any of [my tests](https://github.com/ggml-org/llama.cpp/pull/19770#issuecomment-4000424712) - it was only after @bartowski kindly pointed this out that I realized it was incorrect. (Thanks!) * simplify