Georgi Gerganov
|
7acb4e8cd2
|
hparams : refactor hparams.n_layer (#24060)
* hparams : refactor hparams.n_layer
* cont : remove `n_layer_kv()`, use n_layer_all instead
* cont : type consistency
* pi : update SYSTEM.md
* models : fix Step3.5 MTP
* cont : remove duplicate switch cases
* cont : explicitly set `false` to extra layers for `is_swa` and `is_recr`
* cont : fix nextn layer count handling
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
|
2026-06-05 11:09:36 +03:00 |
|
Georgi Gerganov
|
06938ac129
|
tests : add support for qwen3 SSM archs (#24031)
* tests : add support for qwen3 SSM archs
* arch : add LLM_KV_ATTENTION_RECURRENT_LAYERS
* cont : naming + TODOs
|
2026-06-03 10:15:27 +03:00 |
|
ynankani
|
42928bc14d
|
model : NvFP4 quantized LM head support (#23046)
* NvFP4 quantized LM head support
Signed-off-by: ynankani <ynankani@nvidia.com>
* Address review commnets
Signed-off-by: ynankani <ynankani@nvidia.com>
* Add assert for NvFp4 lm head and tied embeddings
Signed-off-by: ynankani <ynankani@nvidia.com>
* Address review commnets
Signed-off-by: ynankani <ynankani@nvidia.com>
* Create output_s tensor only when LM head NvFp4
Signed-off-by: ynankani <ynankani@nvidia.com>
---------
Signed-off-by: ynankani <ynankani@nvidia.com>
|
2026-05-16 11:09:27 +02:00 |
|
AesSedai
|
8e52631d55
|
model: Add Mimo v2.5 model support (#22493)
* add mimo-v2.5 support
* mimo-v2.5: fix modify_tensors row split
* mimi-v2.5: forgot `add_attn_value_scale` plumbing
* mimi-v2.5: fix tp dequant to detect tp rows
* mimo-v2.5: fix TP iteration to be descending
* mimo-v2.5: fix comment
* mimo-v2.5: retain fused qkv
* mimo-v2.5: missed the attn_value scale during merge
* mimo-v2.5: fused QKV needs contiguous for scaling attention value
* mimo-v2.5: move `speech_embeddings.` to TextModel filter_tensors
* Update src/llama-hparams.h
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update src/models/mimo2.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update src/models/mimo2.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update src/models/mimo2.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* mimo-v2.5: include MTP weights in gguf
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
|
2026-05-07 13:21:58 +02:00 |
|
Xuan-Son Nguyen
|
994118a183
|
model: move load_hparams and load_tensors to per-model definition (#22004)
* git-friendly migration
* add build_graph
* nits
* exclude old code from build
* wip
* add llm_arch_model_i
* prepare downstream functions
* nits
* nits
* wip
* wip
* add back create_tensor_qkv
* fix files missing include
* enforce one llm_build per arch
* cmake: use glob
* missing model params
* nits
* wip
* wip (2)
* wip (3)
* test-llama-archs is happy
* improve switch case
* move more stuff into llm_arch_model_i
* fix downstream code
* nits
* nits (2)
* fix order
* llama_model_base
* LLAMA_LOAD_LOCALS
* small fix
* fix build errors
* auto
* rm migration script and ifdef
|
2026-05-04 12:36:59 +02:00 |
|