llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-06-09 07:16:44 +02:00

Author	SHA1	Message	Date
David Friehs	8a963fc10e	convert : fix conversion for Mistral-Medium-3.5-128B (#24268 ) Mistral explicitly sets `moe` and `llama_4_scaling` to `null` in params.json, breaking `key in dict` checks during conversion. Replace with `dict.get(key) is not None` where this matters. Fixes `convert-hf-to-gguf.py --mistral-format Mistral-Medium-3.5-128B`	2026-06-07 21:41:39 +02:00
Piotr Wilkin (ilintar)	2187e00337	StepFun 3.5 MTP (#23274 ) * StepFun 3.5 MTP * Simplify to single layer * Rollback core changes * fix flake8 errors * Remove scripts * modify to convention * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * dos2unix --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-06-02 17:44:35 +02:00
ynankani	c5229087a5	convert : add FP8 to Q8 conversion (#23250 ) Signed-off-by: ynankani <ynankani@nvidia.com>	2026-05-28 10:16:17 +03:00
Sigbjørn Skjæret	b7393a4d19	convert : update mtp related help (#23334 ) * update mtp related help * remove outdated experimental text	2026-05-19 21:16:58 +02:00
Aman Gupta	255582687b	llama + spec: MTP Support (#22673 ) * spec: support MTP * fix batch size * rename files * cont : simplify (#7) * MTP: clean-up (#9) * MTP: clean-up * review: use llama_context_type instead of llama_graph_type * review: remove llama_model_has_mtp * review: fix convert issues * convert: fix pycheck * review: formatting * use `mtp-` for identifying mtp models * convert: fix mtp conversion * mtp -> draft-mtp * remove unused llama_arch * add need_embd in speculative * llama: allow partial seq_rm for GDN models for speculative decoding Currently speculative checkpoint needs to restart from a checkpoint after some draft tokens are not accepted, this leads to some wastage in running the target again. This PR adds the ability to rollback upto `draft_max` by storing the GDN intermediates. * fix pending state * vulkan: add GDN partial rollback * meta: extend check to axis 1 * metal: add GDN partial rollback Extend the gated delta net kernel to store intermediate states for partial rollback support on the Metal backend. - Add K (snapshot slot count) as a function constant - Read input state from slot 0 of the 3D state tensor - Write intermediate states to different slots during token loop - For K=1, maintain backward-compatible single-slot behavior Ref: https://github.com/ggml-org/llama.cpp/commit/8c05923630110223669f069af2000e9cf10c02bc Assisted-by: llama.cpp:local pi * delta_net_base: use ggml_pad instead of new_tensor * review: add need_rs_seq * review: rename part_bounded to n_rs * review: deslop comments * review: rename, add asserts * server : adjust checkpoint logic (#11) * server : adjust checkpoint logic * cont : rm asserts * server-context: fix early exit * spec : fix compatibility with n-gram and add TODOs (#13) * metal : cleanup * llama : fix faulty bitwise check in recurrent memory * server : disable RS-based MTP in combination with other spec types * spec : add TODOs * cont : fix comment * cont : update comment * common : fix logic for ngram + mtp compat * llama-memory: enable checkpointing with partial rollback * cont: add test-case for loading into a dirty ctx * llama-memory-recurrent: clear rs_idx in clear * download: fix mtp path * llama-arch: fix enorm op * docs: update docs * conversion: fix type annotations --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2026-05-16 20:06:23 +08:00
Piotr Wilkin (ilintar)	cc7200bf12	Refactor: convert_hf_to_gguf.py (#17114 ) * move conversion code to a dedicated conversion directory and split the files akin to the src/models architecture --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-05-15 15:18:12 +02:00
fredzillman	cce09f0b2b	convert : fix Pixtral 12B --mistral-format conversion (3 bugs) (#22981 )	2026-05-12 21:46:01 +02:00
AesSedai	4178259130	mtmd: add MiMo v2.5 vision (#22883 ) * mimo-v2.5: vision support * mimo-v2.5: use fused qkv for vision * mimi-v2.5: fix f16 vision overflow * mimo-v2.5: comment cleanups * mimo-v2.5: Flash doesn't have mmproj more cleanup remember to use filter_tensors * mimo-v2.5: fix trailing whitespace	2026-05-12 11:11:14 +02:00
Daniel Bevenius	f5636f8fc7	convert : add image break token fallback (#22914 ) * convert : add image break token fallback This commit adds a image_break_token_id fallback for mistral where the config contains a image_break_token_id of -1: ```console "vision_encoder": { "image_token_id": 10, "image_break_token_id": -1, ... ``` But the tokenizer.json has this token: ```console 115 "id": 12, 116 "content": "[IMG_BREAK]", 117 "single_word": false, 118 "lstrip": false, 119 "rstrip": false, 120 "normalized": false, 121 "special": true 122 }, ``` If we look in convert_hf_to_gguf.py we have: ```python elif self.is_mistral_format: # hparams is already vision config here so norm_eps is only defined in global_config. self.hparams["norm_eps"] = self.global_config.get("norm_eps", None) assert self.hparams["norm_eps"] is not None, "norm_eps not found in params.json" if self.use_break_tok: self.img_break_tok_id = self.find_vparam(["image_break_token_id"]) ``` The motivation for this is that currently converting this models results in the following error: ```console load_hparams: model size: 5131.60 MiB load_hparams: metadata size: 0.15 MiB clip_init: failed to load model 'models/mmproj-Mistral-Medium-3.5-128B.gguf': operator(): unable to find tensor v.token_embd.img_break mtmd_init_from_file: error: Failed to load CLIP model from models/mmproj-Mistral-Medium-3.5-128B.gguf Failed to load vision model from models/mmproj-Mistral-Medium-3.5-128B.gguf ``` With this fallback the model loads successfully. Resolves: https://github.com/ggml-org/llama.cpp/issues/22901 * Revert "convert : add image break token fallback" This reverts commit `292e40cfdf`. * convert : add image break token fallback This commit adds a image_break_token_id fallback for mistral where the config contains a image_break_token_id of -1: ```console "vision_encoder": { "image_token_id": 10, "image_break_token_id": -1, ... ``` But the tokenizer.json has this token: ```console 115 "id": 12, 116 "content": "[IMG_BREAK]", 117 "single_word": false, 118 "lstrip": false, 119 "rstrip": false, 120 "normalized": false, 121 "special": true 122 }, ``` If we look in convert_hf_to_gguf.py we have: ```python elif self.is_mistral_format: # hparams is already vision config here so norm_eps is only defined in global_config. self.hparams["norm_eps"] = self.global_config.get("norm_eps", None) assert self.hparams["norm_eps"] is not None, "norm_eps not found in params.json" if self.use_break_tok: self.img_break_tok_id = self.find_vparam(["image_break_token_id"]) ``` The motivation for this is that currently converting this models results in the following error: ```console load_hparams: model size: 5131.60 MiB load_hparams: metadata size: 0.15 MiB clip_init: failed to load model 'models/mmproj-Mistral-Medium-3.5-128B.gguf': operator(): unable to find tensor v.token_embd.img_break mtmd_init_from_file: error: Failed to load CLIP model from models/mmproj-Mistral-Medium-3.5-128B.gguf Failed to load vision model from models/mmproj-Mistral-Medium-3.5-128B.gguf ``` With this fallback the model loads successfully. Co-authored-by: Pascal <admin@serveurperso.com> Resolves: https://github.com/ggml-org/llama.cpp/issues/22901 * convert : allow zero value for img_break_tok_id	2026-05-11 12:07:17 +02:00
Sumit Chatterjee	1e5ad35d56	model : add sarvam_moe architecture support (#20275 )	2026-05-09 16:31:50 +02:00
ynankani	9f5f0e689c	model : support Gemma4_26B_A4B_NVFP4 (#22804 ) * Gemma4_26B_A4B_NvFp4 hf checkpoint convert to gguf format fixes Signed-off-by: ynankani <ynankani@nvidia.com> * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Address review comments Signed-off-by: ynankani <ynankani@nvidia.com> * fix CRLF Signed-off-by: ynankani <ynankani@nvidia.com> * Lint error fix Signed-off-by: ynankani <ynankani@nvidia.com> --------- Signed-off-by: ynankani <ynankani@nvidia.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-05-08 20:42:09 +02:00
Michał Piszczek	1d72d87349	convert : fix RuntimeError when stripping FP8 KV-cache scales (#22818 ) * convert : fix RuntimeError when stripping FP8 KV-cache scales In ModelBase._generate_nvfp4_tensors the final cleanup loop iterates self.model_tensors.keys() and calls del on the same dict, which raises RuntimeError: dictionary changed size during iteration when a ModelOpt NVFP4 model also has FP8 KV-cache scales (e.g. mmangkad/Qwen3.6-35B-A3B-NVFP4 and any modelopt config with kv_cache_quant_algo: FP8). Wrap the keys view in list() so the deletions happen on a snapshot. * re-add another accidentally removed list --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-05-08 06:55:48 +03:00
samuraieng	44dbe8c521	model: Support sarashina2.2-vision-3b model (#22103 )	2026-05-07 23:10:29 +02:00
AesSedai	8e52631d55	model: Add Mimo v2.5 model support (#22493 ) * add mimo-v2.5 support * mimo-v2.5: fix modify_tensors row split * mimi-v2.5: forgot `add_attn_value_scale` plumbing * mimi-v2.5: fix tp dequant to detect tp rows * mimo-v2.5: fix TP iteration to be descending * mimo-v2.5: fix comment * mimo-v2.5: retain fused qkv * mimo-v2.5: missed the attn_value scale during merge * mimo-v2.5: fused QKV needs contiguous for scaling attention value * mimo-v2.5: move `speech_embeddings.` to TextModel filter_tensors * Update src/llama-hparams.h Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update src/models/mimo2.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update src/models/mimo2.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update src/models/mimo2.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * mimo-v2.5: include MTP weights in gguf --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-05-07 13:21:58 +02:00
tc-mb	2496f9c149	mtmd : support MiniCPM-V 4.6 (#22529 ) * Support MiniCPM-V 4.6 in new branch Signed-off-by: tc-mb <tianchi_cai@icloud.com> * fix code bug Signed-off-by: tc-mb <tianchi_cai@icloud.com> * fix pre-commit Signed-off-by: tc-mb <tianchi_cai@icloud.com> * fix convert Signed-off-by: tc-mb <tianchi_cai@icloud.com> * rename clip_graph_minicpmv4_6 Signed-off-by: tc-mb <tianchi_cai@icloud.com> * use new TYPE_MINICPMV4_6 Signed-off-by: tc-mb <tianchi_cai@icloud.com> * use build_attn to allow flash attention support Signed-off-by: tc-mb <tianchi_cai@icloud.com> * no use legacy code, restored here. Signed-off-by: tc-mb <tianchi_cai@icloud.com> * use the existing tensors name Signed-off-by: tc-mb <tianchi_cai@icloud.com> * unused ctx->model.hparams.minicpmv_version Signed-off-by: tc-mb <tianchi_cai@icloud.com> * use n_merge for slice alignment Signed-off-by: tc-mb <tianchi_cai@icloud.com> * borrow wa_layer_indexes for vit_merger insertion point Signed-off-by: tc-mb <tianchi_cai@icloud.com> * fix code style Signed-off-by: tc-mb <tianchi_cai@icloud.com> * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * use filter_tensors and add model.vision_tower Signed-off-by: tc-mb <tianchi_cai@icloud.com> * fix chkhsh Signed-off-by: tc-mb <tianchi_cai@icloud.com> * fix type check Signed-off-by: tc-mb <tianchi_cai@icloud.com> --------- Signed-off-by: tc-mb <tianchi_cai@icloud.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-05-06 21:54:09 +02:00
Yakine Tahtah	a00e47e422	mtmd: add granite-speech support (ibm-granite/granite-4.0-1b-speech) (#22101 ) * mtmd: add granite-speech support (ibm-granite/granite-4.0-1b-speech) Conformer encoder with Shaw relative position encoding, QFormer projector, log-mel spectrogram with frame stacking. Encoder uses GLU gating, folded batch norm, and SSM depthwise conv. QFormer compresses encoder output via windowed cross-attention (window=15, queries=3) into the LLM embedding space. Audio preprocessing: reflect-padded STFT, 80-bin mel filterbank, dynamic range compression, 2x frame stacking (80->160 mel). GGUF converter handles batch norm folding at export time, fused K/V split, and Conv1d weight reshaping. Tested against HF transformers reference: token-for-token match on 30s/60s audio clips with greedy decoding. * mtmd: rename gs_ prefixed tensors to generic/architecture names * mtmd: use tensor_mapping.py for all granite_speech tensors * convert: fold GraniteSpeechTextModel into GraniteModel * mtmd: replace n_layer hack with explicit has_standard_layers flag * mtmd: replace hardcoded magic numbers with GGUF hparams for granite speech * mtmd: align KEY_A_ define spacing * convert: register GraniteModel for GraniteSpeechForConditionalGeneration * convert: fix ty type-check for GraniteSpeechMmprojModel registration * mtmd: align TN_ define spacing * mtmd: use generic layer loop for granite speech tensor loading * mtmd: merge qformer_proj_layer into clip_layer * mtmd: granite_speech remove redundant ggml_build_forward_expand on inputs * mtmd: granite_speech add comment explaining why build_attn is not used * mtmd: granite_speech hard-code eps in cpp, remove from GGUF metadata * gguf: add spacing between granite_speech tensor mapping blocks * mtmd: make generic audio layer_norm_eps read optional * mtmd: granite_speech keep encoder eps in GGUF, only hard-code projector eps * mtmd: align defines and struct fields in clip-impl.h and clip-model.h * mtmd: fix alignment and ordering issues across granite speech files * convert: granite_speech use filter_tensors instead of modify_tensors for skipping	2026-05-06 14:40:59 +02:00
Daniel Bevenius	a736e6c0ac	convert : ignore non-language tensors for Gemma4Model (#22753 ) * convert : ignore non-language tensors for Gemma4Model This commit adds a check to make sure only text language tensors are handled in filter_tensors. The motivation is that currently when trying to convert a Gemma4 model the following error occurs: ```console (venv) $ ./convert-gemma.sh INFO:hf-to-gguf:Loading model: gemma-4-E2B-it INFO:hf-to-gguf:Model architecture: Gemma4ForConditionalGeneration INFO:hf-to-gguf:gguf: indexing model part 'model.safetensors' INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only INFO:hf-to-gguf:Exporting model... INFO:hf-to-gguf:rope_freqs.weight, torch.float32 --> F32, shape = {256} Traceback (most recent call last): File "/home/danbev/work/llama.cpp/./convert_hf_to_gguf.py", line 13752, in <module> main() File "/home/danbev/work/llama.cpp/./convert_hf_to_gguf.py", line 13746, in main model_instance.write() File "/home/danbev/work/llama.cpp/./convert_hf_to_gguf.py", line 945, in write self.prepare_tensors() File "/home/danbev/work/llama.cpp/./convert_hf_to_gguf.py", line 805, in prepare_tensors for new_name, data_torch in (self.modify_tensors(data_torch, name, bid)): File "/home/danbev/work/llama.cpp/./convert_hf_to_gguf.py", line 7925, in modify_tensors yield from super().modify_tensors(data_torch, name, bid) File "/home/danbev/work/llama.cpp/./convert_hf_to_gguf.py", line 7290, in modify_tensors yield from super().modify_tensors(data_torch, name, bid) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/danbev/work/llama.cpp/./convert_hf_to_gguf.py", line 579, in modify_tensors new_name = self.map_tensor_name(name) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/danbev/work/llama.cpp/./convert_hf_to_gguf.py", line 572, in map_tensor_name raise ValueError(f"Can not map tensor {name!r}") ValueError: Can not map tensor 'model.embed_vision.embedding_projection.weight' ``` * add forgotten embed_vision and embed_audio * improve --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-05-06 13:50:44 +02:00
Sigbjørn Skjæret	74d6248f71	convert : add filter_tensors method to pre-filter tensors (#22597 ) * add filter_tensors classmethod * remove language_model * fix parts validation	2026-05-06 08:06:05 +02:00
Julien Denize	048a490f76	convert : Mistral format yarn apply_scale support (#22612 ) * [BUGFIX] Mistral format apply_scale support. * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * fix misunderstood boolean parameters --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-05-03 21:51:21 +02:00
JM Robles	db44417b02	convert : apply Q/K RoPE permutation in NVFP4 repack path (#22611 ) Llama-architecture q_proj/k_proj weights need an axis-0 row permutation to match GGML's RoPE convention. The BF16 path applies this in LlamaModel.modify_tensors via LlamaModel.permute, but the NVFP4 path bypasses modify_tensors and writes weights directly through ModelBase._repack_nvfp4. Without the permutation, attention heads end up scrambled at inference and the model produces gibberish. This change overrides _repack_nvfp4 on LlamaModel and applies the same permutation to both the nibble-packed weight and the per-block scale before delegating to ModelBase._repack_nvfp4 via super(). Reuses the existing LlamaModel.permute static helper and respects the existing undo_permute flag, so subclasses (Mistral, Granite, Llama4, etc.) inherit the fix automatically. Verified on TinyLlama-1.1B reproducer: perplexity drops from 4419 (gibberish) to 43.9, matching the BF16-dequantized baseline (44.0). Also verified end-to-end on ALIA-40b-instruct-2601 (BSC, Llama architecture) with multilingual generation in Spanish/Catalan/Basque/ Galician all coherent with the fix applied. Co-authored-by: Chema <chema@montevive.ai>	2026-05-03 18:22:00 +03:00
Csaba Kecskemeti	63d93d1733	convert : disable uint types (#18908 )	2026-05-02 09:05:59 +03:00
Sigbjørn Skjæret	6118c043b1	ci : bump ty to 0.0.33 (#22535 ) * bump ty to 0.0.33 * update typings	2026-04-30 16:15:54 +03:00
Daniel Bevenius	5d56effdee	convert : add support for Nemotron Nano 3 Omni (#22481 ) This commit adds support for NVIDIA Nemotron Nano 3 Omni model enabling this model to be converted to GGUF.	2026-04-28 19:17:57 +02:00
Sigbjørn Skjæret	d13540becd	convert : remove input_scale for dequantized fp8 modelopt (#22356 )	2026-04-27 08:45:01 +02:00
ynankani	5eaee65384	convert : Handle ModelOpt produced mixed precision model during convert to GGUF (#22247 ) * Handle ModelOpt produced mixed precision model during convert to GGUF * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-04-23 08:19:51 +03:00
manayang	7bfe60fdf9	mtmd, llama : Update HunyuanVL vision-language model support (#22037 ) * mtmd, llama : add HunyuanVL vision-language model support - add LLM_ARCH_HUNYUAN_VL with M-RoPE (XD-RoPE) support - add PROJECTOR_TYPE_HUNYUANVL with PatchMerger vision encoder - add HunyuanVL-specific M-RoPE position encoding for image tokens - add GGUF conversion for HunyuanVL vision and text models - add smoke test in tools/mtmd/tests.sh * fix: fix HunyuanVL XD-RoPE h/w section order * fix: Remove redundant code * convert : fix HunyuanOCR / HunyuanVL conversion - Tested locally: both HunyuanOCR and HunyuanVL-4B convert to GGUF - successfully and produce correct inference output on Metal (F16 / Q8_0). * clip : fix -Werror=misleading-indentation in bilinear resize * fix CI: convert_hf_to_gguf type check error - convert_hf_to_gguf.py: give HunyuanVLTextModel.__init__ an explicit `dir_model: Path` parameter so ty can infer the type for load_hparams instead of reporting `Unknown \| None`. --------- Co-authored-by: wendadawen <wendadawen@tencent.com>	2026-04-22 11:58:43 +02:00
Dowon	8685e7b075	convert : support sentence-transformer 5.4 config files (#22087 ) * convert : support sentence-transformer 5.4 config files * fix: embeddinggemma * fix: mapping Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * fix: pooling_mode Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-04-19 10:25:39 +03:00
Sigbjørn Skjæret	89a5474f0e	convert : fix (ignore for now) typings errors (#22002 )	2026-04-18 09:36:41 +02:00
Anav Prasad	03b3d07798	Convert: Fix NemotronH Config Parsing (#21664 ) * fix NemotronH vocab loading by using trust_remote_code for unsupported config patterns * fix NemotronH tokenizer loading by overriding set_vocab with trust_remote_code	2026-04-16 13:11:45 +03:00
Xuan-Son Nguyen	21a4933042	mtmd: qwen3 audio support (qwen3-omni and qwen3-asr) (#19441 ) * add qwen3a * wip * vision ok * no more deepstack for audio * convert ASR model ok * qwen3 asr working * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * nits * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * fix bad merge * fix multi inheritance --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-04-12 23:57:25 +02:00
Sigbjørn Skjæret	1e9d771e2c	convert : force f16 or f32 on step3-vl conv weights (#21646 )	2026-04-12 19:22:29 +02:00
Sirui He	073bb2c20b	mtmd : add MERaLiON-2 multimodal audio support (#21756 ) * mtmd : add MERaLiON-2 multimodal audio support Adds support for ASTAR's MERaLiON-2 audio-language model (3B and 10B) to the multimodal framework. Architecture: - Whisper large-v2 encoder for audio feature extraction - Gated MLP adaptor: ln_speech -> frame stack (x15) -> Linear+SiLU -> GLU -> out_proj - Gemma2 3B / 27B decoder The mmproj GGUF is generated via convert_hf_to_gguf.py --mmproj on the full MERaLiON-2 model directory (architecture: MERaLiON2ForConditionalGeneration). The decoder is converted separately as a standard Gemma2 model after stripping the text_decoder. weight prefix. New projector type: PROJECTOR_TYPE_MERALION Supports tasks: speech transcription (EN/ZH/MS/TA), translation, spoken QA. Model: https://huggingface.co/MERaLiON/MERaLiON-2-3B https://huggingface.co/MERaLiON/MERaLiON-2-10B simplify comments in meralion adaptor * meralion: use format_tensor_name, ascii arrows in comments	2026-04-11 14:15:48 +02:00
Daniel Bevenius	c8ac02fa1b	requirements : update transformers to 5.5.1 (#21617 ) * requirements : update transformers to 5.5.0 This commit updates the transformers dependency to version 5.5.0. The motivation for this is that transformers 5.5.0 includes support for Gemma4 and is required to be able to convert Gemma4 models. This is also causing issues for user of gguf-my-repo. Refs: https://huggingface.co/spaces/ggml-org/gguf-my-repo/discussions/202 * fix huggingface_hub version * set version of transformers to 5.5.0 * convert : add ty ignore directives to convert_hf_to_gguf.py This commit adds `ty: ignore` directives to transformers tokenizers field/methods to avoid type check errors. There might be better ways to handle this and perhaps this can be done in a follow up commit. The motivation for this is that it looks like in transformers 5.5.0 AutoTokenizer.from_pretrained can return generic tokenizer types or None and the type checker now produces an error when the conversion script accesses field like tokenizer.vocab. * convert : add ty ignore to suppress type check errors * convert : remove incorrect type ignores * convert : fix remaining python checks I was running a newer version of ty locally but I've switched to version 0.0.26 which is what CI uses and I was then able to reproduce the errors. Sorry about the noise. * update transformers version to 5.5.1	2026-04-09 12:36:29 +02:00
Xuan-Son Nguyen	501aeed18f	mtmd: support dots.ocr (#17575 ) * convert gguf * clip impl * fix conversion * wip * corrections * update docs * add gguf to test script	2026-04-09 12:16:38 +02:00
forforever73	09343c0198	model : support step3-vl-10b (#21287 ) * feat: support step3-vl-10b * use fused QKV && mapping tensor in tensor_mapping.py * guard hardcoded params and drop crop metadata * get understand_projector_stride from global config * img_u8_resize_bilinear_to_f32 move in step3vl class * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * fix the \r\n mess * add width and heads to MmprojModel.set_gguf_parameters --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-04-08 09:51:31 +02:00
Sigbjørn Skjæret	941146b3f1	convert : fix block_ff_dim retrieval for lfm2 (#21508 )	2026-04-06 14:05:18 +02:00
Georgi Gerganov	400ac8e194	convert : set "add bos" == True for Gemma 4 (#21500 ) * convert : set "add bos" == True for Gemma 4 * cont : handle old GGUFs	2026-04-06 13:52:07 +03:00
Richard Davison	af76639f72	model : add HunyuanOCR support (#21395 ) * HunyuanOCR: add support for text and vision models - Add HunyuanOCR vision projector (perceiver-based) with Conv2d merge - Add separate HUNYUAN_OCR chat template (content-before-role format) - Handle HunyuanOCR's invalid pad_token_id=-1 in converter - Fix EOS/EOT token IDs from generation_config.json - Support xdrope RoPE scaling type - Add tensor mappings for perceiver projector (mm.before_rms, mm.after_rms, etc.) - Register HunYuanVLForConditionalGeneration for both text and mmproj conversion * fix proper mapping * Update gguf-py/gguf/tensor_mapping.py Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> * Update tools/mtmd/clip.cpp Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> * address comments * update * Fix typecheck * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-04-05 23:32:14 +02:00
Piotr Wilkin (ilintar)	b069b10ab4	vocab: fix Gemma4 tokenizer (#21343 ) * seems to work * fix case with new line Co-authored-by: sayap <sokann@gmail.com> * gemma 4: fix pre tok regex --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co> Co-authored-by: sayap <sokann@gmail.com>	2026-04-03 10:33:03 +02:00
Xuan-Son Nguyen	63f8fe0ef4	model, mtmd: fix gguf conversion for audio/vision mmproj (#21309 ) * fix gguf conversion for audio/vision mmproj * fix test	2026-04-02 17:10:32 +02:00
Sigbjørn Skjæret	e2eb39e81c	ci : bump ty to 0.0.26 (#21156 ) * fix incorrect type ignore comments * bump ty to 0.0.26	2026-03-30 09:29:15 +02:00
Saba Fallah	1743d98057	mtmd: fix "v.patch_embd" quant and unsupported im2col ops on Metal for deepseek-ocr (#21027 ) * mtmd: fix "v.patch_embd" quant and unsupported im2col ops on Metal for deepseek-ocr * Update src/llama-quant.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-03-27 00:07:55 +01:00
Michael Wand	f8d4abae86	convert : support Qwen3.5/Qwen3.5 Moe NVFP4 and add input scales (#20505 ) * convert : fix Qwen3.5 NVFP4 conversion * Updated copilot concerns and rebased * move into _LinearAttentionVReorderBase and simplify * --flake * new_name not needed * Added input_scale to gguf * Fixed input_scale addition as tensor * Added input scale to loader and named _in_s * Update convert_hf_to_gguf.py Re-removed input_scale from aux cleanup Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-03-26 16:52:06 +01:00
Pavel Zloi	3d5acab3e7	convert : add RuGPT3XL (RuGPT3XLForCausalLM) support (#21011 ) * Support of ruGPT3XL model added * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * chkhsh for ruGPT3XL model added * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Fixing chkhsh for ruGPT3XL, rerun updated and _qkv_parts in RuGPT3XLModel --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-03-26 16:49:09 +01:00
Saba Fallah	a970515bdb	mtmd: Add DeepSeekOCR Support (#17400 ) * mtmd: llama.cpp DeepSeekOCR support init commit * loading sam tensors * mtmd: fix vision model processing * deepseek-ocr clip-vit model impl * mtmd: add DeepSeek-OCR LM support with standard attention * mtmd: successfully runs DeepSeek-OCR LM in llama-cli * mtmd: Fix RoPE type for DeepSeek-OCR LM. * loading LM testing Vision model loading * sam warmup working * sam erroneous return corrected * clip-vit: corrected cls_embd concat * clip-vit: model convert qkv_proj split * corrected combining of image encoders' results * fix: update callback for ffn_moe_weighted and add callback for attn_out in deepseek2 model * concat image_newline and image_seperator tokens * visual_model warmup (technically) works * window partitioning using standard ggml ops * sam implementation without using CPU only ops * clip: fixed warnings * Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into sf/deepseek-ocr * mtmd: fix get_rel_pos * mtmd: fixed the wrong scaler for get_rel_pos * image encoding technically works but the output can't be checked singe image decoding fails * mtmd: minor changed * mtmd: add native resolution support * - image encoding debugged - issues fixed mainly related wrong config like n_patches etc. - configs need to be corrected in the converter * mtmd: correct token order * - dynamic resizing - changes are concerning PR https://github.com/sfallah/llama.cpp/pull/4 * mtmd: quick fix token order * mtmd: fix danling pointer * mtmd: SAM numerically works * mtmd: debug CLIP-L (vit_pre_ln) * mtmd: debug CLIP-L & first working DeepSeek-OCR model * mtmd : add --dsocr-mode CLI argument for DeepSeek-OCR resolution control & all native resolution modes work * mtmd: simplify SAM patch embedding * mtmd: adapt Pillow image resizing function * mtmd: simplify DeepSeek-OCR dynamic resolution preprocessing * mtmd: remove --dsocr-mode argument * mtmd: refactor code & remove unused helper functions * mtmd: fix tensor names for image newlines and view separator * clean up * reverting automatically removed spaces * reverting automatically removed spaces * mtmd: fixed bad ocr check in Deepseek2 (LM) * mtmd: support combined QKV projection in buid_vit * using common build_attn in sam * corrected code-branch when flash-attn disabled enabling usage of --flash-attn option * mtmd: minor fix * minor formatting and style * fixed flake8 lint issues * minor editorconfig-check fixes * minor editorconfig-check fixes * mtmd: simplify get_rel_pos * mtmd: make sam hparams configurable * mtmd: add detailed comments for resize_bicubic_pillow * mtmd: fixed wrong input setting * mtmd: convert model in FP16 * mtmd: minor fix * mtmd: remove tweak to llama-mtmd-cli & deepseek-ocr template * fix: test-1.jpg ORC issue with small (640) resolution setting min-resolution base (1024) max large (1280) for dynamic-resolution * minor: editconfig-check fix * merge with changes from https://github.com/ggml-org/llama.cpp/pull/17909 added new opt to tests.sh to disable flash-attn * minor: editconfig-check fix * testing deepseek-ocr quick and dirty test script comparing results of Qwen2.5-VL vs DeepSeek-OCR * quick and (potential) dirty merge with https://github.com/ggml-org/llama.cpp/pull/17909 * refactoring, one single builder function and static helpers * added deepseek-ocr test to tests.sh * minor formatting fixes * check with fixed expected resutls * minor formatting * editorconfig-check fix * merge with changes from https://github.com/ggml-org/llama.cpp/pull/18042 * minor - added GLM-4.6V to big tests - added missing deps for python test * convert: minor fix * mtmd: format code * convert: quick fix * convert: quick fix * minor python formatting * fixed merge build issue * merge resolved - fixed issues in convert - tested several deepseek models * minor fix * minor * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * - removed clip_is_deepseekocr - removed redundant RESIZE_ALGO_BICUBIC_PILLOW resize-algo - simplified image-preprocessing - removed/simplified debug functions * - cleaning commented out code * fixing instabilities issues reintroducing resize_bicubic_pillow * - use f16 model for deepseek-ocr test - ignore llama-arch test for deepseek-ocr * rename fc_w --> mm_fc_w * add links to OCR discussion * cleaner loading code * add missing .weight to some tensors * add default jinja template (to be used by server) * move test model to ggml-org * rolling back upscale change * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: bluebread <hotbread70127@gmail.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> Co-authored-by: Xuan Son Nguyen <son@huggingface.co> Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>	2026-03-25 19:57:40 +01:00
Saba Fallah	80322ebdaf	model: codefuse-ai/F2LLM-v2 support	2026-03-25 18:33:42 +01:00
Dowon	3a60d06ad9	convert : register Qwen3Model architecture (#20967 )	2026-03-25 10:37:59 +02:00
bssrdf	ec2b787ebe	mtmd: Add dynamic high-resolution image preprocessing for InternVL model (#20847 ) * added support for internvl's dynamic high-resolution (Qianfan-OCR needed) * add min/max dynamic patch to gguf meta * clean up * simplified handling min/max dynamic patch * reuse llava_uhd logic for slice images * provide default values for older models * flake8 * prevent writing 0 value to gguf * remove duplicated resolution candidates with a better algorithm * fix indentation * format * add protection from divide by zero * change to 0 to be safe --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2026-03-23 01:06:30 +01:00
Michael Wand	eac9c6ea83	Convert: Make NVFP4 and MXFP4 HF conversions say NVFP4/MXFP4 instead of BF16 (#20730 ) * Corrected convert script for NVFP4 naming and updated gguf constants * Add mostly_MXFP4 to FileType Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * simplify * set initial value [no ci] --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-03-21 13:35:21 +02:00
Sigbjørn Skjæret	29b28a9824	ci : switch from pyright to ty (#20826 ) * type fixes * switch to ty * tweak rules * tweak more rules * more tweaks * final tweak * use common import-not-found rule	2026-03-21 08:54:34 +01:00

1 2 3 4 5 ...

375 Commits