llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-06-29 17:17:40 +02:00

Files

T

Aman Gupta 8c146a8366 DeepSeek V4 (#24162 )

* convert: add dsv4 conversion

* add basic setup

* add llm_graph_input_dsv4

* add save-load state

* add sinkhorn eps - correction by @fairydreaming

* add rope fix

* cleanup dead code

* fix bugs

* support pro model: added by @fairydreaming

* remove redundant V cache

* Chat template

* remove debugging leftovers

* Add mechanism for inlining templates based on architecture

* s/deepseek-v4-flash/deepseek4/g

* s/deepseek-v4-flash/deepseek4/g continued

* enable graph reuse

* enable FA

* fix test llama archs

* rename

* compatibility with antirez ds4 GGUFs

* simplified set_gguf_parameters() by calling super class method, replaced moe.score_func with expert_gating_func.

* reserve worst-case kv-cache

* revert max split inputs

* address review comments

* add padding to enable FA

* pad only the final value of plan.n_kv to 256

* remove built-in cpp chat template

* cont: remove cpp built-in template

* rm outdated test

* replace ggml_view_3d() with ggml_reshape_3d()

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* only support n_seq=1 for now

* remove unused var

* cont: remove unused var

* use scale bias

* use correct ptr for can_reuse

* remove gen-chat-inline-templates.py

* simplify graph reuse

* cont: cleanup

* remove unused inputs

* enable partial checkpointing

* add correct shape for kq_mask + set llama_model_n_swa to 0 for dsv4

* precompute source_idx + add comment about dummy write

* support multi-seq

* remove restored_trim_pos

* use split_equal when possible

* fix indent

* address review comments

* use LLM_KV

* fix ci

---------

Co-authored-by: Piotr Wilkin <piotr.wilkin@syndatis.com>
Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
Co-authored-by: fairydreaming <166155368+fairydreaming@users.noreply.github.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

2026-06-29 16:58:51 +08:00

__init__.py

DeepSeek V4 (#24162 )

2026-06-29 16:58:51 +08:00

afmoe.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

arctic.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

baichuan.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

bailingmoe.py

convert : more consistent handling of rope_parameters (#24833 )

2026-06-20 13:42:36 +03:00

base.py

DeepSeek V4 (#24162 )

2026-06-29 16:58:51 +08:00

bert.py

model : support granite multilingual embeddings R2 (ibm-granite/granite-embedding-{97,311}m-multilingual-r2) (#22716 )

2026-06-02 17:55:11 +02:00

bitnet.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

bloom.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

chameleon.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

chatglm.py

convert : more consistent handling of rope_parameters (#24833 )

2026-06-20 13:42:36 +03:00

codeshell.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

cogvlm.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

command_r.py

Add arch support for cohere2-MoE (#24260 )

2026-06-13 19:49:00 +02:00

dbrx.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

deci.py

convert : more consistent handling of rope_parameters (#24833 )

2026-06-20 13:42:36 +03:00

deepseek.py

DeepSeek V4 (#24162 )

2026-06-29 16:58:51 +08:00

dots1.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

dotsocr.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

dream.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

ernie.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

exaone.py

convert : more consistent handling of rope_parameters (#24833 )

2026-06-20 13:42:36 +03:00

falcon_h1.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

falcon.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

gemma.py

convert : more consistent handling of rope_parameters (#24833 )

2026-06-20 13:42:36 +03:00

glm.py

convert : more consistent handling of rope_parameters (#24833 )

2026-06-20 13:42:36 +03:00

gpt2.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

gpt_oss.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

gptneox.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

granite.py

model: Granite Speech Plus (#24818 )

2026-06-23 12:03:31 +02:00

grok.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

grovemoe.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

hunyuan.py

mtmd, model : merge HunyuanOCR into HunyuanVL and fix OCR vision precision (#23329 )

2026-05-21 00:35:37 +02:00

internlm.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

internvl.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

jais.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

jamba.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

januspro.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

kimi_linear.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

kimivl.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

lfm2.py

model : Add LFM2.5-ColBERT-350M and LFM2.5-Embedding-350M (#24913 )

2026-06-24 09:49:46 +03:00

lighton_ocr.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

llada.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

llama4.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

llama.py

dflash: refactor draft model conversion (#25110 )

2026-06-28 20:31:48 +02:00

llava.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

maincoder.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

mamba.py

mamba2: remove hardcoded 2x expansion factor and invalid d_inner % d_state check (#23082 )

2026-06-26 08:50:54 +03:00

mellum.py

model: add Mellum architecture (#23966 )

2026-06-02 22:11:12 +03:00

mimo.py

convert : more consistent handling of rope_parameters (#24833 )

2026-06-20 13:42:36 +03:00

minicpm.py

convert : more consistent handling of rope_parameters (#24833 )

2026-06-20 13:42:36 +03:00

minimax.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

mistral3.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

mistral.py

convert : fix conversion for Mistral-Medium-3.5-128B (#24268 )

2026-06-07 21:41:39 +02:00

mpt.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

nemotron.py

convert : more consistent handling of rope_parameters (#24833 )

2026-06-20 13:42:36 +03:00

olmo.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

openelm.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

orion.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

pangu.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

phi.py

convert : more consistent handling of rope_parameters (#24833 )

2026-06-20 13:42:36 +03:00

pixtral.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

plamo.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

plm.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

qwen3vl.py

convert : fix Qwen3 ASR conversion (#23081 )

2026-05-15 18:38:39 +02:00

qwen.py

dflash: refactor draft model conversion (#25110 )

2026-06-28 20:31:48 +02:00

qwenvl.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

refact.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

rwkv.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

sarashina2.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

smallthinker.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

smolvlm.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

stablelm.py

convert : more consistent handling of rope_parameters (#24833 )

2026-06-20 13:42:36 +03:00

starcoder.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

step3.py

convert : more consistent handling of rope_parameters (#24833 )

2026-06-20 13:42:36 +03:00

t5.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

talkie.py

model : add support for talkie-1930-13b (#22596 )

2026-05-26 07:57:38 +03:00

ultravox.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

wavtokenizer.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

xverse.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00

youtuvl.py

Refactor: convert_hf_to_gguf.py (#17114 )

2026-05-15 15:18:12 +02:00