llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-06-30 09:37:42 +02:00

Files

T

Aman Gupta 8c146a8366 DeepSeek V4 (#24162 )

* convert: add dsv4 conversion

* add basic setup

* add llm_graph_input_dsv4

* add save-load state

* add sinkhorn eps - correction by @fairydreaming

* add rope fix

* cleanup dead code

* fix bugs

* support pro model: added by @fairydreaming

* remove redundant V cache

* Chat template

* remove debugging leftovers

* Add mechanism for inlining templates based on architecture

* s/deepseek-v4-flash/deepseek4/g

* s/deepseek-v4-flash/deepseek4/g continued

* enable graph reuse

* enable FA

* fix test llama archs

* rename

* compatibility with antirez ds4 GGUFs

* simplified set_gguf_parameters() by calling super class method, replaced moe.score_func with expert_gating_func.

* reserve worst-case kv-cache

* revert max split inputs

* address review comments

* add padding to enable FA

* pad only the final value of plan.n_kv to 256

* remove built-in cpp chat template

* cont: remove cpp built-in template

* rm outdated test

* replace ggml_view_3d() with ggml_reshape_3d()

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* only support n_seq=1 for now

* remove unused var

* cont: remove unused var

* use scale bias

* use correct ptr for can_reuse

* remove gen-chat-inline-templates.py

* simplify graph reuse

* cont: cleanup

* remove unused inputs

* enable partial checkpointing

* add correct shape for kq_mask + set llama_model_n_swa to 0 for dsv4

* precompute source_idx + add comment about dummy write

* support multi-seq

* remove restored_trim_pos

* use split_equal when possible

* fix indent

* address review comments

* use LLM_KV

* fix ci

---------

Co-authored-by: Piotr Wilkin <piotr.wilkin@syndatis.com>
Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
Co-authored-by: fairydreaming <166155368+fairydreaming@users.noreply.github.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

2026-06-29 16:58:51 +08:00

Apertus-8B-Instruct.jinja

Autoparser - complete refactoring of parser architecture (#18675 )

2026-03-06 21:01:00 +01:00

Apriel-1.6-15b-Thinker-fixed.jinja

common/parser: add proper reasoning tag prefill reading (#20424 )

2026-03-19 16:58:21 +01:00

Bielik-11B-v3.0-Instruct.jinja

Autoparser - complete refactoring of parser architecture (#18675 )

2026-03-06 21:01:00 +01:00

ByteDance-Seed-OSS.jinja

chat : Seed OSS thinking + tool call support (#15552 )

2025-08-29 14:53:41 +02:00

Cohere2MoE.jinja

chat: add dedicated Cohere2MoE (North Code) parser (#24615 )

2026-06-14 20:17:40 +02:00

CohereForAI-c4ai-command-r7b-12-2024-tool_use.jinja

Autoparser - complete refactoring of parser architecture (#18675 )

2026-03-06 21:01:00 +01:00

CohereForAI-c4ai-command-r-plus-tool_use.jinja

Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars (#9639 )

2025-01-30 19:13:58 +00:00

deepseek-ai-DeepSeek-R1-Distill-Llama-8B.jinja

Autoparser - complete refactoring of parser architecture (#18675 )

2026-03-06 21:01:00 +01:00

deepseek-ai-DeepSeek-R1-Distill-Qwen-32B.jinja

common/parser: add proper reasoning tag prefill reading (#20424 )

2026-03-19 16:58:21 +01:00

deepseek-ai-DeepSeek-V3.1.jinja

common/parser: add proper reasoning tag prefill reading (#20424 )

2026-03-19 16:58:21 +01:00

deepseek-ai-DeepSeek-V3.2.jinja

chat: dedicated DeepSeek v3.2 parser + "official" template (#21785 )

2026-04-13 22:23:53 +02:00

deepseek-ai-DeepSeek-V4.jinja

DeepSeek V4 (#24162 )

2026-06-29 16:58:51 +08:00

fireworks-ai-llama-3-firefunction-v2.jinja

Autoparser - complete refactoring of parser architecture (#18675 )

2026-03-06 21:01:00 +01:00

GigaChat3-10B-A1.8B.jinja

common/parser: add GigaChatV3/3.1 models support (#19931 )

2026-03-12 01:22:25 +01:00

GigaChat3.1-10B-A1.8B.jinja

common/parser: add GigaChatV3/3.1 models support (#19931 )

2026-03-12 01:22:25 +01:00

GLM-4.6.jinja

common : Generalized XML-style tool-call parsing with streaming support (GLM 4.5/4.6 + MiniMax M2 + SeedOSS + Kimi-K2 + Qwen3-Coder + Apriel-1.5 + Xiaomi-MiMo) (#16932 )

2025-11-18 18:54:15 +01:00

GLM-4.7-Flash.jinja

Autoparser - complete refactoring of parser architecture (#18675 )

2026-03-06 21:01:00 +01:00

google-gemma-2-2b-it.jinja

Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars (#9639 )

2025-01-30 19:13:58 +00:00

google-gemma-4-31B-it-interleaved.jinja

common : better align to the updated official gemma4 template (#21704 )

2026-04-10 16:12:53 -05:00

google-gemma-4-31B-it.jinja

common : better align to the updated official gemma4 template (#21704 )

2026-04-10 16:12:53 -05:00

HuggingFaceTB-SmolLM3-3B.jinja

common/autoparser : detect reasoning markers when enable_thinking changes system prompt (#20859 )

2026-03-23 08:35:27 +01:00

ibm-granite-granite-3.3-2B-Instruct.jinja

chat : support Granite model reasoning and tool call (#14864 )

2025-08-06 20:27:30 +02:00

ibm-granite-granite-4.0.jinja

chat : add Granite 4.0 chat template with correct tool_call role mapping (#20804 )

2026-04-02 11:28:56 +02:00

ibm-granite-granite-4.1.jinja

chat : add Granite 4.1 chat template (#23518 )

2026-05-28 13:13:33 +02:00

Kimi-K2-Instruct.jinja

Fix Kimi-K2 tool-call parsing issues (#17376 )

2025-12-08 14:32:04 +01:00

Kimi-K2-Thinking.jinja

Fix Kimi-K2 tool-call parsing issues (#17376 )

2025-12-08 14:32:04 +01:00

LFM2-8B-A1B.jinja

PEG parser for LFM2 (#20251 )

2026-03-09 01:11:22 +01:00

LFM2.5-8B-A1B.jinja

common/chat : fix LFM2/LFM2.5 reasoning round-trip and <think> leak (#24234 )

2026-06-06 22:39:21 +02:00

LFM2.5-Instruct.jinja

fix: tool call parsing for LFM2 and LFM2.5 models (#21242 )

2026-04-01 16:22:44 +02:00

llama-cpp-deepseek-r1.jinja

common/parser: add proper reasoning tag prefill reading (#20424 )

2026-03-19 16:58:21 +01:00

llama-cpp-rwkv-world.jinja

llama : add jinja template for rwkv-world (#14665 )

2025-07-14 07:43:43 +08:00

meetkai-functionary-medium-v3.1.jinja

common/parser: add proper reasoning tag prefill reading (#20424 )

2026-03-19 16:58:21 +01:00

meetkai-functionary-medium-v3.2.jinja

Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars (#9639 )

2025-01-30 19:13:58 +00:00

meta-llama-Llama-3.1-8B-Instruct.jinja

Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars (#9639 )

2025-01-30 19:13:58 +00:00

meta-llama-Llama-3.2-3B-Instruct.jinja

Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars (#9639 )

2025-01-30 19:13:58 +00:00

meta-llama-Llama-3.3-70B-Instruct.jinja

Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars (#9639 )

2025-01-30 19:13:58 +00:00

microsoft-Phi-3.5-mini-instruct.jinja

Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars (#9639 )

2025-01-30 19:13:58 +00:00

MiMo-VL.jinja

common : Generalized XML-style tool-call parsing with streaming support (GLM 4.5/4.6 + MiniMax M2 + SeedOSS + Kimi-K2 + Qwen3-Coder + Apriel-1.5 + Xiaomi-MiMo) (#16932 )

2025-11-18 18:54:15 +01:00

MiniMax-M2.jinja

common : Generalized XML-style tool-call parsing with streaming support (GLM 4.5/4.6 + MiniMax M2 + SeedOSS + Kimi-K2 + Qwen3-Coder + Apriel-1.5 + Xiaomi-MiMo) (#16932 )

2025-11-18 18:54:15 +01:00

Mistral-Small-3.2-24B-Instruct-2506.jinja

jinja : Add Mistral-Small-3.2-24B-Instruct-2506.jinja (#14349 )

2025-06-24 09:17:58 +03:00

mistralai-Ministral-3-14B-Reasoning-2512.jinja

common : add parser for ministral/mistral large 3/devstral 2 (#17713 )

2025-12-09 17:31:04 -06:00

mistralai-Mistral-Nemo-Instruct-2407.jinja

Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars (#9639 )

2025-01-30 19:13:58 +00:00

moonshotai-Kimi-K2.jinja

Autoparser - complete refactoring of parser architecture (#18675 )

2026-03-06 21:01:00 +01:00

NousResearch-Hermes-2-Pro-Llama-3-8B-tool_use.jinja

Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars (#9639 )

2025-01-30 19:13:58 +00:00

NousResearch-Hermes-3-Llama-3.1-8B-tool_use.jinja

Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars (#9639 )

2025-01-30 19:13:58 +00:00

NVIDIA-Nemotron-3-Nano-30B-A3B-BF16.jinja

common : implement new jinja template engine (#18462 )

2026-01-16 11:22:06 +01:00

NVIDIA-Nemotron-Nano-v2.jinja

chat : nemotron thinking & toolcalling support (#15676 )

2025-09-05 01:22:22 +02:00

openai-gpt-oss-120b.jinja

gpt-oss: implement harmony parsing (#15181 )

2025-08-14 17:23:11 +03:00

openbmb-MiniCPM5-1B.jinja

chat : implement minicpm5 parser (#24889 )

2026-06-28 16:53:32 +02:00

Qwen3-Coder.jinja

Autoparser - complete refactoring of parser architecture (#18675 )

2026-03-06 21:01:00 +01:00

Qwen3.5-4B.jinja

common/parser: fix handling of tool definition with missing properties key (#21128 )

2026-03-28 20:41:32 +01:00

Qwen-Qwen2.5-7B-Instruct.jinja

Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars (#9639 )

2025-01-30 19:13:58 +00:00

Qwen-Qwen3-0.6B.jinja

server: add --reasoning-budget 0 to disable thinking (incl. qwen3 w/ enable_thinking:false) (#13771 )

2025-05-26 00:30:51 +01:00

Qwen-QwQ-32B.jinja

Autoparser - complete refactoring of parser architecture (#18675 )

2026-03-06 21:01:00 +01:00

README.md

chat : Deepseek V3.1 reasoning and tool calling support (OpenAI Style) (#15533 )

2025-09-08 16:59:48 +02:00

Reka-Edge.jinja

autoparser: support case of JSON_NATIVE with per-call markers (test case: Reka-Edge) (#21892 )

2026-04-15 10:51:50 +02:00

StepFun3.5-Flash.jinja

Autoparser - complete refactoring of parser architecture (#18675 )

2026-03-06 21:01:00 +01:00

stepfun-ai-Step-3.5-Flash.jinja

common : fix Step-3.5-Flash format detection and thinking support (#19635 )

2026-02-19 22:40:52 +01:00

unsloth-Apriel-1.5.jinja

Autoparser - complete refactoring of parser architecture (#18675 )

2026-03-06 21:01:00 +01:00

unsloth-mistral-Devstral-Small-2507.jinja

mtmd : add support for Voxtral (#14862 )

2025-07-28 15:01:48 +02:00

upstage-Solar-Open-100B.jinja

chat : add parsing for solar-open-100b (#18540 )

2026-01-29 16:06:15 +01:00

README.md

These templates can be updated with the following commands:

./scripts/get_chat_template.py CohereForAI/c4ai-command-r-plus tool_use      > models/templates/CohereForAI-c4ai-command-r-plus-tool_use.jinja
./scripts/get_chat_template.py CohereForAI/c4ai-command-r7b-12-2024 default  > models/templates/CohereForAI-c4ai-command-r7b-12-2024-default.jinja
./scripts/get_chat_template.py CohereForAI/c4ai-command-r7b-12-2024 rag      > models/templates/CohereForAI-c4ai-command-r7b-12-2024-rag.jinja
./scripts/get_chat_template.py CohereForAI/c4ai-command-r7b-12-2024 tool_use > models/templates/CohereForAI-c4ai-command-r7b-12-2024-tool_use.jinja
./scripts/get_chat_template.py deepseek-ai/DeepSeek-R1-Distill-Llama-8B      > models/templates/deepseek-ai-DeepSeek-R1-Distill-Llama-8B.jinja
./scripts/get_chat_template.py deepseek-ai/DeepSeek-R1-Distill-Qwen-32B      > models/templates/deepseek-ai-DeepSeek-R1-Distill-Qwen-32B.jinja
./scripts/get_chat_template.py fireworks-ai/llama-3-firefunction-v2          > models/templates/fireworks-ai-llama-3-firefunction-v2.jinja
./scripts/get_chat_template.py google/gemma-2-2b-it                          > models/templates/google-gemma-2-2b-it.jinja
./scripts/get_chat_template.py meetkai/functionary-medium-v3.1               > models/templates/meetkai-functionary-medium-v3.1.jinja
./scripts/get_chat_template.py meetkai/functionary-medium-v3.2               > models/templates/meetkai-functionary-medium-v3.2.jinja
./scripts/get_chat_template.py meta-llama/Llama-3.1-8B-Instruct              > models/templates/meta-llama-Llama-3.1-8B-Instruct.jinja
./scripts/get_chat_template.py meta-llama/Llama-3.2-3B-Instruct              > models/templates/meta-llama-Llama-3.2-3B-Instruct.jinja
./scripts/get_chat_template.py meta-llama/Llama-3.3-70B-Instruct             > models/templates/meta-llama-Llama-3.3-70B-Instruct.jinja
./scripts/get_chat_template.py microsoft/Phi-3.5-mini-instruct               > models/templates/microsoft-Phi-3.5-mini-instruct.jinja
./scripts/get_chat_template.py mistralai/Mistral-Nemo-Instruct-2407          > models/templates/mistralai-Mistral-Nemo-Instruct-2407.jinja
./scripts/get_chat_template.py NousResearch/Hermes-2-Pro-Llama-3-8B tool_use > models/templates/NousResearch-Hermes-2-Pro-Llama-3-8B-tool_use.jinja
./scripts/get_chat_template.py NousResearch/Hermes-3-Llama-3.1-8B tool_use   > models/templates/NousResearch-Hermes-3-Llama-3.1-8B-tool_use.jinja
./scripts/get_chat_template.py Qwen/Qwen2.5-7B-Instruct                      > models/templates/Qwen-Qwen2.5-7B-Instruct.jinja
./scripts/get_chat_template.py Qwen/QwQ-32B                                  > models/templates/Qwen-QwQ-32B.jinja
./scripts/get_chat_template.py Qwen/Qwen3-0.6B                               > models/templates/Qwen-Qwen3-0.6B.jinja
./scripts/get_chat_template.py zai-org/GLM-4.5                               > models/templates/zai-org-GLM-4.5.jinja
./scripts/get_chat_template.py deepseek-ai/DeepSeek-V3.1                     > models/templates/deepseek-ai-DeepSeek-V3.1.jinja