llama.cpp/models at b9842 - llama.cpp - Gitea: Git with a cup of tea

wylab/llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-06-30 09:37:42 +02:00

Files

T

History

Aman Gupta 8c146a8366 DeepSeek V4 (#24162 )

* convert: add dsv4 conversion

* add basic setup

* add llm_graph_input_dsv4

* add save-load state

* add sinkhorn eps - correction by @fairydreaming

* add rope fix

* cleanup dead code

* fix bugs

* support pro model: added by @fairydreaming

* remove redundant V cache

* Chat template

* remove debugging leftovers

* Add mechanism for inlining templates based on architecture

* s/deepseek-v4-flash/deepseek4/g

* s/deepseek-v4-flash/deepseek4/g continued

* enable graph reuse

* enable FA

* fix test llama archs

* rename

* compatibility with antirez ds4 GGUFs

* simplified set_gguf_parameters() by calling super class method, replaced moe.score_func with expert_gating_func.

* reserve worst-case kv-cache

* revert max split inputs

* address review comments

* add padding to enable FA

* pad only the final value of plan.n_kv to 256

* remove built-in cpp chat template

* cont: remove cpp built-in template

* rm outdated test

* replace ggml_view_3d() with ggml_reshape_3d()

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* only support n_seq=1 for now

* remove unused var

* cont: remove unused var

* use scale bias

* use correct ptr for can_reuse

* remove gen-chat-inline-templates.py

* simplify graph reuse

* cont: cleanup

* remove unused inputs

* enable partial checkpointing

* add correct shape for kq_mask + set llama_model_n_swa to 0 for dsv4

* precompute source_idx + add comment about dummy write

* support multi-seq

* remove restored_trim_pos

* use split_equal when possible

* fix indent

* address review comments

* use LLM_KV

* fix ci

---------

Co-authored-by: Piotr Wilkin <piotr.wilkin@syndatis.com>
Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
Co-authored-by: fairydreaming <166155368+fairydreaming@users.noreply.github.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

2026-06-29 16:58:51 +08:00

..

DeepSeek V4 (#24162 )

2026-06-29 16:58:51 +08:00

.editorconfig

gguf : new file format with flexible meta data (beta) (#2398 )

2023-08-21 23:07:43 +03:00

ggml-vocab-aquila.gguf

Work on the BPE tokenizer (#3252 )

2023-10-03 09:16:26 +02:00

ggml-vocab-baichuan.gguf

Add more tokenizer tests (#3742 )

2023-10-24 09:17:17 +02:00

ggml-vocab-bert-bge.gguf

llama : fix BPE pre-tokenization (#6920 )

2024-04-29 16:58:41 +03:00

ggml-vocab-bert-bge.gguf.inp

convert : allow partial update to the chkhsh pre-tokenizer list (#13847 )

2025-05-30 12:24:37 +02:00

ggml-vocab-bert-bge.gguf.out

convert : allow partial update to the chkhsh pre-tokenizer list (#13847 )

2025-05-30 12:24:37 +02:00

ggml-vocab-command-r.gguf

command-r : add BPE pre-tokenization (#7063 )

2024-05-05 08:19:30 +03:00

ggml-vocab-command-r.gguf.inp

convert : allow partial update to the chkhsh pre-tokenizer list (#13847 )

2025-05-30 12:24:37 +02:00

ggml-vocab-command-r.gguf.out

convert : allow partial update to the chkhsh pre-tokenizer list (#13847 )

2025-05-30 12:24:37 +02:00

ggml-vocab-deepseek-coder.gguf

llama : fix BPE pre-tokenization (#6920 )

2024-04-29 16:58:41 +03:00

ggml-vocab-deepseek-coder.gguf.inp

convert : allow partial update to the chkhsh pre-tokenizer list (#13847 )

2025-05-30 12:24:37 +02:00

ggml-vocab-deepseek-coder.gguf.out

convert : allow partial update to the chkhsh pre-tokenizer list (#13847 )

2025-05-30 12:24:37 +02:00

ggml-vocab-deepseek-llm.gguf

llama : fix BPE pre-tokenization (#6920 )

2024-04-29 16:58:41 +03:00

ggml-vocab-deepseek-llm.gguf.inp

convert : allow partial update to the chkhsh pre-tokenizer list (#13847 )

2025-05-30 12:24:37 +02:00

ggml-vocab-deepseek-llm.gguf.out

convert : allow partial update to the chkhsh pre-tokenizer list (#13847 )

2025-05-30 12:24:37 +02:00

ggml-vocab-falcon.gguf

llama : fix BPE pre-tokenization (#6920 )

2024-04-29 16:58:41 +03:00

ggml-vocab-falcon.gguf.inp

convert : allow partial update to the chkhsh pre-tokenizer list (#13847 )

2025-05-30 12:24:37 +02:00

ggml-vocab-falcon.gguf.out

convert : allow partial update to the chkhsh pre-tokenizer list (#13847 )

2025-05-30 12:24:37 +02:00

ggml-vocab-gemma-4.gguf

vocab: add gemma4 tokenizer tests, fix edge case (#21534 )

2026-04-09 11:41:14 +02:00

ggml-vocab-gemma-4.gguf.inp

vocab: add gemma4 tokenizer tests, fix edge case (#21534 )

2026-04-09 11:41:14 +02:00

ggml-vocab-gemma-4.gguf.out

vocab: add gemma4 tokenizer tests, fix edge case (#21534 )

2026-04-09 11:41:14 +02:00

ggml-vocab-gpt-2.gguf

llama : fix BPE pre-tokenization (#6920 )

2024-04-29 16:58:41 +03:00

ggml-vocab-gpt-2.gguf.inp

convert : allow partial update to the chkhsh pre-tokenizer list (#13847 )

2025-05-30 12:24:37 +02:00

ggml-vocab-gpt-2.gguf.out

convert : allow partial update to the chkhsh pre-tokenizer list (#13847 )

2025-05-30 12:24:37 +02:00

ggml-vocab-gpt-neox.gguf

Add more tokenizer tests (#3742 )

2023-10-24 09:17:17 +02:00

ggml-vocab-llama-bpe.gguf

llama : fix BPE pre-tokenization (#6920 )

2024-04-29 16:58:41 +03:00

ggml-vocab-llama-bpe.gguf.inp

convert : allow partial update to the chkhsh pre-tokenizer list (#13847 )

2025-05-30 12:24:37 +02:00

ggml-vocab-llama-bpe.gguf.out

convert : allow partial update to the chkhsh pre-tokenizer list (#13847 )

2025-05-30 12:24:37 +02:00

ggml-vocab-llama-spm.gguf

llama : fix BPE pre-tokenization (#6920 )

2024-04-29 16:58:41 +03:00

ggml-vocab-llama-spm.gguf.inp

convert : allow partial update to the chkhsh pre-tokenizer list (#13847 )

2025-05-30 12:24:37 +02:00

ggml-vocab-llama-spm.gguf.out

convert : allow partial update to the chkhsh pre-tokenizer list (#13847 )

2025-05-30 12:24:37 +02:00

ggml-vocab-mpt.gguf

llama : fix BPE pre-tokenization (#6920 )

2024-04-29 16:58:41 +03:00

ggml-vocab-mpt.gguf.inp

convert : allow partial update to the chkhsh pre-tokenizer list (#13847 )

2025-05-30 12:24:37 +02:00

ggml-vocab-mpt.gguf.out

convert : allow partial update to the chkhsh pre-tokenizer list (#13847 )

2025-05-30 12:24:37 +02:00

ggml-vocab-nomic-bert-moe.gguf

tests : improve UGM tokenizer test coverage (#13773 )

2025-05-25 16:22:29 +02:00

ggml-vocab-phi-3.gguf

Per token attributes (#7685 )

2024-06-04 09:17:17 +02:00

ggml-vocab-phi-3.gguf.inp

convert : allow partial update to the chkhsh pre-tokenizer list (#13847 )

2025-05-30 12:24:37 +02:00

ggml-vocab-phi-3.gguf.out

convert : allow partial update to the chkhsh pre-tokenizer list (#13847 )

2025-05-30 12:24:37 +02:00

ggml-vocab-qwen2.gguf

llama : add BPE pre-tokenization for Qwen2 (#7114 )

2024-05-08 15:06:43 +03:00

ggml-vocab-qwen2.gguf.inp

convert : allow partial update to the chkhsh pre-tokenizer list (#13847 )

2025-05-30 12:24:37 +02:00

ggml-vocab-qwen2.gguf.out

convert : allow partial update to the chkhsh pre-tokenizer list (#13847 )

2025-05-30 12:24:37 +02:00

ggml-vocab-qwen35.gguf

unicode,test: add Qwen3.5 non-backtracking tokenizer handler and regr… (#22110 )

2026-05-14 11:03:40 +02:00

ggml-vocab-qwen35.gguf.inp

unicode,test: add Qwen3.5 non-backtracking tokenizer handler and regr… (#22110 )

2026-05-14 11:03:40 +02:00

ggml-vocab-qwen35.gguf.out

unicode,test: add Qwen3.5 non-backtracking tokenizer handler and regr… (#22110 )

2026-05-14 11:03:40 +02:00

ggml-vocab-refact.gguf

tests : add test-tokenizer-0.sh + fix some tokenizers (#7036 )

2024-05-04 08:32:32 +03:00

ggml-vocab-refact.gguf.inp

convert : allow partial update to the chkhsh pre-tokenizer list (#13847 )

2025-05-30 12:24:37 +02:00

ggml-vocab-refact.gguf.out

convert : allow partial update to the chkhsh pre-tokenizer list (#13847 )

2025-05-30 12:24:37 +02:00

ggml-vocab-starcoder.gguf

llama : fix BPE pre-tokenization (#6920 )

2024-04-29 16:58:41 +03:00

ggml-vocab-starcoder.gguf.inp

convert : allow partial update to the chkhsh pre-tokenizer list (#13847 )

2025-05-30 12:24:37 +02:00

ggml-vocab-starcoder.gguf.out

convert : allow partial update to the chkhsh pre-tokenizer list (#13847 )

2025-05-30 12:24:37 +02:00