Files
llama.cpp/tools
Adrien Gallouët 4b48a53b6c server : optimize get_token_probabilities (#24796)
Use std::partial_sort to order only the requested top-n tokens instead
of the full vocabulary

    logprobs sort: vocab=128000 n_top=0 iters=100
    full    sort:   8555.6 us/op
    partial sort:    704.3 us/op

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-06-19 23:26:54 +02:00
..
2026-05-14 13:05:52 +03:00