mirror of
https://github.com/ggml-org/llama.cpp.git
synced 2026-06-30 09:37:42 +02:00
b9824
10 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
52b3df0023 |
common/peg : implement ac parser for stricter grammar generation (#24869)
* common/peg : implement ac parser * cont : extract functions * cont : tidy up * cont : remove a test * cont : move ac() def |
||
|
|
063d9c156e |
common/peg : refactor until gbnf grammar generation (#24839)
* common/peg : refactor until gbnf grammar into an ac automaton * cont : add a test with multiple strings * cont : pad state with 0s so rules line up * cont : clean up comments * cont : use set everywhere * cont : inline state num string padding * cont : add a ref to PR * cont : fix regression in server-tools.cpp |
||
|
|
0ae3f450f0 |
chat: fix an "oldie but goodie" grammar generator bug that surfaced during last changes (#24653)
* chat: fix an "oldie but goodie" grammar generator bug that surfaced during last changes * update erroneous case in PEG parser test |
||
|
|
e21cdc11a0 |
common/gemma4 : handle parsing edge cases (#21760)
CI (android) / android (push) Failing after 6m14s
CI (android) / android-ndk (arm64-cpu, -D ANDROID_ABI=arm64-v8a -D ANDROID_PLATFORM=android-31 -D CMAKE_TOOLCHAIN_FILE=${ANDROID_NDK_ROOT}/build/cmake/android.toolchain.cmake -D GGML_NATIVE=OFF -DGGML_CPU_ARM_ARCH=armv8.5-a+fp16+i8mm -G Ninja -D LLAMA_OPENSSL=OFF -D … (push) Failing after 1s
CI (android) / android-ndk (arm64-snapdragon, --preset arm64-android-snapdragon-release) (push) Failing after 8s
CI (sanitize) / ubuntu-latest-sanitizer (Debug, ADDRESS) (push) Failing after 15s
CI (sanitize) / ubuntu-latest-sanitizer (Debug, THREAD) (push) Failing after 14s
CI (sanitize) / ubuntu-latest-sanitizer (Debug, UNDEFINED) (push) Failing after 13s
CI / ubuntu-latest-rpc (push) Failing after 22s
CI / ubuntu-latest-cuda (push) Failing after 5s
CI / build-cmake-pkg (push) Successful in 28m6s
Server (sanitize) / server (RelWithDebInfo, UNDEFINED) (push) Successful in 1h5m25s
Server (sanitize) / server (RelWithDebInfo, ADDRESS) (push) Successful in 1h31m12s
Build Actions Cache / ubuntu-24-vulkan-cache (push) Has been cancelled
Build Actions Cache / ubuntu-24-openvino-cache (push) Has been cancelled
Build Actions Cache / windows-2022-rocm-cache (push) Has been cancelled
Server / server (default) (push) Successful in 44m44s
Server / server (backend-sampling) (push) Successful in 38m30s
Update Winget Package / Update Winget Package (push) Has been skipped
CI (3rd-party) / ubuntu-24-llguidance (push) Has been cancelled
CI (apple) / macOS-latest-ios (push) Has been cancelled
CI (apple) / macos-latest-ios-xcode (push) Has been cancelled
CI (apple) / macOS-latest-tvos (push) Has been cancelled
CI (apple) / macOS-latest-visionos (push) Has been cancelled
CI (apple) / macOS-latest-swift (generic/platform=iOS) (push) Has been cancelled
CI (apple) / macOS-latest-swift (generic/platform=macOS) (push) Has been cancelled
CI (apple) / macOS-latest-swift (generic/platform=tvOS) (push) Has been cancelled
CI (cann) / openEuler-latest-cann (aarch64, Release, 310p, off) (push) Has been cancelled
CI (cann) / openEuler-latest-cann (aarch64, Release, 910b, off) (push) Has been cancelled
CI (cann) / openEuler-latest-cann (aarch64, Release, 910b, on) (push) Has been cancelled
CI (cann) / openEuler-latest-cann (x86, Release, 310p, off) (push) Has been cancelled
CI (cann) / openEuler-latest-cann (x86, Release, 910b, off) (push) Has been cancelled
CI (cann) / openEuler-latest-cann (x86, Release, 910b, on) (push) Has been cancelled
CI (riscv) / ubuntu-riscv64-native-sanitizer (Debug, ADDRESS) (push) Has been cancelled
CI (riscv) / ubuntu-riscv64-native-sanitizer (Debug, THREAD) (push) Has been cancelled
CI (riscv) / ubuntu-riscv64-native-sanitizer (Debug, UNDEFINED) (push) Has been cancelled
CI (self-hosted) / ggml-ci-nvidia-cuda (push) Has been cancelled
CI (self-hosted) / ggml-ci-nvidia-vulkan-cm (push) Has been cancelled
CI (self-hosted) / ggml-ci-nvidia-vulkan-cm2 (push) Has been cancelled
CI (self-hosted) / ggml-ci-linux-intel-vulkan (push) Has been cancelled
CI (self-hosted) / ggml-ci-win-intel-vulkan (push) Has been cancelled
CI (self-hosted) / ggml-ci-intel-openvino-gpu-low-perf (push) Has been cancelled
CI (vulkan) / ubuntu-24-vulkan-llvmpipe (push) Has been cancelled
CI / macOS-latest-arm64 (push) Has been cancelled
CI / macOS-latest-x64 (push) Has been cancelled
CI / macOS-latest-arm64-webgpu (push) Has been cancelled
CI / ubuntu-cpu (arm64, ubuntu-24.04-arm) (push) Has been cancelled
CI / ubuntu-cpu (ppc64le, ubuntu-24.04-ppc64le) (push) Has been cancelled
CI / ubuntu-cpu (s390x, ubuntu-24.04-s390x) (push) Has been cancelled
CI / ubuntu-cpu (x64, ubuntu-22.04) (push) Has been cancelled
CI / ubuntu-24-vulkan (arm64, ubuntu-24.04-arm) (push) Has been cancelled
CI / ubuntu-24-vulkan (x64, ubuntu-24.04) (push) Has been cancelled
CI / ubuntu-24-webgpu (push) Has been cancelled
CI / ubuntu-24-webgpu-wasm (push) Has been cancelled
CI / ubuntu-22-hip (push) Has been cancelled
CI / ubuntu-22-musa (push) Has been cancelled
CI / ubuntu-22-sycl (push) Has been cancelled
CI / ubuntu-22-sycl-fp16 (push) Has been cancelled
CI / ubuntu-24-openvino-CPU (push) Has been cancelled
CI / ubuntu-24-openvino-GPU (push) Has been cancelled
CI / windows-latest (arm64, llvm-arm64, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/arm64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON) (push) Has been cancelled
CI / windows-latest (arm64, llvm-arm64-opencl-adreno, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/arm64-windows-llvm.cmake -DCMAKE_PREFIX_PATH="$env:RUNNER_TEMP/opencl-arm64-release" -DGGML_OPENCL=ON -DGGML_OPENCL_USE_ADRENO_KERNELS=ON) (push) Has been cancelled
CI / windows-latest (x64, cpu-x64 (static), -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/x64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DBUILD_SHARED_LIBS=OFF) (push) Has been cancelled
CI / windows-latest (x64, openblas-x64, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/x64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON -DGGML_OPENMP=OFF -DGGML_BLAS=ON -DG… (push) Has been cancelled
CI / windows-latest (x64, vulkan-x64, -DCMAKE_BUILD_TYPE=Release -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON -DGGML_VULKAN=ON) (push) Has been cancelled
CI / windows-2022-cuda (12.4) (push) Has been cancelled
CI / windows-latest-sycl (push) Has been cancelled
CI / windows-latest-hip (push) Has been cancelled
CI / ubuntu-cpu-riscv64-native (push) Has been cancelled
CI / ggml-ci-x64-cpu-low-perf (push) Has been cancelled
CI / ggml-ci-arm64-cpu-low-perf (push) Has been cancelled
CI / ggml-ci-x64-cpu-high-perf (push) Has been cancelled
CI / ggml-ci-arm64-cpu-high-perf (push) Has been cancelled
CI / ggml-ci-arm64-cpu-high-perf-sve (push) Has been cancelled
CI / ggml-ci-arm64-cpu-kleidiai (push) Has been cancelled
CI / ggml-ci-arm64-cpu-kleidiai-graviton4 (push) Has been cancelled
EditorConfig Checker / editorconfig (push) Has been cancelled
Release / macOS-cpu (arm64, arm64, -DGGML_METAL_USE_BF16=ON -DGGML_METAL_EMBED_LIBRARY=ON, macos-14) (push) Has been cancelled
Release / macOS-cpu (arm64, arm64-kleidiai, -DGGML_METAL_USE_BF16=ON -DGGML_METAL_EMBED_LIBRARY=ON -DGGML_CPU_KLEIDIAI=ON, macos-14) (push) Has been cancelled
Release / macOS-cpu (x64, x64, -DGGML_METAL=OFF -DCMAKE_OSX_DEPLOYMENT_TARGET=13.3, macos-15-intel) (push) Has been cancelled
Release / ubuntu-cpu (arm64, ubuntu-24.04-arm) (push) Has been cancelled
Release / ubuntu-cpu (s390x, ubuntu-24.04-s390x) (push) Has been cancelled
Release / ubuntu-cpu (x64, ubuntu-22.04) (push) Has been cancelled
Release / ubuntu-vulkan (arm64, ubuntu-24.04-arm) (push) Has been cancelled
Release / ubuntu-vulkan (x64, ubuntu-22.04) (push) Has been cancelled
Release / ubuntu-24-openvino (push) Has been cancelled
Release / windows-cpu (arm64) (push) Has been cancelled
Release / windows-cpu (x64) (push) Has been cancelled
Release / windows (arm64, opencl-adreno, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/arm64-windows-llvm.cmake -DCMAKE_PREFIX_PATH="$env:RUNNER_TEMP/opencl-arm64-release" -DGGML_OPENCL=ON -DGGML_OPENCL_USE_ADRENO_KERNELS=ON, ggml-opencl) (push) Has been cancelled
Release / windows (x64, vulkan, -DGGML_VULKAN=ON, ggml-vulkan) (push) Has been cancelled
Release / windows-cuda (12.4) (push) Has been cancelled
Release / windows-cuda (13.1) (push) Has been cancelled
Release / windows-sycl (push) Has been cancelled
Release / ubuntu-22-rocm (7.2.1, x64, gfx908;gfx90a;gfx942;gfx1030;gfx1100;gfx1101;gfx1102;gfx1151;gfx1150;gfx1200;gfx1201) (push) Has been cancelled
Release / windows-hip (gfx1150;gfx1151;gfx1200;gfx1201;gfx1100;gfx1101;gfx1102;gfx1030;gfx1031;gfx1032, radeon) (push) Has been cancelled
Release / ios-xcode-build (push) Has been cancelled
Release / openEuler-cann (aarch64, Release, 310p, off) (push) Has been cancelled
Release / openEuler-cann (aarch64, Release, 910b, on) (push) Has been cancelled
Release / openEuler-cann (x86, Release, 310p, off) (push) Has been cancelled
Release / openEuler-cann (x86, Release, 910b, on) (push) Has been cancelled
Release / release (push) Has been cancelled
Server (self-hosted) / server-metal (GPUx2, backend-sampling) (push) Has been cancelled
Server (self-hosted) / server-metal (GPUx2) (push) Has been cancelled
Server (self-hosted) / server-metal (GPUx1) (push) Has been cancelled
Server (self-hosted) / server-metal (GPUx1, backend-sampling) (push) Has been cancelled
Server (self-hosted) / server-cuda (GPUx1) (push) Has been cancelled
Server (self-hosted) / server-cuda (GPUx1, backend-sampling) (push) Has been cancelled
Server / server-windows (push) Has been cancelled
Publish Docker image / Create and push git tag (push) Has been cancelled
Publish Docker image / Prepare Docker matrices (push) Has been cancelled
Publish Docker image / Push Docker image to Docker Registry (push) Has been cancelled
Publish Docker image / Create shared tags from digests (push) Has been cancelled
|
||
|
|
223373742b | common : add commentary rules for gpt-oss-20b (#21286) | ||
|
|
c96f608d98 |
common: consolidate PEG string parsers (#20263)
* common : consolidate PEG string parsers * cont : fix json_string_content() |
||
|
|
451ef08432 |
common : gracefully handle incomplete output (#20191)
* common : handle incomplete UTF-8 at end of input in PEG parser * cont : if reached end prematurely, emit needs_more_input to propagate partial output * cont: refactor peg parse context to add lenient flag * cont : remove partial flag, keep lenient flag |
||
|
|
566059a26b |
Autoparser - complete refactoring of parser architecture (#18675)
* Autoparser - full single commit squish * Final pre-merge changes: minor fixes, Kimi 2.5 model parser |
||
|
|
c15395f73c |
common : implement new jinja template engine (#18462)
* jinja vm * lexer * add vm types * demo * clean up * parser ok * binary_expression::execute * shadow naming * bin ops works! * fix map object * add string builtins * add more builtins * wip * use mk_val * eval with is_user_input * render gemma tmpl ok * track input string even after transformations * support binded functions * keyword arguments and slicing array * use shared_ptr for values * add mk_stmt * allow print source on exception * fix negate test * testing more templates * mostly works * add filter_statement * allow func to access ctx * add jinja-value.cpp * impl global_from_json * a lot of fixes * more tests * more fix, more tests * more fixes * rm workarounds * demo: type inferrence * add placeholder for tojson * improve function args handling * rm type inference * no more std::regex * trailing spaces * make testing more flexible * make output a bit cleaner * (wip) redirect minja calls * test: add --output * fix crash on macro kwargs * add minimal caps system * add some workarounds * rm caps_apply_workarounds * get rid of preprocessing * more fixes * fix test-chat-template * move test-chat-jinja into test-chat-template * rm test-chat-jinja from cmake * test-chat-template: use common * fix build * fix build (2) * rename vm --> interpreter * improve error reporting * correct lstrip behavior * add tojson * more fixes * disable tests for COMMON_CHAT_FORMAT_GENERIC * make sure tojson output correct order * add object.length * fully functional selectattr / rejectattr * improve error reporting * more builtins added, more fixes * create jinja rendering tests * fix testing.h path * adjust whitespace rules * more fixes * temporary disable test for ibm-granite * r/lstrip behavior matched with hf.js * minimax, glm4.5 ok * add append and pop * kimi-k2 ok * test-chat passed * fix lstrip_block * add more jinja tests * cast to unsigned char * allow dict key to be numeric * nemotron: rm windows newline * tests ok * fix test * rename interpreter --> runtime * fix build * add more checks * bring back generic format support * fix Apertus * [json.exception.out_of_range.403] key 'content' not found * rm generic test * refactor input marking * add docs * fix windows build * clarify error message * improved tests * split/rsplit with maxsplit * non-inverse maxsplit forgot to change after simplifying * implement separators for tojson and fix indent * i like to move it move it * rename null -- > none * token::eof * some nits + comments * add exception classes for lexer and parser * null -> none * rename global -> env * rm minja * update docs * docs: add input marking caveats * imlement missing jinja-tests functions * oops * support trim filter with args, remove bogus to_json reference * numerous argument fixes * updated tests * implement optional strip chars parameter * use new chars parameter * float filter also has default * always leave at least one decimal in float string * jinja : static analysis + header cleanup + minor fixes * add fuzz test * add string.cpp * fix chat_template_kwargs * nits * fix build * revert * unrevert sorry :) * add fuzz func_args, refactor to be safer * fix array.map() * loosen ensure_vals max count condition, add not impl for map(int) * hopefully fix windows * check if empty first * normalize newlines --------- Co-authored-by: Alde Rojas <hello@alde.dev> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> |
||
|
|
0a8026e768 |
common : introduce composable PEG parser combinators for chat parsing (#17136)
* common : implement parser combinators to simplify chat parsing * add virtual destructor to parser_base * fix memory leak from circular references of rules * implement gbnf grammar building * remove unused private variable * create a base visitor and implement id assignment as a visitor * fix const ref for grammar builder * clean up types, friend classes, and class declarations * remove builder usage from until_parser * Use a counter class to help assign rule ids * cache everything * add short description for each parser * create a type for the root parser * implement repetition parser * Make optional, one_or_more, and zero_or_more subclasses of repetition * improve context constructor * improve until parsing and add benchmarks * remove cached() pattern, cache in parser_base with specialized parsing functions for each parser * improve json parsing performance to better match legacy parsing * fix const auto * it for windows * move id assignment to classes instead of using a visitor * create named rules in the command r7b example * use '.' for any in GBNF * fix parens around choices in gbnf grammar * add convenience operators to turn strings to literals * add free-form operators for const char * to simplify defining literals * simplify test case parser * implement semantic actions * remove groups in favor of actions and a scratchpad * add built in actions for common operations * add actions to command r7b example * use std::default_searcher for platforms that don't have bm * improve parser_type handling and add cast helper * add partial result type to better control when to run actions * fix bug in until() * run actions on partial results by default * use common_chat_msg for result * add qwen3 example wip * trash partial idea and simplify * move action arguments to a struct * implement aho-corasick matcher for until_parser and to build exclusion grammars * use std::string for input, since std::string_view is incompatible with std::regex * Refactor tests * improve qwen3 example * implement sax-style parsing and refactor * fix json string in test * rename classes to use common_chat_ prefix * remove is_ suffix from functions * rename from id_counter to just counter * Final refactored tests * Fix executable name and editorconfig-checker * Third time's the charm... * add trigger parser to begin lazy grammar rule generation * working lazy grammar * refactor json rules now that we check for reachability * reduce pointer usage * print out grammars in example * rename to chat-peg-parser* and common_chat_peg_parser* * Revert unrelated changes * New macros for CMakeLists to enable multi-file compilations * starting unicode support * add unicode support to char_parser * use unparsed args as additional sources * Refactor tests to new harness * Fix CMakeLists * fix rate calculation * add unicode tests * fix trailing whitespace and line endings skip-checks: true * Helpers + rewrite qwen3 with helpers * Fix whitespace * extract unicode functions to separate file * refactor parse unicode function * fix compiler error * improve construction of sequence/choice parsers * be less clever * add make_parser helper function * expand usage of make_parser, alias common_chat_msg_peg_parser_builder to builder in source * lower bench iterations * add unicode support to until_parser * add unicode support to json_string_parser * clean up unicode tests * reduce unicode details to match src/unicode.cpp * simplify even further * remove unused functions * fix type * reformat char class parsing * clean up json string parser * clean up + fix diagnostics * reorder includes * compact builder functions * replace action_parser with capture_parser, rename env to semantics * rename env to semantics * clean up common_chat_parse_context * move type() to below constant * use default constructor for common_chat_peg_parser * make all operators functions for consistency * fix compilation errors in test-optional.cpp * simplify result values * rename json_string_unquoted to json_string_content * Move helper to separate class, add separate explicit and helper classes * Whitespace * Change + to append() * Reformat * Add extra helpers, tests and Minimax example * Add some extra optional debugging prints + real example of how to use them * fix bug in repetitions when min_count = 0 reports failures * dump rule in debug * fix token accumulation and assert parsing never fails * indent debug by depth * use LOG_* in tests so logs sync up with test logs * - Add selective testing - Refactor all messaging to use LOG_ERR - Fix lack of argument / tool name capturing - Temporary fix for double event capture * refactor rule() and introduce ref() * clean up visitor * clean up indirection in root parser w.r.t rules * store shared ptr directly in parser classes * replace aho-corasick automation with a simple trie * Reset prev for qwen3 helper example variant * refactor to use value semantics with std::variant/std::visit * simplify trie_matcher result * fix linting issues * add annotations to rules * revert test workaround * implement serializing the parser * remove redundant parsers * remove tests * gbnf generation fixes * remove LOG_* use in tests * update gbnf tests to test entire grammar * clean up gbnf generation and fix a few bugs * fix typo in test output * remove implicit conversion rules * improve test output * rename trie_matcher to trie * simplify trie to just know if a node is the end of a word * remove common_chat_ prefix and ensure a common_peg_ prefix to all types * rename chat-peg-parser -> peg-parser * promote chat-peg-parser-helper to chat-peg-parser * checkpoint * use a static_assert to ensure we handle every branch * inline trivial peg parser builders * use json strings for now * implement basic and native chat peg parser builders/extractors * resolve refs to their rules * remove packrat caching (for now) * update tests * compare parsers with incremental input * benchmark both complete and incremental parsing * add raw string generation from json schema * add support for string schemas in gbnf generation * fix qwen example to include \n * tidy up example * rename extractor to mapper * rename ast_arena to ast * place basic tests into one * use gbnf_format_literal from json-schema-to-grammar * integrate parser with common/chat and server * clean up schema and serialization * add json-schema raw string tests * clean up json creation and remove capture parser * trim spaces from reasoning and content * clean up redundant rules and comments * rename input_is_complete to is_partial to match rest of project * simplify json rules * remove extraneous file * remove comment * implement += and |= operators * add comments to qwen3 implementation * reorder arguments to common_chat_peg_parse * remove commented outdated tests * add explicit copy constructor * fix operators and constness * wip: update test-chat for qwen3-coder * bring json parser closer to json-schema-to-grammar rules * trim trailing space for most things * fix qwen3 coder rules w.r.t. trailing spaces * group rules * do not trim trailing space from string args * tweak spacing of qwen3 grammar * update qwen3-coder tests * qwen3-coder small fixes * place parser in common_chat_syntax to simplify invocation * use std::set to collect rules to keep order predictable for tests * initialize parser to make certain platforms happy * revert back to std::unordered_set, sort rule names at the end instead * uncomment rest of chat tests * define explicit default constructor * improve arena init and server integration * fix chat test * add json_member() * add a comprehensive native example * clean up example qwen test and add response_format example to native test * make build_peg_parser accept std::function instead of template * change peg parser parameters into const ref * push tool call on tool open for constructed parser * add parsing documentation * clean up some comments * add json schema support to qwen3-coder * add id initializer in tests * remove grammar debug line from qwen3-coder * refactor qwen3-coder to use sequence over operators * only call common_chat_peg_parse if appropriate format * simplify qwen3-coder space handling * revert qwen3-coder implementation * revert json-schema-to-grammar changes * remove unnecessary forward declaration * small adjustment to until_parser * rename C/C++ files to use dashes * codeowners : add aldehir to peg-parser and related files --------- Co-authored-by: Piotr Wilkin <piotr.wilkin@syndatis.com> |