SYCL: Use get_multi_ptr instead of deprecated get_pointer in wkv6 (#11087 )

* SYCL: Use get_multi_ptr instead of deprecated get_pointer in wkv6 * Revert "SYCL: Use get_multi_ptr instead of deprecated get_pointer in wkv6" This reverts commit f62dc45f31. * Reland: Use get_multi_ptr instead of deprecated get_pointer in wkv6
llama-run : fix context size (#11094 )
2025-01-07 14:26:07 +08:00 · 2025-01-06 23:45:28 +01:00 · 2025-01-06 17:52:35 +02:00 · 2025-01-06 16:34:49 +01:00 · 2025-01-06 15:36:08 +02:00 · 2025-01-06 13:41:12 +01:00
8 changed files with 30 additions and 23 deletions
@@ -65,12 +65,22 @@ body:
        If possible, please do a git bisect and identify the exact commit that introduced the bug.
    validations:
      required: false
+  - type: textarea
+    id: command
+    attributes:
+      label: Compile command
+      description: >
+        Please provide the exact command you used to compile llama.cpp. For example: `cmake -B ...`.
+        This will be automatically formatted into code, so no need for backticks.
+      render: shell
+    validations:
+      required: true
  - type: textarea
    id: logs
    attributes:
      label: Relevant log output
      description: >
-          Please copy and paste any relevant log output, including the command that you entered and any generated text.
+          Please copy and paste any relevant log output, including any generated text.
          This will be automatically formatted into code, so no need for backticks.
      render: shell
    validations:
@@ -52,6 +52,16 @@ body:
        - Other (Please specify in the next section)
    validations:
      required: false
+  - type: textarea
+    id: command
+    attributes:
+      label: Command line
+      description: >
+        Please provide the exact commands you entered, if applicable. For example: `llama-server -m ... -c ...`, `llama-cli -m ...`, etc.
+        This will be automatically formatted into code, so no need for backticks.
+      render: shell
+    validations:
+      required: false
  - type: textarea
    id: info
    attributes:
@@ -74,7 +84,7 @@ body:
    attributes:
      label: Relevant log output
      description: >
-          If applicable, please copy and paste any relevant log output, including the command that you entered and any generated text.
+          If applicable, please copy and paste any relevant log output, including any generated text.
          This will be automatically formatted into code, so no need for backticks.
      render: shell
    validations:
@@ -1,5 +1,5 @@
 # collaborators can optionally add themselves here to indicate their availability for reviewing related PRs

 /ci/ @ggerganov
-/.devops/ @ngxson
+/.devops/*.Dockerfile @ngxson
 /examples/server/ @ngxson
@@ -83,6 +83,7 @@ class Opt {
        }

        ctx_params.n_batch        = context_size >= 0 ? context_size : context_size_default;
+        ctx_params.n_ctx          = ctx_params.n_batch;
        model_params.n_gpu_layers = ngl >= 0 ? ngl : ngl_default;
        temperature               = temperature >= 0 ? temperature : temperature_default;

@@ -3797,7 +3797,7 @@ int main(int argc, char ** argv) {
        data["input_extra"] = input_extra; // default to empty array if it's not exist

        std::string prompt = json_value(data, "prompt", std::string());
-        std::vector<llama_tokens> tokenized_prompts = tokenize_input_prompts(ctx_server.ctx, prompt, true, true);
+        std::vector<llama_tokens> tokenized_prompts = tokenize_input_prompts(ctx_server.ctx, prompt, false, true);
        SRV_DBG("creating infill tasks, n_prompts = %d\n", (int) tokenized_prompts.size());
        data["prompt"] = format_infill(
            ctx_server.ctx,
@@ -18,7 +18,7 @@ def test_infill_without_input_extra():
        "input_suffix": "}\n",
    })
    assert res.status_code == 200
-    assert match_regex("(Ann|small|shiny)+", res.body["content"])
+    assert match_regex("(Ann|small|shiny|Daddy)+", res.body["content"])


 def test_infill_with_input_extra():
@@ -131,7 +131,7 @@ void ggml_sycl_op_rwkv_wkv6(ggml_backend_sycl_context& ctx, const ggml_tensor* s
            [=](sycl::nd_item<3> item_ct1) {
                rwkv_wkv_f32_kernel(
                    B, T, C, H, k_d, v_d, r_d, tf_d, td_d, s_d, dst_d,
-                    item_ct1, shared_mem_acc.get_pointer()
+                    item_ct1, (float*)shared_mem_acc.get_multi_ptr<sycl::access::decorated::no>().get()
                );
            });
    });
@@ -8,7 +8,6 @@
 #include "llama-kv-cache.h"
 #include "llama-model-loader.h"
 #include "llama-model.h"
-#include "llama-quant.h"

 #include "ggml.h"
 #include "ggml-alloc.h"
@@ -18,12 +17,8 @@
 #include <algorithm>
 #include <array>
 #include <cassert>
-#include <cctype>
 #include <cfloat>
-#include <cinttypes>
-#include <climits>
 #include <cmath>
-#include <cstdarg>
 #include <cstddef>
 #include <cstdint>
 #include <cstdio>
@@ -31,10 +26,7 @@
 #include <ctime>
 #include <functional>
 #include <initializer_list>
-#include <locale>
 #include <map>
-#include <numeric>
-#include <type_traits>

 #if defined(_MSC_VER)
 #pragma warning(disable: 4244 4267) // possible loss of data
@@ -11519,13 +11511,7 @@ int32_t llama_lora_adapter_set(
            struct llama_context * ctx,
            struct llama_lora_adapter * adapter,
            float scale) {
-    if (ctx->cparams.flash_attn) {
-        LLAMA_LOG_ERROR("%s: flash_attn is not compatible with LoRA\n", __func__);
-        return -1;
-    }
-
    ctx->lora_adapters[adapter] = scale;
-
    return 0;
 }

@@ -12440,16 +12426,16 @@ int llama_split_path(char * split_path, size_t maxlen, const char * path_prefix,
    return 0;
 }

-int llama_split_prefix(char * dest, size_t maxlen, const char * split_path, int split_no, int split_count) {
+int llama_split_prefix(char * split_prefix, size_t maxlen, const char * split_path, int split_no, int split_count) {
    std::string str_split_path(split_path);
    char postfix[32];
    snprintf(postfix, 32, "-%05d-of-%05d.gguf", split_no + 1, split_count);
    std::string str_postfix(postfix);

-    // check if dest ends with postfix
+    // check if split_prefix ends with postfix
    int size_prefix = str_split_path.size() - str_postfix.size();
    if (size_prefix > 0 && str_split_path.find(str_postfix, size_prefix) != std::string::npos) {
-        snprintf(dest, std::min((size_t) size_prefix + 1, maxlen), "%s", split_path);
+        snprintf(split_prefix, std::min((size_t) size_prefix + 1, maxlen), "%s", split_path);
        return size_prefix;
    }
Author	SHA1	Message	Date
Akarshan Biswas	c0d6f790d0	SYCL: Use get_multi_ptr instead of deprecated get_pointer in wkv6 (#11087 ) * SYCL: Use get_multi_ptr instead of deprecated get_pointer in wkv6 * Revert "SYCL: Use get_multi_ptr instead of deprecated get_pointer in wkv6" This reverts commit `f62dc45f31`. * Reland: Use get_multi_ptr instead of deprecated get_pointer in wkv6	2025-01-07 14:26:07 +08:00
Eric Curtin	dc7cef9f37	llama-run : fix context size (#11094 ) Set `n_ctx` equal to `n_batch` in `Opt` class. Now context size is a more reasonable 2048. Signed-off-by: Eric Curtin <ecurtin@redhat.com>	2025-01-06 23:45:28 +01:00
Georgi Gerganov	ecebbd292d	llama : remove unused headers (#11109 ) ggml-ci	2025-01-06 17:52:35 +02:00
Xuan Son Nguyen	96be8c3264	github : add cmd line field to bug report (#11090 ) * github : cmd line to bug report * codeowners : (@ngxson) only watch dockerfile * Apply suggestions from code review [no ci] Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * rm cmd in log output [no ci] * rm 2 [no ci] * no need backticks [no ci] --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2025-01-06 16:34:49 +01:00
Georgi Gerganov	e6e7c75d94	server : fix extra BOS in infill endpoint (#11106 ) * server : fix extra BOS in infill endpoing ggml-ci * server : update infill tests	2025-01-06 15:36:08 +02:00
Xuan Son Nguyen	09186fabbe	llama : remove check flash_attn with lora (#11104 )	2025-01-06 13:41:12 +01:00