Files
claudeopus46 1ea0b4798d
EditorConfig Checker / editorconfig (push) Has been cancelled
CI (self-hosted) / ggml-ci-nvidia-cuda (push) Has been cancelled
CI (self-hosted) / ggml-ci-nvidia-vulkan-cm (push) Has been cancelled
CI (self-hosted) / ggml-ci-nvidia-vulkan-cm2 (push) Has been cancelled
CI (self-hosted) / ggml-ci-linux-intel-vulkan (push) Has been cancelled
CI (self-hosted) / ggml-ci-intel-openvino-gpu-low-perf (push) Has been cancelled
CI / build-cmake-pkg (push) Has been cancelled
CI / macOS-latest-arm64 (push) Has been cancelled
CI / macOS-latest-x64 (push) Has been cancelled
CI / macOS-latest-arm64-webgpu (push) Has been cancelled
CI / ubuntu-cpu (arm64, ubuntu-22.04-arm) (push) Has been cancelled
CI / ubuntu-cpu (ppc64le, ubuntu-24.04-ppc64le) (push) Has been cancelled
CI / ubuntu-cpu (s390x, ubuntu-24.04-s390x) (push) Has been cancelled
CI / ubuntu-cpu (x64, ubuntu-22.04) (push) Has been cancelled
CI / ubuntu-latest-rpc (push) Has been cancelled
CI / ubuntu-24-vulkan (push) Has been cancelled
CI / ubuntu-24-webgpu (push) Has been cancelled
CI / ubuntu-24-webgpu-wasm (push) Has been cancelled
CI / ubuntu-22-hip (push) Has been cancelled
CI / ubuntu-22-musa (push) Has been cancelled
CI / ubuntu-22-sycl (push) Has been cancelled
CI / ubuntu-22-sycl-fp16 (push) Has been cancelled
CI / ubuntu-24-openvino-CPU (push) Has been cancelled
CI / ubuntu-24-openvino-GPU (push) Has been cancelled
CI / windows-latest (arm64, llvm-arm64, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/arm64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON) (push) Has been cancelled
CI / windows-latest (arm64, llvm-arm64-opencl-adreno, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/arm64-windows-llvm.cmake -DCMAKE_PREFIX_PATH="$env:RUNNER_TEMP/opencl-arm64-release" -DGGML_OPENCL=ON -DGGML_OPENCL_USE_ADRENO_KERNELS=ON) (push) Has been cancelled
CI / windows-latest (x64, cpu-x64 (static), -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/x64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DBUILD_SHARED_LIBS=OFF) (push) Has been cancelled
CI / windows-latest (x64, openblas-x64, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/x64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON -DGGML_OPENMP=OFF -DGGML_BLAS=ON -DG… (push) Has been cancelled
CI / windows-latest (x64, vulkan-x64, -DCMAKE_BUILD_TYPE=Release -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON -DGGML_VULKAN=ON) (push) Has been cancelled
CI / ubuntu-latest-cuda (push) Has been cancelled
CI / windows-2022-cuda (12.4) (push) Has been cancelled
CI / windows-latest-sycl (push) Has been cancelled
CI / windows-latest-hip (push) Has been cancelled
CI / ubuntu-cpu-riscv64-native (push) Has been cancelled
CI / ggml-ci-x64-cpu-low-perf (push) Has been cancelled
CI / ggml-ci-arm64-cpu-low-perf (push) Has been cancelled
CI / ggml-ci-x64-cpu-high-perf (push) Has been cancelled
CI / ggml-ci-arm64-cpu-high-perf (push) Has been cancelled
CI / ggml-ci-arm64-cpu-high-perf-sve (push) Has been cancelled
CI / ggml-ci-arm64-cpu-kleidiai (push) Has been cancelled
CI / ggml-ci-arm64-cpu-kleidiai-graviton4 (push) Has been cancelled
HIP quality check / ubuntu-22-hip-quality-check (push) Has been cancelled
Release / macOS-arm64 (push) Has been cancelled
Release / macOS-x64 (push) Has been cancelled
Release / ubuntu-22-cpu (s390x, ubuntu-24.04-s390x) (push) Has been cancelled
Release / ubuntu-22-cpu (x64, ubuntu-22.04) (push) Has been cancelled
Release / ubuntu-22-vulkan (push) Has been cancelled
Release / ubuntu-24-openvino (push) Has been cancelled
Release / windows-cpu (arm64) (push) Has been cancelled
Release / windows-cpu (x64) (push) Has been cancelled
Release / windows (arm64, opencl-adreno, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/arm64-windows-llvm.cmake -DCMAKE_PREFIX_PATH="$env:RUNNER_TEMP/opencl-arm64-release" -DGGML_OPENCL=ON -DGGML_OPENCL_USE_ADRENO_KERNELS=ON, ggml-opencl) (push) Has been cancelled
Release / windows (x64, vulkan, -DGGML_VULKAN=ON, ggml-vulkan) (push) Has been cancelled
Release / windows-cuda (12.4) (push) Has been cancelled
Release / windows-cuda (13.1) (push) Has been cancelled
Release / windows-sycl (push) Has been cancelled
Release / ubuntu-22-rocm (7.2, x64, gfx908;gfx90a;gfx942;gfx1030;gfx1100;gfx1101;gfx1151;gfx1150;gfx1200;gfx1201) (push) Has been cancelled
Release / windows-hip (gfx1150;gfx1151;gfx1200;gfx1201;gfx1100;gfx1101;gfx1102;gfx1030;gfx1031;gfx1032, radeon) (push) Has been cancelled
Release / ios-xcode-build (push) Has been cancelled
Release / openEuler-cann (aarch64, Release, 310p, off) (push) Has been cancelled
Release / openEuler-cann (aarch64, Release, 910b, on) (push) Has been cancelled
Release / openEuler-cann (x86, Release, 310p, off) (push) Has been cancelled
Release / openEuler-cann (x86, Release, 910b, on) (push) Has been cancelled
Server (self-hosted) / server-metal (GPUx2, backend-sampling) (push) Has been cancelled
Server (self-hosted) / server-metal (GPUx2) (push) Has been cancelled
Server (self-hosted) / server-metal (GPUx1) (push) Has been cancelled
Server (self-hosted) / server-metal (GPUx1, backend-sampling) (push) Has been cancelled
Server (self-hosted) / server-cuda (GPUx1) (push) Has been cancelled
Server (self-hosted) / server-cuda (GPUx1, backend-sampling) (push) Has been cancelled
Server / server (default) (push) Has been cancelled
Server / server (backend-sampling) (push) Has been cancelled
Server / server-windows (push) Has been cancelled
Release / release (push) Has been cancelled
fix: FWHT butterfly loop warp divergence on RDNA3 iGPU (gfx1103)
The `if (tid < 64)` guard only let half the threads participate in the
128-element Walsh-Hadamard Transform, leaving elements 64-127
untransformed. This caused warp divergence and register pressure that
crashes low-CU iGPUs (AMD 780M / gfx1103 with 6 CUs).

Replace with proper bounds check `if (i + h < TURBO_HEAD_DIM)` so all
128 threads participate naturally. Fixes both correctness (full FWHT
over all elements) and GPU occupancy on resource-constrained hardware.

Affects: dequantize_block_turbo3_0, dequantize_block_turbo4_0,
k_set_rows_turbo3, k_set_rows_turbo4

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 00:19:35 +02:00
..
2024-07-13 18:12:39 +02:00