fix: match Ollama's proven gfx1103 approach — gfx1102 target + rocBLAS

Remove GGML_CUDA_FORCE_MMQ — let rocBLAS handle large batch GEMMs using gfx1102 TensileLibrary (available in ROCm 7.2). The GPU is spoofed as gfx1102 via HSA_OVERRIDE_GFX_VERSION=11.0.2 at runtime, matching Ollama's working configuration. FORCE_MMQ caused crashes because MMQ kernel launch_bounds are tuned for GPUs with many CUs and cannot fit on the 6-CU iGPU for large matrix dimensions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 03:05:06 +02:00
parent 94127d7b33
commit 457e76fc0e
1 changed files with 0 additions and 1 deletions
@@ -38,7 +38,6 @@ RUN HIPCXX="$(hipconfig -l)/clang" HIP_PATH="$(hipconfig -R)" \
    cmake -S . -B build \
        -DGGML_HIP=ON \
        -DGGML_HIP_ROCWMMA_FATTN=OFF \
-        -DGGML_CUDA_FORCE_MMQ=ON \
        -DAMDGPU_TARGETS="$ROCM_DOCKER_ARCH" \
        -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON \
        -DCMAKE_BUILD_TYPE=Release -DLLAMA_BUILD_TESTS=OFF \