配置:

cpu:3A6000
内存:16G
硬盘:256G
显卡:“龙芯牌独立显卡”

未使用显卡加速,纯cpu计算生成。

技术框架

Ollama + deepseek-r1

由于ollama官网不提供LoongArch架构的二进制包,所以需要手工编译ollama,然后直接起ollama服务,pull相应的大模型即可进入对话。

笔者采用的系统环境是aosc

PRETTY_NAME="AOSC OS (12.0.4)"
NAME="AOSC OS"
VERSION_ID="12.0.4"
VERSION="12.0.4 (localhost)"
BUILD_ID="20250122"
ID=aosc
ANSI_COLOR="1;36"
HOME_URL="https://aosc.io/"
SUPPORT_URL="https://github.com/AOSC-Dev/aosc-os-abbs"
BUG_REPORT_URL="https://github.com/AOSC-Dev/aosc-os-abbs/issues"

准备编译环境

安装基本开发环境

sudo oma install build-essential

安装go和python环境

sudo oma install go python cmake

如果您的系统环境是rpm系的,请参考以下命令:

yum groupinstall 'Development Tools' # y + enter
yum install cmake go python g++

下载ollama代码

wget -c https://github.com/ollama/ollama/archive/refs/heads/main.zip

解压代码

unzip main.zip 

开始编译

加速go

export GO111MODULE=on

export GOPROXY=https://goproxy.io

开始构建ollama

cd ollama-main

go generate ./...

go build -p 8 .

等待编译完成,会在当前目录生成可执行二进制文件ollama。

起ollama服务

./ollama serve

pull deepseek-r1大模型

到以下网站选择不同参数的大模型

https://ollama.com/library/deepseek-r1:7b

参数越多,需要的内存越多,消耗的内存基本等于对应大模型自身的大小。

我这里选择默认的7b

./ollama run deepseek-r1:7b

开始玩耍

等待下载校验完成后,自动进入对话模式,开始提问:

生成速度平均一秒一个字,祝大家玩的愉快!

    忘了说一点,pull大模型的时候要另起一个console, 不要关闭起ollama服务的那个console。

      我之前尝试为Arch Linux移植编译过这个包,但是有以下错误:

      ggml-cpu-quants.c:2263:20: error: incompatible types when initializing type ‘__m128’ using type ‘__vector(2) long long int’
      ggml-cpu-quants.c:2264:20: error: incompatible types when initializing type ‘__m128’ using type ‘__vector(2) long long int’
      ggml-cpu-quants.c:2265:20: error: incompatible types when initializing type ‘__m128’ using type ‘__vector(2) long long int’
      ggml-cpu-quants.c:2266:20: error: incompatible types when initializing type ‘__m128’ using type ‘__vector(2) long long int’
      ggml-cpu-quants.c:2271:30: error: incompatible types when initializing type ‘__m128’ using type ‘__m128i’
      ggml-cpu-quants.c:2278:31: error: implicit declaration of function ‘mul_sum_i8_pairs’ [-Wimplicit-function-declaration]
      ggml-cpu-quants.c:2278:31: error: incompatible types when initializing type ‘__m128i’ using type ‘int’
      ggml-cpu-quants.c:2283:31: error: incompatible types when initializing type ‘__m128i’ using type ‘int’
      ggml-cpu-quants.c:2289:30: error: incompatible types when initializing type ‘__m128’ using type ‘__m128i’
      ggml-cpu-quants.c:2296:31: error: incompatible types when initializing type ‘__m128i’ using type ‘int’
      ggml-cpu-quants.c:2301:31: error: incompatible types when initializing type ‘__m128i’ using type ‘int’
      ggml-cpu-quants.c:2322:12: error: implicit declaration of function ‘hsum_float_4x4’ [-Wimplicit-function-declaration]

      现在是已经在main分支中解决了吗?

        Debian sid里试了一下不行,遇到和上面Arch Linux里类似错误:

        # github.com/ollama/ollama/ml/backend/ggml/ggml/src/ggml-cpu
        ggml-cpu-quants.c: In function ‘ggml_vec_dot_q4_0_q8_0’:
        ggml-cpu-quants.c:2241:20: error: incompatible types when initializing type ‘__m128’ using type ‘__vector(2) long long int’
        2241 | __m128 acc_0 = __lsx_vldi(0);
        | ^~~~~~~~~~
        ggml-cpu-quants.c:2242:20: error: incompatible types when initializing type ‘__m128’ using type ‘__vector(2) long long int’
        2242 | __m128 acc_1 = __lsx_vldi(0);
        | ^~~~~~~~~~
        ggml-cpu-quants.c:2243:20: error: incompatible types when initializing type ‘__m128’ using type ‘__vector(2) long long int’
        2243 | __m128 acc_2 = __lsx_vldi(0);
        | ^~~~~~~~~~
        ggml-cpu-quants.c:2244:20: error: incompatible types when initializing type ‘__m128’ using type ‘__vector(2) long long int’
        2244 | __m128 acc_3 = __lsx_vldi(0);
        | ^~~~~~~~~~
        ggml-cpu-quants.c:2249:30: error: incompatible types when initializing type ‘__m128’ using type ‘__m128i’
        2249 | const __m128 d_0_1 = __lsx_vreplgr2vr_w( GGML_FP16_TO_FP32(x[ib].d) * GGML_FP16_TO_FP32(y[ib].d) );
        | ^~~~~~~~~~~~~~~~~~
        ggml-cpu-quants.c:2256:31: error: incompatible types when initializing type ‘__m128i’ using type ‘int’
        2256 | const __m128i i32_0 = mul_sum_i8_pairs(bx_0, by_0);
        | ^~~~~~~~~~~~~~~~
        ggml-cpu-quants.c:2261:31: error: incompatible types when initializing type ‘__m128i’ using type ‘int’
        2261 | const __m128i i32_1 = mul_sum_i8_pairs(bx_1, by_1);
        | ^~~~~~~~~~~~~~~~
        ggml-cpu-quants.c:2267:30: error: incompatible types when initializing type ‘__m128’ using type ‘__m128i’
        2267 | const __m128 d_2_3 = __lsx_vreplgr2vr_w( GGML_FP16_TO_FP32(x[ib + 1].d) * GGML_FP16_TO_FP32(y[ib + 1].d) );
        | ^~~~~~~~~~~~~~~~~~
        ggml-cpu-quants.c:2274:31: error: incompatible types when initializing type ‘__m128i’ using type ‘int’
        2274 | const __m128i i32_2 = mul_sum_i8_pairs(bx_2, by_2);
        | ^~~~~~~~~~~~~~~~
        ggml-cpu-quants.c:2279:31: error: incompatible types when initializing type ‘__m128i’ using type ‘int’
        2279 | const __m128i i32_3 = mul_sum_i8_pairs(bx_3, by_3);
        | ^~~~~~~~~~~~~~~~

          我发现向量似乎未启用,使用该diff可以临时启用向量

          diff --git a/ml/backend/ggml/ggml/src/ggml-cpu/cpu.go b/ml/backend/ggml/ggml/src/ggml-cpu/cpu.go
          index 895d093..e217f85 100644
          --- a/ml/backend/ggml/ggml/src/ggml-cpu/cpu.go
          +++ b/ml/backend/ggml/ggml/src/ggml-cpu/cpu.go
          @@ -1,7 +1,7 @@
           package cpu
           
          -// #cgo CFLAGS: -O3 -Wno-implicit-function-declaration
          -// #cgo CXXFLAGS: -std=c++17
          +// #cgo CFLAGS: -O3 -Wno-implicit-function-declaration -march=la464
          +// #cgo CXXFLAGS: -std=c++17 -march=la464
           // #cgo CPPFLAGS: -I${SRCDIR}/amx -I${SRCDIR}/llamafile -I${SRCDIR}/.. -I${SRCDIR}/../../include
           // #cgo CPPFLAGS: -DGGML_USE_LLAMAFILE
           // #cgo linux CPPFLAGS: -D_GNU_SOURCE
          diff --git a/ml/backend/ggml/ggml/src/ggml.go b/ml/backend/ggml/ggml/src/ggml.go
          index 94b0d18..543ba5c 100644
          --- a/ml/backend/ggml/ggml/src/ggml.go
          +++ b/ml/backend/ggml/ggml/src/ggml.go
          @@ -1,6 +1,7 @@
           package ggml
           
          -// #cgo CXXFLAGS: -std=c++17
          +// #cgo CFLAGS: -march=la464
          +// #cgo CXXFLAGS: -march=la464
           // #cgo CPPFLAGS: -DNDEBUG -DGGML_USE_CPU
           // #cgo CPPFLAGS: -I${SRCDIR}/../include -I${SRCDIR}/ggml-cpu
           // #cgo windows LDFLAGS: -lmsvcrt -static -static-libgcc -static-libstdc++

            junchao mul_sum_i8_pairs这个函数返回类型没有转化,我先看看其他修复

            # github.com/ollama/ollama/llama
            ggml-cpu-quants.c: In function ‘ggml_vec_dot_q4_0_q8_0’:
            ggml-cpu-quants.c:2295:31: error: implicit declaration of function ‘mul_sum_i8_pairs’ [-Wimplicit-function-declaration]
             2295 |         const __m128i i32_0 = mul_sum_i8_pairs(bx_0, by_0);
                  |                               ^~~~~~~~~~~~~~~~
            ggml-cpu-quants.c:2295:31: error: incompatible types when initializing type ‘__m128i’ using type ‘int’
            ggml-cpu-quants.c:2300:31: error: incompatible types when initializing type ‘__m128i’ using type ‘int’
             2300 |         const __m128i i32_1 = mul_sum_i8_pairs(bx_1, by_1);
                  |                               ^~~~~~~~~~~~~~~~
            ggml-cpu-quants.c:2313:31: error: incompatible types when initializing type ‘__m128i’ using type ‘int’
             2313 |         const __m128i i32_2 = mul_sum_i8_pairs(bx_2, by_2);
                  |                               ^~~~~~~~~~~~~~~~
            ggml-cpu-quants.c:2318:31: error: incompatible types when initializing type ‘__m128i’ using type ‘int’
             2318 |         const __m128i i32_3 = mul_sum_i8_pairs(bx_3, by_3);
                  |                               ^~~~~~~~~~~~~~~~

              wszqkzqk 上游的修改里面似乎也没有相关修复

                4 天 后

                Sunny
                实测要打这个diff。
                -march=la464 可以替换成 -march=native,然后就能编译通过了。

                  有同学仍然使用旧世界系统,提供一下Loongnix20.6编译的方案。需要在 龙芯开源社区 下载1.23.3 版本,
                  export PATH=/path/to/your/go1.23.3/bin:$PATH
                  并且增加如下diff:

                  diff --git a/go.mod b/go.mod
                  index 1c99c09..ba7bba2 100644
                  --- a/go.mod
                  +++ b/go.mod
                  @@ -1,6 +1,6 @@
                   module github.com/ollama/ollama
                   
                  -go 1.23.4
                  +go 1.23.3
                   
                   require (
                          github.com/containerd/console v1.0.3
                  diff --git a/ml/backend/ggml/ggml/src/ggml.go b/ml/backend/ggml/ggml/src/ggml.go
                  index 543ba5c..823fc1c 100644
                  --- a/ml/backend/ggml/ggml/src/ggml.go
                  +++ b/ml/backend/ggml/ggml/src/ggml.go
                  @@ -4,6 +4,7 @@ package ggml
                   // #cgo CXXFLAGS: -march=la464
                   // #cgo CPPFLAGS: -DNDEBUG -DGGML_USE_CPU
                   // #cgo CPPFLAGS: -I${SRCDIR}/../include -I${SRCDIR}/ggml-cpu
                  +// #cgo LDFLAGS: -lstdc++fs
                   // #cgo windows LDFLAGS: -lmsvcrt -static -static-libgcc -static-libstdc++
                   // #include <stdlib.h>
                   // #include "ggml-backend.h"

                    其实用llama.cpp 更好,可以直接使用vulkan后端
                    编译时候带 -DGGML_VULKAN=ON 就行。
                    ollama的 7B_q4k模型,在我的6750xt上,llama-bench 可以跑到69.43token/s

                    $ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/.local/lib ~/.local/bin/llama-bench -m /home/user/.ollama/models/blobs/sha256-96c415656d377afbff962f6cdb2394ab092ccbcbaab4b82525bc4ca800fe8a49
                    ggml_vulkan: Found 1 Vulkan devices:
                    ggml_vulkan: 0 = AMD Radeon RX 6750 XT (RADV NAVI22) (radv) | uma: 0 | fp16: 1 | warp size: 64 | matrix cores: none
                    | model | size | params | backend | ngl | test | t/s |
                    | ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | -------------------: |
                    | qwen2 7B Q4_K - Medium | 4.36 GiB | 7.62 B | Vulkan | 99 | pp512 | 483.12 ?0.29 |
                    | qwen2 7B Q4_K - Medium | 4.36 GiB | 7.62 B | Vulkan | 99 | tg128 | 69.43 ?0.01 |

                    build: cfd74c86 (4610)

                      20 天 后

                      无产者 您好,请问下这个ollama二进制文件是编译直接可用的吗?

                        各位大佬,我有3A4000和3B4000的龙芯架构电脑,想试着装一些deepseek,但是由于无法连接互联网,不能拉取ollama官网模型,一些配置环境需要的东西也不能直接网上下载,请问有下载好的资源可以分享一下吗?谢谢🙏

                          没必要玩deepseek-r1了,蒸馏过的小模型试过像垃圾一样。阿里云刚刚推出最新通义千问 QwQ-32B 推理模型,声称性能媲美满血DeepSeek R1 参数量仅用其1/20:

                          https://baijiahao.baidu.com/s?id=1825811100262658822&wfr=spider&for=pc

                          ollama今天也上线了这个模型:https://ollama.com/library/qwq

                          刚刚在X86_64电脑下试了一下还不错。在龙架构上要玩好这个需要上好显卡了,AMD 最顶级的RADEON RX 7900 XTX 24GB。

                            目前ollama/server下的create.go会运用到函数os.OPENROOT,需要GO 1.24.0支持。请勿修改go.mod下的go版本,否则编译会报错

                              鄂ICP备2022017735号

                              此论坛为纯技术类论坛, 且为"独立运作,风险自控", 请用户在论坛中不要讨论无关话题, 以免影响论坛正常运行