在3a6000上玩deepseek-r1大模型

Ssiyanteng · 2月3日

配置：

cpu：3A6000
内存：16G
硬盘：256G
显卡：“龙芯牌独立显卡”

未使用显卡加速，纯cpu计算生成。

技术框架

Ollama + deepseek-r1

由于ollama官网不提供LoongArch架构的二进制包，所以需要手工编译ollama，然后直接起ollama服务，pull相应的大模型即可进入对话。

笔者采用的系统环境是aosc

PRETTY_NAME="AOSC OS (12.0.4)"
NAME="AOSC OS"
VERSION_ID="12.0.4"
VERSION="12.0.4 (localhost)"
BUILD_ID="20250122"
ID=aosc
ANSI_COLOR="1;36"
HOME_URL="https://aosc.io/"
SUPPORT_URL="https://github.com/AOSC-Dev/aosc-os-abbs"
BUG_REPORT_URL="https://github.com/AOSC-Dev/aosc-os-abbs/issues"

准备编译环境

安装基本开发环境

sudo oma install build-essential

安装go和python环境

sudo oma install go python cmake

如果您的系统环境是rpm系的，请参考以下命令：

yum groupinstall 'Development Tools' # y + enter
yum install cmake go python g++

下载ollama代码

wget -c https://github.com/ollama/ollama/archive/refs/heads/main.zip

解压代码

unzip main.zip

开始编译

加速go

export GO111MODULE=on

export GOPROXY=https://goproxy.io

开始构建ollama

cd ollama-main

go generate ./...

go build -p 8 .

等待编译完成，会在当前目录生成可执行二进制文件ollama。

起ollama服务

./ollama serve

pull deepseek-r1大模型

到以下网站选择不同参数的大模型

https://ollama.com/library/deepseek-r1:7b

参数越多，需要的内存越多，消耗的内存基本等于对应大模型自身的大小。

我这里选择默认的7b

./ollama run deepseek-r1:7b

开始玩耍

等待下载校验完成后，自动进入对话模式，开始提问：

生成速度平均一秒一个字，祝大家玩的愉快！

Ssiyanteng · 2月3日

忘了说一点，pull大模型的时候要另起一个console, 不要关闭起ollama服务的那个console。

无产者 · 2月4日

司工牛逼！

wszqkzqk · 2月4日

我之前尝试为Arch Linux移植编译过这个包，但是有以下错误：

https://github.com/lcpu-club/loongarch-packages/pull/366

ggml-cpu-quants.c:2263:20: error: incompatible types when initializing type ‘__m128’ using type ‘__vector(2) long long int’
ggml-cpu-quants.c:2264:20: error: incompatible types when initializing type ‘__m128’ using type ‘__vector(2) long long int’
ggml-cpu-quants.c:2265:20: error: incompatible types when initializing type ‘__m128’ using type ‘__vector(2) long long int’
ggml-cpu-quants.c:2266:20: error: incompatible types when initializing type ‘__m128’ using type ‘__vector(2) long long int’
ggml-cpu-quants.c:2271:30: error: incompatible types when initializing type ‘__m128’ using type ‘__m128i’
ggml-cpu-quants.c:2278:31: error: implicit declaration of function ‘mul_sum_i8_pairs’ [-Wimplicit-function-declaration]
ggml-cpu-quants.c:2278:31: error: incompatible types when initializing type ‘__m128i’ using type ‘int’
ggml-cpu-quants.c:2283:31: error: incompatible types when initializing type ‘__m128i’ using type ‘int’
ggml-cpu-quants.c:2289:30: error: incompatible types when initializing type ‘__m128’ using type ‘__m128i’
ggml-cpu-quants.c:2296:31: error: incompatible types when initializing type ‘__m128i’ using type ‘int’
ggml-cpu-quants.c:2301:31: error: incompatible types when initializing type ‘__m128i’ using type ‘int’
ggml-cpu-quants.c:2322:12: error: implicit declaration of function ‘hsum_float_4x4’ [-Wimplicit-function-declaration]

现在是已经在main分支中解决了吗？

随随便看看 · 2月4日

Debian sid里试了一下不行，遇到和上面Arch Linux里类似错误：

# github.com/ollama/ollama/ml/backend/ggml/ggml/src/ggml-cpu ggml-cpu-quants.c: In function ‘ggml_vec_dot_q4_0_q8_0’: ggml-cpu-quants.c:2241:20: error: incompatible types when initializing type ‘__m128’ using type ‘__vector(2) long long int’ 2241 | __m128 acc_0 = __lsx_vldi(0); | ^~~~~~~~~~ ggml-cpu-quants.c:2242:20: error: incompatible types when initializing type ‘__m128’ using type ‘__vector(2) long long int’ 2242 | __m128 acc_1 = __lsx_vldi(0); | ^~~~~~~~~~ ggml-cpu-quants.c:2243:20: error: incompatible types when initializing type ‘__m128’ using type ‘__vector(2) long long int’ 2243 | __m128 acc_2 = __lsx_vldi(0); | ^~~~~~~~~~ ggml-cpu-quants.c:2244:20: error: incompatible types when initializing type ‘__m128’ using type ‘__vector(2) long long int’ 2244 | __m128 acc_3 = __lsx_vldi(0); | ^~~~~~~~~~ ggml-cpu-quants.c:2249:30: error: incompatible types when initializing type ‘__m128’ using type ‘__m128i’ 2249 | const __m128 d_0_1 = __lsx_vreplgr2vr_w( GGML_FP16_TO_FP32(x[ib].d) * GGML_FP16_TO_FP32(y[ib].d) ); | ^~~~~~~~~~~~~~~~~~ ggml-cpu-quants.c:2256:31: error: incompatible types when initializing type ‘__m128i’ using type ‘int’ 2256 | const __m128i i32_0 = mul_sum_i8_pairs(bx_0, by_0); | ^~~~~~~~~~~~~~~~ ggml-cpu-quants.c:2261:31: error: incompatible types when initializing type ‘__m128i’ using type ‘int’ 2261 | const __m128i i32_1 = mul_sum_i8_pairs(bx_1, by_1); | ^~~~~~~~~~~~~~~~ ggml-cpu-quants.c:2267:30: error: incompatible types when initializing type ‘__m128’ using type ‘__m128i’ 2267 | const __m128 d_2_3 = __lsx_vreplgr2vr_w( GGML_FP16_TO_FP32(x[ib + 1].d) * GGML_FP16_TO_FP32(y[ib + 1].d) ); | ^~~~~~~~~~~~~~~~~~ ggml-cpu-quants.c:2274:31: error: incompatible types when initializing type ‘__m128i’ using type ‘int’ 2274 | const __m128i i32_2 = mul_sum_i8_pairs(bx_2, by_2); | ^~~~~~~~~~~~~~~~ ggml-cpu-quants.c:2279:31: error: incompatible types when initializing type ‘__m128i’ using type ‘int’ 2279 | const __m128i i32_3 = mul_sum_i8_pairs(bx_3, by_3); | ^~~~~~~~~~~~~~~~

SSunny · 2月5日

我发现向量似乎未启用，使用该diff可以临时启用向量

diff --git a/ml/backend/ggml/ggml/src/ggml-cpu/cpu.go b/ml/backend/ggml/ggml/src/ggml-cpu/cpu.go
index 895d093..e217f85 100644
--- a/ml/backend/ggml/ggml/src/ggml-cpu/cpu.go
+++ b/ml/backend/ggml/ggml/src/ggml-cpu/cpu.go
@@ -1,7 +1,7 @@
 package cpu
 
-// #cgo CFLAGS: -O3 -Wno-implicit-function-declaration
-// #cgo CXXFLAGS: -std=c++17
+// #cgo CFLAGS: -O3 -Wno-implicit-function-declaration -march=la464
+// #cgo CXXFLAGS: -std=c++17 -march=la464
 // #cgo CPPFLAGS: -I${SRCDIR}/amx -I${SRCDIR}/llamafile -I${SRCDIR}/.. -I${SRCDIR}/../../include
 // #cgo CPPFLAGS: -DGGML_USE_LLAMAFILE
 // #cgo linux CPPFLAGS: -D_GNU_SOURCE
diff --git a/ml/backend/ggml/ggml/src/ggml.go b/ml/backend/ggml/ggml/src/ggml.go
index 94b0d18..543ba5c 100644
--- a/ml/backend/ggml/ggml/src/ggml.go
+++ b/ml/backend/ggml/ggml/src/ggml.go
@@ -1,6 +1,7 @@
 package ggml
 
-// #cgo CXXFLAGS: -std=c++17
+// #cgo CFLAGS: -march=la464
+// #cgo CXXFLAGS: -march=la464
 // #cgo CPPFLAGS: -DNDEBUG -DGGML_USE_CPU
 // #cgo CPPFLAGS: -I${SRCDIR}/../include -I${SRCDIR}/ggml-cpu
 // #cgo windows LDFLAGS: -lmsvcrt -static -static-libgcc -static-libstdc++

Jjunchao · 2月6日

wszqkzqk https://github.com/ggerganov/llama.cpp/pull/11701

wszqkzqk · 2月6日

junchao mul_sum_i8_pairs这个函数返回类型没有转化，我先看看其他修复

# github.com/ollama/ollama/llama
ggml-cpu-quants.c: In function ‘ggml_vec_dot_q4_0_q8_0’:
ggml-cpu-quants.c:2295:31: error: implicit declaration of function ‘mul_sum_i8_pairs’ [-Wimplicit-function-declaration]
 2295 |         const __m128i i32_0 = mul_sum_i8_pairs(bx_0, by_0);
      |                               ^~~~~~~~~~~~~~~~
ggml-cpu-quants.c:2295:31: error: incompatible types when initializing type ‘__m128i’ using type ‘int’
ggml-cpu-quants.c:2300:31: error: incompatible types when initializing type ‘__m128i’ using type ‘int’
 2300 |         const __m128i i32_1 = mul_sum_i8_pairs(bx_1, by_1);
      |                               ^~~~~~~~~~~~~~~~
ggml-cpu-quants.c:2313:31: error: incompatible types when initializing type ‘__m128i’ using type ‘int’
 2313 |         const __m128i i32_2 = mul_sum_i8_pairs(bx_2, by_2);
      |                               ^~~~~~~~~~~~~~~~
ggml-cpu-quants.c:2318:31: error: incompatible types when initializing type ‘__m128i’ using type ‘int’
 2318 |         const __m128i i32_3 = mul_sum_i8_pairs(bx_3, by_3);
      |                               ^~~~~~~~~~~~~~~~

wszqkzqk · 2月6日

wszqkzqk 上游的修改里面似乎也没有相关修复

Resbi · 2月10日

Sunny
实测要打这个diff。
-march=la464 可以替换成 -march=native，然后就能编译通过了。

无产者 · 2月10日

孙海勇老师编译的ollama二进制：
https://mirrors.wsyu.edu.cn/fedora/linux/Yongbao/test/20250208/ollama.loongarch64.bin

无产者 · 2月10日

xen0n已提交修复编译问题的补丁，次不懂已合并。
https://go-review.googlesource.com/c/go/+/647956

SSunny · 2月12日

有同学仍然使用旧世界系统，提供一下Loongnix20.6编译的方案。需要在龙芯开源社区下载1.23.3 版本，
export PATH=/path/to/your/go1.23.3/bin:$PATH
并且增加如下diff:

diff --git a/go.mod b/go.mod
index 1c99c09..ba7bba2 100644
--- a/go.mod
+++ b/go.mod
@@ -1,6 +1,6 @@
 module github.com/ollama/ollama
 
-go 1.23.4
+go 1.23.3
 
 require (
        github.com/containerd/console v1.0.3
diff --git a/ml/backend/ggml/ggml/src/ggml.go b/ml/backend/ggml/ggml/src/ggml.go
index 543ba5c..823fc1c 100644
--- a/ml/backend/ggml/ggml/src/ggml.go
+++ b/ml/backend/ggml/ggml/src/ggml.go
@@ -4,6 +4,7 @@ package ggml
 // #cgo CXXFLAGS: -march=la464
 // #cgo CPPFLAGS: -DNDEBUG -DGGML_USE_CPU
 // #cgo CPPFLAGS: -I${SRCDIR}/../include -I${SRCDIR}/ggml-cpu
+// #cgo LDFLAGS: -lstdc++fs
 // #cgo windows LDFLAGS: -lmsvcrt -static -static-libgcc -static-libstdc++
 // #include <stdlib.h>
 // #include "ggml-backend.h"

Pphorcys · 2月13日

其实用llama.cpp 更好，可以直接使用vulkan后端
编译时候带 -DGGML_VULKAN=ON 就行。
ollama的 7B_q4k模型，在我的6750xt上，llama-bench 可以跑到69.43token/s

$ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/.local/lib ~/.local/bin/llama-bench -m /home/user/.ollama/models/blobs/sha256-96c415656d377afbff962f6cdb2394ab092ccbcbaab4b82525bc4ca800fe8a49
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 6750 XT (RADV NAVI22) (radv) | uma: 0 | fp16: 1 | warp size: 64 | matrix cores: none
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | -------------------: |
| qwen2 7B Q4_K - Medium | 4.36 GiB | 7.62 B | Vulkan | 99 | pp512 | 483.12 ?0.29 |
| qwen2 7B Q4_K - Medium | 4.36 GiB | 7.62 B | Vulkan | 99 | tg128 | 69.43 ?0.01 |

build: cfd74c86 (4610)

Sshup · 3月5日

无产者您好，请问下这个ollama二进制文件是编译直接可用的吗？

Sshup · 3月5日

各位大佬，我有3A4000和3B4000的龙芯架构电脑，想试着装一些deepseek，但是由于无法连接互联网，不能拉取ollama官网模型，一些配置环境需要的东西也不能直接网上下载，请问有下载好的资源可以分享一下吗？谢谢

随随便看看 · 3月6日

没必要玩deepseek-r1了，蒸馏过的小模型试过像垃圾一样。阿里云刚刚推出最新通义千问 QwQ-32B 推理模型，声称性能媲美满血DeepSeek R1 参数量仅用其1/20：

https://baijiahao.baidu.com/s?id=1825811100262658822&wfr=spider&for=pc

ollama今天也上线了这个模型：https://ollama.com/library/qwq

刚刚在X86_64电脑下试了一下还不错。在龙架构上要玩好这个需要上好显卡了，AMD 最顶级的RADEON RX 7900 XTX 24GB。

Bbeginner · 1 个月前

目前ollama/server下的create.go会运用到函数os.OPENROOT，需要GO 1.24.0支持。请勿修改go.mod下的go版本，否则编译会报错