QEMU ENABLE CXL 3.0

虚拟机配置

使用Host Bridge直连: 2路相连,每路两个CXL设备
拓扑图如下:

1
2
3
4
5
6
7
Host
├── CXL Host Bridge (HB0)
│ ├── Root Port 0 → Type3 Device 0
│ └── Root Port 1 → Type3 Device 1
└── CXL Host Bridge (HB1)
├── Root Port 2 → Type3 Device 2
└── Root Port 3 → Type3 Device 3

启动脚本如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
QEMU_BIN="$HOME/tool/qemu/build/qemu-system-x86_64"
KERNEL_IMG="$HOME/work/mempool/linux/arch/x86_64/boot/bzImage"
DISK_IMG="$HOME/tool/qemu/qemu-cxl-img"

$QEMU_BIN \
-s \
-kernel $KERNEL_IMG \
-append "root=/dev/sda rw console=ttyS0,115200 ignore_loglevel nokaslr \
cxl_acpi.dyndbg=+fplm cxl_pci.dyndbg=+fplm cxl_core.dyndbg=+fplm \
cxl_mem.dyndbg=+fplm cxl_pmem.dyndbg=+fplm cxl_port.dyndbg=+fplm \
cxl_region.dyndbg=+fplm cxl_test.dyndbg=+fplm cxl_mock.dyndbg=+fplm \
cxl_mock_mem.dyndbg=+fplm dax.dyndbg=+fplm dax_cxl.dyndbg=+fplm \
device_dax.dyndbg=+fplm" \
-smp 1 \
-accel kvm \
-serial mon:stdio \
-nographic \
-qmp tcp:localhost:4444,server,wait=off \
-netdev user,id=network0,hostfwd=tcp::2024-:22 \
-device e1000,netdev=network0 \
-monitor telnet:127.0.0.1:12345,server,nowait \
-drive file=$DISK_IMG,index=0,media=disk,format=raw \
-machine q35,cxl=on -m 8G,maxmem=32G,slots=8 \
-virtfs local,path=/lib/modules,mount_tag=modshare,security_model=mapped \
-virtfs local,path=/home/zmy,mount_tag=homeshare,security_model=mapped # 替换用户目录为你的目录\
-object memory-backend-file,id=cxl-mem1,share=on,mem-path=/tmp/cxltest.raw,size=256M \
-object memory-backend-file,id=cxl-mem2,share=on,mem-path=/tmp/cxltest2.raw,size=256M \
-object memory-backend-file,id=cxl-mem3,share=on,mem-path=/tmp/cxltest3.raw,size=256M \
-object memory-backend-file,id=cxl-mem4,share=on,mem-path=/tmp/cxltest4.raw,size=256M \
-object memory-backend-file,id=cxl-lsa1,share=on,mem-path=/tmp/lsa.raw,size=256M \
-object memory-backend-file,id=cxl-lsa2,share=on,mem-path=/tmp/lsa2.raw,size=256M \
-object memory-backend-file,id=cxl-lsa3,share=on,mem-path=/tmp/lsa3.raw,size=256M \
-object memory-backend-file,id=cxl-lsa4,share=on,mem-path=/tmp/lsa4.raw,size=256M \
-device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
-device pxb-cxl,bus_nr=222,bus=pcie.0,id=cxl.2 \
-device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \
-device cxl-type3,bus=root_port13,persistent-memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem0,sn=0x1 \
-device cxl-rp,port=1,bus=cxl.1,id=root_port14,chassis=0,slot=3 \
-device cxl-type3,bus=root_port14,persistent-memdev=cxl-mem2,lsa=cxl-lsa2,id=cxl-pmem1,sn=0x2 \
-device cxl-rp,port=0,bus=cxl.2,id=root_port15,chassis=0,slot=5 \
-device cxl-type3,bus=root_port15,persistent-memdev=cxl-mem3,lsa=cxl-lsa3,id=cxl-pmem2,sn=0x3 \
-device cxl-rp,port=1,bus=cxl.2,id=root_port16,chassis=0,slot=6 \
-device cxl-type3,bus=root_port16,persistent-memdev=cxl-mem4,lsa=cxl-lsa4,id=cxl-pmem3,sn=0x4 \
-M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.targets.1=cxl.2,cxl-fmw.0.size=4G,cxl-fmw.0.interleave-granularity=8k

使用交换机相连: 2路相连,每路两个CXL设备;延迟较第一种方法略高
拓扑图如下:

1
2
3
4
5
6
7
8
Host
└── CXL Host Bridge
└── Root Port
└── CXL Switch (Upstream Port)
├── Downstream Port 0 → Type3 Device 0
├── Downstream Port 1 → Type3 Device 1
├── Downstream Port 2 → Type3 Device 2
└── Downstream Port 3 → Type3 Device 3

启动脚本如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
QEMU_BIN="$HOME/tool/qemu/build/qemu-system-x86_64"
KERNEL_IMG="$HOME/work/mempool/linux/arch/x86_64/boot/bzImage"
DISK_IMG="$HOME/tool/qemu/qemu-cxl-img"

$QEMU_BIN \
-s \
-kernel $KERNEL_IMG \
-append "root=/dev/sda rw console=ttyS0,115200 ignore_loglevel nokaslr \
cxl_acpi.dyndbg=+fplm cxl_pci.dyndbg=+fplm cxl_core.dyndbg=+fplm \
cxl_mem.dyndbg=+fplm cxl_pmem.dyndbg=+fplm cxl_port.dyndbg=+fplm \
cxl_region.dyndbg=+fplm cxl_test.dyndbg=+fplm cxl_mock.dyndbg=+fplm \
cxl_mock_mem.dyndbg=+fplm dax.dyndbg=+fplm dax_cxl.dyndbg=+fplm \
device_dax.dyndbg=+fplm" \
-smp 1 \
-accel kvm \
-serial mon:stdio \
-nographic \
-qmp tcp:localhost:4444,server,wait=off \
-netdev user,id=network0,hostfwd=tcp::2024-:22 \
-device e1000,netdev=network0 \
-monitor telnet:127.0.0.1:12345,server,nowait \
-drive file=$DISK_IMG,index=0,media=disk,format=raw \
-machine q35,cxl=on -m 8G,maxmem=32G,slots=8 \
-virtfs local,path=/lib/modules,mount_tag=modshare,security_model=mapped \
-virtfs local,path=/home/zmy,mount_tag=homeshare,security_model=mapped # 替换用户名 \
-object memory-backend-file,id=cxl-mem0,share=on,mem-path=/tmp/cxltest.raw,size=256M \
-object memory-backend-file,id=cxl-mem1,share=on,mem-path=/tmp/cxltest1.raw,size=256M \
-object memory-backend-file,id=cxl-mem2,share=on,mem-path=/tmp/cxltest2.raw,size=256M \
-object memory-backend-file,id=cxl-mem3,share=on,mem-path=/tmp/cxltest3.raw,size=256M \
-object memory-backend-file,id=cxl-lsa0,share=on,mem-path=/tmp/lsa0.raw,size=256M \
-object memory-backend-file,id=cxl-lsa1,share=on,mem-path=/tmp/lsa1.raw,size=256M \
-object memory-backend-file,id=cxl-lsa2,share=on,mem-path=/tmp/lsa2.raw,size=256M \
-object memory-backend-file,id=cxl-lsa3,share=on,mem-path=/tmp/lsa3.raw,size=256M \
-device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
-device cxl-rp,port=0,bus=cxl.1,id=root_port0,chassis=0,slot=0 \
-device cxl-rp,port=1,bus=cxl.1,id=root_port1,chassis=0,slot=1 \
-device cxl-upstream,bus=root_port0,id=us0 \
-device cxl-downstream,port=0,bus=us0,id=swport0,chassis=0,slot=4 \
-device cxl-type3,bus=swport0,persistent-memdev=cxl-mem0,lsa=cxl-lsa0,id=cxl-pmem0,sn=0x1 \
-device cxl-downstream,port=1,bus=us0,id=swport1,chassis=0,slot=5 \
-device cxl-type3,bus=swport1,persistent-memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem1,sn=0x2 \
-device cxl-downstream,port=2,bus=us0,id=swport2,chassis=0,slot=6 \
-device cxl-type3,bus=swport2,persistent-memdev=cxl-mem2,lsa=cxl-lsa2,id=cxl-pmem2,sn=0x3 \
-device cxl-downstream,port=3,bus=us0,id=swport3,chassis=0,slot=7 \
-device cxl-type3,bus=swport3,persistent-memdev=cxl-mem3,lsa=cxl-lsa3,id=cxl-pmem3,sn=0x4 \
-M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G,cxl-fmw.0.interleave-granularity=4k

ndctl工具

ndctl工具包包含cxl-cli,可以在用户态进行CXL的配置管理和监控

从官方源安装

使用命令sudo apt install ndctl可以安装,但是ubuntu22.04支持的ndctl版本为v72,只能简单的列出设备内存信息,对于CXL3.0支持差,只有list,version,help,monitor,read & write label六个命令

1
2
3
4
5
6
7
8
9
10
sudo apt install ndctl

cxl --version
# 72.1+
cxl list -Mu # 只有容量信息,region信息,命名空间等都不支持
{
"memdev":"mem0",
"pmem_size":"256.00 MiB (268.44 MB)",
"ram_size":0
}

从源码编译安装

经测试最新版本V83-V80无法编译(CXL接口在Linux 6.11之后的版本有变更),需要降级
支持CXL3.X的最低版本77编译安装后无法使用,会报错

1
2
3
4
5
6
root@s53:~/ndctl/build# cxl
cxl: /lib/x86_64-linux-gnu/libcxl.so.1: version `LIBCXL_5' not found (required by cxl)
cxl: /lib/x86_64-linux-gnu/libcxl.so.1: version `LIBCXL_6' not found (required by cxl)
cxl: /lib/x86_64-linux-gnu/libcxl.so.1: version `LIBCXL_4' not found (required by cxl)
cxl: /lib/x86_64-linux-gnu/libcxl.so.1: version `LIBCXL_3' not found (required by cxl)
cxl: /lib/x86_64-linux-gnu/libcxl.so.1: version `LIBCXL_2' not found (required by cxl)

折腾了一段时间,决定放弃,直接升级ubuntu24.04,官方源就支持v77了;sudo umount mntdir后重新配置磁盘信息,安装nobel版本ubuntu即可。

使用ndctl

ndctl支持如下命令:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
cxl --list-cmds
version
list
help
zero-labels
read-labels
write-labels
disable-memdev
enable-memdev
reserve-dpa
free-dpa
disable-port
enable-port
set-partition
disable-bus
create-region
enable-region
disable-region
destroy-region
monitor

目前只使用list命令查看信息,create-region命令创建区域,其他命令暂时没有用到
区域的四种模式:
raw: 块设备访问,兼容传统存储栈
fsdax(default): 字节可寻址访问,支持DAX文件系统(ext4/xfs)
devdax: 创建字符设备,程序通过mmap系统调用直接访问内存(阿里的数据库Paper就是这样干的)
sector: 扇区对齐的块设备
使用ndctl create-region -m mod <bus-id>创建后,对应的cxl-mem就会成为/dev/下的设备,用户可以通过mmap调用映射到用户空间使用。

参考资料

  1. QEMu官方文档