为什么在SSD上直接io比非直接IO慢得多（在系统缓存后测量性能）？

发布于 2025-02-03 08:55:14 字数 2113 浏览 3 评论 0原文

我是优化磁盘IO性能的新手。我比较了启用或不启用io的文件的文件的读取的性能。块尺寸为512KIB。当直接IO在用户空间中直接读取磁盘到缓冲区的数据时，我认为直接IO应该比非直接IO更快（在测量之前未缓存数据）。但是，结果是非直接IO比直接IO快得多。但是，如果我将块大小更改为2MIB，速度相等。这是测试结果：

ps@701083:/mnt/md0/cuda-learning$ sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"
ps@701083:/mnt/md0/cuda-learning$ dd if=nodes-1G of=/dev/null iflag=direct bs=512K count=1024
1024+0 records in
1024+0 records out
536870912 bytes (537 MB, 512 MiB) copied, 1.32862 s, 404 MB/s
ps@701083:/mnt/md0/cuda-learning$ sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"
ps@701083:/mnt/md0/cuda-learning$ dd if=nodes-1G of=/dev/null bs=512K count=1024
1024+0 records in
1024+0 records out
536870912 bytes (537 MB, 512 MiB) copied, 0.365581 s, 1.5 GB/s
ps@701083:/mnt/md0/cuda-learning$ sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"
ps@701083:/mnt/md0/cuda-learning$ dd if=nodes-1G of=/dev/null bs=2M count=256
256+0 records in
256+0 records out
536870912 bytes (537 MB, 512 MiB) copied, 0.370193 s, 1.5 GB/s
ps@701083:/mnt/md0/cuda-learning$ sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"
ps@701083:/mnt/md0/cuda-learning$ dd if=nodes-1G of=/dev/null iflag=direct bs=2M count=256
256+0 records in
256+0 records out
536870912 bytes (537 MB, 512 MiB) copied, 0.36575 s, 1.5 GB/s

DF的输出：

ps@701083:/mnt/md0/cuda-learning$ df -h
Filesystem                         Size  Used Avail Use% Mounted on
udev                                32G     0   32G   0% /dev
tmpfs                              6.3G  1.3M  6.3G   1% /run
/dev/mapper/ubuntu--vg-ubuntu--lv  117G   28G   83G  26% /
tmpfs                               32G     0   32G   0% /dev/shm
tmpfs                              5.0M  4.0K  5.0M   1% /run/lock
tmpfs                               32G     0   32G   0% /sys/fs/cgroup
/dev/md0                           3.5T  756G  2.6T  23% /mnt/md0
/dev/nvme0n1p2                     976M  204M  705M  23% /boot
/dev/nvme0n1p1                     511M  6.7M  505M   2% /boot/efi
tmpfs                              6.3G     0  6.3G   0% /run/user/1000
ps@701083:/mnt/md0/cuda-learning$

为什么？

原文

I'm new to optimizing disk IO performance. I compared the performance of reading from file with or without direct IO enabled. The chunk size is 512KiB. As Direct IO reads data from disk directly to buffer in user space, I think Direct IO should be faster than non Direct IO(data is not cached before measurement). However, the result is that non Direct IO is much faster than Direct IO. But if I change the chunk size to 2MiB, speed is equal. Here is the test result:

ps@701083:/mnt/md0/cuda-learning$ sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"
ps@701083:/mnt/md0/cuda-learning$ dd if=nodes-1G of=/dev/null iflag=direct bs=512K count=1024
1024+0 records in
1024+0 records out
536870912 bytes (537 MB, 512 MiB) copied, 1.32862 s, 404 MB/s
ps@701083:/mnt/md0/cuda-learning$ sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"
ps@701083:/mnt/md0/cuda-learning$ dd if=nodes-1G of=/dev/null bs=512K count=1024
1024+0 records in
1024+0 records out
536870912 bytes (537 MB, 512 MiB) copied, 0.365581 s, 1.5 GB/s
ps@701083:/mnt/md0/cuda-learning$ sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"
ps@701083:/mnt/md0/cuda-learning$ dd if=nodes-1G of=/dev/null bs=2M count=256
256+0 records in
256+0 records out
536870912 bytes (537 MB, 512 MiB) copied, 0.370193 s, 1.5 GB/s
ps@701083:/mnt/md0/cuda-learning$ sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"
ps@701083:/mnt/md0/cuda-learning$ dd if=nodes-1G of=/dev/null iflag=direct bs=2M count=256
256+0 records in
256+0 records out
536870912 bytes (537 MB, 512 MiB) copied, 0.36575 s, 1.5 GB/s

output of df:

ps@701083:/mnt/md0/cuda-learning$ df -h
Filesystem                         Size  Used Avail Use% Mounted on
udev                                32G     0   32G   0% /dev
tmpfs                              6.3G  1.3M  6.3G   1% /run
/dev/mapper/ubuntu--vg-ubuntu--lv  117G   28G   83G  26% /
tmpfs                               32G     0   32G   0% /dev/shm
tmpfs                              5.0M  4.0K  5.0M   1% /run/lock
tmpfs                               32G     0   32G   0% /sys/fs/cgroup
/dev/md0                           3.5T  756G  2.6T  23% /mnt/md0
/dev/nvme0n1p2                     976M  204M  705M  23% /boot
/dev/nvme0n1p1                     511M  6.7M  505M   2% /boot/efi
tmpfs                              6.3G     0  6.3G   0% /run/user/1000
ps@701083:/mnt/md0/cuda-learning$

Why?

分享到QQ

分享到微博