为什么在SSD上直接io比非直接IO慢得多(在系统缓存后测量性能)?
我是优化磁盘IO性能的新手。我比较了启用或不启用io的文件的文件的读取的性能。块尺寸为512KIB。当直接IO在用户空间中直接读取磁盘到缓冲区的数据时,我认为直接IO应该比非直接IO更快(在测量之前未缓存数据)。但是,结果是非直接IO比直接IO快得多。但是,如果我将块大小更改为2MIB,速度相等。这是测试结果:
ps@701083:/mnt/md0/cuda-learning$ sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"
ps@701083:/mnt/md0/cuda-learning$ dd if=nodes-1G of=/dev/null iflag=direct bs=512K count=1024
1024+0 records in
1024+0 records out
536870912 bytes (537 MB, 512 MiB) copied, 1.32862 s, 404 MB/s
ps@701083:/mnt/md0/cuda-learning$ sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"
ps@701083:/mnt/md0/cuda-learning$ dd if=nodes-1G of=/dev/null bs=512K count=1024
1024+0 records in
1024+0 records out
536870912 bytes (537 MB, 512 MiB) copied, 0.365581 s, 1.5 GB/s
ps@701083:/mnt/md0/cuda-learning$ sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"
ps@701083:/mnt/md0/cuda-learning$ dd if=nodes-1G of=/dev/null bs=2M count=256
256+0 records in
256+0 records out
536870912 bytes (537 MB, 512 MiB) copied, 0.370193 s, 1.5 GB/s
ps@701083:/mnt/md0/cuda-learning$ sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"
ps@701083:/mnt/md0/cuda-learning$ dd if=nodes-1G of=/dev/null iflag=direct bs=2M count=256
256+0 records in
256+0 records out
536870912 bytes (537 MB, 512 MiB) copied, 0.36575 s, 1.5 GB/s
DF的输出:
ps@701083:/mnt/md0/cuda-learning$ df -h
Filesystem Size Used Avail Use% Mounted on
udev 32G 0 32G 0% /dev
tmpfs 6.3G 1.3M 6.3G 1% /run
/dev/mapper/ubuntu--vg-ubuntu--lv 117G 28G 83G 26% /
tmpfs 32G 0 32G 0% /dev/shm
tmpfs 5.0M 4.0K 5.0M 1% /run/lock
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/md0 3.5T 756G 2.6T 23% /mnt/md0
/dev/nvme0n1p2 976M 204M 705M 23% /boot
/dev/nvme0n1p1 511M 6.7M 505M 2% /boot/efi
tmpfs 6.3G 0 6.3G 0% /run/user/1000
ps@701083:/mnt/md0/cuda-learning$
为什么?
I'm new to optimizing disk IO performance. I compared the performance of reading from file with or without direct IO enabled. The chunk size is 512KiB. As Direct IO reads data from disk directly to buffer in user space, I think Direct IO should be faster than non Direct IO(data is not cached before measurement). However, the result is that non Direct IO is much faster than Direct IO. But if I change the chunk size to 2MiB, speed is equal. Here is the test result:
ps@701083:/mnt/md0/cuda-learning$ sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"
ps@701083:/mnt/md0/cuda-learning$ dd if=nodes-1G of=/dev/null iflag=direct bs=512K count=1024
1024+0 records in
1024+0 records out
536870912 bytes (537 MB, 512 MiB) copied, 1.32862 s, 404 MB/s
ps@701083:/mnt/md0/cuda-learning$ sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"
ps@701083:/mnt/md0/cuda-learning$ dd if=nodes-1G of=/dev/null bs=512K count=1024
1024+0 records in
1024+0 records out
536870912 bytes (537 MB, 512 MiB) copied, 0.365581 s, 1.5 GB/s
ps@701083:/mnt/md0/cuda-learning$ sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"
ps@701083:/mnt/md0/cuda-learning$ dd if=nodes-1G of=/dev/null bs=2M count=256
256+0 records in
256+0 records out
536870912 bytes (537 MB, 512 MiB) copied, 0.370193 s, 1.5 GB/s
ps@701083:/mnt/md0/cuda-learning$ sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"
ps@701083:/mnt/md0/cuda-learning$ dd if=nodes-1G of=/dev/null iflag=direct bs=2M count=256
256+0 records in
256+0 records out
536870912 bytes (537 MB, 512 MiB) copied, 0.36575 s, 1.5 GB/s
output of df:
ps@701083:/mnt/md0/cuda-learning$ df -h
Filesystem Size Used Avail Use% Mounted on
udev 32G 0 32G 0% /dev
tmpfs 6.3G 1.3M 6.3G 1% /run
/dev/mapper/ubuntu--vg-ubuntu--lv 117G 28G 83G 26% /
tmpfs 32G 0 32G 0% /dev/shm
tmpfs 5.0M 4.0K 5.0M 1% /run/lock
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/md0 3.5T 756G 2.6T 23% /mnt/md0
/dev/nvme0n1p2 976M 204M 705M 23% /boot
/dev/nvme0n1p1 511M 6.7M 505M 2% /boot/efi
tmpfs 6.3G 0 6.3G 0% /run/user/1000
ps@701083:/mnt/md0/cuda-learning$
Why?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论