Does the same idea about an optimal 'block size' apply for dd apply?
I think so. Consider a buffer of 4kb. It can be filled by a simple mmap operation if the filesystem also has 4k blocks (common on newer flash disks). If one stream fills the buffer, and then the buffer has to be written, the input stream is blocked until the buffer is flushed. This is the reason for using more than one buffer. On the other hand I assume that a single buffer must first be filled, before it is written to the output. So even if there is already some data available, the buffer must be hold in memory. If you use a 1Gig buffer (or block size), this wouldn't be optiomal, too.
I just had a look at the buffer size in gparted, and it seems they just switch between 128kb and 256kb (might be more complex), and it looks like they want to make most of the caches found in most systems. Given a disk cache of 2MB, it might make sense to transfer data in blocks of that size, and those blocks might even fit in the CPU cache if no mmapped io is used.
If so, are there any specific situations in which it applies most strongly?
All operations that transfer much data, if the data can be read and written blockwise.
How can I use the block size to my advantage?
Calculating it by testing which one is the fasted for you. Its that simple and given the explanations above you might have a good start with something around 256k as a blocksize. Or add an autobufsize option to dd :).
发布评论
评论(1)
关于最佳“块大小”的相同想法是否适用于 dd?
我想是的。考虑 4kb 的缓冲区。如果文件系统也有 4k 块(在较新的闪存磁盘上常见),则可以通过简单的 mmap 操作来填充它。如果一个流填满了缓冲区,然后必须写入缓冲区,则输入流将被阻塞,直到缓冲区被刷新为止。这就是使用多个缓冲区的原因。另一方面,我假设在将单个缓冲区写入输出之前,必须首先填充该缓冲区。因此,即使已经有一些可用数据,缓冲区也必须保存在内存中。如果您使用 1Gig 缓冲区(或块大小),这也不是最佳选择。
我刚刚查看了 gparted 中的缓冲区大小,似乎他们只是在 128kb 和 256kb 之间切换(可能更复杂),而且看起来他们想要在大多数系统中找到大部分缓存。给定 2MB 的磁盘缓存,以该大小的块传输数据可能是有意义的,如果不使用映射 io,这些块甚至可能适合 CPU 缓存。
如果是这样,有没有什么具体情况最适合它?
如果可以按块读取和写入数据,则所有传输大量数据的操作。
如何利用块大小来发挥我的优势?
通过测试哪一种最适合您来计算。就这么简单,考虑到上面的解释,您可能会以 256k 左右的块大小作为一个良好的开端。或者向 dd 添加 autobufsize 选项:)。
Does the same idea about an optimal 'block size' apply for dd apply?
I think so. Consider a buffer of 4kb. It can be filled by a simple mmap operation if the filesystem also has 4k blocks (common on newer flash disks). If one stream fills the buffer, and then the buffer has to be written, the input stream is blocked until the buffer is flushed. This is the reason for using more than one buffer. On the other hand I assume that a single buffer must first be filled, before it is written to the output. So even if there is already some data available, the buffer must be hold in memory. If you use a 1Gig buffer (or block size), this wouldn't be optiomal, too.
I just had a look at the buffer size in gparted, and it seems they just switch between 128kb and 256kb (might be more complex), and it looks like they want to make most of the caches found in most systems. Given a disk cache of 2MB, it might make sense to transfer data in blocks of that size, and those blocks might even fit in the CPU cache if no mmapped io is used.
If so, are there any specific situations in which it applies most strongly?
All operations that transfer much data, if the data can be read and written blockwise.
How can I use the block size to my advantage?
Calculating it by testing which one is the fasted for you. Its that simple and given the explanations above you might have a good start with something around 256k as a blocksize. Or add an autobufsize option to dd :).