You won't see disk errors, but you may see slowdowns over time, or during intensive disk writing by the VM. The reason you wouldn't compress a VM in place is the same you wouldn't compress a database in place: the virtual disk is a block device, and the VM addresses it using block offsets. It assumes all blocks to be the same size. If they are compressed, that is no longer true. The host file system has to do the translation between the assumed block location, and the real (compressed) block location. That would be fairly trivial overhead if the thing were read-only, but blocks change, and so does their compressibility. Rewriting a compressed block may mean that it doesn't fit where it was. The host FS will have to move it, which is an extra step, and which fragments the virtual disk.
Of course, there is always fragmentation when your VM writes to new locations that don't yet have a physical location. You can only beat this by using (uncompressed) full size virtual disks (full of empty padding, i.e. no auto-grow), and defragging both host and guest.
On solid state storage, fragmentation doesn't matter, and compression will help reduce writes, which lengthens the life of your disk. But you are still stuck with the CPU and memory overhead of block translation.
Also bear in mind that your disk controller may be deduping and or compressing data, so your OS level efforts may be redundant.
For VMs that are largely reading, compression may turn out to be worthwhile. Unfortunately, modern OSs do so much housekeeping, logging and self-updating that they write continually, but you can confine this activity to a snapshot. So compress the base image in place, but not the snapshots. Write performance will be unaffected. Caveat: snapshots can easily grow to the size of the original disk. You will still need to merge or delete them frequently, so put them where you can see them.
这都是关于权衡的。磁盘需要一定的时间来读取一定数量的字节。如果您可以压缩数据,使得磁盘读取数据所花费的时间与 CPU 解压缩所花费的时间之和小于从磁盘读取未压缩数据所花费的时间,那么您就赢了性能。
问题是这其中存在太多变数,而且无论哪种方式都可能很好。您的磁盘读取小块的速度可能会较慢,或者您的数据无法很好地压缩,或者您的 CPU 可能非常快,或者(等等)。知道它是否确实有所作为的唯一方法就是尝试并测量它。对于不同的数据/机器,您得到的答案可能会有所不同。
This is all about tradeoffs. It takes the disk a certain amount of time to read a certain number of bytes. If you can compress the data such that the sum of the reduced amount of time it takes the disk to read it combined with the time it takes the CPU to decompress it is less than it took to read the uncompressed data from the disk then you win perf.
The problem is that there are so many variables in this, and it is likely to be quite fine either way. Your disk could read small blocks slower, or your data could not be very compressible, or your CPU might be really fast, or (and so on and so on). The only way to know if it makes a difference for sure is to try it and measure it. The answers you get for different data/machines is likely to be different.
发布评论
评论(2)
您不会看到磁盘错误,但随着时间的推移或在虚拟机进行密集磁盘写入期间,您可能会看到速度变慢。不就地压缩虚拟机的原因与不就地压缩数据库的原因相同:虚拟磁盘是块设备,虚拟机使用块偏移量对其进行寻址。它假设所有块的大小相同。如果它们被压缩,那就不再是这样了。主机文件系统必须在假定的块位置和实际(压缩)块位置之间进行转换。如果内容是只读的,那么这将是相当微不足道的开销,但块会改变,它们的可压缩性也会改变。重写压缩块可能意味着它不适合原来的位置。主机 FS 必须移动它,这是一个额外的步骤,并且会产生虚拟磁盘碎片。
当然,当您的虚拟机写入还没有物理位置的新位置时,总会出现碎片。您只能通过使用(未压缩的)全尺寸虚拟磁盘(充满空填充,即没有自动增长)并对主机和来宾进行碎片整理来解决此问题。
在固态存储上,碎片并不重要,压缩将有助于减少写入,从而延长磁盘的使用寿命。但您仍然受制于块转换的 CPU 和内存开销。
另请记住,您的磁盘控制器可能正在重复数据删除和/或压缩数据,因此您的操作系统级别的工作可能是多余的。
对于大量读取的虚拟机,压缩可能是值得的。不幸的是,现代操作系统做了太多的内务处理、日志记录和自我更新,以至于它们不断地写入,但您可以将此活动限制在快照中。因此,请就地压缩基础映像,但不要压缩快照。写入性能不会受到影响。注意:快照可以轻松增长到原始磁盘的大小。您仍然需要经常合并或删除它们,因此请将它们放在您可以看到的地方。
结论:保持简单。
You won't see disk errors, but you may see slowdowns over time, or during intensive disk writing by the VM. The reason you wouldn't compress a VM in place is the same you wouldn't compress a database in place: the virtual disk is a block device, and the VM addresses it using block offsets. It assumes all blocks to be the same size. If they are compressed, that is no longer true. The host file system has to do the translation between the assumed block location, and the real (compressed) block location. That would be fairly trivial overhead if the thing were read-only, but blocks change, and so does their compressibility. Rewriting a compressed block may mean that it doesn't fit where it was. The host FS will have to move it, which is an extra step, and which fragments the virtual disk.
Of course, there is always fragmentation when your VM writes to new locations that don't yet have a physical location. You can only beat this by using (uncompressed) full size virtual disks (full of empty padding, i.e. no auto-grow), and defragging both host and guest.
On solid state storage, fragmentation doesn't matter, and compression will help reduce writes, which lengthens the life of your disk. But you are still stuck with the CPU and memory overhead of block translation.
Also bear in mind that your disk controller may be deduping and or compressing data, so your OS level efforts may be redundant.
For VMs that are largely reading, compression may turn out to be worthwhile. Unfortunately, modern OSs do so much housekeeping, logging and self-updating that they write continually, but you can confine this activity to a snapshot. So compress the base image in place, but not the snapshots. Write performance will be unaffected. Caveat: snapshots can easily grow to the size of the original disk. You will still need to merge or delete them frequently, so put them where you can see them.
Conclusion: keep it simple.
这都是关于权衡的。磁盘需要一定的时间来读取一定数量的字节。如果您可以压缩数据,使得磁盘读取数据所花费的时间与 CPU 解压缩所花费的时间之和小于从磁盘读取未压缩数据所花费的时间,那么您就赢了性能。
问题是这其中存在太多变数,而且无论哪种方式都可能很好。您的磁盘读取小块的速度可能会较慢,或者您的数据无法很好地压缩,或者您的 CPU 可能非常快,或者(等等)。知道它是否确实有所作为的唯一方法就是尝试并测量它。对于不同的数据/机器,您得到的答案可能会有所不同。
This is all about tradeoffs. It takes the disk a certain amount of time to read a certain number of bytes. If you can compress the data such that the sum of the reduced amount of time it takes the disk to read it combined with the time it takes the CPU to decompress it is less than it took to read the uncompressed data from the disk then you win perf.
The problem is that there are so many variables in this, and it is likely to be quite fine either way. Your disk could read small blocks slower, or your data could not be very compressible, or your CPU might be really fast, or (and so on and so on). The only way to know if it makes a difference for sure is to try it and measure it. The answers you get for different data/machines is likely to be different.