压缩以提高硬盘写入性能
在现代系统上,可以通过压缩输出流来提高本地硬盘写入速度吗?
这个问题源于我正在处理的一个案例,其中一个程序串行生成并将大约 1-2GB 的文本记录数据转储到硬盘上的原始文本文件,我认为它是 IO 绑定的。 我是否希望能够通过在数据进入磁盘之前对其进行压缩来减少运行时间,或者压缩的开销是否会耗尽我可以获得的任何收益? 空闲的第二个核心会影响这个吗?
我知道这会受到用于生成数据的 CPU 数量的影响,因此关于需要多少空闲 CPU 时间的经验法则会很好。
我记得在一次视频演讲中,有人使用压缩来提高数据库的读取速度,但 IIRC 压缩比解压缩消耗更多的 CPU 资源。
On a modern system can local hard disk write speeds be improved by compressing the output stream?
This question derives from a case I'm working with where a program serially generates and dumps around 1-2GB of text logging data to a raw text file on the hard disk and I think it is IO bound. Would I expect to be able to decrease runtimes by compressing the data before it goes to disk or would the overhead of compression eat up any gain I could get? Would having an idle second core affect this?
I know this would be affected by how much CPU is being used to generate the data so rules of thumb on how much idle CPU time would be needed would be good.
I recall a video talk where someone used compression to improve read speeds for a database but IIRC compressing is a lot more CPU intensive than decompressing.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(12)
是的,是的,是的,绝对是的。
这样看:以每秒兆字节为单位计算最大连续磁盘写入速度。 (继续测量它,对一个巨大的写入或其他东西进行计时。)假设 100mb/s。 现在以兆赫为单位计算 CPU 速度; 假设 3Ghz = 3000mhz。 将 CPU 速度除以磁盘写入速度。 这是 CPU 空闲的周期数,您可以在压缩上花费每个字节。 在本例中,3000/100 = 每字节 30 个周期。
如果您有一种算法可以将数据压缩 25%,从而达到 125mb/s 的有效写入速度,那么每个字节将有 24 个周期来运行它,并且它基本上是免费,因为 CPU 不会在等待磁盘转动的过程中无论如何都不要做任何其他事情。 每字节 24 个周期 = 每 128 字节缓存线 3072 个周期,很容易实现。
我们在阅读光学介质时总是这样做。
如果您有一个空闲的第二个核心,那就更容易了。 只需将日志缓冲区交给该核心的线程,它就可以花尽可能长的时间来压缩数据,因为它没有做任何其他事情! 唯一棘手的一点是您实际上希望拥有一圈缓冲区,这样您就不会让生产者线程(生成日志的线程)在互斥体上等待消费者线程(将其写入磁盘的线程)的缓冲区持有。
Yes, yes, yes, absolutely.
Look at it this way: take your maximum contiguous disk write speed in megabytes per second. (Go ahead and measure it, time a huge fwrite or something.) Let's say 100mb/s. Now take your CPU speed in megahertz; let's say 3Ghz = 3000mhz. Divide the CPU speed by the disk write speed. That's the number of cycles that the CPU is spending idle, that you can spend per byte on compression. In this case 3000/100 = 30 cycles per byte.
If you had an algorithm that could compress your data by 25% for an effective 125mb/s write speed, you would have 24 cycles per byte to run it in and it would basically be free because the CPU wouldn't be doing anything else anyway while waiting for the disk to churn. 24 cycles per byte = 3072 cycles per 128-byte cache line, easily achieved.
We do this all the time when reading optical media.
If you have an idle second core it's even easier. Just hand off the log buffer to that core's thread and it can take as long as it likes to compress the data since it's not doing anything else! The only tricky bit is you want to actually have a ring of buffers so that you don't have the producer thread (the one making the log) waiting on a mutex for a buffer that the consumer thread (the one writing it to disk) is holding.
是的,至少十年来一直如此。 有关于它的操作系统论文。 我认为克里斯·斯莫尔可能参与其中的一些工作。
就速度而言,较低质量级别的
gzip
/zlib
压缩速度相当快; 如果这还不够快,您可以尝试FastLZ。 使用额外核心的一种快速方法是使用popen(3)
通过gzip
发送输出。Yes, this has been true for at least 10 years. There are operating-systems papers about it. I think Chris Small may have worked on some of them.
For speed,
gzip
/zlib
compression on lower quality levels is pretty fast; if that's not fast enough you can try FastLZ. A quick way to use an extra core is just to usepopen(3)
to send output throughgzip
.值得一提的是,Sun 的文件系统 ZFS 能够启用动态压缩,以减少磁盘 IO 量,而不会显着增加开销(作为实践中的一个示例)。
For what it is worth Sun's filesystem ZFS has the ability to have on the fly compression enabled to decrease the amount of disk IO without a significant increase in overhead as an example of this in practice.
Stony Brook 的文件系统和存储实验室发布了相当广泛的性能(和能源)评估今年 IBM SYSTOR 系统研究会议上的服务器系统上的文件数据压缩:ACM 数字图书馆的论文,演示文稿。
结果取决于
例如,在本文的测量中,使用文本工作负载和使用低压缩工作量的 lzop 的服务器环境比纯写入更快,但 bzip 和 gz 则不然。
在您的特定设置中,您应该尝试并测量。 它确实可能会提高性能,但情况并非总是如此。
The Filesystems and storage lab from Stony Brook published a rather extensive performance (and energy) evaluation on file data compression on server systems at IBM's SYSTOR systems research conference this year: paper at ACM Digital Library, presentation.
The results depend on the
For example, in the measurements from the paper, using a textual workload and a server environment using lzop with low compression effort are faster than plain write, but bzip and gz aren't.
In your specific setting, you should try it out and measure. It really might improve performance, but it is not always the case.
CPU 的增长速度比硬盘驱动器的访问速度更快。 即使早在 80 年代,许多压缩文件也可以从磁盘读取并解压缩,所用时间比读取原始(未压缩)文件所需的时间要短。 这不会改变。
不过,一般来说,现在压缩/解压缩的处理级别比您编写的级别要低,例如在数据库 I/O 层中。
至于第二个核心的用处,只有当系统还要执行大量其他操作时才有意义 - 并且您的程序必须是多线程的才能利用额外的 CPU。
CPUs have grown faster at a faster rate than hard drive access. Even back in the 80's a many compressed files could be read off the disk and uncompressed in less time than it took to read the original (uncompressed) file. That will not have changed.
Generally though, these days the compression/de-compression is handled at a lower level than you would be writing, for example in a database I/O layer.
As to the usefulness of a second core only counts if the system will be also doing a significant number of other things - and your program would have to be multi-threaded to take advantage of the additional CPU.
以二进制形式记录数据可能是一个快速的改进。 您向磁盘写入的数据会减少,CPU 将数字转换为文本的时间也会减少。 如果人们要读取日志,它可能没有用,但他们也无法读取压缩日志。
Logging the data in binary form may be a quick improvement. You'll write less to the disk and the CPU will spend less time converting numbers to text. It may not be useful if people are going to be reading the logs, but they won't be able to read compressed logs either.
Windows 已经支持 NTFS 中的文件压缩,因此您所要做的就是在文件属性中设置“压缩”标志。
然后你可以衡量它是否值得。
Windows already supports File Compression in NTFS, so all you have to do is to set the "Compressed" flag in the file attributes.
You can then measure if it was worth it or not.
这取决于很多因素,我认为没有一个正确的答案。 归结为:
在给定可用于专用的 CPU 带宽的情况下,您压缩原始数据的速度能否比磁盘的原始写入性能乘以您所达到的压缩比(或您试图获得的速度倍数)的速度更快?这个目的?
鉴于当今相对较高的数据写入速率(数十兆字节/秒),这是一个需要克服的相当高的障碍。 对于其他一些答案,您可能必须拥有易于压缩的数据,并且只需通过一些合理性类型实验的测试对其进行基准测试并找出答案。
相对于关于额外核心的具体意见(猜猜!?)。 如果您对数据的压缩进行线程化,并保持内核的运行——文本的压缩率很高,那么这种技术很可能会取得一些成果。 但这只是一个猜测。 在磁盘写入和压缩操作之间交替的单线程应用程序中,在我看来,这种情况不太可能发生。
This depends on lots of factors and I don't think there is one correct answer. It comes down to this:
Can you compress the raw data faster than the raw write performance of your disk times the compression ratio you are achieving (or the multiple in speed you are trying to get) given the CPU bandwidth you have available to dedicate to this purpose?
Given today's relatively high data write rates in the 10's of MBytes/second this is a pretty high hurdle to get over. To the point of some of the other answers, you would likely have to have easily compressible data and would just have to benchmark it with some test of reasonableness type experiments and find out.
Relative to a specific opinion (guess!?) to the point about additional cores. If you thread up the compression of the data and keep the core(s) fed - with the high compression ratio of text, it is likely such a technique would bear some fruit. But this is just a guess. In a single threaded application alternating between disk writes and compression operations, it seems much less likely to me.
如果只是文本,那么压缩肯定会有帮助。 只需选择一种压缩算法和设置即可降低压缩成本。 “gzip”比“bzip2”便宜,并且两者都有可以调整的参数以支持速度或压缩比。
If it's just text, then compression could definitely help. Just choose an compression algorithm and settings that make the compression cheap. "gzip" is cheaper than "bzip2" and both have parameters that you can tweak to favor speed or compression ratio.
如果您受 I/O 限制,将人类可读的文本保存到硬盘驱动器,我希望压缩能够减少您的总运行时间。
如果您有一个空闲的 2 GHz 核心和一个相对较快的 100 MB/s 流媒体硬盘,
将净日志记录时间减半需要至少 2:1 的压缩,并且每个未压缩字节不超过大约 10 个 CPU 周期,以便压缩器处理数据。
对于双管道处理器,每个字节(非常粗略地)有 20 条指令。
我发现 LZRW1-A(最快的压缩算法之一)每个字节使用 10 到 20 条指令,并且压缩典型英文文本的比例约为 2:1。
在上限(每字节 20 条指令),您正处于 IO 限制和 CPU 限制之间的边缘。 在中端和低端,您仍然受到 IO 限制,因此有一些可用的周期(不多),可以让稍微复杂的压缩器长时间思考数据。
如果您有一个更典型的非顶级硬盘驱动器,或者硬盘驱动器由于某些其他原因(碎片、使用磁盘的其他多任务进程等)而速度较慢,
那么您就有更多的时间让更复杂的压缩器来思考数据。
您可能会考虑设置一个压缩分区,将数据保存到该分区(让设备驱动程序对其进行压缩),然后将速度与原始速度进行比较。
与更改程序和链接压缩算法相比,这可能需要更少的时间,并且引入新错误的可能性更小。
我看到基于 FUSE 的压缩文件系统列表,我听说NTFS也支持压缩分区。
If you are I/O bound saving human-readable text to the hard drive, I expect compression to reduce your total runtime.
If you have an idle 2 GHz core, and a relatively fast 100 MB/s streaming hard drive,
halving the net logging time requires at least 2:1 compression and no more than roughly 10 CPU cycles per uncompressed byte for the compressor to ponder the data.
With a dual-pipe processor, that's (very roughly) 20 instructions per byte.
I see that LZRW1-A (one of the fastest compression algorithms) uses 10 to 20 instructions per byte, and compresses typical English text about 2:1.
At the upper end (20 instructions per byte), you're right on the edge between IO bound and CPU bound. At the middle and lower end, you're still IO bound, so there is a a few cycles available (not much) for a slightly more sophisticated compressor to ponder the data a little longer.
If you have a more typical non-top-of-the-line hard drive, or the hard drive is slower for some other reason (fragmentation, other multitasking processes using the disk, etc.)
then you have even more time for a more sophisticated compressor to ponder the data.
You might consider setting up a compressed partition, saving the data to that partition (letting the device driver compress it), and comparing the speed to your original speed.
That may take less time and be less likely to introduce new bugs than changing your program and linking in a compression algorithm.
I see a list of compressed file systems based on FUSE, and I hear that NTFS also supports compressed partitions.
如果这台特定的机器经常受到 IO 限制,
另一种加速方法是安装 RAID 阵列。
这将为每个程序和每种数据(甚至是不可压缩的数据)提供加速。
例如,流行的 RAID 1+0 配置(总共 4 个磁盘)可提供近 2 倍的加速。
几乎同样流行的 RAID 5 配置,总共有 4 个相同的磁盘,可提供近 3 倍的加速。
设置速度是单个驱动器速度 8 倍的 RAID 阵列相对简单。
另一方面,高压缩比显然不是那么简单。 “仅仅”压缩到 6.30 比 1 就可以为您打破当前的压缩世界纪录带来现金奖励(Hutter 奖)。
If this particular machine is often IO bound,
another way to speed it up is to install a RAID array.
That would give a speedup to every program and every kind of data (even incompressible data).
For example, the popular RAID 1+0 configuration with 4 total disks gives a speedup of nearly 2x.
The nearly as popular RAID 5 configuration, with same 4 total disks, gives all a speedup of nearly 3x.
It is relatively straightforward to set up a RAID array with a speed 8x the speed of a single drive.
High compression ratios, on the other hand, are apparently not so straightforward. Compression of "merely" 6.30 to one would give you a cash prize for breaking the current world record for compression (Hutter Prize).
这曾经可以提高很多应用程序的性能。 我猜今天它不太可能获得回报,但在您的特定情况下可能会获得回报,特别是如果您记录的数据很容易压缩,
但是,正如 Shog9 评论的那样:
This used to be something that could improve performance in quite a few applications way back when. I'd guess that today it's less likely to pay off, but it might in your specific circumstance, particularly if the data you're logging is easily compressible,
However, as Shog9 commented: