除存储效率外,高性能压缩算法的目的
在尝试从基于C ++的RAD框架的U ++的源代码中学习时,我注意到它大量使用压缩/减压来读取和写入数据。 据我了解,压缩提供了以更紧凑的方式存储数据的优势,同时仍保持完整性。
但是,当我更多地研究LZ4算法时,一些消息来源提到,它提供的读/写入速度比直接读写更快(不幸的是,我无法再找到上述资源了)。我想知道为什么是这样的,因为无论如何,主要数据仍然必须处理 - 压缩/减压只是另一个额外的步骤。即使我们考虑了诸如Huffman编码之类的基本压缩算法,我们仍然必须检查原始数据空间,因此,常规读取将做到这一点。但是,压缩算法不仅必须执行该步骤,而且还必须进一步处理该信息。
考虑到常规的IO操作和压缩/减压操作似乎正在执行初始数据空间读取,额外的步骤如何产生更快的处理。
U ++似乎主要将ZLIB库大量用于编写/ retreviving应用程序相关资源。这样做是为了有效地使用空间,还是出于其他原因,就像上面提到的那样?
While trying to learn from the source code of U++, a C++ based RAD framework, I noticed that it made heavy use of compression/ decompression to read and write data.
As far as I had understood, compression provided the advantage of storing data in a more compact manner while still maintaining integrity.
But as I looked more into LZ4 algorithm in general, some sources mentioned that it provides faster read/ write than direct read and write (Unfortunately I am unable to locate said sources any longer). I am wondering why this is that case if it is so, because no matter what, the main data still has to be processed - the compression/ decompression is just another extra step. Even if we consider a basic compression algorithm like Huffman coding, we are still having to examine the original data space either way, so a regular read, for example would do just that. But a compression algorithm would not only have to perform that step, but also then have to process that information further.
How could the presence of extra steps yield faster processing given that both a regular IO operation and compression/ decompression operation seem to be performing the initial data space read.
U++ mainly seems to use the zlib library heavily for writing/ retreiving app related resources. Is this done simply to use space efficiently, or for other reasons as well, like the one mentioned above?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
在A(c)PU上编写和运行的代码在原始数据空间上操作*
(*申请异常,代码可以考虑压缩方法并在其上进行处理,例如,不需要对RLE数据进行解压缩以实现每个目标)
但是,从持久数据到处理电路,存在许多中间仓库
,并且带宽相当大,固定且沿途中的固定速度有所不同,并且根据“管道”:
SD/HDD/SSD-> (_ram - >)缓存内存 - > CPU/GPU寄存器(后者的带宽很少,但极端吞吐量)
,具体取决于PU是否是PS5(从2010年起),“通用” PC或2022年。
我目前没有找到任何来源,也没有产生数据。从我记得的概括中,我记得有压缩数据最初从慢速网络服务器(-Connection)或HDD移动到RAM或客户端PC,然后由CPU解压缩(完全或按范围),然后放回RAM或CACHE CACHE可以可以在大多数情况下,在处理数据之前要比其他延迟更短,直到处理数据。
OFC这取决于带宽与运输的整体数据之间的压缩比,减压工作和管道零件以及其上的预期处理。
我在U ++网站上没有读过有关压缩的任何信息。
减压是额外的一步时间(思考 + *= 1.01时间),但是通过运输事先保存了更多的时间(思考 - *= 0.9时间)。
code written and running on a (c)pu operates* on original data space
(* exceptions apply, code can take compression method into account and work on it, eg. RLE data does not need to be decompressed for every aim to be achieved)
but from persistent data to processing circuit quite some many intermediate storages exist
and bandwidth jumps considerably, differently fixed and exponetially growing along the way depending on the "pipeline":
sd/hdd/ssd -> (_RAM ->) cache memory -> cpu/gpu register (latter having few bandwidth but extreme throughput)
and depending on whether the pu's are that of the PS5, a "generic" PC, from 2010 or from 2022.
I currently do not find any sources nor data giving rise to a generalisation of this but from what I remember having compressed data initially moved from a slow webserver(-connection) or hdd to ram or a client PC and then decompressed (entirely or range by range) by the cpu and put back in RAM or cache can be a significant shorter delay until processing the data than otherwise and this in most cases.
Ofc this depends on the compression ratio, the decompression effort and the pipeline pieces inbetween bandwidth and the overall data transported and the intended processing on it.
I didn't read anything about compression on the U++ website.
Decompression is an extra step taking time (think + *=1.01 time) but much more time is saved beforehand with transport (think - *=0.9 time) .