JSON编码数据包的压缩算法?
在通过网络发送数据包之前,用于压缩数据包的最佳压缩算法是什么? 数据包使用 JSON 进行编码。 LZW 适合这个还是有更好的东西?
What would be the best compression algorithm to use to compress packets before sending them over the wire? The packets are encoded using JSON. Would LZW be a good one for this or is there something better?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
我认为有两个问题会影响你的答案:
1)在不知道程序的任何特定运行时会发生什么的情况下,你能预测数据的组成吗? 例如,如果您的数据包如下所示:
-- 那么您可能会通过创建不断出现在数据中的文本字符串的硬编码字典并将其中每个出现的文本字符串替换为适当的字典索引。 (实际上,如果您的数据这个是常规的,您可能只想通过网络发送只是值,然后简单地将一个函数写入客户端来构造一个 JSON 对象如果需要 JSON 对象,则从值中获取。)
如果您无法预测将使用哪个标头,则可能需要使用 LZW、LZ77 或其他查看已消失数据的方法通过寻找它可以以特别紧凑的形式表达的数据。 但是...
2)数据包是否需要彼此单独压缩? 如果是这样,那么 LZW 绝对不是您想要的方法; 它没有时间将其字典构建到能够在单个数据包末尾提供大量压缩结果的大小。 恕我直言,在这种情况下获得真正实质性压缩的唯一机会是使用硬编码字典。
(以上所有内容的附录:正如 Michael Kohne 指出的那样,发送 JSON 意味着您可能正在发送所有文本,这意味着您没有充分利用带宽,而该带宽能够发送比您正在使用的字符范围更广的字符然而,如何将 0-127 范围内的字符打包到保存值 0-255 的容器中的问题相当简单,我认为可以将其作为“读者的练习”,正如他们所说。)
I think two questions will affect your answer:
1) How well can you predict the composition of the data without knowing what will happen on any particular run of the program? For instance, if your packets look like this:
-- then you would probably get your best compression by creating a hard-coded dictionary of the text strings that keep showing up in your data and replace each occurrence of one of the text strings with the appropriate dictionary index. (Actually, if your data was this regular, you'd probably want to send just the values over the wire and simply write a function into the client to construct a JSON object from the values if a JSON object is needed.)
If you cannot predict which headers will be used, you may need to use LZW, or LZ77, or another method which looks at the data which has already gone through to find the data it can express in an especially compact form. However...
2) Do the packets need to be compressed separately from each other? If so then LZW is definitely not the method you want; it will not have time to build its dictionary up to a size that will give substantial compression results by the end of a single packet. The only chance of getting really substantial compression in this scenario, IMHO, is to use a hard-coded dictionary.
(Addendum to all of the above: as Michael Kohne points out, sending JSON means you're probably sending all text, which means that you're underusing bandwidth that has the capability of sending a much wider range of characters than you're using. However, the problem of how to pack characters that fall into the range 0-127 into containers that hold values 0-255 is fairly simple and I think can be left as "an exercise for the reader", as they say.)
还有两种 JSON 压缩算法: CJson 和 CJson 。 惠普包
HPPack 的压缩效果非常好,可以与 gzip 压缩相媲美。
There are two more JSON compression algorithms: CJson & HPack
The HPack does a very good job, comparable to gzip compression.
这是对 JSON 数据可压缩性的简短测试
原文:crime-data_geojson.json 72844By
(您可以在此处获取该文件:https://github.com/lsauer/Data-Hub . 该文件是随机选取的,但不能代表平均 JSON 数据),
除了 zip 之外,所有存档器参数都设置为 ultra
这意味着压缩非常高且有益。 JSON 数据通常具有较高的熵。 根据维基百科
JSON 数据的熵通常远高于此。 (在使用 10 个大小大致相等的任意 JSON 文件进行的实验中,我计算出 2.36)
Here is a short test on the compressibility of JSON data
original: crime-data_geojson.json 72844By
(You can get the file here: https://github.com/lsauer/Data-Hub . The file was picked at random but cannot be representative of average JSON data)
except for zip all archiver parameters were set to ultra
This means that compression is very high and beneficial. JSON data generally has a high entropy. According to wikipedia
The entropy of JSON data is often well above that. (In an experiment with 10 arbitrary JSON files of roughly equal size i calculated 2.36)
嗯...如果我错了,请纠正我,但如果您正在实施在线压缩,那么您可以控制连接的两端,对吧? 在这种情况下,如果 JSON 是一种太胖的协议,为什么不选择另一种不那么胖的有线协议呢? 我的意思是,我理解使用 JSON 这样的标准的吸引力,但如果您担心带宽,那么您可能应该选择不全是文本的有线协议。
Ummm...Correct me if I'm wrong, but if you are implementing on-the-wire compression, then you control both ends of the connection, right? In that case, if JSON is too fat a protocol, why wouldn't you just choose a different wire protocol that isn't as fat? I mean, I understand the appeal of using a standard like JSON, but if you are concerned about bandwidth, then you probably ought to pick a wire protocol that isn't all text.
让网络服务器本地压缩,浏览器本地解压; gzip 或 deflate。
Let the webserver compress and the browser decompress natively; gzip or deflate.
我发现压缩算法往往比选择替代格式更有效。 如果这是“实时”压缩,我建议研究较低级别的 Brotli 或 Zstandard 压缩器(高级压缩器占用大量 CPU - 但确实提供非常好的压缩)。
如果您想了解所有替代方案以及我如何得出该结论,可以找到完整的详细信息 Lucidchart 技术博客。
I have found that the compression algorithm tends to be more effective than choosing an alternative format. If this is a 'real-time' compression, I would recommend investigating a lower-level Brotli or Zstandard compressor (the high level ones take a lot CPU - but do give very good compression).
If you want to read about all the alternatives and how I came to that conclusion, the full details can be found on the Lucidchart techblog.
Gzip(deflate 算法)在压缩方面相当出色,尽管像所有好的压缩算法一样,使用大量的 cpu(在我的测试中是 json 读/写开销的 3-5 倍)。
Gzip (deflate algorithm) is pretty good at compression, although like all good compression algorithms, uses plenty of cpu (3-5x as much as overhead of json reading/writing on my testing).