当前位置：文江博客话题详情

JSON编码数据包的压缩算法？

发布于 2024-07-11 01:40:04 字数 75 浏览 12 评论 0原文

在通过网络发送数据包之前，用于压缩数据包的最佳压缩算法是什么？数据包使用 JSON 进行编码。 LZW 适合这个还是有更好的东西？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

红墙和绿瓦 2024-07-18 01:40:04

我认为有两个问题会影响你的答案：

1）在不知道程序的任何特定运行时会发生什么的情况下，你能预测数据的组成吗？例如，如果您的数据包如下所示：

{
    "vector": {
        "latitude": 16,
        "longitude": 18,
        "altitude": 20
    },
    "vector": {
        "latitude": -8,
        "longitude": 13,
        "altitude": -5
    },
    [... et cetera ...]
}

-- 那么您可能会通过创建不断出现在数据中的文本字符串的硬编码字典并将其中每个出现的文本字符串替换为适当的字典索引。（实际上，如果您的数据这个是常规的，您可能只想通过网络发送只是值，然后简单地将一个函数写入客户端来构造一个 JSON 对象如果需要 JSON 对象，则从值中获取。）

如果您无法预测将使用哪个标头，则可能需要使用 LZW、LZ77 或其他查看已消失数据的方法通过寻找它可以以特别紧凑的形式表达的数据。但是...

2）数据包是否需要彼此单独压缩？如果是这样，那么 LZW 绝对不是您想要的方法；它没有时间将其字典构建到能够在单个数据包末尾提供大量压缩结果的大小。恕我直言，在这种情况下获得真正实质性压缩的唯一机会是使用硬编码字典。

（以上所有内容的附录：正如 Michael Kohne 指出的那样，发送 JSON 意味着您可能正在发送所有文本，这意味着您没有充分利用带宽，而该带宽能够发送比您正在使用的字符范围更广的字符然而，如何将 0-127 范围内的字符打包到保存值 0-255 的容器中的问题相当简单，我认为可以将其作为“读者的练习”，正如他们所说。）

I think two questions will affect your answer:

1) How well can you predict the composition of the data without knowing what will happen on any particular run of the program? For instance, if your packets look like this:

{
    "vector": {
        "latitude": 16,
        "longitude": 18,
        "altitude": 20
    },
    "vector": {
        "latitude": -8,
        "longitude": 13,
        "altitude": -5
    },
    [... et cetera ...]
}

-- then you would probably get your best compression by creating a hard-coded dictionary of the text strings that keep showing up in your data and replace each occurrence of one of the text strings with the appropriate dictionary index. (Actually, if your data was this regular, you'd probably want to send just the values over the wire and simply write a function into the client to construct a JSON object from the values if a JSON object is needed.)

If you cannot predict which headers will be used, you may need to use LZW, or LZ77, or another method which looks at the data which has already gone through to find the data it can express in an especially compact form. However...

2) Do the packets need to be compressed separately from each other? If so then LZW is definitely not the method you want; it will not have time to build its dictionary up to a size that will give substantial compression results by the end of a single packet. The only chance of getting really substantial compression in this scenario, IMHO, is to use a hard-coded dictionary.

(Addendum to all of the above: as Michael Kohne points out, sending JSON means you're probably sending all text, which means that you're underusing bandwidth that has the capability of sending a much wider range of characters than you're using. However, the problem of how to pack characters that fall into the range 0-127 into containers that hold values 0-255 is fairly simple and I think can be left as "an exercise for the reader", as they say.)

回复收藏 0 原文

我爱人 2024-07-18 01:40:04

还有两种 JSON 压缩算法： CJson 和 CJson 。惠普包
HPPack 的压缩效果非常好，可以与 gzip 压缩相媲美。

回复收藏 0 原文

止于盛夏 2024-07-18 01:40:04

这是对 JSON 数据可压缩性的简短测试
原文：crime-data_geojson.json 72844By
（您可以在此处获取该文件：https://github.com/lsauer/Data-Hub . 该文件是随机选取的，但不能代表平均 JSON 数据），

除了 zip 之外，所有存档器参数都设置为 ultra

* cm/ nanozip: 
  > 4076/72844
  [1] 0.05595519
* gzip:
  > 6611/72844
  [1] 0.09075559
* LZMA / 7zip
  > 5864/72844
  [1] 0.0805008
* Huffman / zip:
  > 7382/72844
  [1] 0.1013398
* ?/Arc:
  > 4739/72844
  [1] 0.06505683

这意味着压缩非常高且有益。 JSON 数据通常具有较高的熵。根据维基百科

英文文本的熵率在每1.0到1.5位之间
字母，[1] 或每个字母低至 0.6 至 1.3 位，根据
香农基于人体实验的估计

JSON 数据的熵通常远高于此。（在使用 10 个大小大致相等的任意 JSON 文件进行的实验中，我计算出 2.36）

Here is a short test on the compressibility of JSON data
original: crime-data_geojson.json 72844By
(You can get the file here: https://github.com/lsauer/Data-Hub . The file was picked at random but cannot be representative of average JSON data)

except for zip all archiver parameters were set to ultra

* cm/ nanozip: 
  > 4076/72844
  [1] 0.05595519
* gzip:
  > 6611/72844
  [1] 0.09075559
* LZMA / 7zip
  > 5864/72844
  [1] 0.0805008
* Huffman / zip:
  > 7382/72844
  [1] 0.1013398
* ?/Arc:
  > 4739/72844
  [1] 0.06505683

This means that compression is very high and beneficial. JSON data generally has a high entropy. According to wikipedia

The entropy rate of English text is between 1.0 and 1.5 bits per
letter,[1] or as low as 0.6 to 1.3 bits per letter, according to
estimates by Shannon based on human experiments

The entropy of JSON data is often well above that. (In an experiment with 10 arbitrary JSON files of roughly equal size i calculated 2.36)

回复收藏 0 原文