如何更有效地将有序树编码为比特序列

发布于 2025-01-03 23:05:49 字数 201 浏览 1 评论 0原文

假设我们要保存一个有 n 个节点的有序树的形状,每个节点最多有 2 个子节点。 如果是二叉树,我们必须使用2n位。由于在我们的情况下,我们没有左孩子或右孩子,它们是相同的,所以我们必须有一些冗余序列。 那么,我们可以用更好的方式对其进行编码吗?看起来每个节点仍然有3种情况,没有子节点,1个子节点,2个子节点,但是我们可以将其存储在少于2位的内存中吗?或者总共有一个比 2 更好的常数?

Suppose we want to save the shape of an ordered tree of n node, each node has maximal 2 children.
If it is a binary tree, we must use 2n bits. Since in our situation, we don't have left or right child, they are the same, so we must have some redundant sequences.
So, can we encode it in a better way? It seems each node still have 3 cases, no child, one child, two children, but can we store it in less than 2 bits? Or in total have a better constant than 2?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

も星光 2025-01-10 23:05:49

您也许可以存储您提到的 2n 位,然后使用 huffman 编码 或其他< href="http://en.wikipedia.org/wiki/Lossless_data_compression" rel="nofollow">无损数据压缩技术来压缩此数据。

我认为你无法实现更好的最坏情况限制,但在平均情况下 - 它应该为你节省一些空间。

You might be able to store the 2n bits as you mention and then to use huffman coding or other lossless data compression technique to compress this data.

I don't think you can achieve a better worst case bound, but on the average case - it should save you some space.

电影里的梦 2025-01-10 23:05:49

有两种方法可以实现:

  1. 对多级子树进行编码。例如:在最大级别 2 上,您可以有四种形状:()、(a)、(a->b) 和 (a<-b->c)。现在对每种情况使用 0,10,110,111。对于简单的 2 级完整树编码为:111 0 0。3 级完整树为:111,10,10。对于 4 级完整树,这变为: 111 111 111 0 0 0 0 。分配是任意的。您可以使用霍夫曼编码方案(如阿米特提到的)来找到最佳编码。这种编码方案对于链来说更糟糕。对于纯链,您需要 3n-2 位来存储。

  2. 进行 2n 位编码,然后使用任何压缩算法进行压缩。

=== 另一种方法 ===

在每个节点的正常表示中,您可以使用以下三个选项之一:00、01、11。现在,一次获取三个节点。您总共可以有 27 种组合。您可以将每个组合存储在 5 位中。这样,所需的平均存储空间变为 5/3,而不是 2 位。此外,您可以尝试组合任意数量的节点。压缩率请参见下表:

如您所见,如果将 10 个节点组合在一起,存储空间将减少 1.25 倍(即空间减少 20%)

naive_length compr_length compr_factor
2 2 1.0
4 4 1.0
6 5 1.2
8 7 1.14285714286
10 8 1.25
12 10 1.2
14 12 1.16666666667
16 13 1.23076923077
18 15 1.2
20 16 1.25
22 18 1.22222222222
24 20 1.2
26 21 1.2380952381
28 23 1.21739130435
30 24 1.25
32 26 1.23076923077
34 27 1.25925925926

There are two ways to approach it:

  1. Encode multi-level sub-trees. Eg: at maximum level two you can have four shapes: (), (a), (a->b) and (a<-b->c). Now use 0,10,110,111 for each of these cases. For a simple 2 level complete tree coding is: 111 0 0. A 3 level complete tree is: 111,10,10. For 4 level complete tree this becomes: 111 111 111 0 0 0 0 . The assignments are arbitrary. You can use hoffman coding scheme (as amit mentioned) to find the optimal encoding. This encoding scheme is worse for chains. For a pure chain you need 3n-2 bits to store.

  2. Do the 2n bit coding and then compress with any compression algorithms.

=== Another Approach ===

In the normal representation for every node you can have one of these three options: 00, 01, 11. Now, take three nodes at a time. You can have total 27 combinations. You can store each of those combinations in 5 bits. This way the average storage needed becomes 5/3 instead on 2 bits. Further, you can try to combine any number of nodes you like. See the following table for compression rates:

As you can see, if you combine 10 nodes together you reduce the storage space by a factor of 1.25 (i.e. space reduced by 20%)

naive_length compr_length compr_factor
2 2 1.0
4 4 1.0
6 5 1.2
8 7 1.14285714286
10 8 1.25
12 10 1.2
14 12 1.16666666667
16 13 1.23076923077
18 15 1.2
20 16 1.25
22 18 1.22222222222
24 20 1.2
26 21 1.2380952381
28 23 1.21739130435
30 24 1.25
32 26 1.23076923077
34 27 1.25925925926
狠疯拽 2025-01-10 23:05:49

如果我理解正确的话,如果一个节点有一个子节点,那么它不是左子节点或右子节点,即在有一个子节点的情况下不区分左子节点和右子节点。然后我认为它可以在 (log 3) n 中完成,其中 log 以 2 为底。

您通过先序遍历来描述树,为每个节点写入子节点的数量(0、1 或 2)。这会创建一个长度为 n 的以 3 为基数的数字(实际上长度为 n - 1,最后一个节点始终有零个子节点)。正好有 3^(n - 1) 个这样的数字,这可以用二进制编码为 (log 3) (n - 1) ~= 1.59 (n - 1)。

可以使用 O(log n) 位来写入编码位串开头的位数。

更新:这是一个实现

If I understand correctly, if a node has one child, it's not a left or right child, i.e. you don't distinguish left and right in case of one child. Then I think it can be done in (log 3) n where the log is base 2.

You describe the tree by a preorder traversal, for every node you write the number of children (0, 1 or 2). This creates a number of base 3 of length n (in fact length n - 1, the last node will always have zero children). There are exactly 3^(n - 1) such numbers, this can be encoded in binary in (log 3) (n - 1) ~= 1.59 (n - 1).

One can use O(log n) bits to write the number of bits in the beginning of the encoded bitstring.

Update: here is an implementation.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文