冗余编码?
这更像是一个计算机科学/信息论问题,而不是一个简单的编程问题,所以如果有人知道更好的网站来发布这个问题,请告诉我。
假设我有一个 N 位数据,将在 M 条消息中冗余发送,其中至少 M-1 条消息将被成功接收。我对以每条消息更少的位数编码 N 位数据的不同方式感兴趣。 (这类似于 RAID,但级别要小得多,其中 N = 8 或 16 或 32 )
示例:假设 N = 16,M = 4。那么我可以使用以下算法:
1st and 3rd message: send "0" + bits 0-7
2nd and 4th message: send "1" + bits 8-15
如果我可以保证 4 条消息中的 3 条能够通过,那么每组中至少有一条消息能够通过。因此我可以用 9 位或更少的位来完成这项工作,可能有一种方法可以用更少的总位来完成这项工作,但我不确定如何做。
是否有一些简单的编码/解码算法可以完成这种事情? 这个问题有名字吗?(如果我知道它叫什么,我可以用谷歌搜索它!)
注意:在我的特定情况下,消息要么正确到达,要么不正确完全到达(没有消息到达时有错误)。
(编辑:将第二部分移至单独的问题)
This is more of a computer science / information theory question than a straightforward programming one, so if anyone knows of a better site to post this, please let me know.
Let's say I have an N-bit piece of data that will be sent redundantly in M messages, where at least M-1 of those messages will be received successfully. I am interested in different ways of encoding the N-bit piece of data in fewer bits per message. (this is similar to RAID but at a much smaller level, where N = 8 or 16 or 32)
Example: suppose N = 16 and M = 4. Then I could use the following algorithm:
1st and 3rd message: send "0" + bits 0-7
2nd and 4th message: send "1" + bits 8-15
If I can guarantee that 3 messages of the 4 will get through, then at least one message from each group will get through. Thus I can make this work with 9 bits or less, there's probably a way to do this with fewer total bits but I'm not sure how.
Are there some simple encoding/decoding algorithms to do this kind of thing? Does this problem have a name? (if I know what it's called, I can google it!)
note: in my particular case, the messages either arrive correctly or do not arrive at all (no messages arrive with errors).
(edit: moved 2nd part to a separate question)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
(以下是不完整的答案。我可能会在稍后添加更多内容。)
您可能感兴趣的术语是信道编码:向源添加冗余,以使其在噪声信道上传输期间保持鲁棒性。在信息论中,信道编码的补充问题是源编码:减少源中的冗余以使用更少的比特来表示它。 (这两个问题的组合称为联合源信道编码。)
您的第一个问题要求找到信道代码。您给出的简单示例类似于重复代码,即您发送同一消息两次以上(通常是奇数次),然后最常收到的消息被接受为原始消息。
这段代码效率低下。为了使用标准符号,令 k = 原始消息中的位数,n = 传输消息中的位数。对于您的示例,k = 16 且 n = 36。编码效率的衡量标准是 k/n,其中越高意味着效率越高。在您的情况下,k/n = 0.44。这是低的。
重复码是一种简单的块码,即,向每个k位块添加冗余以创建n位码字。正如其他人提到的,汉明和里德-所罗门代码也是如此。通过一些基本的线性代数,汉明码相对容易理解。
这些术语应该足够您自行搜索。祝你好运。
(Incomplete answer follows. I may add more later.)
The term you may be interested in is channel coding: adding redundancy to a source in order to make it robust during transmission over a noisy channel. In information theory, the complementary problem to channel coding is source coding: reducing the redundancy in a source to represent it using fewer bits. (The combination of these two problems is called joint source-channel coding.)
Your first question asks to find a channel code. The simple example you give is similar to a repetition code, i.e., you send the same message more than twice (usually an odd number of times), and then the message which is received most often is accepted as the original message.
This code is inefficient. To use standard notation, let k = number of bits in original message, and n = number of bits in the transmitted message. For your example, k = 16 and n = 36. A measure of coding efficiency is k/n, where higher means more efficient. In your case, k/n = 0.44. This is low.
The repetition code is a simple kind of block code, i.e., redundancy is added to each block of k bits to create a codeword of n bits. So are the Hamming and Reed-Solomon codes as others mentioned. Hamming codes are relatively easy to understand with some basic linear algebra.
These should be enough terms for you to search on your own. Good luck.
我不确定我是否正确理解了您问题的所有细节,但您的问题肯定是设计某种 纠错代码。这是计算机科学的一个广阔领域,关于它的著作已经有厚厚的著作。从维基百科开始,看看您是否可以获得任何简单的方案(例如汉明或里德所罗门代码)来适用于您的情况。
如果您不仅想处理符号损坏,还想删除符号,您应该查看纠删码< /a>,这绝对是一个更困难的任务,但在很多情况下都存在好的方法。
编辑:来自 hackersdelight.org 的材料似乎是一个很好的介绍。
I'm not sure if I understood all the details of your question correctly, but your problem is definitely aboud designing some kind of error correcting code. This is a vast area of computer science and thick tomes have been written about it. Start with wikipedia and see if you can get any simple schemes (like Hamming or Reed-Solomon codes) to work in your case.
If you want to deal not only with symbol corruption, but also deletion of symbols, you should look at erasure codes, this is definitely a more difficult task but good methods exist in many cases.
EDIT: This material from hackersdelight.org seems a nice introduction.
请参阅纠删码。
See erasure codes.
您正在寻找数据包擦除代码。只有两种有用的数据包擦除代码不完全受到专利的阻碍,并且只有一个开源库来实现这些代码。在这里找到它:http://planete-bcast.inrialpes.fr/rubrique。 php3?id_rubrique=5
You're looking for a packet erasure code. There are only two useful packet erasure codes that are not totally encumbered by patents, and there's only one open-source library to implement those. Find it here: http://planete-bcast.inrialpes.fr/rubrique.php3?id_rubrique=5
这是一个非常简单的方案,其效率几乎是您的示例的两倍。
您将消息切成 (N/M)*2 位的块。相反,将其切成 N/(M-1) 位块。 (如有必要,将其四舍五入。)第一个块
src[0]
本身进行编码:enc[0]=src[0]
。最后一个块也是如此:enc[M-1]=src[M-1]
。其他每个块都与其左邻居进行异或运算:enc[i]=src[i-1]^src[i]。基本上就像您所做的那样,为每个编码块添加一个 log(M) 位序列号作为前缀,以便接收器可以知道哪个被丢弃。 (如果您可以确定到达的块将按顺序到达,则可以使用 1 位序列号。只需交替使用 0 和 1。)
要解码,请依次从左侧和右侧进行异或,直到遇到丢弃的块。例如
src[1] == enc[0]^enc[1]
。 (删除一个端点块不是特殊情况 - 例如,如果第一个块被删除,则右侧的扫描将恢复它,并且左侧的扫描的长度为 0。)Here's a trivially simple scheme that's almost twice as efficient as your example.
You chopped the message into blocks of (N/M)*2 bits. Instead, chop it into N/(M-1)-bit blocks. (Round it up if necessary.) The first block,
src[0]
, encodes as itself:enc[0]=src[0]
. The same for the last block:enc[M-1]=src[M-1]
. Each of the other blocks gets XORed with its left neighbor:enc[i]=src[i-1]^src[i]
.Prefix each encoded block with a log(M)-bit sequence number, essentially as you did, so the receiver can tell which was dropped. (If you can be sure that whichever blocks arrive will arrive in order, then a 1-bit sequence number will do. Just alternate 0 and 1.)
To decode, successively XOR from the left and the right until you hit the dropped block. E.g.
src[1] == enc[0]^enc[1]
. (Dropping one of the endpoint blocks isn't a special case -- e.g. if the first block is dropped, the scan from the right recovers it, and the scan from the left is of length 0.)