您可以使用哪些技术在有损单向通道上对数据进行编码?
想象一下,您有一个固有有损且单向的通信渠道。 也就是说,存在一些无法消除的固有噪声,这些噪声会导致随机位被切换。 还可以想象这是一种方式 - 您不能请求重传。
但无论如何你都需要通过它发送数据。 您可以使用哪些技术通过该渠道发送号码和文本?
是否可以对数字进行编码,以便即使随机位旋转,它们仍然可以被解释为接近原始值(有损传输)?
有没有办法以无损方式发送字符串(例如 ASCII)?
这只是为了好玩。 我知道您可以使用莫尔斯电码或任何频率非常低的二进制通信。 我了解奇偶校验位和校验和来检测错误并重试。 我知道您不妨使用模拟信号。 我只是好奇是否有任何有趣的计算机科学技术可以通过有损通道发送这些东西。
Imagine you have a channel of communication that is inherently lossy and one-way. That is, there is some inherent noise that is impossible to remove that causes, say, random bits to be toggled. Also imagine that it is one way - you cannot request retransmission.
But you need to send data over it regardless. What techniques can you use to send numbers and text over that channel?
Is it possible to encode numbers so that even with random bit twiddling they can still be interpreted as values close to the original (lossy transmittion)?
Is there a way to send a string of characters (ASCII, say) in a lossless fashion?
This is just for fun. I know you can use morse code or any very low frequency binary communication. I know about parity bits and checksums to detect errors and retrying. I know that you might as well use an analog signal. I'm just curious if there are any interesting computer-sciency techniques to send this stuff over a lossy channel.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
根据您未提供的有关有损通道的一些详细信息,我建议首先使用 格雷码 确保单比特错误导致较小的差异(以满足您在有损传输中减少丢失的愿望),然后可能还使用一些“无损”对结果流进行编码(==尝试 无损;-) 编码。
如果您的噪音片段是容易发生小突发(例如,单个字节内的几个位错误),这应该与格雷编码良好地互操作(因为多位错误是格雷的“丢失减轻”方面的杀手,旨在优雅地降级线路上的单位错误)。 这是因为 RS 本质上是一种块方案,从 RS 的角度来看,一个块内的多个错误基本上与其中的单个错误相同;-)。
如果许多错误都是删除,那么 RS 就特别棒了——简单地说,就是删除是一个很可能在传输过程中被破坏的符号,但您确实知道它已被破坏的关键事实。 物理层,取决于它的设计方式,通常可以暗示这一事实,如果有办法让它通知更高层,那可能会提供至关重要的帮助。 让我稍微解释一下擦除...:
举一个简单的例子,0 以 -1 伏的电平发送,1 以 +1 伏的电平发送(相对于一些参考波),但是有噪声(物理噪声通常可以很好地建模,请询问任何有能力的通信工程师;-); 根据噪声模型,解码可能是 -0.7 V 及以下的任何内容都被视为 0 位,+0.7 V 及以上的任何内容都被视为 1 位,介于两者之间的任何内容都被视为擦除,即,较高层被告知有问题的位可能在传输过程中被损坏,因此应该被忽略。 (我有时将此作为我的论文的一个例子,即有时抽象应该“泄漏”——以一种受控和架构的方式:Spolsky 的 Martelli 推论 泄漏抽象法则!-)。
具有任何给定冗余比的 RS 码在纠正擦除(解码器被告知的错误)方面的效率大约是纠正其他未知错误时的两倍——也可以混合这两个方面,纠正一些擦除和一些其他未知的错误。
最重要的是,可以(相当容易地)设计和定制自定义 RS 代码,以将未纠正错误的概率降低到任何所需阈值 θ 以下,前提是在擦除和未检测到的错误(包括概率和突发性)。
实际上,我不会将整个领域称为“计算机科学”领域:当我毕业时(MSEE,30 年前),我主要试图避免“CS”内容,而倾向于芯片设计、系统设计、高级无线电系统,&c——但是我学到了这些东西(嗯,已经属于实际工程使用领域的子集;-)相当不错。
而且,只是为了确认事情在一代人的时间内没有发生太大变化:我的女儿刚刚获得了电信工程硕士学位(严格专注于先进无线电系统)——她无法设计任何严肃的程序、算法、或数据结构(虽然她在 C 和 Java 的必修课程中表现得很好,但这些课程中绝对没有 CS 深度,她的课程中的其他课程也没有 - 她的日常工作语言是 matlab.. .!-) - 但她对信息和编码理论的了解比我所了解的还要多,而且这还是在任何博士级别的学习之前(她正在攻读博士学位,但尚未开始)。
所以,我声称这些领域更多的是 EE-y,而不是 CS-y(尽管边界当然是模糊的 - 见证这样一个事实:在设计芯片几年后,我或多或少地偶然成为了一名软件人员,并且我的很多同时代人也是如此;-)。
Depending on some details that you don't supply about your lossy channel, I would recommend, first using a Gray code to ensure that single-bit errors result in small differences (to cover your desire for loss mitigation in lossy transmission), and then possibly also encoding the resulting stream with some "lossless" (==tries to be loss-less;-) encoding.
Reed-Solomon and variants thereof are particularly good if your noise episodes are prone to occur in small bursts (several bit mistakes within, say, a single byte), which should interoperate well with Gray coding (since multi-bit mistakes are the killers for the "loss mitigation" aspect of Gray, designed to degrade gracefully for single-bit errors on the wire). That's because R-S is intrinsically a block scheme, and multiple errors within one block are basically the same as a single error in it, from R-S's point of view;-).
R-S is particularly awesome if many of the errors are erasures -- to put it simply, an erasure is a symbol that has most probably been mangled in transmission, BUT for which you DO know the crucial fact that it HAS been mangled. The physical layer, depending on how it's designed, can often have hints about that fact, and if there's a way for it to inform the higher layers, that can be of crucial help. Let me explain erasures a bit...:
Say for a simplified example that a 0 is sent as a level of -1 volt and a 1 is send as a level of +1 volt (wrt some reference wave), but there's noise (physical noise can often be well-modeled, ask any competent communication engineer;-); depending on the noise model the decoding might be that anything -0.7 V and down is considered a 0 bit, anything +0.7 V and up is considered a 1 bit, anything in-between is considered an erasure, i.e., the higher layer is told that the bit in question was probably mangled in transmission and should therefore be disregarded. (I sometimes give this as one example of my thesis that sometimes abstractions SHOULD "leak" -- in a controlled and architected way: the Martelli corollary to Spolsky's Law of Leaky Abstractions!-).
A R-S code with any given redundancy ratio can be about twice as effective at correcting erasures (errors the decoder is told about) as it can be at correcting otherwise-unknown errors -- it's also possible to mix both aspects, correcting both some erasures AND some otherwise-unknown errors.
As the cherry on top, custom R-S codes can be (reasonably easily) designed and tailored to reduce the probability of uncorrected errors to below any required threshold θ given a precise model of the physical channel's characteristics in terms of both erasures and undetected errors (including both probability and burstiness).
I wouldn't call this whole area a "computer-sciency" one, actually: back when I graduated (MSEE, 30 years ago), I was mostly trying to avoid "CS" stuff in favor of chip design, system design, advanced radio systems, &c -- yet I was taught this stuff (well, the subset that was already within the realm of practical engineering use;-) pretty well.
And, just to confirm that things haven't changed all that much in one generation: my daughter just got her MS in telecom engineering (strictly focusing on advanced radio systems) -- she can't design just about any serious program, algorithm, or data structure (though she did just fine in the mandatory courses on C and Java, there was absolutely no CS depth in those courses, nor elsewhere in her curriculum -- her daily working language is matlab...!-) -- yet she knows more about information and coding theory than I ever learned, and that's before any PhD level study (she's staying for her PhD, but that hasn't yet begun).
So, I claim these fields are more EE-y than CS-y (though of course the boundaries are ever fuzzy -- witness the fact that after a few years designing chips I ended up as a SW guy more or less by accident, and so did a lot of my contemporaries;-).
这个问题是编码理论的主题。
This question is the subject of coding theory.
最著名的方法之一可能是使用汉明码。 它可能不是大规模纠正错误的最佳方法,但它非常容易理解。
Probably one of the better-known methods is to use Hamming code. It might not be the best way of correcting errors on large scales, but it's incredibly simple to understand.
Turbo 代码 或 一般数据的低密度奇偶校验码,因为这些最接近香农极限 - 请参阅维基百科。
Either Turbo Codes or Low-density parity-checking codes for general data, because these come closest to approaching the Shannon limit - see wikipedia.
您可以使用 Reed-Solomon 代码。
You can use Reed-Solomon codes.
另请参阅滑动窗口协议(TCP 使用)。
尽管这包括处理数据包重新排序或完全丢失,但这不是问题定义的一部分。
See also the Sliding Window Protocol (which is used by TCP).
Although this includes dealing with packets being re-ordered or lost altogether, which was not part of your problem definition.
正如 Alex Martelli 所说,世界上有很多编码理论,但里德-所罗门码绝对是一个最佳选择。 如果你真的想构建一些东西,Jim Plank 已经写了关于 Reed-Solomon 编码的好教程。 Plank 对编码有着专业的兴趣,并拥有大量的实践专业知识作为后盾。
As Alex Martelli says, there's lots of coding theory in the world, but Reed-Solomon codes are definitely a sweet spot. If you actually want to build something, Jim Plank has written a nice tutorial on Reed-Solomon coding. Plank has a professional interest in coding with a lot of practical expertise to back it up.
我会采纳其中一些建议,然后多次发送相同的数据。 这样,您就可以希望在流中的不同点引入不同的错误,并且您可以更轻松地推断出所需的数字。
I would go for some of these suggestions, followed by multiple sendings of the same data. So that way you can hope for different errors to be introduced at different points in the stream, and you may be able to infer the desired number a lot easier.