数据匹配算法
我有一个项目,正在测试一个对噪声(电磁、无线电等)非常敏感的设备。该设备根据给定的输入(音频)每秒生成 5-6 个字节的二进制数据(对于未经训练的人来说看起来像是乱码)。
根据噪声的不同,有时设备会丢失字符,有时会插入随机字符,有时是两者的倍数。
我编写了一个应用程序,使用户能够动态查看它生成的错误(与主文件相比[例如设备在理想条件下应输出的内容])。我的算法基本上获取实时数据中的每个字节,并将其与已知主文件中相同位置的字节进行比较。如果字节不匹配,我会在当前位置的双向范围内有一个 10 个字符的窗口,我将在其中寻找附近的匹配项。如果匹配(加上一两个验证),我会在 UI 中直观地标记该位置并注册一个错误。
这种方法效果相当好,实际上,考虑到传入数据的速度,它也可以实时工作。然而,我觉得我所做的并不是最佳的,如果数据以更高的速率流动,这种方法就会崩溃。
我还可以采取其他方法吗?是否有针对此类事物的已知算法?
我多年前读到过,尽管太空中存在巨大干扰,但 NASA 的数据收集设备(例如与太空和月球/火星上的飞行器通信的设备)的数据丢失率为 0.00001%。
有什么想法吗?
I have a project where I am testing a device that is very sensitive to noise (electromagnetic, radio, etc...). The device generates 5-6 bytes per second of binary data (looks like gibberish to an untrained eye) based on a give input (audio).
Depending on noise, sometime the device will miss characters, sometimes it will insert random characters, sometimes multiples of both.
I have written an app that gives the user an ability to see on the fly the errors that it generates (as compared to the master file [e.g. what the device should output in ideal conditions]). My algorithm basically takes each byte in the live data and compares it to the byte in the same position in the known master file. If the bytes don't match, I have a window of 10 characters both ways from the current position, where I'll seek a match nearby. If that matches (plus a validation or two), I visually mark up the location in the UI and register an error.
This approach works reasonably well and actually, given the speed of the incoming data, works real time as well. However, I feel like what I am doing is not optimal and the approach would fall apart if the data would stream at higher rates.
Are there other approaches I could take? Are there known algorithms for this type of thing?
I read many years ago that NASA's data collection outfit (e.g. ones that communicate with crafts in space and on the Moon/Mars) have had a 0.00001% loss of data despite tremendous interference in space.
Any ideas?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我认为主要感兴趣的是设备生成的信号?更重要的是什么呢?检测何时发生错误或使信号“稳健”地应对此类错误?我最近做了很多信号处理,对信号进行去噪是我日常工作的一部分,我基本上是在尝试估计真实信号并去除任何污染物。
我不知道如何进一步使用设备生成的信号......如果将其记录到计算机上,那么您可以轻松地应用一些去噪,例如尝试小波去噪。您将找到用您选择的多种语言执行此操作的软件包。
I presume of main interest is the signal generated by the device? What is more important? Detecting when an error has occurred or making the signal 'robust' against such errors? I do a lot of signal processing lately and denoising a signal is part of my routine, I'm basically trying to estimate the real signal and remove any contaminants.
I don't know how the signal generated by the device is further used...if it's being recorded to a computer, then you can easily apply some denoising, try wavelet denoising for instance. You will find packages for doing this in several languages of your choice.