提取结构/压缩序列的好算法有哪些?
从序列中提取层次结构有哪些好的算法?
我主要关心的是压缩序列,并且序列具有一些层次结构。我不太担心算法的运行时间,尽管序列的长度最多为 256k 个符号,并且运行时间不应超过几秒。
到目前为止,我知道 sequitur 算法,并且我想知道任何其他算法可能同样有用的算法/想法。
编辑:解压需要非常简单。
EDIT2:我正在压缩代码。我已经将一个相当大的函数详细阐述为一个巨大的基本代码块,在某些大小上该代码块的运行速度比原始递归函数更快,但随着我改变参数,代码会变得笨拙且庞大。我一直在尝试使用 sequitur 来压缩完全详细的函数,并且效果很好——它使我能够在递归函数和完全详细的基本块之间实现一些中间立场。我现在想知道是否还应该尝试其他算法。
What are some good algorithms for extracting hierarchical structure from sequences?
My primary concern is compressing the sequence, and the sequence has some hierarchical structure to it. I'm not too worried about runtime of the algorithm, though the length of the sequence is up to 256k symbols, and it shouldn't run longer than a few seconds.
So far I'm aware of the sequitur algorithm, and I'd like to know of any other algorithms/ideas that could be similarly useful.
EDIT: The decompression needs to be very simple.
EDIT2: I am compressing code. I have elaborated a rather large function into a huge basic block of code that runs faster than the original recursive function for some sizes, but then the code grows to be unwieldy and large as I vary a parameter. I have been experimenting with sequitur to compress the fully elaborated function, and it works well -- it allows me to achieve some middle ground between the recursive function and the fully elaborated basic block. I'm now wondering if there are other algorithms I should try as well.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
LZ77 和 LZ78 以及 Burrows-Wheeler 变换 是一个很好的开始方式。前两个可以很好地处理流数据,并且可以非常快速地实现。 LZ78的纯字典风格非常适合提取层次结构。
如果您不太关心快速压缩而只想要结构,那么 sequitur 算法将很难被击败 - AFAICT,它是同类中最好的。
LZ77 and LZ78 and the Burrows-Wheeler Transform are a good way to start. The first two work well with streamed data and can have very fast implementations. The pure dictionary style of LZ78 is well suited to extracting hierarchical structures.
If you were less concerned about fast compression and just wanted the structure, the sequitur algorithm will be hard to beat -- AFAICT, it is the best in class.