如何从亚线性空间/时间中的字符流计算回文?
我什至不知道是否存在解决方案。这是问题的详细内容。您是一个接受无限长字符流的程序(为简单起见,您可以假设字符为 1 或 0)。在任何时候,我都可以停止流(假设在传递了 N 个字符之后)并询问您到目前为止收到的字符串是否是回文。如何使用更少的次线性空间和/或时间来做到这一点?
I don't even know if a solution exists or not. Here is the problem in detail. You are a program that is accepting an infinitely long stream of characters (for simplicity you can assume characters are either 1 or 0). At any point, I can stop the stream (let's say after N characters were passed through) and ask you if the string received so far is a palindrome or not. How can you do this using less sub-linear space and/or time.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
是的。答案大约是向下的三分之二 http://rjlipton.wordpress.com/2011/01/12/stringology-the-real-string-theory/
编辑: 有些人要求我总结结果,如果链接失效。该链接提供了有关以下定理证明的一些详细信息:有一个多磁带图灵机,可以实时识别初始的非平凡回文。(文章也提供了摘要)链接:假设机器读取了输入的x1,x2,...,xk,那么它只有恒定的时间来决定是否x1,x2,...,xk。 em> 是回文。)
多磁带图灵机只是一台具有多个可以读取和写入的并排磁带的机器;在非常具体的意义上,它完全等同于标准图灵机。
实时计算是图灵机必须每 M 步(对于某个有界常数 M)至少从输入中读取一个字符一次的计算。很容易看出,任何实时算法都应该是线性时间的。
有一篇关于证明的论文,大约 10 页,可在机构付费专区此处获取,我不会转发到其他地方。如果您愿意,可以联系作者获取更详细的解释;我最近刚刚读过这篇文章,意识到这或多或少就是您所寻找的。
Yes. The answer is about two-thirds of the way down http://rjlipton.wordpress.com/2011/01/12/stringology-the-real-string-theory/
EDIT: Some people have asked me to summarize the result, in case the link dies. The link gives some details about a proof of the following theorem: There is a multi-tape Turing machine that can recognize initial non-trivial palindromes in real-time. (A summary, also provided by the article linked: Suppose the machine has read x1, x2, ..., xk of the input. Then it has only constant time to decide if x1, x2, ..., xk is a palindrome.)
A multitape Turing machine is just one with several side-by-side tapes that it can read and write to; in a very specific sense it is exactly equivalent to a standard Turing machine.
A real-time computation is one in which a Turing machine must read a character from input at least once every M steps (for some bounded constant M). It is readily seen that any real-time algorithm should be linear-time, then.
There is a paper on the proof which is around 10 pages which is available behind an institutional paywall here which I will not repost elsewhere. You can contact the author for a more detailed explanation if you'd like; I just had read this recently and realized it was more or less what you were looking for.
您可以使用滚动哈希或更多滚动哈希来提高准确性。按照读取的顺序以及读取的相反顺序增量计算到目前为止读取的字符的哈希值。
例如,如果您的哈希函数为
x*3^(k-1)+x*3^(k-2)+...+x*3^0
,其中x< /code> 是您读取的字符,这就是您的操作方式:
显然您必须计算对某物(可能是素数或 2 的幂)取模的哈希值。当然,这可能会导致误报。
You could use a rolling hash, or more rolling hashes for accuracy. Incrementally compute the hash of the characters read so far, in the order they were read, and in reverse order of reading.
If your hash function is
x*3^(k-1)+x*3^(k-2)+...+x*3^0
for example, wherex
is a character you read, this is how you'd do it:Obviously you'd have to calculate the hashes modulo something, probably a prime or a power of two. And of course, this could lead to false positives.
第二轮
据我所知,对于每个新字符,都会出现三种情况:
假设您有一个指针跟踪字符串并指向延续潜在回文的最后一个字符。
(我将使用括号来表示指向的字符)
假设您以 aa(b) 开头并得到一个:
左边并检查它是否是一个“a”(它
是)。你现在有a(a)b。
真正棘手的情况是 2,因为不知怎的,你必须知道你刚刚得到的字符不会影响对称性,它只是延长了中间。为此,您必须持有一个额外的指针来跟踪高原(中间)边缘所在的位置。例如,您有 (b)baabb,而您刚刚获得了另一个 'b',在这种情况下,您必须知道将指针重置到此处中间高原的底部:bbaa(b)bb。由于我们要持续恒定的时间,因此您必须首先将指针放在这里(您没有时间来搜索高原的边缘)。现在,如果您得到另一个“b”,您就知道您仍然处于该高原的边缘,并且您将指针保持在原来的位置,因此 bbaa(b)bb -> bbaa(b)bbb。现在,如果你得到一个“a”,你就知道“b”不是扩展中间的一部分,并且你重置了两个指针(跟踪指针和边缘指针),所以你现在有了 bbaabbbb((a))。
有了这三个案例,我想所有的基础都已经涵盖了。如果您想检查当前字符串是否为回文,请检查第一个指针(不是高原的边缘指针)是否位于索引 0。
Round 2
As I see it, with each new character, there are three cases:
Assume you have a pointer that tracks down the string and points to the last character that continued a potential palindrome.
(I am going to use parenthesis to indicate a pointed at character)
Lets say you are starting with aa(b) and get an:
the left and check if it's an 'a' (it
is). You now have a(a)b.
The really tricky case is 2, because somehow you have to know that the character you just got isn't affecting symmetry, it is just extending the middle. For this, you have to hold an additional pointer that tracks where the plateau's (middle's) edge lies. For example, you have (b)baabb and you just got another 'b', in this case you have to know to reset the pointer to the base of the middle plateau here: bbaa(b)bb. Since we are going for constant time, you have to hold a pointer here to begin with (you can't afford the time to search for the plateau's edge). Now if you get another 'b', you know that you are still on the edge of that plateau and you keep the pointer where it is, so bbaa(b)bb -> bbaa(b)bbb. Now, if you get an 'a', you know that the 'b's are not part of the extended middle and you reset both pointers (The tracking pointer and the edge pointer) so you now have bbaabbbb((a)).
With these three cases, I think all bases are covered. If you ever want to check if the current string is a palindrome, check if the first pointer (not the plateau's edge pointer) is at index 0.
这可能对您有帮助:
http://arxiv.org/pdf/1308.3466v1.pdf
如果您存储最后一个 $ k$ 个输入符号,您可以轻松找到长度达 $k$ 的回文。
如果您使用本文的算法,您可以找到回文的中点及其长度的长度估计。
This might help you:
http://arxiv.org/pdf/1308.3466v1.pdf
If you store the last $k$ many input symbols you can easily find palindromes up to a length of $k$.
If you use the algorithms of the paper you can find the midpoints of palindromes and an length estimate of its length.