移位字符串的数据结构

发布于 2024-12-28 11:03:56 字数 811 浏览 0 评论 0原文

我们对二进制字符串的数据结构感兴趣。令 S=s1s2....sm 为大小为 m 的二进制字符串。 Shift(S,i) 是字符串 S i 空格向左循环移位。即 Shift(S,i)=sisi+1si+2...sm< /sub>s1...si-1。建议一种高效的数据结构,支持:

  1. O(1) 内的空 DS 的 Init()
  2. Insert(s) 将二进制字符串插入到 O(|s 内的 DS) |^2)
  3. Search_circlic(s) 检查 O(|s|) 中是否存在任何 iShift(S,i)

空间复杂度: O(|S1|+|S2|+.....+|Sm|) 其中 S<如果我们已经插入了 m 个字符串,那么 sub>i 就是 1。

如果我必须为某个给定的 i 找到 Search_circlic(s,i) ,那么使用后缀树并在 O(|s|) 中遍历它就非常简单了。但在 Search_cycl(s) 中,我们没有给定的 i,所以我不知道在给定的复杂度下该怎么做。 OTOH,Insert(s) 通常需要 O(|s|) 来插入后缀树,这里我们给出 O(|s|^2)。

We're interested in a data structure for binary strings. Let S=s1s2....sm be a binary string of size m. Shift(S,i) is a cyclic shift of string S i spaces to the left. That is, Shift(S,i)=sisi+1si+2...sms1...si-1. Suggest an efficient data structure that supports:

  1. Init() of an empy DS in O(1)
  2. Insert(s) inserts a binary string to the DS in O(|s|^2)
  3. Search_cyclic(s) checks if there is a Shift(S,i) for ANY i in O(|s|).

Space Complexity: O(|S1|+|S2|+.....+|Sm|) where Si is one if the m strings we've inserted this far.

If i had to find Search_cyclic(s,i) for some given i, this is quite simple with using a suffix tree and just traversing it in O(|s|). But here in Search_cyclic(s) we don't have a given i, so I don't know what to do in the given complexity. OTOH, Insert(s) generally takes O(|s|) to insert to a suffix tree and here we are given O(|s|^2).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

染墨丶若流云 2025-01-04 11:03:56

所以这是我可以向您提出的解决方案。复杂性甚至低于他们要求你的复杂性,但可能看起来有点复杂。

保存所有字符串的数据结构将是 Trie 甚至是 帕特里夏树。在这棵树中,对于每个字符串,您想要在所有可能的移位中插入最小循环移位(即所有可能的循环移位,按字典顺序最小)。您可以计算线性时间内字符串的最小循环移位,稍后我将给出一个可能的解决方案。现在我们假设你能做到。以下是所需操作的实现方式:

  1. Init() - trie 树和 patricia 树的 init 都是常量 - 这里没问题
  2. Insert(s) - 您在 O(|s|) 中计算 s 的最小循环移位 s' 并然后将其插入 O(|s'|) = O(|s|) 中的任一数据结构中。这比所需的复杂度 Search_circular(s) 更好
  3. - 再次在 O(|s|) 中计算 s 的最小循环移位,然后在 Patricia 或 Trie 中检查字符串是否存在,这又是在 O 中完成的(|s|)

此外,内存复杂性是根据需要的,如果构建 Patricia,内存复杂性可能会更低。

所以剩下的就是解释如何找到最小循环移位。既然您提到后缀树,我希望您知道如何在 中构建它线性时间。所以技巧是 - 将字符串 s 附加到自身(即加倍),然后为加倍的字符串构造一个后缀树。这对于 |s| 仍然是线性的所以没有问题。之后你所要做的就是找到这棵树中长度为 n 的后缀的最小值。我相信这并不难——从根开始,并始终遵循从当前节点开始的链接,该节点上写有最小的字符串,直到累积长度长于|s|。由于字符串加倍,您将始终能够遵循最小的字符串链接,直到累积长度至少 |s|。

希望这个答案有帮助。

So here is a solution I can propose to you. The complexities are even lower then the ones they asked of you but it may seem a bit complicated.

The data structure in which you keep all the strings will be a Trie or even a Patricia tree. In this tree for each string you want to insert the minimum cyclic shift(i.e. the cyclic shift of all possible ones which is minimum lexicographically) out of all of its possible shifts. You can calculate the minimum cyclic shift of a string in linear time and I will give one possible solution to that a bit later. For the moment lets assume you can do it. Here is how the operations required will be implemented:

  1. Init() - init of both trie and patricia tree are constant - no problem here
  2. Insert(s) - you compute the minimum cyclic shift s' of s in O(|s|) and then you insert it in either of the data structures in O(|s'|) = O(|s|). This is even better then the required complexity
  3. Search_cyclic(s) - again you compute the minimum cyclic shift of s in O(|s|) and then you check in the Patricia or Trie if the string is present, which again is done in O(|s|)

Also the memory complexity is as required and may be even lower if you construct a Patricia.

So all that is left is to exaplain how to find the minimum cyclic shift. Since you mention suffix tree I hope you know how to construct it in linear time. So the trick is - you append your string s to itself(i.e. double it) and then you construct a suffix tree for the doubled string. This is still linear with respect to |s| so no problem there. After that all you have to do is to find the minimum of the suffixes of length n in this tree. This is not hard at all I believe - start from the root and always follow the link from the current node that has minimal string written on it until you accumulate length longer then |s|. Because of the doubling of the string, you will always be able to follow minimal string links until you accumulate length at least |s|.

Hope this answer helps.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文