生成具有所有排列的序列

发布于 2024-08-20 21:01:06 字数 186 浏览 4 评论 0原文

如何生成包含所有可能排列的最短序列?

例子: 对于长度 2,答案是 121,因为这个列表包含 12 和 21,它们都是可能的排列。

对于长度 3,答案是 123121321,因为此列表包含所有可能的排列: 123、231、312、121(无效)、213、132、321。

每个数字(在给定排列内)只能出现一次。

How can I generate the shortest sequence with contains all possible permutations?

Example:
For length 2 the answer is 121, because this list contains 12 and 21, which are all possible permutations.

For length 3 the answer is 123121321, because this list contains all possible permutations:
123, 231, 312, 121 (invalid), 213, 132, 321.

Each number (within a given permutation) may only occur once.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

北恋 2024-08-27 21:01:06

这种贪心算法产生相当短的最小序列。

更新:请注意对于n ≥ 6,该算法不产生尽可能短的字符串!

  • 制作所有排列的集合。
  • 从集合中删除第一个排列。
  • a = 第一个排列。
  • 查找集合中与 a 结尾重叠最大的序列。如果有平局,则选择字典顺序中第一个的顺序。从集合中删除所选序列并将不重叠的部分添加到a的末尾。重复此步骤,直到集合为空。

奇怪的打破平局的步骤对于正确性来说是必要的;随机打破领带似乎会导致更长的字符串。

我(通过编写一个更长、更慢的程序)验证了该算法给出的长度为 4 的答案 123412314231243121342132413214321 确实是最短的答案。然而,对于长度 6,它产生长度为 873 的答案,这比已知的最短解更长。

该算法的复杂度为 O(n!2)。

Python 中的实现:

import itertools

def costToAdd(a, b):
    for i in range(1, len(b)):
        if a.endswith(b[:-i]):
            return i
    return len(b)

def stringContainingAllPermutationsOf(s):
    perms = set(''.join(tpl) for tpl in itertools.permutations(s))
    perms.remove(s)
    a = s
    while perms:
        cost, next = min((costToAdd(a, x), x) for x in perms)
        perms.remove(next)
        a += next[-cost:]
    return a

该函数生成的字符串长度为 1, 3, 9, 33, 153, 873, 5913, ... 看起来是 这个整数序列

我有预感你可以比 O(n!2) 做得更好。

This greedy algorithm produces fairly short minimal sequences.

UPDATE: Note that for n ≥ 6, this algorithm does not produce the shortest possible string!

  • Make a collection of all permutations.
  • Remove the first permutation from the collection.
  • Let a = the first permutation.
  • Find the sequence in the collection that has the greatest overlap with the end of a. If there is a tie, choose the sequence is first in lexicographic order. Remove the chosen sequence from the collection and add the non-overlapping part to the end of a. Repeat this step until the collection is empty.

The curious tie-breaking step is necessary for correctness; breaking the tie at random instead seems to result in longer strings.

I verified (by writing a much longer, slower program) that the answer this algorithm gives for length 4, 123412314231243121342132413214321, is indeed the shortest answer. However, for length 6 it produces an answer of length 873, which is longer than the shortest known solution.

The algorithm is O(n!2).

An implementation in Python:

import itertools

def costToAdd(a, b):
    for i in range(1, len(b)):
        if a.endswith(b[:-i]):
            return i
    return len(b)

def stringContainingAllPermutationsOf(s):
    perms = set(''.join(tpl) for tpl in itertools.permutations(s))
    perms.remove(s)
    a = s
    while perms:
        cost, next = min((costToAdd(a, x), x) for x in perms)
        perms.remove(next)
        a += next[-cost:]
    return a

The length of the strings generated by this function are 1, 3, 9, 33, 153, 873, 5913, ... which appears to be this integer sequence.

I have a hunch you can do better than O(n!2).

御弟哥哥 2024-08-27 21:01:06
  • 创建所有排列。
  • 让每一个
    排列表示a中的一个节点
    图形。
  • 现在,对于任意两个状态添加一个
    如果它们共享,则边值为 1
    n-1 位数字(对于来自
    结束,并且对于目标
    end),如果共享 n-2 位数字,则为两个
    等等。
  • 现在,你需要寻找
    包含n的最短路径
    顶点。
  • Create all permutations.
  • Let each
    permutation represent a node in a
    graph.
  • Now, for any two states add an
    edge with a value 1 if they share
    n-1 digits (for the source from the
    end, and for the target from the
    end), two if they share n-2 digits
    and so on.
  • Now, you are left to find
    the shortest path containing n
    vertices.
摘星┃星的人 2024-08-27 21:01:06

这是一个快速算法,可以生成包含所有排列的短字符串。我很确定它会产生尽可能短的答案,但我手头没有完整的证明。

解释。下面是所有排列的树。图片不完整;想象这棵树永远向右延伸。

1 --+-- 12 --+-- 123 ...
    |        |
    |        +-- 231 ...
    |        |
    |        +-- 312 ...
    |
    +-- 21 --+-- 213 ...
             |
             +-- 132 ...
             |
             +-- 321 ...

这棵树的k层的节点都是长度的排列
k。此外,排列是按特定顺序排列的,其中有很多
每个排列与其上方和下方的邻居之间的重叠。

准确地说,每个节点的第一个子节点只需添加下一个节点即可找到
符号到最后。例如,213 的第一个孩子是 2134。其余的
的孩子是通过旋转到第一个孩子来找到的,留下一个符号
一次。旋转 2134 将产生 1342、3421、4213。

将给定级别的所有节点并将它们串在一起,重叠
尽可能多地生成字符串 1、121、123121321 等。

该序列中第 n 个字符串的长度为 x=1 到 nx!。 (您可以通过观察相邻排列之间有多少不重叠来证明这一点。兄弟姐妹在除 1 个符号之外的所有符号中重叠;表兄弟在除 2 个符号之外的所有符号中重叠;依此类推。)

证明草图。 我还没有完全证明这是最好的解决方案,但这里是证明如何进行的草图。首先证明任何包含 n 个不同排列的字符串的长度 ≥ 2n - 1。然后证明添加任何包含 n+1 个不同排列的字符串长度为 2n + 1。也就是说,再添加一个排列将花费两位数。继续计算包含 nPrn 的字符串的最小长度em>Pr + 1 个不同的排列,最多 n!。简而言之,这个顺序是最佳的,因为你不能为了让其他地方变得更好而让某个地方变得更糟。它在任何地方都已经是局部最优的。所有的动作都是被迫的。

算法。鉴于所有这些背景,该算法非常简单。将这棵树移动到所需的深度,并将该深度的所有节点串在一起。

幸运的是,我们实际上不必在内存中构建树。

def build(node, s):
    """String together all descendants of the given node at the target depth."""
    d = len(node)  # depth of this node. depth of "213" is 3.
    n = len(s)     # target depth
    if d == n - 1:
        return node + s[n - 1] + node    # children of 213 join to make "2134213"
    else:
        c0 = node + s[d]                 # first child node
        children = [c0[i:] + c0[:i] for i in range(d + 1)]  # all child nodes
        strings = [build(c, s) for c in children]  # recurse to the desired depth
        for j in range(1, d + 1):
            strings[j] = strings[j][d:]  # cut off overlap with previous sibling
        return ''.join(strings)          # join what's left

def stringContainingAllPermutationsOf(s):
    return build(s[:1], s)

性能。上面的代码已经比我的其他解决方案快得多,并且它对大字符串进行了大量剪切和粘贴,您可以对其进行优化。该算法的运行时间和内存与输出的大小成正比。

Here is a fast algorithm that produces a short string containing all permutations. I am pretty sure it produces the shortest possible answer, but I don't have a complete proof in hand.

Explanation. Below is a tree of All Permutations. The picture is incomplete; imagine that the tree goes on forever to the right.

1 --+-- 12 --+-- 123 ...
    |        |
    |        +-- 231 ...
    |        |
    |        +-- 312 ...
    |
    +-- 21 --+-- 213 ...
             |
             +-- 132 ...
             |
             +-- 321 ...

The nodes at level k of this tree are all the permutations of length
k. Furthermore, the permutations are in a particular order with a lot
of overlap between each permutation and its neighbors above and below.

To be precise, each node's first child is found by simply adding the next
symbol to the end. For example, the first child of 213 would be 2134. The rest
of the children are found by rotating to the first child to left one symbol at
a time. Rotating 2134 would produce 1342, 3421, 4213.

Taking all the nodes at a given level and stringing them together, overlapping
as much as possible, produces the strings 1, 121, 123121321, etc.

The length of the nth string in that sequence is the sum for x=1 to n of x!. (You can prove this by observing how much non-overlap there is between neighboring permutations. Siblings overlap in all but 1 symbol; first-cousins overlap in all but 2 symbols; and so on.)

Sketch of proof. I haven't completely proved that this is the best solution, but here's a sketch of how the proof would proceed. First show that any string containing n distinct permutations has length ≥ 2n - 1. Then show that adding any string containing n+1 distinct permutations has length 2n + 1. That is, adding one more permutation will cost you two digits. Proceed by calculating the minimum length of strings containing nPr and nPr + 1 distinct permutations, up to n!. In short, this sequence is optimal because you can't make it worse somewhere in the hope of making it better someplace else. It's already locally optimal everywhere. All the moves are forced.

Algorithm. Given all this background, the algorithm is very simple. Walk this tree to the desired depth and string together all the nodes at that depth.

Fortunately we do not actually have to build the tree in memory.

def build(node, s):
    """String together all descendants of the given node at the target depth."""
    d = len(node)  # depth of this node. depth of "213" is 3.
    n = len(s)     # target depth
    if d == n - 1:
        return node + s[n - 1] + node    # children of 213 join to make "2134213"
    else:
        c0 = node + s[d]                 # first child node
        children = [c0[i:] + c0[:i] for i in range(d + 1)]  # all child nodes
        strings = [build(c, s) for c in children]  # recurse to the desired depth
        for j in range(1, d + 1):
            strings[j] = strings[j][d:]  # cut off overlap with previous sibling
        return ''.join(strings)          # join what's left

def stringContainingAllPermutationsOf(s):
    return build(s[:1], s)

Performance. The above code is already much faster than my other solution, and it does a lot of cutting and pasting of large strings that you can optimize away. The algorithm can be made to run in time and memory proportional to the size of the output.

美男兮 2024-08-27 21:01:06

我展示了一个 DNA 序列的例子。有时,我需要生成所有可能的短 DNA 序列(通常称为寡核苷酸)。我使用队列来代替递归。我没有优化,如果你想节省内存,你可以使用resize of string到所需的长度。为了进一步节省内存(也许还有效率),您可以使用构造中间体的指针和长度。我的代码是C++。它可以转换任何其他语言。
对于那些 DNA(高中)是一个外来词,它只有 4 个字母:A、C、G、T。对于其他字母集,您将需要一个循环来节省一些输入;否则,您仍然可以通过复制粘贴手动输入(例如 English 26)代码行。

 vector<string> buildAllOligo(short len) {
     //int num = pow(4, len);
     //vector<char*> res;
     vector<string> res;
     queue<string> intermediate;
     intermediate.push(string());
     while (!intermediate.empty()) {
        string addA(std::move(intermediate.front()));
        string addC(addA);
        string addG(addA);
        string addT(addA);
        addA.push_back('A');
        addC.push_back('C');
        addG.push_back('G');
        addT.push_back('T');
        intermediate.pop();
        if (static_cast<short>(addA.size()) < len) {
           intermediate.push(std::move(addA));
           intermediate.push(std::move(addC));
           intermediate.push(std::move(addG));
           intermediate.push(std::move(addT));
        }
        else {
           res.push_back(std::move(addA));
           res.push_back(std::move(addC));
           res.push_back(std::move(addG));
           res.push_back(std::move(addT));
        }
     }
     assert(res.size() == pow(4,len));
     return res;
  }

I show one example with DNA sequences. Sometimes, I need to generate all possible short DNA sequences (usually called oligos). I use a queue to replace recursion. I did not optimize, if you want to save memory, you can use resize of string to the desired length. To further save memory (and maybe efficiency), you can use pointer and length of the the construction intermediate. My code is C++. It can be converted any any other language.
For those DNA (High school) is a foreign word, it has only 4 letters: A, C, G, T. For other set of alphabets, you will need a loop to save some typing; otherwise, you can still type, say English 26, lines of code manually with copy-pasting.

 vector<string> buildAllOligo(short len) {
     //int num = pow(4, len);
     //vector<char*> res;
     vector<string> res;
     queue<string> intermediate;
     intermediate.push(string());
     while (!intermediate.empty()) {
        string addA(std::move(intermediate.front()));
        string addC(addA);
        string addG(addA);
        string addT(addA);
        addA.push_back('A');
        addC.push_back('C');
        addG.push_back('G');
        addT.push_back('T');
        intermediate.pop();
        if (static_cast<short>(addA.size()) < len) {
           intermediate.push(std::move(addA));
           intermediate.push(std::move(addC));
           intermediate.push(std::move(addG));
           intermediate.push(std::move(addT));
        }
        else {
           res.push_back(std::move(addA));
           res.push_back(std::move(addC));
           res.push_back(std::move(addG));
           res.push_back(std::move(addT));
        }
     }
     assert(res.size() == pow(4,len));
     return res;
  }
︶葆Ⅱㄣ 2024-08-27 21:01:06

对于 n 3 长度的链是 8
12312132
在我看来,我们正在使用循环系统——换句话说,它是环。但我们正在像对待链条一样对待戒指。链确实是123121321 = 9
但环是12312132 = 8
我们从序列 12312132[1] 的开头取出 321 的最后 1。

For n 3 length chain is 8
12312132
Seems to me we are working with cycled system - it's ring, saying in other words. But we are are working with ring as if it is chain. Chain is realy 123121321 = 9
But the ring is 12312132 = 8
We take last 1 for 321 from the beginning of the sequence 12312132[1].

碍人泪离人颜 2024-08-27 21:01:06

这些被称为(最小长度)超级排列(参见维基百科)
当一位匿名用户在 4chan 上发布了新的下限时,人们对此重新产生了兴趣。 (有关历史,请参阅维基百科和许多其他网页。)

据我所知,截至今天我们才知道:

  • 它们的长度是 A180632(n) ≤ A007489(n) = Sum_{k=1..n} k!但这个界限仅在 n ≤ 5 时是尖锐的,即,对于 n ≤ 5,我们具有相等性,但对于 n > 5,我们具有相等性。 5.
  • 有一个非常简单的递归算法,如下所示,产生长度的超排列A007489(n),它始终是回文(但如上所述,这不是 n > 5 的最小长度)。
  • 对于 n ≥ 7,我们有更好的上限 n! + (n−1)! + (n−2)! + (n−3)! + n − 3。
  • 对于 n ≤ 5,所有最小 SP 都是已知的;并且对于所有 n > 5 我们不知道哪个是最小SP。
  • 对于 n = 1, 2, 3, 4,最小 SP 是唯一的(直到更改符号),由 (1, 121, 123121321, 123412314231243121342132413214321) 给出,长度为 A007489(1..4) = (1, 3, 9 ,33)。
  • 对于 n = 5,有 8 个最小长度的不等价 153 = A007489(5);下面的算法产生的回文是按字典顺序排列的第三个。
  • 对于 n = 6,休斯顿产生了数千个已知的最小长度 872 = A007489(6) - 1,但据我所知,我们仍然不知道这是否是最小的。
  • 对于 n = 7,伊根生成的长度为 5906(比上面给出的更好上限小一),但我们同样不知道这是否是最小的。

我编写了一个非常短的 PARI/GP 程序(您可以将其粘贴到 PARI/GP 网站),它实现了产生长度 A007489(n) 的回文超排列的标准算法:

extend(S,n=vecmax(s))={ my(t); concat([
  if(#Set(s)<n, [], /* discard if not a permutation */
    s=concat([s, n+1, s]); /* Now merge with preceding segment: */ 
    forstep(i=min(#s, #t)-1, 0, -1,
      if(s[1..1+i]==t[#t-i..#t], s=s[2+i..-1]; break));
    t=s /* store as previous for next */
  )/*endif*/
  | s <- [ S[i+1..i+n] | i <- [0..#S-n] ]])
}
SSP=vector(6, n, s=if(n>1, extend(s), [1])); // gives the first 6, the 6th being non-minimal

我认为这很容易翻译成任何其他语言。 (对于非 PARI 语言的人:“| x <-”表示“for x in”。)

These are called (minimal length) superpermutations (cf. Wikipedia).
Interest on this has re-sparked when an anonymous user has posted a new lower bound on 4chan. (See Wikipedia and many other web pages for history.)

AFAIK, as of today we just know:

  • Their length is A180632(n) ≤ A007489(n) = Sum_{k=1..n} k! but this bound is only sharp for n ≤ 5, i.e., we have equality for n ≤ 5 but strictly less for n > 5.
  • There's a very simple recursive algorithm, given below, producing a superpermutation of length A007489(n), which is always palindromic (but as said above this is not the minimal length for n > 5).
  • For n ≥ 7 we have the better upper bound n! + (n−1)! + (n−2)! + (n−3)! + n − 3.
  • For n ≤ 5 all minimal SP's are known; and for all n > 5 we don't know which is the minimal SP.
  • For n = 1, 2, 3, 4 the minimal SP's are unique (up to changing the symbols), given by (1, 121, 123121321, 123412314231243121342132413214321) of length A007489(1..4) = (1, 3, 9, 33).
  • For n = 5 there are 8 inequivalent ones of minimal length 153 = A007489(5); the palindromic one produced by the algorithm below is the 3rd in lexicographic order.
  • For n = 6 Houston produced thousands of the smallest known length 872 = A007489(6) - 1, but AFAIK we still don't know whether this is minimal.
  • For n = 7 Egan produced one of length 5906 (one less than the better upper bound given above) but again we don't know whether that's minimal.

I've written a very short PARI/GP program (you can paste to run it on the PARI/GP web site) which implements the standard algorithm producing a palindromic superpermutation of length A007489(n):

extend(S,n=vecmax(s))={ my(t); concat([
  if(#Set(s)<n, [], /* discard if not a permutation */
    s=concat([s, n+1, s]); /* Now merge with preceding segment: */ 
    forstep(i=min(#s, #t)-1, 0, -1,
      if(s[1..1+i]==t[#t-i..#t], s=s[2+i..-1]; break));
    t=s /* store as previous for next */
  )/*endif*/
  | s <- [ S[i+1..i+n] | i <- [0..#S-n] ]])
}
SSP=vector(6, n, s=if(n>1, extend(s), [1])); // gives the first 6, the 6th being non-minimal

I think that easily translates to any other language. (For non-PARI speaking persons: "| x <-" means "for x in".)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文