最大化“交换”的最小变化算法
这是一个非数学家提出的组合数学问题,所以请耐心等待!
给定一个包含 n 个不同字符的数组,我想以最小变化顺序生成 k 个字符的子集,即第 i+1 代恰好包含一个不在第 i 代中的字符的顺序。这本身并不太难。但是,我还希望最大限度地增加在第 i+1 代中换出的字符与在第 i 代中换入相同的字符的情况数量。为了说明这一点,对于 n=7,k=3:
abc abd abe* abf* abg* afg aeg* adg* acg* acd ace* acf* aef adf* ade bde bdf bef bcf* bce bcd* bcg* bdg beg* bfg* cfg ceg* cdg* cde cdf* cef def deg dfg efg
带星号的字符串表示我想要最大化的情况;例如,第 3 代中新出现的 e abe 取代了第 2 代中新出现的 abd。似乎不可能在每一代人中都发生这种情况,但我希望它尽可能频繁地发生。
我使用的典型数组大小为 20-30,子集大小约为 5-8。
我使用的是一种奇怪的语言,Icon(或者实际上是它的衍生语言Unicon),所以我不希望有人发布我可以直接使用的代码。但我将感谢伪代码中的答案或提示,并将尽力翻译 C 等。此外,我注意到此类问题经常用整数数组来讨论,我当然可以应用解决方案以这样的方式发布我自己的问题。
谢谢
Kim Bastin
编辑 2010 年 6 月 15 日:
我似乎确实比我想象的陷入了更深的困境,虽然我很感激所有的答案,但并非所有答案都是相关的。作为一个不充分的解决方案的示例,让我发布我自己的 Unicon 过程,用于以最小更改顺序生成字符集 s 的 k 进制子集。要理解代码,您需要知道的事情是:前置的 * 表示结构的大小,因此如果 s 是字符串,*s 表示 s 的大小(它包含的字符数)。 ||是一个字符串连接操作。一个前置!在连续的传递中依次产生结构的每个元素,例如字符串的每个字符。 “挂起”控制结构返回过程的结果,但使过程“悬而未决”,所有局部变量都就位,以便在循环中调用该过程时可以产生新结果。
procedure revdoor(s, k)
# Produces all k-subsets of a string or character set s in a 'revolving
# door' order. Each column except the first traverses the characters
# available to it in alphabetical and reverse alphabetical order
# alternately. The order of the input string is preserved.
# If called in a loop as revdoor("abcdefg", 3),
# the order of production is: abc, abd, abe, abf, abg, acg, acf, ace, acd,
# ade, adf, adg, aeg, aef, afg, bfg, bef, beg, bdg, bdf, bde, bcd, bce,
# bcf, bcg, cdg, cdf, cde, cef, ceg, cfg, dfg, deg, def, efg
local i
static Ctl
if /Ctl then { # this means 'if Ctl doesn't exist'
if k = 0 then return ""
Ctl := list(k, 1) # a list of k elements, each initialised to 1.
}
if Ctl[k] = 1 then {
if k = 1 then suspend !s else
every i := 1 to *s-k+1 do {
suspend s[i] || revdoor(s[i+1:0], k-1)
}
} else {
if k = 1 then suspend !reverse(s) else
every i := -k to -*s by -1 do {
suspend s[i] || revdoor(s[i+1:0], k-1)
}
}
# the following line multiplies element k of Ctl by -1 if k < size of Ctl
# (this controls the order of generation of characters),
# and destroys Ctl on final exit from the procedure.
if k < *Ctl then Ctl[k] *:= -1 else Ctl := &null
end
请注意,在我看来,上述过程的输出并不是最佳的。到目前为止,我的调查结果之一是,n 个元素的 k 元子集列表的最大“交换分数”不小于 comm(n-1, k),或者在 n=7 的情况下,k= 3,最大分数至少为comb(6, 3) = 20。我将列表的“交换分数”定义为列表中新元素替换前一个项目中本身是新的元素的项目数。我没有数学设备来证明这一点,但是用 k=1 或 k=2 很容易看出。对于某些 (n,k) ,可能会有稍高的分数,如 n=7、k=3 的情况:
abc abd abe abf abg
acg adg aeg afg
efg dfg cfg bfg
求bdg bcg
bcd bce bcf
bdf 之前
def cef aef
adf acf
acd 王牌
阿德
bde cde
cdf cdg
ceg
deg(交换分数 = 21)
可能会注意到,上面的列表是“强最小变化顺序”(就像单词高尔夫:新角色始终处于与其替换的角色相同的位置),这可能表明我的方向自己的工作正在开展。我希望在几天内发布更多内容。
金
This is a question on combinatorics from a non-mathematician, so please try to bear with me!
Given an array of n distinct characters, I want to generate subsets of k characters in a minimal-change order, i.e. an order in which generation i+1 contains exactly one character that was not in generation i. That's not too hard in itself. However, I also want to maximise the number of cases in which the character that is swapped out in generation i+1 is the same character that was swapped in in generation i. To illustrate, for n=7, k=3:
abc abd abe* abf* abg* afg aeg* adg* acg* acd ace* acf* aef adf* ade
bde bdf bef bcf* bce bcd* bcg* bdg beg* bfg* cfg ceg* cdg* cde cdf* cef def deg dfg efg
The asterisked strings indicate the case I want to maximise; e.g. the e that is new in generation 3, abe, replaces a d that was new in generation 2, abd. It doesn't seem possible to have this happen in every generation, but I want it to happen as often as possible.
Typical array sizes that I use are 20-30 and subset sizes around 5-8.
I'm using an odd language, Icon (or actually its derivative Unicon), so I don't expect anyone to post code that I can used directly. But I will be grateful for answers or hints in pseudo-code, and will do my best to translate C etc. Also, I have noticed that problems of this kind are often discussed in terms of arrays of integers, and I can certainly apply solutions posted in such terms to my own problem.
Thanks
Kim Bastin
Edit 15 June 2010:
I do seem to have got into deeper water than I thought, and while I'm grateful for all answers, not all of them have been relevant. As an example of a solution which is NOT adequate, let me post my own Unicon procedure for generating k-ary subsets of a character set s in a minimal change order. Things you need to know to understand the code are: a preposed * means the size of a structure, so if s is a string, *s means the size of s (the number of characters it contains). || is a string concatenation operation. A preposed ! produces each element of a structure, e.g. each character of a string, in turn on successive passes. And the 'suspend' control structure returns a result from a procedure, but leaves the procedure 'in suspense', with all local variables in place, so that new results can be produced if the procedure is called in a loop.
procedure revdoor(s, k)
# Produces all k-subsets of a string or character set s in a 'revolving
# door' order. Each column except the first traverses the characters
# available to it in alphabetical and reverse alphabetical order
# alternately. The order of the input string is preserved.
# If called in a loop as revdoor("abcdefg", 3),
# the order of production is: abc, abd, abe, abf, abg, acg, acf, ace, acd,
# ade, adf, adg, aeg, aef, afg, bfg, bef, beg, bdg, bdf, bde, bcd, bce,
# bcf, bcg, cdg, cdf, cde, cef, ceg, cfg, dfg, deg, def, efg
local i
static Ctl
if /Ctl then { # this means 'if Ctl doesn't exist'
if k = 0 then return ""
Ctl := list(k, 1) # a list of k elements, each initialised to 1.
}
if Ctl[k] = 1 then {
if k = 1 then suspend !s else
every i := 1 to *s-k+1 do {
suspend s[i] || revdoor(s[i+1:0], k-1)
}
} else {
if k = 1 then suspend !reverse(s) else
every i := -k to -*s by -1 do {
suspend s[i] || revdoor(s[i+1:0], k-1)
}
}
# the following line multiplies element k of Ctl by -1 if k < size of Ctl
# (this controls the order of generation of characters),
# and destroys Ctl on final exit from the procedure.
if k < *Ctl then Ctl[k] *:= -1 else Ctl := &null
end
Note that the output of the above procedure is not optimal in my sense. One result of my investigations so far is that the maximum 'swapping score' for a list of k-ary subsets of n elements is not less than comb(n-1, k), or in the case of n=7, k=3, the maximum score is at least comb(6, 3) = 20. I define the 'swapping score' of a list as the number of items in the list whose new element replaces an element in the previous item which was itself new. I haven't got the mathematical equipment to prove this, but it is easy to see with k=1 or k=2. For certain (n,k) a slightly higher score is possible, as in the case of n=7, k=3:
abc abd abe abf abg
acg adg aeg afg
efg dfg cfg bfg
beg bdg bcg
bcd bce bcf
bdf bef
def cef aef
adf acf
acd ace
ade
bde cde
cdf cdg
ceg
deg (swapping score = 21)
It may be noted that the above list is in 'strong minimal change order' (like word golf: the new character is always in the same position as the character it replaces), which may indicate the direction my own work is taking. I hope to post something more in a few days.
Kim
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
这相当简单。为了最大化替换,只需将字符视为数字并将字符串加一,直到达到上限。
然后检查并确保您没有在字符串中使用相同的字符两次。
我认为这会起作用:
It's fairly straightforward. In order to maximise replacement just think of the characters as numbers and increment the string by one till you have reached the upper limit.
Then check to see that you don't use the same character twice in the string.
I think this would work:
Kim,你的问题描述听起来很像(家庭作业)尝试描述枚举一组 n 元素的所有 k 组合的最简单解决方案 - 而不会太容易给出实际解决方案。不管怎样,请看下面我的镜头。我使用 Java,但重要部分与 C 没有什么不同。
以下是示例的输出:
Kim, your problem description sounds very much like a (homework) attempt to describe the simplest solution for enumerating all k-combinations of a set of n elements - without giving the actual solution away too easily. Anyway, see below for my shot. I used Java but the important parts are not different from C.
Here is the output for your example:
我没有从算法开始,而是尝试想出一种方法来找到最大“交换分数”的形式,以便您知道要追求什么。通常,可以从这样的证明中得出产生所需结构的算法。
大学已经过去很长时间了,但我一直试图想出一个组合模型来帮助解决这个问题,但运气不太好。
我首先将一组组合想象为图中的顶点,其边对应于组合的“邻接”(只有一个元素差异)。所以:
有一个这些图有很多对称性。任何给定的 {n,k} 的图形与 {n,nk} 的图形相同。如果 k=1 或 k=n-1,则它是 n 个顶点上的完整图(每种组合与所有其他组合仅相差一个字符)。不过,我无法从中看到明显的算法。
编辑:我的下一个想法是用稍微不同的解释来构思该图表。您可以将每个 {n,k} 组合视为包含 k 个 1 的 n 位序列。 1 的位置对应于组合中出现的 n 个字符中的哪一个。因此,对于 n=7 k=3,abc 为 1110000,adg 为 1001001,efg 为 0000111。有了这个,您还可以想象位于 n 维超立方体的角上的点。因此,对于给定的子序列,如果边缘共面,则它们符合您的“最小交换”标准:我将它们视为穿过超立方体的“切割平面”。
您正在通过此组合图寻找一条符合您特殊标准的哈密顿路径。
思考问题的另一种方法是尽量减少序列中您更改组合中正在更改的字符的次数。
Rather than start with an algorithm, I've tried to think of a way to find a form for the maximum "swapping score", so that you know what to shoot for. Often an algorithm for producing the desired structure can emerge from such a proof.
It's been a long time since university, but I've tried to think of a combinatorial model that will help to figure this out, without very much luck.
I started by imagining the set of combinations as vertices in a graph, with a edges corresponding to the "adjacency" (only one element difference) of the combinations. So:
There's a lot of symmetry to these graphs. The graph is the same for any given {n,k} as it is for {n,n-k}. If k=1 or k=n-1 it's the complete graph on n vertices (each combinations differs from all the others by only one character). I can't see an obvious algorithm from this, though.
Edit: My next thought was to conceive the graph with a slightly different interpretation. You can think of each {n,k}-combination as a sequence of n bits where there are k 1s. The position of the 1s corresponds to which of the n characters is present in the combination. So for n=7 k=3, abc is 1110000, adg is 1001001, efg is 0000111. With this you can also imagine the points lying at the corners of an n-dimensional hypercube. So for a given subsequence, the edges match your "minimal swapping" criteria if they are co-planar: I think of them as "cutting planes" through the hypercube.
You are looking for a Hamiltonian path through this graph of combinations, one that meets your special criteria.
Another way to think of your problem is to minimize the number of times in the sequence that you do change which character in the combination is being altered.
为了得到一个好的答案,一次性计算所有组合列表是否可以接受,还是需要一次计算一个?换句话说,您是否需要一个功能:
或者可以
接受吗?
如果允许批量计算组合,则迭代深化 A* 搜索(或只是 A* 搜索,但我怀疑它会耗尽内存)可能会起作用。通过可接受的启发式,A* 保证给出最优值。我时间不够,所以我决定发布这个部分答案,并在有时间编写代码时编辑该帖子。
A*是一种图搜索算法。在这种情况下,节点是迄今为止使用的组合列表(列表中不允许重复)。我的计划是对节点使用位串表示。 n=30 适合 32 位整数。我们可以任意排列任何解决方案,以便第一个组合以 0 开始并以 1 结束,即 000...1111。如果两个列表在最后一个元素之前都相同,并且最后一个元素的不同之处仅在于一个 0 位翻转为 1,一个 1 位翻转为 0,则具有较短列表的节点将连接到较长的列表。如果存在“交换”,则两者之间的距离为 0;如果存在交换,则两者之间的距离为 1。
每个组合的第二个表示是打开的位的排序列表。该图的一种可能可接受的启发法是使用该索引列表。每次(在组合列表中)在索引列表中的特定位置使用索引时,将其标记出来。具有未使用索引的头寸数量 - 当前最后更改的索引是(我相信)需要发生的最小交换数量。
为了说明这一启发式,请考虑上面的序列 abc abd abe* abf* abg afg。 (在我的处理中,字母将是数字,但这是一个微小的差异)。该序列(将是搜索图中的一个节点)将标记以下位置:
因此,启发式算法将预测至少需要一个交换(因为位置 3 中没有未标记的元素,并且当前位置为 2) )。
如果我有时间,我将对其进行编辑以发布算法的代码和性能。
回复:NP 完整性结果(在对 Zac Thompson 答案的评论中)。我们在其上搜索最小成本哈密顿路径的图具有非常特殊的结构。例如,通常 NP 完全哈密顿路径问题可以使用“枚举所有组合”算法在 O(n) 时间内解决 - 其中 n 是图中的节点数。这种结构使得尽管在一般图上顶点覆盖很困难,但在您的图上它可能是多项式的(甚至是线性或二次的)。当然,由于该图有很多节点,例如 n=30、k=8,您可能仍然需要进行大量计算。
For a good answer, would computing the list of combinations all at once be acceptable, or do you need to compute them one at a time? In other words, do you need a function:
or would
be acceptable?
If computing the combinations in batch is permissible, it is possible that an iterative-deepening A* search (or just an A* search but I suspect it'd run out of memory) would work. With an admissible heuristic, A* is guaranteed to give the optimum. I'm short of time, so I decided to post this partial answer and edit the post if I get time to write code.
A* is a graph search algorithm. In this case, the nodes are lists of combinations used so far (no duplicates allowed in the list). My plan was to use a bit-string representation for the nodes. n=30 would fit into a 32 bit integer. We can arbitrarily permute any solution so that the first combination begins with 0's and ends in 1's, i.e. 000...1111. A node with a shorter list is connected to a longer one if the two lists are the same up until the last element and the last element differs only in having one 0'bit flipped to a 1 and one 1 bit flipped to a 0. The distance between the two is 0 if there was a "swap" and 1 if there was a swap.
A second representation for each combination is a sorted list of the bits that are turned on. One possible admissible heuristic for this graph is to use this index list. Each time (in the list of combinations) an index is used at a particular position in the index list, mark it off. The number of positions with un-used indices - the current last changed index is (I believe) the minimal number of swaps that need to happen.
To illustrate this heuristic, consider the sequence abc abd abe* abf* abg afg from above. (the letters would be numbers in my treatment, but that is a minor difference). This sequence (which would be one node in the search-graph) would have the following places marked:
Thus the heuristic would predict that there is at least one swap required (since there are no unmarked elements in position 3 and the current position is 2).
If I get the time, I'll edit this to post code and performance of the algorithm.
Re: the NP completeness result (in a comment to Zac Thompson's answer). The graph on which we are searching for a minimal cost Hamiltonian path has a very special structure. For example, the normally NP-complete Hamiltonian Path problem can be solved in O(n) time with the "enumerate all combinations" algorithm - with n being the number of nodes in the graph. This structure makes it possible that, though on a general graph, vertex cover is hard, on your graph it may be polynomial (even linear or quadratic). Of course, since the graph has a lot of nodes for e.g. n=30, k=8 you may still have a lot of computation ahead of you.
我在 2010 年研究过这个问题,但当时未能找到解决方案。几天前,我再次查看了当时的一些笔记,怀疑我已经非常接近解决方案了。几分钟后我拿到了钥匙。
概括地说:要求是字符串 s 的 k 子集的严格最小更改顺序,以便 LIFO(后进先出)最大化。我在之前的帖子中将其称为最大化“交换”。
我在关键要求之后将该算法称为 maxlifo(最大化 LIFO)。它有两个参数:一个字符串 s(不得包含重复字符)和一个不大于 s 大小的正整数 k。该算法是递归的,即 maxlifo(s, k) 使用 maxlifo(s, k-1) 的输出直至 k=1。输出以列表形式返回。
下面我使用字符串“abcdefg”和不同的 k 值给出一个非正式的解释,并举例说明。接下来是作为 Unicon 过程实现的示例。 (我对任何更常用的语言都不流利。)
k=1 的情况很简单——它按从第一个到最后一个的顺序返回 s 的元素。
对于 k>1,将以下规则应用于 maxlifo(s, k-1) 的输出:
(1) 对于 maxlifo(s, k-1) 输出的每个元素,连续列出构建的 k 子集从该元素依次删除 s 的每个缺失字符。子集中字符的顺序与 s 中相同。
(2) 从第二行开始,用空白“占位符”替换前面一行中出现的所有子集。 s 的每个 k 子集现在只出现一次。
(3) 在每个非空白行中,用首字母 ! 进行标记。每个子集使得下一行中的相同位置有一个占位符。这个标记的意思是“第一”。在每个非空白行中将恰好标记一个子集。
(4) 删除所有完全空白的行(仅包含占位符)。
(5) 在除最后一行之外的其余每一行中,用结尾 ! 进行标记。与下一行中标记为“first”的子集对应的位置中的子集。这个标记的意思是“最后”。
现在我们可以列出子集的最终 maxlifo 排序。从上到下的每一行都是有序的,并且其元素按该顺序添加到输出列表中。
(6) 每行从上到下:
(6.1) 删除所有空白占位符。
(6.2) 将标记为“first”(初始!)的子集添加到输出列表中,并将其从行中删除。
(6.3) 如果行中仍然有子集,则最左边或最右边的子集将被标记为“最后”(最终!)。如果最右边的子集标记为“最后”,则按从左到右的顺序将子集添加到输出列表,否则按从右到左的顺序。
(7) 处理完所有行后,返回输出列表。
使用 maxlifo("abcdefg", 2) 的示例:
Col1 包含 maxlifo("abcdefg", 1) 的输出。 Col2 的行包含由 s 的其余字符形成的派系:
清空出现在较早行中的子集:
标记每行中的“第一个”子集(下面有空格的子集):
删除所有完全空白的行 (在这种情况下只有一个):
标记每行中的“最后一个”子集(下面有“第一个”子集的子集)。
按上述顺序输出每一行:'first'、unmarked、'last':
输出:[ab ag af ae ad ac bc bg bf be bd cd cg cf ce df dg de ef eg fg]
3 <= 的示例k≤6 给出的细节较少。步骤 4 中删除的空白行保留在原处。
maxlifo("abcdefg", 3):
输出:[abc abd abe abf abg afg acg adg aeg aef acf adf ade ace acd
bcd bce bcf bcg bfg bdg beg bef bdf bde
CDE CDF CDG CFG CEG CEF
def deg dfg
efg]
maxlifo("abcdefg", 4):
输出:[abcd abcg abcf abce abde abdg abdf abef abeg abfg aefg adfg acfg acdg aceg adeg adef acef acdf acde
bcde bcdg bcdf bcef bceg bcfg befg bdfg bdeg bdef
cdef cdeg cdfg cefg
defg]
maxlifo("abcdefg", 5):
输出:[abcde abcdf abcdg abccfg abceg abcef abdef abdeg abdfg abefg adefg acefg acdfg acdeg acdef
bcdef bcdeg bcdfg bcefg bdefg
cdefg]
maxlifo("abcdefg", 6):
输出: [abcdef abcdeg abcdfg abcefg abdefg acdefg bcdefg]
Unicon 实现:
I worked on this problem in 2010 but failed to find a solution then. A few days ago I had another look at some of my notes from that time and suspected I had been very close to a solution. A few minutes later I had the key.
To recapitalute: the requirement is a strict minimal change ordering of the k-subsets of a string s such that LIFO (last in first out) is maximised. I refer to this as maximised ‘swapping’ in earlier posts.
I call the algorithm maxlifo (maximised LIFO) after the key requirement. It takes two parameters, a string s, which must not contain duplicated characters, and a positive integer k not greater than the size of s. The algorithm is recursive, i.e. maxlifo(s, k) uses the output of maxlifo(s, k-1) down to k=1. Output is returned as a list.
Below I give an informal explanation, with examples, using the string "abcdefg" and various values of k. This is followed by an example of implementation as a Unicon procedure. (I’m not fluent in any of the more commonly used languages.)
The case k=1 is trivial — it returns the elements of s in order from first to last.
For k>1, apply the following rules to the output of maxlifo(s, k-1):
(1) For each element of the output of maxlifo(s, k-1), list in a row the k-subsets built from that element with each missing character of s in turn. The order of characters in the subsets is as in s.
(2) Working from the second row down, substitute blank ‘placeholders’ for all occurrences of subsets that appear in an earlier row. Each k-subset of s now appears just once.
(3) In each non-blank row, mark with an initial ! each subset such that there is a placeholder at the same position in the next row. This marking means ‘first’. Exactly one subset will be so marked in each non-blank row.
(4) Delete all rows that are completely blank (contain only placeholders).
(5) In each remaining row except the last, mark with a final ! the subset in the position corresponding to the subset marked ‘first’ in the next lower row. This marking means ‘last’.
Now we can list the final maxlifo ordering of the subsets. Each row from top to bottom is ordered and its elements added in that order to the output list.
(6) In each row from the top down:
(6.1) Remove all blank placeholders.
(6.2) Add to the output list the subset marked ‘first’ (initial !) and remove it from the row.
(6.3) If there are still subsets remaining in the row, either the leftmost or the rightmost subset will be marked ‘last’ (final !). If the rightmost subset is marked ‘last’, add the subsets to the output list in order from left to right, otherwise in order from right to left.
(7) After processing all rows, return the output list.
Example using maxlifo("abcdefg", 2):
Col1 contains the output of maxlifo("abcdefg", 1). The rows of Col2 contain the cliques formed with the remaining characters of s:
Blank out subsets that appear in an earlier row:
Mark the ‘first’ subset in each row (the one with a blank below it):
Delete all completely blank rows (only one in this case):
Mark the ’last’ subset in each row (the one with a ‘first’ subset below it).
Output each row in the order described above: ‘first’, unmarked, ’last’:
Output: [ab ag af ae ad ac bc bg bf be bd cd cg cf ce df dg de ef eg fg]
Examples for 3 <= k <= 6 are given in less detail. The blank rows deleted in step 4 are left in place.
maxlifo("abcdefg", 3):
Output: [abc abd abe abf abg afg acg adg aeg aef acf adf ade ace acd
bcd bce bcf bcg bfg bdg beg bef bdf bde
cde cdf cdg cfg ceg cef
def deg dfg
efg]
maxlifo("abcdefg", 4):
Output: [abcd abcg abcf abce abde abdg abdf abef abeg abfg aefg adfg acfg acdg aceg adeg adef acef acdf acde
bcde bcdg bcdf bcef bceg bcfg befg bdfg bdeg bdef
cdef cdeg cdfg cefg
defg]
maxlifo("abcdefg", 5):
Output: [abcde abcdf abcdg abcfg abceg abcef abdef abdeg abdfg abefg adefg acefg acdfg acdeg acdef
bcdef bcdeg bcdfg bcefg bdefg
cdefg]
maxlifo("abcdefg", 6):
Output: [abcdef abcdeg abcdfg abcefg abdefg acdefg bcdefg]
Unicon implementation: