从每个列表中最优地选取一个元素
我遇到了一个老问题,你们 Mathematica/StackOverflow 的人可能会喜欢,并且对于后代来说,在 StackOverflow 上看起来很有价值。
假设您有一个列表列表,并且您想从每个列表中选取一个元素并将它们放入一个新列表中,以便最大化与其下一个邻居相同的元素数量。 换句话说,对于结果列表 l,最小化 Length@Split[l]。 换句话说,我们希望列表中相同连续元素的中断最少。
例如:(
pick[{ {1,2,3}, {2,3}, {1}, {1,3,4}, {4,1} }]
--> { 2, 2, 1, 1, 1 }
或者 {3,3,1,1,1} 同样好。)
这是一个荒谬的蛮力解决方案:
pick[x_] := argMax[-Length@Split[#]&, Tuples[x]]
其中 argMax 如下所述:
posmax:与 argmax 类似,但给出了 f[x] 最大的元素 x 的位置
你能想出更好的办法吗? 传奇人物卡尔·沃尔(Carl Woll)为我解决了这个问题,我将在一周内透露他的解决方案。
I came across an old problem that you Mathematica/StackOverflow folks will probably like and that seems valuable to have on StackOverflow for posterity.
Suppose you have a list of lists and you want to pick one element from each and put them in a new list so that the number of elements that are identical to their next neighbor is maximized.
In other words, for the resulting list l, minimize Length@Split[l].
In yet other words, we want the list with the fewest interruptions of identical contiguous elements.
For example:
pick[{ {1,2,3}, {2,3}, {1}, {1,3,4}, {4,1} }]
--> { 2, 2, 1, 1, 1 }
(Or {3,3,1,1,1} is equally good.)
Here's a preposterously brute force solution:
pick[x_] := argMax[-Length@Split[#]&, Tuples[x]]
where argMax is as described here:
posmax: like argmax but gives the position(s) of the element x for which f[x] is maximal
Can you come up with something better?
The legendary Carl Woll nailed this for me and I'll reveal his solution in a week.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
不是答案,而是这里提出的方法的比较。我生成了具有可变数量子集的测试集,该数量从 5 到 100 不等。每个测试集都是使用此代码生成的,
其中 rl 是涉及的子集数量。
对于以这种方式生成的每个测试集,我让所有算法都完成它们的工作。我这样做了 10 次(使用相同的测试集),算法以随机顺序运行,以平衡顺序效应和笔记本电脑上随机后台进程的影响。这会得出给定数据集的平均时间。上述直线对于每个 rl 长度使用 20 次,从中计算平均值(均值)和标准差。
结果如下(水平方向为子集数量,垂直方向为平均 AbsoluteTiming):
在此处输入图像描述">
更新
正如 Timo 在此所要求的,计时是可以从中选择的不同子集元素的数量以及每个子集中的元素的最大数量的函数。根据这行代码,为固定数量的子集 (50) 生成数据集:
我还将每个值尝试的数据集数量从 20 增加到 40。
这里有 5 个子集:
Not an answer, but a comparison of the methods proposed here. I generated test sets with a variable number of subsets this number varying from 5 to 100. Each test set was generated with this code
with rl the number of subsets involved.
For every test set that was generated this way I had all the algorithms do their thing. I did this 10 times (with the same test set) with the algorithms operating in a random order so as to level out order effects and the effects of random background processes on my laptop. This results in mean timing for the given data set. The above line was used 20 times for each rl length, from which a mean (of means) and a standard deviation were calculated.
The results are below (horizontally the number of subsets and vertically the mean AbsoluteTiming):
It seems that Mr.Wizard is the (not so clear) winner. Congrats!
Update
As requested by Timo here the timings as a function of the number of distinct subset elements that can be chosen from as well as the maximum number of elements in each subset. The data sets are generated for a fixed number of subsets (50) according to this line of code:
I also increased the number of datasets I tried for each value from 20 to 40.
Here for 5 subsets:
我会把这个扔进戒指里。我不确定它总是给出最佳解决方案,但它似乎与给出的其他一些答案的逻辑相同,而且速度很快。
findruns 提供游程编码输出,包括并行答案。如果需要严格指定的输出,请使用:
这是使用 Fold 的变体。在某些设定形状上速度更快,但在其他形状上速度稍慢。
I'll toss this into the ring. I am not certain it always gives an optimal solution, but it appears to work on the same logic as some other answers given, and it is fast.
findruns
gives run-length-encoded output, including parallel answers. If output as strictly specified is required, use:Here is a variation using Fold. It is faster on some set shapes, but a little slower on others.
这是我的看法,与 Sjoerd 做的事情几乎相同,只是代码量较少。
一些画廊:
编辑鉴于
Sjoerd的Dreeves的蛮力方法由于无法一次生成所有元组而在大样本上失败,这里是另一种蛮力方法:This brute-force -best-pick 可能会产生不同的分割,但根据原始问题,长度才是重要的。
在这个例子中 pick 失败了。
我发布此内容是为了防止有人想要搜索反例,例如 pickPath 或 LongestRuns 等代码确实会生成中断次数最少的序列。
This is my take on it, and does pretty much the same thing as Sjoerd, just in a less amount of code.
Some gallery:
EDIT given that
Sjoerd'sDreeves's brute force approach fails on large samples due to inability to generate all Tuples at once, here is another brute force approach:This brute-force-best-pick might generate different splitting, but it is length that matters according to the original question.
pick fails on this example.
I am posting this in case somebody might want to search for counterexamples that the code like pickPath or LongestRuns does indeed generate a sequence with smallest number of interruptions.
尝试一下...
runsByN:对于每个数字,显示它是否出现在每个子列表中
runsByN
是list
转置的,插入零来表示缺失的数字。它显示了出现 1、2、3 和 4 的子列表。myPick:挑选构成最佳路径的数字
myPick
递归地构建最长运行列表。它并不寻找所有最佳解决方案,而是寻找最小长度的第一个解决方案。感谢 Mr.Wizard 建议使用替换规则作为
TakeWhile
的有效替代方案。结语:可视化解决方案
路径 下图展示了
list
中的数据。每个绘制点对应于一个数字和它出现的子列表。
Here's a go at it...
runsByN: For each number, show whether it appears or not in each sublist
runsByN
islist
transposed, with zeros inserted to represent missing numbers. It shows the sublists in which 1, 2, 3, and 4 appeared.myPick: Picking numbers that constitute an optimal path
myPick
recursively builds a list of the longest runs. It doesn't look for all optimal solutions, but rather the first solution of minimal length.Thanks to Mr.Wizard for suggesting the use of a replacement rule as an efficient alternative to
TakeWhile
.Epilog:Visualizing the solution path
The chart below represents the data from
list
.Each plotted point corresponds to a number and the sublist in which it occurred.
这是我的“一行”,经过 Mr.Wizard 的改进:
它基本上在连续列表上重复使用交集,直到它变为空,然后一次又一次地执行。在一个巨大的折磨测试案例中,
我在 2GHz Core 2 Duo 上得到的
Timing[]
始终在 0.032 左右。下面是我的第一次尝试,我将其留给您细读。
对于给定的元素列表列表
M
,我们计算不同元素和列表数量,按规范顺序列出不同元素,并构造一个矩阵K[i,j]
code> 详细说明了列表j
中元素i
的存在:现在的问题相当于从左到右遍历该矩阵,仅步进 1,并将行更改为尽可能少几次。
为了实现这一点,我对行进行
排序
,在开头取连续 1 最多的行,跟踪我选择的元素,删除
来自的许多列K
并重复:它的
AbsoluteTiming
大约是 Sjoerd 的 方法。So here is my "one liner" with improvements by Mr.Wizard:
It basically uses intersection repeatedly on consecutive lists until it comes up empty, and then does it again and again. In a humongous torture test case with
I get
Timing[]
consistently around 0.032 on my 2GHz Core 2 Duo.Below this point is my first attempt, which I'll leave for your perusal.
For a given list of lists of elements
M
we count the different elements and the number of lists, list the different elements in canonical order, and construct a matrixK[i,j]
detailing the presence of elementi
in listj
:The problem is now equivalent to traversing this matrix from left to right, by only stepping on 1's, and changing rows as few times as possible.
To achieve this I
Sort
the rows, take the one with the most consecutive 1's at the start, keep track of what element I picked,Drop
that many columns fromK
and repeat:This has an
AbsoluteTiming
of approximately three times that of Sjoerd's approach.我的解决方案基于“贪婪是好的”这一观察。如果我可以在中断一条链条和开始一条新的、可能很长的链条之间做出选择,那么选择新的链条继续对我没有任何好处。新链条变长的量与旧链条变短的量相同。
因此,该算法基本上所做的就是从第一个子列表开始,为其每个成员查找具有相同成员的附加子列表的数量,并选择具有最邻近双胞胎的子列表成员。然后,此过程在第一个链末尾的子列表中继续,依此类推。
因此,将其结合到递归算法中,我们最终得到:
测试
Dreeves 的蛮力方法
我第一次使用稍长的测试列表。这种强力方法使我的计算机几乎陷入停顿,占用了它拥有的所有内存。很糟糕。 10 分钟后我不得不重新启动。由于电脑变得极其无响应,重新启动又花了我四分之一的时间。
My solution is based on the observation that 'greed is good' here. If I have the choice between interrupting a chain and beginning a new, potentially long chain, picking the new one to continue doesn't do me any good. The new chain gets longer with the same amount as the old chain gets shorter.
So, what the algorithm basically does is starting at the first sublist and for each of its members finding the number of additional sublists that have the same member and choosing the sublist member that has the most neighboring twins. This process then continues at the sublist at the end of this first chain and so on.
So combining this in a recursive algorithm we end up with:
Test
Dreeves' Brute Force approach
The first time I used a slightly longer test list. The brute force approach brought my computer to a virtual standstill, claiming all the memory it had. Pretty bad. I had to restart after 10 minutes. Restarting took me another quarter, due to the PC becoming extremely non-responsive.
可以使用整数线性规划。这是代码。
您的示例:
Out[88]= {1, {2, 2, 1, 1, 1}}
对于较大的问题,Minimize 可能会遇到麻烦,因为它使用精确的方法来解决宽松的 LP。在这种情况下,您可能需要切换到 NMinimize,并将域参数更改为 Element[fvars,Integers] 形式的约束。
丹尼尔·利希布劳
Could use integer linear programming. Here is code for that.
Your example:
Out[88]= {1, {2, 2, 1, 1, 1}}
For larger problems Minimize might run into trouble since it uses exact methods for solving relaxed LPs. In which case you might need to switch to NMinimize, and change the domain argument to a constraint of the form Element[fvars,Integers].
Daniel Lichtblau
一周就到了!这是卡尔·沃尔提出的传说中的解决方案。 (我试图让他自己发布。卡尔,如果你遇到这个并想获得官方认可,只需将其粘贴为单独的答案,我将删除这个!)
仍然引用卡尔:
A week is up! Here is the fabled solution from Carl Woll. (I tried to get him to post it himself. Carl, if you come across this and want to take official credit, just paste it in as a separate answer and I'll delete this one!)
Still quoting Carl: