排列 R 中向量的所有唯一枚举
我试图找到一个函数来排列向量的所有唯一排列,同时不计算相同元素类型子集中的并置。例如:
dat <- c(1,0,3,4,1,0,0,3,0,4)
具有
factorial(10)
> 3628800
只有 10!/(2!*2!*4!*2!)
factorial(10)/(factorial(2)*factorial(2)*factorial(2)*factorial(4))
> 18900
可能的排列,但当忽略相同元素类型子集中的并置时,
唯一排列。我可以通过使用 unique()
和 combinat
包中的 permn()
函数来得到这个,
unique( permn(dat) )
但这在计算上非常昂贵,因为它涉及到枚举 n!
,这可能比我需要的排列多一个数量级。有没有办法在不先计算 n! 的情况下做到这一点?
I'm trying to find a function that will permute all the unique permutations of a vector, while not counting juxtapositions within subsets of the same element type. For example:
dat <- c(1,0,3,4,1,0,0,3,0,4)
has
factorial(10)
> 3628800
possible permutations, but only 10!/(2!*2!*4!*2!)
factorial(10)/(factorial(2)*factorial(2)*factorial(2)*factorial(4))
> 18900
unique permutations when ignoring juxtapositions within subsets of the same element type.
I can get this by using unique()
and the permn()
function from the package combinat
unique( permn(dat) )
but this is computationally very expensive, since it involves enumerating n!
, which can be an order of magnitude more permutations than I need. Is there a way to do this without first computing n!
?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
编辑:这是一个更快的答案;再次基于 Louisa Gray 和 Bryce Wagner 的想法,但由于更好地使用矩阵索引,R 代码更快。它比我原来的要快很多:
代码:
它不会返回相同的顺序,但排序后,结果是相同的。
对于我的第一次尝试,请参阅编辑历史记录。
EDIT: Here's a faster answer; again based on the ideas of Louisa Grey and Bryce Wagner, but with faster R code thanks to better use of matrix indexing. It's quite a bit faster than my original:
And the code:
It doesn't return the same order, but after sorting, the results are identical.
For my first attempt, see the edit history.
以下函数(它实现了重复排列的经典公式,就像您在问题中手动执行的那样)对我来说似乎相当快:
它确实计算
n!
但不像permn
首先生成所有排列的函数。查看实际操作:
更新:我刚刚意识到问题是关于生成所有唯一的排列,而不仅仅是指定它们的数量 - 对此感到抱歉!
您可以通过为少一个元素指定唯一排列来改进
unique(perm(...))
部分,然后在它们前面添加 uniqe 元素。好吧,我的解释可能会失败,所以让消息来源说:这样你可以获得一些速度。我懒得在您提供的向量上运行代码(花了很多时间),这是对较小向量的一个小比较:
我认为通过将此函数重写为递归,您可以获得更多收益!
更新(再次):我尝试用我有限的知识编写一个递归函数:
这有很大的收获:
请报告这是否适合您!
The following function (which implements the classic formula for repeated permutations just like you did manually in your question) seems quite fast to me:
It does compute
n!
but not likepermn
function which generates all permutations first.See it in action:
UPDATE: I have just realized that the question was about generating all unique permutations not just specifying the number of them - sorry for that!
You could improve the
unique(perm(...))
part with specifying unique permutations for one less element and later adding the uniqe elements in front of them. Well, my explanation may fail, so let the source speak:This way you could gain some speed. I was lazy to run the code on the vector you provided (took so much time), here is a small comparison on a smaller vector:
I think you could gain a lot more by rewriting this function to be recursive!
UPDATE (again): I have tried to make up a recursive function with my limited knowledge:
Which has a great gain:
Please report back if this would work for you!
这里没有提到的一个选项是
multicool
包中的allPerm
函数。它可以很容易地用来获得所有独特的排列:在基准测试中,我发现它在
dat
上比 OP 和 daroczig 的解决方案更快,但比 Aaron 的解决方案慢。One option that hasn't been mentioned here is the
allPerm
function from themulticool
package. It can be used pretty easily to get all the unique permutations:In benchmarking I found it to be faster on
dat
than the solutions from the OP and daroczig but slower than the solution from Aaron.我实际上并不了解 R,但这是我解决问题的方法:
查找每种元素类型的数量,即
按频率排序(上面已经是)。
从出现频率最高的值开始,该值占据 10 个位置中的 4 个。确定 10 个可用点内 4 个值的唯一组合。
(0,1,2,3),(0,1,2,4),(0,1,2,5),(0,1,2,6)
... (0,1,2,9),(0,1,3,4),(0,1,3,5)
... (6,7,8,9)
转到第二个最常见的值,它占据 6 个可用位置中的 2 个,并确定它的 6 个中的 2 个的唯一组合。
(0,1),(0,2),(0,3),(0,4),(0,5),(1,2),(1,3) ... (4,6), (5,6)
然后 4 中的 2:
(0,1),(0,2),(0,3),(1,2),(1,3),(2,3)
其余值,2 of 2:
(0,1)
然后你需要将它们组合成每种可能的组合。这是一些伪代码(我相信有一个更有效的算法,但这应该不会太糟糕):
I don't actually know R, but here's how I'd approach the problem:
Find how many of each element type, i.e.
Sort by frequency (which the above already is).
Start with the most frequent value, which takes up 4 of the 10 spots. Determine the unique combinations of 4 values within the 10 available spots.
(0,1,2,3),(0,1,2,4),(0,1,2,5),(0,1,2,6)
... (0,1,2,9),(0,1,3,4),(0,1,3,5)
... (6,7,8,9)
Go to the second most frequent value, it takes up 2 of 6 available spots, and determine it's unique combinations of 2 of 6.
(0,1),(0,2),(0,3),(0,4),(0,5),(1,2),(1,3) ... (4,6),(5,6)
Then 2 of 4:
(0,1),(0,2),(0,3),(1,2),(1,3),(2,3)
And the remaining values, 2 of 2:
(0,1)
Then you need to combine them into each possible combination. Here's some pseudocode (I'm convinced there's a more efficient algorithm for this, but this shouldn't be too bad):
另一个选择是 iterpc 包,我相信它是现有方法中最快的。更重要的是,结果是按字典顺序排列的(这可能在某种程度上更可取)。
基准测试表明
iterpc
比此处描述的所有其他方法要快得多Another option is the
iterpc
package, I believe it is the fastest of the existing method. More importantly, the result is in dictionary order (which may be somehow preferable).The benchmark indicates that
iterpc
is significant faster than all other methods described here由于这个问题很老,并且继续吸引许多观点,所以这篇文章只是为了告知 R 用户该语言在执行 OP 概述的流行任务方面的当前状态。正如 @RandyLai 提到的,有一些开发包是为了完成此任务而开发的。它们是:安排和RcppAlgos*。
效率
它们非常高效并且非常容易用于生成多重集的排列。
借助 RcppAlgos,我们可以利用并行处理在更大的示例上获得更高的效率。
字典顺序
这些包的一个很好的好处是输出按照字典顺序:
迭代器
此外,这两个包都提供迭代器,允许逐一高效地生成排列:
* 我是
RcppAlgos
的作者As this question is old and continues to attract many views, this post is solely meant to inform
R
users of the current state of the language with regards to performing the popular task outlined by the OP. As @RandyLai alludes to, there are packages developed with this task in mind. They are: arrangements and RcppAlgos*.Efficiency
They are very efficient and quite easy to use for generating permutations of a multiset.
With
RcppAlgos
we can utilize parallel processing for even better efficiency on larger examples.Lexicographical Order
A nice benefit of these packages is that the output is in lexicographical order:
Iterators
Additionally, both packages offer iterators that allow for memory efficient generation of permutation, one by one:
* I am the author of
RcppAlgos
另一种选择是使用 Rcpp 包。不同之处在于它返回一个列表。
Another option is by using the Rcpp package. The difference is that it returns a list.