在perl中随机化矩阵,保持行和列的总数相同
我有一个矩阵,我想随机化几千次,同时保持行和列总数相同:
1 2 3
A 0 0 1
B 1 1 0
C 1 0 0
有效随机矩阵的一个例子是:
1 2 3
A 1 0 0
B 1 1 0
C 0 0 1
我的实际矩阵要大得多(大约 600x600 个项目),所以我确实需要一种计算效率高的方法。
我最初的(低效)方法包括使用 Perl Cookbook shuffle
我在下面粘贴了当前的代码。如果在 while 循环中找不到解决方案,我已经准备好了额外的代码,可以从新的打乱后的数字列表开始。该算法对于小矩阵工作得很好,但是一旦我开始扩大规模,就需要很长时间才能找到符合要求的随机矩阵。
有没有更有效的方法来完成我正在寻找的事情? 多谢!
#!/usr/bin/perl -w
use strict;
my %matrix = ( 'A' => {'3' => 1 },
'B' => {'1' => 1,
'2' => 1 },
'C' => {'1' => 1 }
);
my @letters = ();
my @numbers = ();
foreach my $letter (keys %matrix){
foreach my $number (keys %{$matrix{$letter}}){
push (@letters, $letter);
push (@numbers, $number);
}
}
my %random_matrix = ();
&shuffle(\@numbers);
foreach my $letter (@letters){
while (exists($random_matrix{$letter}{$numbers[0]})){
&shuffle (\@numbers);
}
my $chosen_number = shift (@numbers);
$random_matrix{$letter}{$chosen_number} = 1;
}
sub shuffle {
my $array = shift;
my $i = scalar(@$array);
my $j;
foreach my $item (@$array )
{
--$i;
$j = int rand ($i+1);
next if $i == $j;
@$array [$i,$j] = @$array[$j,$i];
}
return @$array;
}
I have a matrix that I want to randomize a couple of thousand times, while keeping the row and column totals the same:
1 2 3
A 0 0 1
B 1 1 0
C 1 0 0
An example of a valid random matrix would be:
1 2 3
A 1 0 0
B 1 1 0
C 0 0 1
My actual matrix is a lot bigger (about 600x600 items), so I really need an approach that is computationally efficient.
My initial (inefficient) approach consisted of shuffling arrays using the Perl Cookbook shuffle
I pasted my current code below. I've got extra code in place to start with a new shuffled list of numbers, if no solution is found in the while loop. The algorithm works fine for a small matrix, but as soon as I start scaling up it takes forever to find a random matrix that fits the requirements.
Is there a more efficient way to accomplish what I'm searching for?
Thanks a lot!
#!/usr/bin/perl -w
use strict;
my %matrix = ( 'A' => {'3' => 1 },
'B' => {'1' => 1,
'2' => 1 },
'C' => {'1' => 1 }
);
my @letters = ();
my @numbers = ();
foreach my $letter (keys %matrix){
foreach my $number (keys %{$matrix{$letter}}){
push (@letters, $letter);
push (@numbers, $number);
}
}
my %random_matrix = ();
&shuffle(\@numbers);
foreach my $letter (@letters){
while (exists($random_matrix{$letter}{$numbers[0]})){
&shuffle (\@numbers);
}
my $chosen_number = shift (@numbers);
$random_matrix{$letter}{$chosen_number} = 1;
}
sub shuffle {
my $array = shift;
my $i = scalar(@$array);
my $j;
foreach my $item (@$array )
{
--$i;
$j = int rand ($i+1);
next if $i == $j;
@$array [$i,$j] = @$array[$j,$i];
}
return @$array;
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
当前算法的问题在于,您试图通过洗牌摆脱死胡同 - 具体来说,当您的
@letters
和@numbers
数组(在初始洗牌之后)@numbers
) 多次产生相同的单元格。当矩阵很小时,这种方法很有效,因为不需要太多尝试就能找到可行的重新洗牌。然而,当列表很大时,它就是一个杀手。即使您可以更有效地寻找替代方案(例如,尝试排列而不是随机洗牌),这种方法也可能注定失败。您可以通过对现有矩阵进行少量修改来解决问题,而不是重新排列整个列表。
例如,让我们从示例矩阵(称为 M1)开始。随机选择一个单元格进行更改(例如 A1)。此时矩阵处于非法状态。我们的目标是通过最少的编辑次数来修复它——具体来说是再进行 3 次编辑。您可以通过围绕矩阵“行走”来实现这 3 个附加编辑,每次修复行或列都会产生另一个需要解决的问题,直到您走完一圈(呃……完整的矩形)。
例如,将A1从0改为1后,下次修复的行走方式有3种:A3、B1、C1。让我们决定第一次编辑应该修复行。所以我们选择A3。在第二次编辑时,我们将修复该列,因此我们有选择:B3 或 C3(例如 C3)。最终修复只提供一个选择(C1),因为我们需要返回到原始编辑的栏。最终结果是一个新的有效矩阵。
如果编辑路径通向死胡同,你就会原路返回。如果所有修复路径都失败,则可以拒绝初始编辑。
这种方法将快速生成新的有效矩阵。它不一定会产生随机结果:M1 和 M2 仍将彼此高度相关,随着矩阵大小的增长,这一点将变得更加明显。
如何增加随机性?您提到大多数单元格(99% 或更多)都是零。一种想法是这样进行:对于矩阵中的每个 1,将其值设置为 0,然后使用上面概述的 4 编辑方法修复矩阵。实际上,您会将所有这些都移动到新的随机位置。
这是一个例子。这里可能还有进一步的速度优化,但这种方法在我的 Windows 机器上在 30 秒左右的时间内以 0.5% 的密度生成了 10 个新的 600x600 矩阵。不知道这样够不够快。
The problem with your current algorithm is that you are trying to shuffle your way out of dead ends -- specifically, when your
@letters
and@numbers
arrays (after the initial shuffle of@numbers
) yield the same cell more than once. That approach works when the matrix is small, because it doesn't take too many tries to find a viable re-shuffle. However, it's a killer when the lists are big. Even if you could hunt for alternatives more efficiently -- for example, trying permutations rather than random shuffling -- the approach is probably doomed.Rather than shuffling entire lists, you might tackle the problem by making small modifications to an existing matrix.
For example, let's start with your example matrix (call it M1). Randomly pick one cell to change (say, A1). At this point the matrix is in an illegal state. Our goal will be to fix it in the minimum number of edits -- specifically 3 more edits. You implement these 3 additional edits by "walking" around the matrix, with each repair of a row or column yielding another problem to be solved, until you have walked full circle (err ... full rectangle).
For example, after changing A1 from 0 to 1, there are 3 ways to walk for the next repair: A3, B1, and C1. Let's decide that the 1st edit should fix rows. So we pick A3. On the second edit, we will fix the column, so we have choices: B3 or C3 (say, C3). The final repair offers only one choice (C1), because we need to return to the column of our original edit. The end result is a new, valid matrix.
If an editing path leads to a dead end, you backtrack. If all of the repair paths fail, the initial edit can be rejected.
This approach will generate new, valid matrixes quickly. It will not necessarily produce random outcomes: M1 and M2 will still be highly correlated with each other, a point that will become more directly evident as the size of the matrix grows.
How do you increase the randomness? You mentioned that most cells (99% or more) are zeros. One idea would be to proceed like this: for each 1 in the matrix, set its value to 0 and then repair the matrix using the 4-edit method outlined above. In effect, you would be moving all of the ones to new, random locations.
Here is an illustration. There are probably further speed optimizations in here, but this approach yielded 10 new 600x600 matrixes, at 0.5% density, in 30 seconds or so on my Windows box. Don't know if that's fast enough.
步骤 1:首先,我将矩阵初始化为零并计算所需的行和列总计。
步骤 2:现在选择一个随机行,按该行中必须存在的 1 计数进行加权(因此计数为 300 的行比权重为 5 的行更有可能被选择)。
步骤 3:对于这一行,选择一个随机列,按该列中 1 的计数进行加权(忽略任何可能已包含 1 的单元格 - 稍后会详细介绍)。
步骤 4:在此单元格中放置一个 1,并减少相应行和列的行数和列数。
步骤 5:返回步骤 2,直到没有行具有非零计数。
但问题是这个算法可能无法终止,因为你可能有一行需要放置一个 1,一列需要一个 1,但你已经在该单元格中放置了一个,所以你会“卡住” '。我不确定这种情况发生的可能性有多大,但如果这种情况发生得非常频繁,我也不会感到惊讶——足以使算法无法使用。如果这是一个问题,我可以想出两种方法来解决它:
a)递归地构造上述算法并允许在失败时回溯。
b) 如果没有其他选项,则允许单元格包含大于 1 的值并继续。然后,最后您得到了正确的行数和列数,但某些单元格可能包含大于 1 的数字。您可以通过查找如下所示的分组来解决此问题:
并将其更改为:
如果满足以下条件,应该很容易找到这样的分组:你有很多零。我认为 b) 可能会更快。
我不确定这是最好的方法,但它可能比洗牌数组更快。我将跟踪这个问题,看看其他人的想法。
Step 1: First I would initialize the matrix to zeros and calculate the required row and column totals.
Step 2: Now pick a random row, weighted by the count of 1s that must be in that row (so a row with count 300 is more likely to be picked than a row with weight 5).
Step 3: For this row, pick a random column, weighted by the count of 1s in that column (except ignore any cells that may already contain a 1 - more on this later).
Step 4: Place a one in this cell and reduce both the row and column count for the appropriate row and column.
Step 5: Go back to step 2 until no rows have non-zero count.
The problem though is that this algorithm can fail to terminate because you may have a row where you need to place a one, and a column that needs a one, but you've already placed a one in that cell, so you get 'stuck'. I'm not sure how likely this is to happen, but I wouldn't be surprised if it happened very frequently - enough to make the algorithm unusable. If this is a problem I can think of two ways to fix it:
a) Construct the above algorithm recursively and allow backtracking on failure.
b) Allow a cell to contain a value greater than 1 if there is no other option and keep going. Then at the end you have a correct row and column count but some cells may contain numbers greater than 1. You can fix this by finding a grouping that looks like this:
and changing it to:
It should be easy to find such a grouping if you have many zeros. I think b) is likely to be faster.
I'm not sure it's the best way, but it's probably faster than shuffling arrays. I'll be tracking this question to see what other people come up with.
我不是数学家,但我认为如果您需要保持相同的列和行总数,那么矩阵的随机版本将具有相同数量的 1 和 0。
如果我错了,请纠正我,但这意味着制作矩阵的后续版本只需要您对行和列进行洗牌。
随机打乱列不会改变行和列的总计,随机打乱行也不会。因此,我要做的就是首先打乱行,然后打乱列。
那应该是相当快的。
I'm not a mathematician, but I figure that if you need to keep the same column and row totals, then random versions of the matrix will have the same quantity of ones and zeros.
Correct me if I'm wrong, but that would mean that making subsequent versions of the matrix would only require you to shuffle around the rows and columns.
Randomly shuffling columns won't change your totals for rows and columns, and randomly shuffling rows won't either. So, what I would do, is first shuffle rows, and then shuffle columns.
That should be pretty fast.
不确定这是否有帮助,但您可以尝试从一个角落开始,对于每一列和行,您应该跟踪总和和实际总和。不要试图找到一个好的矩阵,而是尝试将总数视为金额并将其拆分。对于每个元素,找到行总计 - 实际行总计和列总计 - 实际列总计中较小的数字。现在您已经有了随机数的上限。
清楚了吗?抱歉,我不懂 Perl,所以我无法显示任何代码。
Not sure if it will help, but you can try going from one corner and for each column and row you should track the total and actual sum. Instead of trying to hit a good matrix, try to see the total as amount and split it. For each element, find the smaller number of row total - actual row total and column total - actual column total. Now you have the upper bound for your random number.
Is it clear? Sorry I don't know Perl, so I cannot show any code.
就像@Gabriel一样,我不是一名Perl程序员,所以这可能是你的代码已经做的事情......
你只发布了一个例子。目前尚不清楚您是否想要一个在每行和每列中具有与起始矩阵相同数量的 1 的随机矩阵,或者具有相同行和列但已打乱的随机矩阵。如果后者足够好,您可以创建一个行(或列,没关系)索引数组并随机排列它。然后,您可以按照随机索引指定的顺序读取原始数组。无需修改原始数组或创建副本。
当然,这可能无法满足您不明确的要求。
Like @Gabriel I'm not a Perl programmer so it's possible that this is what your code already does ...
You've only posted one example. It's not clear whether you want a random matrix which has the same number of 1s in each row and column as your start matrix, or one which has the same rows and columns but shuffled. If the latter is good enough you could create an array of row (or column, it doesn't matter) indexes and randomly permute that. You can then read your original array in the order specified by the randomised index. No need to modify the original array or create a copy.
Of course, this might not meet aspects of your requirements which are not explicit.
感谢 FMc 的 Perl 代码。基于这个解决方案,我用Python重写了它(供我自己使用,为了更清楚而在这里分享),如下所示:
Thank the Perl code of FMc. Based on this solution, I rewrite it in Python (for my own use and share here for more clarity) as shown below: