对二进制二维矩阵进行排序?
我在这里寻找一些指示,因为我不太知道从哪里开始研究这个。
我有一个每个单元格中有 0 或 1 的 2D 矩阵,例如:
1 2 3 4
A 0 1 1 0
B 1 1 1 0
C 0 1 0 0
D 1 1 0 0
我想对其进行排序,使其尽可能“上三角”,如下所示:
4 3 1 2
B 0 1 1 1
A 0 1 0 1
D 0 0 1 1
C 0 0 0 1
行和列必须保持完整,即元素可以'不能单独移动,只能“整体”交换。
我知道可能会出现病态情况,其中矩阵具有多个可能的排序结果(即形状相同,但“原始”行/列的标识不同。)
因此,任何人都可以建议我在哪里可以找到一些起点为了这?现有的库/算法会很棒,但我会满足于知道我要解决的问题的名称!
我怀疑这是一个线性代数问题,也许有某种适用的图像处理技术。
抛开任何其他想法不谈,我最初的猜测只是在行上编写一个简单的插入排序,然后在列上进行迭代,直到它稳定下来(并希望检测病理情况不会太难。)
更多详细信息< /strong>:有关我正在尝试做的事情的更多信息可能有助于澄清。每行代表一个竞争对手,每列代表一个挑战。每个 1 或 0 都代表参赛者在特定挑战中的“成功”。
通过对矩阵进行排序,使所有 1 都位于右上角,我希望能够提供每个挑战的内在难度排名以及竞争对手的排名(这将考虑他们成功完成的挑战的难度,而不是只是成功的次数。)
关于已接受答案的注释:我已接受模拟退火作为“答案”,但需要注意的是,该问题没有正确的答案。这似乎是一个很好的方法,尽管我实际上还没有想出一个适合我的问题的评分函数。
I'm looking for some pointers here as I don't quite know where to start researching this one.
I have a 2D matrix with 0 or 1 in each cell, such as:
1 2 3 4
A 0 1 1 0
B 1 1 1 0
C 0 1 0 0
D 1 1 0 0
And I'd like to sort it so it is as "upper triangular" as possible, like so:
4 3 1 2
B 0 1 1 1
A 0 1 0 1
D 0 0 1 1
C 0 0 0 1
The rows and columns must remain intact, i.e. elements can't be moved individually and can only be swapped "whole".
I understand that there'll probably be pathological cases where a matrix has multiple possible sorted results (i.e. same shape, but differ in the identity of the "original" rows/columns.)
So, can anyone suggest where I might find some starting points for this? An existing library/algorithm would be great, but I'll settle for knowing the name of the problem I'm trying to solve!
I doubt it's a linear algebra problem as such, and maybe there's some kind of image processing technique that's applicable.
Any other ideas aside, my initial guess is just to write a simple insertion sort on the rows, then the columns and iterate that until it stabilises (and hope that detecting the pathological cases isn't too hard.)
More details: Some more information on what I'm trying to do may help clarify. Each row represents a competitor, each column represents a challenge. Each 1 or 0 represents "success" for the competitor on a particular challenge.
By sorting the matrix so all 1s are in the top-right, I hope to then provide a ranking of the intrinsic difficulty of each challenge and a ranking of the competitors (which will take into account the difficulty of the challenges they succeeded at, not just the number of successes.)
Note on accepted answer: I've accepted Simulated Annealing as "the answer" with the caveat that this question doesn't have a right answer. It seems like a good approach, though I haven't actually managed to come up with a scoring function that works for my problem.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
基于模拟退火的算法可以处理此类事情,不需要太多 麻烦。如果你的矩阵很可能有固定的解决方案,那就不太好,但如果你的矩阵变得更大并且问题变得更加困难,那就太好了。
(但是,它也辜负了您可以增量完成插入的愿望。)
预备知识
设计一个对矩阵“评分”的性能函数 - 更接近三角形的矩阵应该获得更好的分数 。
设计一组矩阵允许的运算。您的描述有点含糊,但如果您可以交换行,那么一个操作就是
SwapRows(a, b)
。另一个可能是SwapCols(a, b)
。退火循环
我不会在这里给出完整的说明,但这个想法很简单。您可以使用您的操作对矩阵执行随机变换。您可以测量操作后矩阵“更好”的程度(使用操作前后的性能函数)。然后您决定是否进行该转换。您多次重复此过程。
决定是否提交转换是有趣的部分:您需要决定是否执行该操作。在退火过程即将结束时,您只接受提高矩阵分数的变换。但早些时候,在一个更加混乱的时期,您允许进行不会提高分数的转换。一开始,算法很“热”,一切都会发生。最终,算法冷却下来,只允许好的变换。如果您线性冷却算法,那么是否接受转换的选择是:
您应该阅读 数值食谱了解有关该算法的更多信息。
长话短说,您应该学习其中一些通用算法。这样做将使您能够解决大量难以通过分析解决的问题。
评分算法
这可能是最棘手的部分。您将需要设计一个记分器来指导退火过程实现您的目标。评分器应该是一个连续函数,当矩阵接近理想解时,它会产生更大的数字。
如何衡量“理想解决方案”——三角形?这是一个天真而简单的计分器:对于每一点,您都知道它应该是
1
还是0
。如果矩阵正确则得分加 +1,如果错误则得分加 -1。这里有一些代码,所以我可以明确(未经测试!请查看!)使用此评分算法,1 和 0 的随机字段将给出 0 分。“相反”三角形将给出最大的负分,而正确的解决方案将给出最积极的分数。比较两个分数即可得出成本。
如果这个记分器不适合您,那么您将需要“调整”它,直到它产生您想要的矩阵。
该算法基于这样一个前提:调整该评分器比设计用于对矩阵进行排序的最佳算法要简单得多。
An Algorithm based upon simulated annealing can handle this sort of thing without too much trouble. Not great if you have small matrices which most likely hae a fixed solution, but great if your matrices get to be larger and the problem becomes more difficult.
(However, it also fails your desire that insertions can be done incrementally.)
Preliminaries
Devise a performance function that "scores" a matrix - matrices that are closer to your triangleness should get a better score than those that are less triangle-y.
Devise a set of operations that are allowed on the matrix. Your description was a little ambiguous, but if you can swap rows then one op would be
SwapRows(a, b)
. Another could beSwapCols(a, b)
.The Annealing loop
I won't give a full exposition here, but the idea is simple. You perform random transformations on the matrix using your operations. You measure how much "better" the matrix is after the operation (using the performance function before and after the operation). Then you decide whether to commit that transformation. You repeat this process a lot.
Deciding whether to commit the transform is the fun part: you need to decide whether to perform that operation or not. Toward the end of the annealing process, you only accept transformations that improved the score of the matrix. But earlier on, in a more chaotic time, you allow transformations that don't improve the score. In the beginning, the algorithm is "hot" and anything goes. Eventually, the algorithm cools and only good transforms are allowed. If you linearly cool the algorithm, then the choice of whether to accept a transformation is:
You should read the excellent information contained in Numerical Recipes for more information on this algorithm.
Long story short, you should learn some of these general purpose algorithms. Doing so will allow you to solve large classes of problems that are hard to solve analytically.
Scoring algorithm
This is probably the trickiest part. You will want to devise a scorer that guides the annealing process toward your goal. The scorer should be a continuous function that results in larger numbers as the matrix approaches the ideal solution.
How do you measure the "ideal solution" - triangleness? Here is a naive and easy scorer: For every point, you know whether it should be
1
or0
. Add +1 to the score if the matrix is right, -1 if it's wrong. Here's some code so I can be explicit (not tested! please review!)With this scoring algorithm, a random field of 1s and 0s will give a score of 0. An "opposite" triangle will give the most negative score, and the correct solution will give the most positive score. Diffing two scores will give you the cost.
If this scorer doesn't work for you, then you will need to "tune" it until it produces the matrices you want.
This algorithm is based on the premise that tuning this scorer is much simpler than devising the optimal algorithm for sorting the matrix.
我想出了下面的算法,它似乎工作正常。
阶段 1:将
1
最多的行向上移动,将1
最多的列向右移动。1
的数量对行进行排序。我们不在乎如果 2 行具有相同数量的
1
。计算他们的
1
。我们不在乎如果 2 列的数量相同
1
秒。阶段 2:重复阶段 1,但有额外的标准,以便我们满足三角矩阵变形。
行的标准:如果 2 行具有相同数量的
1
,我们将向上移动以较少0
开头的行。列的标准:如果 2 个列具有相同数量的
1
,我们将向右移动底部0
较少的列。示例:
阶段 1
阶段 2
编辑:事实证明,我的算法并不总是给出正确的三角矩阵。
例如:
阶段 1
阶段 2
(*) 也许阶段 3 会增加良好结果。在该阶段,我们将以较少的
0
开头的行放在顶部。I came up with the below algorithm, and it seems to work correctly.
Phase 1: move rows with most
1
s up and columns with most1
s right.1
s. We don't careif 2 rows have the same number of
1
s.counting their
1
s. We don't careif 2 cols have the same number of
1
s.Phase 2: repeat phase 1 but with extra criterions, so that we satisfy the triangular matrix morph.
Criterion for rows: if 2 rows have the same number of
1
s, we move up the row that begin with fewer0
s.Criterion for cols: if 2 cols have the same number of
1
s, we move right the col that has fewer0
s at the bottom.Example:
Phase 1
Phase 2
Edit: it turns out that my algorithm doesn't give proper triangular matrices always.
For example:
Phase 1
Phase 2
(*) Perhaps a phase 3 will increase the good results. In that phase we place the rows that start with fewer
0
s in the top.查找 Anna Lubiw 于 1987 年发表的关于“矩阵的双重词法排序”的论文。
下面有一个引文。顺序与您要查找的顺序并不相同,但非常接近。如果不出意外的话,你应该能够从那里得到一个很好的主意。
http://dl.acm.org/itation.cfm?id=33385
Look for a 1987 paper by Anna Lubiw on "Doubly Lexical Orderings of Matrices".
There is a citation below. The ordering is not identical to what you are looking for, but is pretty close. If nothing else, you should be able to get a pretty good idea from there.
http://dl.acm.org/citation.cfm?id=33385
这是一个起点:
将每一行从二进制位转换为数字
按降序对数字进行排序。
然后将每一行转换回二进制。
Here's a starting point:
Convert each row from binary bits into a number
Sort the numbers in descending order.
Then convert each row back to binary.
基本算法:
价值观。确定列总和
并存储值。
总和按升序排列。
希望您应该拥有一个尽可能接近右上三角形区域的矩阵。
Basic algorithm:
values. Determine the column sums
and store values.
sums in ascending order.
Hopefully, you should have a matrix with as close to an upper-right triangular region as possible.
将行视为二进制数,最左边的列作为最高有效位,并按从上到下的降序排序。 将列视为二进制数,最
底下的行作为最高有效位,并按升序排序,从左到右正确的。
重复直到达到固定点。算法终止的证明留给读者作为练习。
Treat rows as binary numbers, with the leftmost column as the most significant bit, and sort them in descending order, top to bottom
Treat the columns as binary numbers with the bottommost row as the most significant bit and sort them in ascending order, left to right.
Repeat until you reach a fixed point. Proof that the algorithm terminates left as an excercise for the reader.