用于评估数组单调性的算法(即判断数组的“排序性”)
编辑:哇,很多很棒的回复。是的,我使用它作为适应度函数来判断遗传算法执行的排序的质量。因此,评估成本很重要(即,它必须很快,最好是O(n)
。)
作为我正在使用的人工智能应用程序的一部分,我希望能够根据候选整数数组的单调性(也称为“排序性”)对其进行评分。目前,我正在使用一种启发式方法来计算最长的排序运行,然后将其除以数组的长度:
public double monotonicity(int[] array) {
if (array.length == 0) return 1d;
int longestRun = longestSortedRun(array);
return (double) longestRun / (double) array.length;
}
public int longestSortedRun(int[] array) {
if (array.length == 0) return 0;
int longestRun = 1;
int currentRun = 1;
for (int i = 1; i < array.length; i++) {
if (array[i] >= array[i - 1]) {
currentRun++;
} else {
currentRun = 1;
}
if (currentRun > longestRun) longestRun = currentRun;
}
return longestRun;
}
这是一个好的开始,但它没有考虑到可能存在“团块”的可能性排序的子序列。例如:
{ 4, 5, 6, 0, 1, 2, 3, 7, 8, 9}
该数组被划分为三个已排序的子序列。我的算法将其评为仅 40% 排序,但直观上,它应该获得比这更高的分数。这类事情有标准算法吗?
EDIT: Wow, many great responses. Yes, I am using this as a fitness function for judging the quality of a sort performed by a genetic algorithm. So cost-of-evaluation is important (i.e., it has to be fast, preferably O(n)
.)
As part of an AI application I am toying with, I'd like to be able to rate a candidate array of integers based on its monotonicity, aka its "sortedness". At the moment, I'm using a heuristic that calculates the longest sorted run, and then divides that by the length of the array:
public double monotonicity(int[] array) {
if (array.length == 0) return 1d;
int longestRun = longestSortedRun(array);
return (double) longestRun / (double) array.length;
}
public int longestSortedRun(int[] array) {
if (array.length == 0) return 0;
int longestRun = 1;
int currentRun = 1;
for (int i = 1; i < array.length; i++) {
if (array[i] >= array[i - 1]) {
currentRun++;
} else {
currentRun = 1;
}
if (currentRun > longestRun) longestRun = currentRun;
}
return longestRun;
}
This is a good start, but it fails to take into account the possibility that there may be "clumps" of sorted sub-sequences. E.g.:
{ 4, 5, 6, 0, 1, 2, 3, 7, 8, 9}
This array is partitioned into three sorted sub-sequences. My algorithm will rate it as only 40% sorted, but intuitively, it should get a higher score than that. Is there a standard algorithm for this sort of thing?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(11)
计算值增加的步数与总步数怎么样。那是
O(n)
。How about counting the number of steps with increasing value vs. the number of total steps. That's
O(n)
.这似乎是
LevenshteinDamerau–Levenshtein 距离 - 对数组进行排序所需的交换次数。这应该与每个项目距其在排序数组中的位置的距离成正比。这是一个简单的 ruby 算法,用于对距离的平方求和。这似乎是排序性的一个很好的衡量标准——每次交换两个无序元素时,结果都会变小。
This seems like a good candidate for
LevenshteinDamerau–Levenshtein distance - the number of swaps needed to sort the array. This should be proportional to how far each item is from where it should be in a sorted array.Here's a simple ruby algorithm that sums the squares of the distances. It seems a good measure of sortedness - the result gets smaller every time two out-of-order elements are swapped.
我认为要使用的函数的选择在很大程度上取决于您打算使用它的用途。根据您的问题,我猜测您正在使用遗传系统来创建排序程序,这将是排名功能。如果是这样的话,那么执行速度就至关重要。基于此,我敢打赌你的最长排序子序列算法会工作得很好。听起来它应该很好地定义健身。
I expect that the choice of function to use depends very strongly on what you intend to use it for. Based on your question, I would guess that you are using a genetic system to create a sorting program, and this is to be the ranking function. If that is the case, then speed of execution is crucial. Based on that, I bet your longest-sorted-subsequence algorithm would work pretty well. That sounds like it should define fitness pretty well.
像这样的东西? http://en.wikipedia.org/wiki/Rank_correlation
Something like these? http://en.wikipedia.org/wiki/Rank_correlation
这是我刚编的一个。
对于每对相邻值,计算它们之间的数值差。如果第二个大于或等于第一个,则将其添加到
已排序
总计中,否则添加到未排序
总计中。完成后,计算两者的比例。Here's one I just made up.
For each pair of adjacent values, calculate the numeric difference between them. If the second is greater than or equal to the first, add that to the
sorted
total, otherwise add to theunsorted
total. When done, take the ratio of the two.计算所有已排序子序列的长度,然后将它们平方并相加。
如果你想校准最大的强调程度,请使用不同于 2 的幂。
我不确定按长度标准化此值的最佳方法是什么,也许可以将其除以长度的平方?
Compute the lenghts of all sorted sub-sequences, then square them and add them.
If you want to calibrate how much enphasis you put on largest, use a power different than 2.
I'm not sure what's the best way to normalize this by length, maybe divide it per length squared?
您可能正在寻找的是 Kendall Tau。它是两个数组之间冒泡排序距离的一对一函数。要测试数组是否“几乎排序”,请根据排序数组计算其 Kendall Tau。
What you're probably looking for is Kendall Tau. It's a one-to-one function of the bubble sort distance between two arrays. To test whether an array is "almost sorted", compute its Kendall Tau against a sorted array.
我建议查看 煎饼问题 和反转距离排列。这些算法通常用于查找两个排列(恒等和排列字符串)之间的距离。这种距离度量应该考虑更多的有序值簇以及反转(单调递减而不是递增子序列)。还有近似值多项式时间[PDF]。
这实际上完全取决于数字的含义以及这个距离函数在您的上下文中是否有意义。
I would suggest looking at the Pancake Problem and the reversal distance of the permutations. These algorithms are often used to find the distance between two permutations (the Identity and the permuted string). This distance measure should take into account more clumps of in order values, as well as reversals (monotonically decreasing instead of increasing subsequences). There are also approximations that are polynomial time[PDF].
It really all depends on what the number means and if this distance function makes sense in your context though.
我有同样的问题(单调性评分),我建议你尝试最长递增子序列。最高效的算法运行时间为
O(n log n)
,还不错。以问题为例,
{4, 5, 6, 0, 1, 2, 3, 7, 8, 9}
的最长递增序列是{0, 1, 2, 3, 7, 8, 9}
(长度为 7)。也许它比运行时间最长的排序算法更好(70%)。I have the same problem (monotonicity scoring), and I suggest you to try Longest Increasing Subsequence. The most efficient algorithm run in
O(n log n)
, not so bad.Taking example from the question, the longest increasing sequence of
{4, 5, 6, 0, 1, 2, 3, 7, 8, 9}
is{0, 1, 2, 3, 7, 8, 9}
(length of 7). Maybe it rate better (70%) than your longest-sorted-run algorithm.这在很大程度上取决于您打算使用该度量来做什么,但一种简单的方法是将数组输入标准排序算法并测量排序需要完成多少次操作(交换和/或比较)数组。
It highly depends on what you're intending to use the measure for, but one easy way to do this is to feed the array into a standard sorting algorithm and measure how many operations (swaps and/or comparisons) need to be done to sort the array.
使用修饰符 Ratcliff & 进行的一些实验Obershelp
就这样完成了它需要做的事情。但不太确定如何证明这一点。
Some experiments with a modifier Ratcliff & Obershelp
So kind of does what it needs to. Not too sure how to prove it though.