用于评估数组单调性的算法(即判断数组的“排序性”)

发布于 2024-08-18 12:41:51 字数 1012 浏览 8 评论 0原文


编辑:哇,很多很棒的回复。是的,我使用它作为适应度函数来判断遗传算法执行的排序的质量。因此,评估成本很重要(即,它必须很快,最好是O(n)。)


作为我正在使用的人工智能应用程序的一部分,我希望能够根据候选整数数组的单调性(也称为“排序性”)对其进行评分。目前,我正在使用一种启发式方法来计算最长的排序运行,然后将其除以数组的长度:

public double monotonicity(int[] array) {
    if (array.length == 0) return 1d;

    int longestRun = longestSortedRun(array);
    return (double) longestRun / (double) array.length;
}

public int longestSortedRun(int[] array) {

    if (array.length == 0) return 0;

    int longestRun = 1;
    int currentRun = 1;

    for (int i = 1; i < array.length; i++) {
        if (array[i] >= array[i - 1]) {
            currentRun++;
        } else {
            currentRun = 1;
        }

        if (currentRun > longestRun) longestRun = currentRun;
    }

    return longestRun;
}

这是一个好的开始,但它没有考虑到可能存在“团块”的可能性排序的子序列。例如:

{ 4, 5, 6, 0, 1, 2, 3, 7, 8, 9}

该数组被划分为三个已排序的子序列。我的算法将其评为仅 40% 排序,但直观上,它应该获得比这更高的分数。这类事情有标准算法吗?


EDIT: Wow, many great responses. Yes, I am using this as a fitness function for judging the quality of a sort performed by a genetic algorithm. So cost-of-evaluation is important (i.e., it has to be fast, preferably O(n).)


As part of an AI application I am toying with, I'd like to be able to rate a candidate array of integers based on its monotonicity, aka its "sortedness". At the moment, I'm using a heuristic that calculates the longest sorted run, and then divides that by the length of the array:

public double monotonicity(int[] array) {
    if (array.length == 0) return 1d;

    int longestRun = longestSortedRun(array);
    return (double) longestRun / (double) array.length;
}

public int longestSortedRun(int[] array) {

    if (array.length == 0) return 0;

    int longestRun = 1;
    int currentRun = 1;

    for (int i = 1; i < array.length; i++) {
        if (array[i] >= array[i - 1]) {
            currentRun++;
        } else {
            currentRun = 1;
        }

        if (currentRun > longestRun) longestRun = currentRun;
    }

    return longestRun;
}

This is a good start, but it fails to take into account the possibility that there may be "clumps" of sorted sub-sequences. E.g.:

{ 4, 5, 6, 0, 1, 2, 3, 7, 8, 9}

This array is partitioned into three sorted sub-sequences. My algorithm will rate it as only 40% sorted, but intuitively, it should get a higher score than that. Is there a standard algorithm for this sort of thing?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(11

素食主义者 2024-08-25 12:41:52

计算值增加的步数与总步数怎么样。那是O(n)

How about counting the number of steps with increasing value vs. the number of total steps. That's O(n).

七月上 2024-08-25 12:41:51

这似乎是 Levenshtein Damerau–Levenshtein 距离 - 对数组进行排序所需的交换次数。这应该与每个项目距其在排序数组中的位置的距离成正比。

这是一个简单的 ruby​​ 算法,用于对距离的平方求和。这似乎是排序性的一个很好的衡量标准——每次交换两个无序元素时,结果都会变小。

ap = a.sort
sum = 0
a.each_index{|i| j = ap.index(a[i])-i 
  sum += (j*j)
}
dist = sum/(a.size*a.size)

This seems like a good candidate for Levenshtein Damerau–Levenshtein distance - the number of swaps needed to sort the array. This should be proportional to how far each item is from where it should be in a sorted array.

Here's a simple ruby algorithm that sums the squares of the distances. It seems a good measure of sortedness - the result gets smaller every time two out-of-order elements are swapped.

ap = a.sort
sum = 0
a.each_index{|i| j = ap.index(a[i])-i 
  sum += (j*j)
}
dist = sum/(a.size*a.size)
水染的天色ゝ 2024-08-25 12:41:51

我认为要使用的函数的选择在很大程度上取决于您打算使用它的用途。根据您的问题,我猜测您正在使用遗传系统来创建排序程序,这将是排名功能。如果是这样的话,那么执行速度就至关重要。基于此,我敢打赌你的最长排序子序列算法会工作得很好。听起来它应该很好地定义健身。

I expect that the choice of function to use depends very strongly on what you intend to use it for. Based on your question, I would guess that you are using a genetic system to create a sorting program, and this is to be the ranking function. If that is the case, then speed of execution is crucial. Based on that, I bet your longest-sorted-subsequence algorithm would work pretty well. That sounds like it should define fitness pretty well.

初雪 2024-08-25 12:41:51

这是我刚编的一个。

对于每对相邻值,计算它们之间的数值差。如果第二个大于或等于第一个,则将其添加到已排序总计中,否则添加到未排序总计中。完成后,计算两者的比例。

Here's one I just made up.

For each pair of adjacent values, calculate the numeric difference between them. If the second is greater than or equal to the first, add that to the sorted total, otherwise add to the unsorted total. When done, take the ratio of the two.

白昼 2024-08-25 12:41:51

计算所有已排序子序列的长度,然后将它们平方并相加。
如果你想校准最大的强调程度,请使用不同于 2 的幂。

我不确定按长度标准化此值的最佳方法是什么,也许可以将其除以长度的平方?

Compute the lenghts of all sorted sub-sequences, then square them and add them.
If you want to calibrate how much enphasis you put on largest, use a power different than 2.

I'm not sure what's the best way to normalize this by length, maybe divide it per length squared?

耶耶耶 2024-08-25 12:41:51

您可能正在寻找的是 Kendall Tau。它是两个数组之间冒泡排序距离的一对一函数。要测试数组是否“几乎排序”,请根据排序数组计算其 Kendall Tau。

What you're probably looking for is Kendall Tau. It's a one-to-one function of the bubble sort distance between two arrays. To test whether an array is "almost sorted", compute its Kendall Tau against a sorted array.

長街聽風 2024-08-25 12:41:51

我建议查看 煎饼问题 和反转距离排列。这些算法通常用于查找两个排列(恒等和排列字符串)之间的距离。这种距离度量应该考虑更多的有序值簇以及反转(单调递减而不是递增子序列)。还有近似值多项式时间[PDF]

这实际上完全取决于数字的含义以及这个距离函数在您的上下文中是否有意义。

I would suggest looking at the Pancake Problem and the reversal distance of the permutations. These algorithms are often used to find the distance between two permutations (the Identity and the permuted string). This distance measure should take into account more clumps of in order values, as well as reversals (monotonically decreasing instead of increasing subsequences). There are also approximations that are polynomial time[PDF].

It really all depends on what the number means and if this distance function makes sense in your context though.

小忆控 2024-08-25 12:41:51

我有同样的问题(单调性评分),我建议你尝试最长递增子序列。最高效的算法运行时间为 O(n log n),还不错。

以问题为例,{4, 5, 6, 0, 1, 2, 3, 7, 8, 9}的最长递增序列是{0, 1, 2, 3, 7, 8, 9}(长度为 7)。也许它比运行时间最长的排序算法更好(70%)。

I have the same problem (monotonicity scoring), and I suggest you to try Longest Increasing Subsequence. The most efficient algorithm run in O(n log n), not so bad.

Taking example from the question, the longest increasing sequence of {4, 5, 6, 0, 1, 2, 3, 7, 8, 9} is {0, 1, 2, 3, 7, 8, 9} (length of 7). Maybe it rate better (70%) than your longest-sorted-run algorithm.

挽你眉间 2024-08-25 12:41:51

这在很大程度上取决于您打算使用该度量来做什么,但一种简单的方法是将数组输入标准排序算法并测量排序需要完成多少次操作(交换和/或比较)数组。

It highly depends on what you're intending to use the measure for, but one easy way to do this is to feed the array into a standard sorting algorithm and measure how many operations (swaps and/or comparisons) need to be done to sort the array.

不气馁 2024-08-25 12:41:51

使用修饰符 Ratcliff & 进行的一些实验Obershelp

>>> from difflib import SequenceMatcher as sm
>>> a = [ 4, 5, 6, 0, 1, 2, 3, 7, 8, 9 ]
>>> c = [ 0, 1, 9, 2, 8, 3, 6, 4, 7, 5 ]
>>> b = [ 4, 5, 6, 0, 1, 2, 3, 7, 8, 9 ]
>>> b.sort()
>>> s = sm(None, a, b)
>>> s.ratio()
0.69999999999999996
>>> s2 = sm(None, c, b)
>>> s2.ratio()
0.29999999999999999

就这样完成了它需要做的事情。但不太确定如何证明这一点。

Some experiments with a modifier Ratcliff & Obershelp

>>> from difflib import SequenceMatcher as sm
>>> a = [ 4, 5, 6, 0, 1, 2, 3, 7, 8, 9 ]
>>> c = [ 0, 1, 9, 2, 8, 3, 6, 4, 7, 5 ]
>>> b = [ 4, 5, 6, 0, 1, 2, 3, 7, 8, 9 ]
>>> b.sort()
>>> s = sm(None, a, b)
>>> s.ratio()
0.69999999999999996
>>> s2 = sm(None, c, b)
>>> s2.ratio()
0.29999999999999999

So kind of does what it needs to. Not too sure how to prove it though.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文