用于评估数组单调性的算法（即判断数组的“排序性”）

发布于 2024-08-18 12:41:51 字数 1012 浏览 8 评论 0原文

编辑：哇，很多很棒的回复。是的，我使用它作为适应度函数来判断遗传算法执行的排序的质量。因此，评估成本很重要（即，它必须很快，最好是O(n)。）

作为我正在使用的人工智能应用程序的一部分，我希望能够根据候选整数数组的单调性（也称为“排序性”）对其进行评分。目前，我正在使用一种启发式方法来计算最长的排序运行，然后将其除以数组的长度：

public double monotonicity(int[] array) {
    if (array.length == 0) return 1d;

    int longestRun = longestSortedRun(array);
    return (double) longestRun / (double) array.length;
}

public int longestSortedRun(int[] array) {

    if (array.length == 0) return 0;

    int longestRun = 1;
    int currentRun = 1;

    for (int i = 1; i < array.length; i++) {
        if (array[i] >= array[i - 1]) {
            currentRun++;
        } else {
            currentRun = 1;
        }

        if (currentRun > longestRun) longestRun = currentRun;
    }

    return longestRun;
}

这是一个好的开始，但它没有考虑到可能存在“团块”的可能性排序的子序列。例如：

{ 4, 5, 6, 0, 1, 2, 3, 7, 8, 9}

该数组被划分为三个已排序的子序列。我的算法将其评为仅 40% 排序，但直观上，它应该获得比这更高的分数。这类事情有标准算法吗？

原文

EDIT: Wow, many great responses. Yes, I am using this as a fitness function for judging the quality of a sort performed by a genetic algorithm. So cost-of-evaluation is important (i.e., it has to be fast, preferably O(n).)

As part of an AI application I am toying with, I'd like to be able to rate a candidate array of integers based on its monotonicity, aka its "sortedness". At the moment, I'm using a heuristic that calculates the longest sorted run, and then divides that by the length of the array:

public double monotonicity(int[] array) {
    if (array.length == 0) return 1d;

    int longestRun = longestSortedRun(array);
    return (double) longestRun / (double) array.length;
}

public int longestSortedRun(int[] array) {

    if (array.length == 0) return 0;

    int longestRun = 1;
    int currentRun = 1;

    for (int i = 1; i < array.length; i++) {
        if (array[i] >= array[i - 1]) {
            currentRun++;
        } else {
            currentRun = 1;
        }

        if (currentRun > longestRun) longestRun = currentRun;
    }

    return longestRun;
}

This is a good start, but it fails to take into account the possibility that there may be "clumps" of sorted sub-sequences. E.g.:

{ 4, 5, 6, 0, 1, 2, 3, 7, 8, 9}

This array is partitioned into three sorted sub-sequences. My algorithm will rate it as only 40% sorted, but intuitively, it should get a higher score than that. Is there a standard algorithm for this sort of thing?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

素食主义者 2024-08-25 12:41:52

计算值增加的步数与总步数怎么样。那是O(n)。

回复收藏 0 原文

七月上 2024-08-25 12:41:51

这似乎是 ~~Levenshtein~~ Damerau–Levenshtein 距离 - 对数组进行排序所需的交换次数。这应该与每个项目距其在排序数组中的位置的距离成正比。

这是一个简单的 ruby 算法，用于对距离的平方求和。这似乎是排序性的一个很好的衡量标准——每次交换两个无序元素时，结果都会变小。

ap = a.sort
sum = 0
a.each_index{|i| j = ap.index(a[i])-i 
  sum += (j*j)
}
dist = sum/(a.size*a.size)

This seems like a good candidate for ~~Levenshtein~~ Damerau–Levenshtein distance - the number of swaps needed to sort the array. This should be proportional to how far each item is from where it should be in a sorted array.

Here's a simple ruby algorithm that sums the squares of the distances. It seems a good measure of sortedness - the result gets smaller every time two out-of-order elements are swapped.

ap = a.sort
sum = 0
a.each_index{|i| j = ap.index(a[i])-i 
  sum += (j*j)
}
dist = sum/(a.size*a.size)

回复收藏 0 原文

水染的天色ゝ 2024-08-25 12:41:51

我认为要使用的函数的选择在很大程度上取决于您打算使用它的用途。根据您的问题，我猜测您正在使用遗传系统来创建排序程序，这将是排名功能。如果是这样的话，那么执行速度就至关重要。基于此，我敢打赌你的最长排序子序列算法会工作得很好。听起来它应该很好地定义健身。

回复收藏 0 原文

明天过后 2024-08-25 12:41:51

像这样的东西？ http://en.wikipedia.org/wiki/Rank_correlation

回复收藏 0 原文

初雪 2024-08-25 12:41:51

这是我刚编的一个。

对于每对相邻值，计算它们之间的数值差。如果第二个大于或等于第一个，则将其添加到已排序总计中，否则添加到未排序总计中。完成后，计算两者的比例。

回复收藏 0 原文

白昼 2024-08-25 12:41:51

计算所有已排序子序列的长度，然后将它们平方并相加。
如果你想校准最大的强调程度，请使用不同于 2 的幂。

我不确定按长度标准化此值的最佳方法是什么，也许可以将其除以长度的平方？

回复收藏 0 原文

耶耶耶 2024-08-25 12:41:51

您可能正在寻找的是 Kendall Tau。它是两个数组之间冒泡排序距离的一对一函数。要测试数组是否“几乎排序”，请根据排序数组计算其 Kendall Tau。

回复收藏 0 原文

長街聽風 2024-08-25 12:41:51

我建议查看煎饼问题和反转距离排列。这些算法通常用于查找两个排列（恒等和排列字符串）之间的距离。这种距离度量应该考虑更多的有序值簇以及反转（单调递减而不是递增子序列）。还有近似值多项式时间[PDF]。

这实际上完全取决于数字的含义以及这个距离函数在您的上下文中是否有意义。

回复收藏 0 原文

小忆控 2024-08-25 12:41:51

我有同样的问题（单调性评分），我建议你尝试最长递增子序列。最高效的算法运行时间为 O(n log n)，还不错。

以问题为例，{4, 5, 6, 0, 1, 2, 3, 7, 8, 9}的最长递增序列是{0, 1, 2, 3, 7, 8, 9}（长度为 7）。也许它比运行时间最长的排序算法更好（70%）。

回复收藏 0 原文

挽你眉间 2024-08-25 12:41:51

这在很大程度上取决于您打算使用该度量来做什么，但一种简单的方法是将数组输入标准排序算法并测量排序需要完成多少次操作（交换和/或比较）数组。

回复收藏 0 原文

不气馁 2024-08-25 12:41:51

使用修饰符 Ratcliff & 进行的一些实验Obershelp

>>> from difflib import SequenceMatcher as sm
>>> a = [ 4, 5, 6, 0, 1, 2, 3, 7, 8, 9 ]
>>> c = [ 0, 1, 9, 2, 8, 3, 6, 4, 7, 5 ]
>>> b = [ 4, 5, 6, 0, 1, 2, 3, 7, 8, 9 ]
>>> b.sort()
>>> s = sm(None, a, b)
>>> s.ratio()
0.69999999999999996
>>> s2 = sm(None, c, b)
>>> s2.ratio()
0.29999999999999999

就这样完成了它需要做的事情。但不太确定如何证明这一点。

Some experiments with a modifier Ratcliff & Obershelp

>>> from difflib import SequenceMatcher as sm
>>> a = [ 4, 5, 6, 0, 1, 2, 3, 7, 8, 9 ]
>>> c = [ 0, 1, 9, 2, 8, 3, 6, 4, 7, 5 ]
>>> b = [ 4, 5, 6, 0, 1, 2, 3, 7, 8, 9 ]
>>> b.sort()
>>> s = sm(None, a, b)
>>> s.ratio()
0.69999999999999996
>>> s2 = sm(None, c, b)
>>> s2.ratio()
0.29999999999999999

So kind of does what it needs to. Not too sure how to prove it though.

回复收藏 0 原文

~没有更多了~