什么是排序算法的稳定性以及为什么它很重要?
我很好奇,为什么稳定性在排序算法中重要或不重要?
I'm very curious, why stability is or is not important in sorting algorithms?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
我很好奇,为什么稳定性在排序算法中重要或不重要?
I'm very curious, why stability is or is not important in sorting algorithms?
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
接受
或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
发布评论
评论(10)
如果具有相同键的两个对象在排序输出中出现的顺序与它们在要排序的输入数组中出现的顺序相同,则称排序算法是稳定。有些排序算法本质上是稳定的,如插入排序、合并排序、冒泡排序等。而有些排序算法则不然,如堆排序、快速排序等。
背景:“稳定”排序算法保持具有相同排序键的项目按顺序排列。假设我们有一个包含 5 个字母的单词列表:
如果我们仅按每个单词的第一个字母对列表进行排序,则稳定排序将产生:
在不稳定排序算法中,
straw< /code> 或
spork
可以互换,但在稳定的情况下,它们保持相同的相对位置(也就是说,因为straw
出现在spork
之前)在输入中,它也出现在spork
之前)。我们可以使用此算法对单词列表进行排序:按第 5 列、然后是 4、然后是 3、然后是 2、然后是 1 进行稳定排序。
最后,它会被正确排序。说服自己这一点。 (顺便说一句,该算法称为基数排序)
现在回答您的问题,假设我们有一个名字和姓氏的列表。我们被要求“按姓氏,然后按名字”排序。我们可以首先按名字排序(稳定或不稳定),然后按姓氏进行稳定排序。经过这些排序后,列表主要按姓氏排序。但是,如果姓氏相同,则按名字排序。
你不能以同样的方式堆叠不稳定的排序。
A sorting algorithm is said to be stable if two objects with equal keys appear in the same order in sorted output as they appear in the input array to be sorted. Some sorting algorithms are stable by nature like Insertion sort, Merge Sort, Bubble Sort, etc. And some sorting algorithms are not, like Heap Sort, Quick Sort, etc.
Background: a "stable" sorting algorithm keeps the items with the same sorting key in order. Suppose we have a list of 5-letter words:
If we sort the list by just the first letter of each word then a stable-sort would produce:
In an unstable sort algorithm,
straw
orspork
may be interchanged, but in a stable one, they stay in the same relative positions (that is, sincestraw
appears beforespork
in the input, it also appears beforespork
in the output).We could sort the list of words using this algorithm: stable sorting by column 5, then 4, then 3, then 2, then 1.
In the end, it will be correctly sorted. Convince yourself of that. (by the way, that algorithm is called radix sort)
Now to answer your question, suppose we have a list of first and last names. We are asked to sort "by last name, then by first". We could first sort (stable or unstable) by the first name, then stable sort by the last name. After these sorts, the list is primarily sorted by the last name. However, where last names are the same, the first names are sorted.
You can't stack unstable sorts in the same fashion.
稳定的排序算法是按照相同元素在输入中出现的顺序对相同元素进行排序的算法,而不稳定的排序可能无法满足这种情况。 - 我感谢我的算法讲师 Didem Gozupek 提供了对算法的见解。
由于一些反馈称有些人不明白演示文稿的逻辑,我再次需要编辑问题。 它说明了对第一个元素进行排序。另一方面,您也可以考虑由键值对组成的插图。
稳定的排序算法:
不稳定排序算法:
A stable sorting algorithm is the one that sorts the identical elements in their same order as they appear in the input, whilst unstable sorting may not satisfy the case. - I thank my algorithm lecturer Didem Gozupek to have provided insight into algorithms.
I again needed to edit the question due to some feedback that some people don't get the logic of the presentation. It illustrates sorting w.r.t. first elements. On the other hand, you can either consider the illustration consisting of key-value pairs.
Stable Sorting Algorithms:
Unstable Sorting Algorithms:
排序稳定性是指具有相同键的记录在排序前后保持其相对顺序。
因此,当且仅当您要解决的问题需要保留相对顺序时,稳定性才重要。
如果您不需要稳定性,您可以使用库中的快速、内存消耗算法,例如堆排序或快速排序,然后就不用管它了。
如果你需要稳定性,那就更复杂了。稳定的算法比不稳定的算法具有更高的大 O CPU 和/或内存使用率。因此,当您拥有大型数据集时,您必须在 CPU 和内存之间做出选择。如果 CPU 和内存都受到限制,就会遇到问题。一个好的折衷稳定算法是二叉树排序; Wikipedia 文章 有一个基于 STL 的极其简单的 C++ 实现。
通过将原始记录号添加为每条记录的最后一位键,可以将不稳定的算法变成稳定的算法。
Sorting stability means that records with the same key retain their relative order before and after the sort.
So stability matters if, and only if, the problem you're solving requires retention of that relative order.
If you don't need stability, you can use a fast, memory-sipping algorithm from a library, like heapsort or quicksort, and forget about it.
If you need stability, it's more complicated. Stable algorithms have higher big-O CPU and/or memory usage than unstable algorithms. So when you have a large data set, you have to pick between beating up the CPU or the memory. If you're constrained on both CPU and memory, you have a problem. A good compromise stable algorithm is a binary tree sort; the Wikipedia article has a pathetically easy C++ implementation based on the STL.
You can make an unstable algorithm into a stable one by adding the original record number as the last-place key for each record.
这取决于你做什么。
想象一下,您有一些带有名字和姓氏字段的人员记录。首先,您按名字对列表进行排序。如果您随后使用稳定算法按姓氏对列表进行排序,您将得到一个按名字和姓氏排序的列表。
It depends on what you do.
Imagine you've got some people records with a first and a last name field. First you sort the list by first name. If you then sort the list with a stable algorithm by last name, you'll have a list sorted by first name AND last name.
稳定性如此重要有几个原因。一是,如果不需要通过交换两条记录来交换它们,则可能会导致内存更新,页面被标记为脏,并且需要重新写入磁盘(或其他慢速介质)。
There's a few reasons why stability can be important. One is that, if two records don't need to be swapped by swapping them you can cause a memory update, a page is marked dirty, and needs to be re-written to disk (or another slow medium).
如果具有相同键的两个对象在排序输出中出现的顺序与它们在输入未排序数组中出现的顺序相同,则称排序算法是稳定的。有些排序算法本质上是稳定的,如插入排序、合并排序、冒泡排序等。而有些排序算法则不然,如堆排序、快速排序等。
但是,任何不稳定的给定排序算法都可以修改为稳定的。可以有特定于排序算法的方法来使其稳定,但一般来说,任何本质上不稳定的基于比较的排序算法都可以通过更改键比较操作来修改为稳定,以便两个键的比较将位置视为一个具有相同键的对象的因子。
参考:
http://www.math.uic.edu/ 〜leon/cs-mcs401-s08/handouts/stability.pdf
http://en.wikipedia.org/wiki/Sorting_algorithm#Stability
A sorting algorithm is said to be stable if two objects with equal keys appear in the same order in sorted output as they appear in the input unsorted array. Some sorting algorithms are stable by nature like Insertion sort, Merge Sort, Bubble Sort, etc. And some sorting algorithms are not, like Heap Sort, Quick Sort, etc.
However, any given sorting algo which is not stable can be modified to be stable. There can be sorting algo specific ways to make it stable, but in general, any comparison based sorting algorithm which is not stable by nature can be modified to be stable by changing the key comparison operation so that the comparison of two keys considers position as a factor for objects with equal keys.
References:
http://www.math.uic.edu/~leon/cs-mcs401-s08/handouts/stability.pdf
http://en.wikipedia.org/wiki/Sorting_algorithm#Stability
我知道对此有很多答案,但对我来说,这个答案,作者:Robert Harvey,总结得更清楚:
来源
I know there are many answers for this, but to me, this answer, by Robert Harvey, summarized it much more clearly:
Source
更多关于需要稳定排序的原因的例子。数据库是一个常见的例子。以交易数据库为例,其中包含姓|名、购买日期|时间、商品编号、价格。假设数据库通常按日期|时间排序。然后进行查询以按姓氏|名字制作数据库的排序副本,因为稳定的排序保留原始顺序,即使查询比较仅涉及姓氏|名字,每个姓氏|名字的事务将按数据|时间顺序排列。
一个类似的例子是经典的 Excel,它限制一次只能排序 3 列。要对 6 列进行排序,先对最不重要的 3 列进行排序,然后对最重要的 3 列进行排序。
稳定基数排序的一个典型示例是卡片排序器,用于按以 10 为基数的数字列的字段进行排序。卡片按从最低有效数字到最高有效数字排序。每次通过时,都会读取一副纸牌,并根据该列中的数字将其分为 10 个不同的纸牌。然后将 10 个卡片箱按顺序放回输入料斗(“0”张卡片先,“9”张卡片最后)。然后下一列完成另一遍,直到所有列都排序完毕。实际的卡片分类机有超过 10 个仓,因为一张卡上有 12 个区域,一列可能是空白的,并且存在误读仓。要对字母进行排序,每列需要 2 遍,第 1 遍用于数字,第 2 遍用于 12 11 区域。
后来(1937年)出现了卡片整理(合并)机,可以通过比较字段来合并两副卡片。输入是两副已经排序的牌,一副主牌和一副更新牌。整理者将两副牌合并成一个新的母版箱和一个存档箱,该箱体可以选择用于主版副本,以便新的主版箱只有在出现重复项时才会有更新卡。这可能是原始(自下而上)合并排序背后的思想的基础。
Some more examples of the reason for wanting stable sorts. Databases are a common example. Take the case of a transaction data base than includes last|first name, date|time of purchase, item number, price. Say the data base is normally sorted by date|time. Then a query is made to make a sorted copy of the data base by last|first name, since a stable sort preserves the original order, even though the inquiry compare only involves last|first name, the transactions for each last|first name will be in data|time order.
A similar example is classic Excel, which limited sorts to 3 columns at a time. To sort 6 columns, a sort is done with the least significant 3 columns, followed by a sort with the most significant 3 columns.
A classic example of a stable radix sort is a card sorter, used to sort by a field of base 10 numeric columns. The cards are sorted from least significant digit to most significant digit. On each pass, a deck of cards is read and separated into 10 different bins according to the digit in that column. Then the 10 bins of cards are put back into the input hopper in order ("0" cards first, "9" cards last). Then another pass is done by the next column, until all columns are sorted. Actual card sorters have more than 10 bins since there are 12 zones on a card, a column can be blank, and there is a mis-read bin. To sort letters, 2 passes per column are needed, 1st pass for digit, 2nd pass for the 12 11 zone.
Later (1937) there were card collating (merging) machines that could merge two decks of cards by comparing fields. The input was two already sorted decks of cards, a master deck and an update deck. The collator merged the two decks into a a new mater bin and an archive bin, which was optionally used for master duplicates so that the new master bin would only have update cards in case of duplicates. This was probably the basis for the idea behind the original (bottom up) merge sort.
如果您假设您要排序的只是数字,并且只有它们的值可以识别/区分它们(例如具有相同值的元素是相同的),那么排序的稳定性问题是没有意义的。
然而,排序时具有相同优先级的对象可能是不同的,有时它们的相对顺序是有意义的信息。在这种情况下,不稳定的排序会产生问题。
例如,您有一个数据列表,其中包含游戏中所有玩家清理级别为 [L] 的迷宫的时间成本 [T]。
假设我们需要根据玩家清理迷宫的速度对他们进行排名。然而,还有一条附加规则:无论花费多长时间,清理迷宫等级较高的玩家总是拥有较高的等级。
当然,您可以尝试使用某种遵循规则的算法将配对值 [T,L] 映射到实数 [R],然后用 [R] 值对所有玩家进行排名。
然而,如果稳定排序是可行的,那么您可以简单地按 [T](速度较快的玩家优先)然后按 [L] 对整个列表进行排序。在这种情况下,按照玩家清理的迷宫级别对玩家进行分组后,玩家的相对顺序(按时间成本)不会改变。
PS:当然,两次排序的方法并不是解决特定问题的最佳解决方案,但对于解释海报的问题来说应该足够了。
If you assume what you are sorting are just numbers and only their values identify/distinguish them (e.g. elements with same value are identicle), then the stability-issue of sorting is meaningless.
However, objects with same priority in sorting may be distinct, and sometime their relative order is meaningful information. In this case, unstable sort generates problems.
For example, you have a list of data which contains the time cost [T] of all players to clean a maze with Level [L] in a game.
Suppose we need to rank the players by how fast they clean the maze. However, an additional rule applies: players who clean the maze with higher-level always have a higher rank, no matter how long the time cost is.
Of course you might try to map the paired value [T,L] to a real number [R] with some algorithm which follows the rules and then rank all players with [R] value.
However, if stable sorting is feasible, then you may simply sort the entire list by [T] (Faster players first) and then by [L]. In this case, the relative order of players (by time cost) will not be changed after you grouped them by level of maze they cleaned.
PS: of course the approach to sort twice is not the best solution to the particular problem but to explain the question of poster it should be enough.
稳定排序将始终在相同输入上返回相同的解决方案(排列)。
例如,[2,1,2] 将使用稳定排序作为排列 [2,1,3] 进行排序(排序输出中首先是索引 2,然后是索引 1,然后是索引 3)这意味着输出始终以相同的方式进行洗牌。其他不稳定但仍然正确的排列是[2,3,1]。
快速排序不是稳定排序,相同元素之间的排列差异取决于选取枢轴的算法。一些实现是随机选择的,这可以使用相同的算法对相同的输入进行快速排序,从而产生不同的排列。
稳定的排序算法必须是确定性的。
Stable sort will always return same solution (permutation) on same input.
For instance [2,1,2] will be sorted using stable sort as permutation [2,1,3] (first is index 2, then index 1 then index 3 in sorted output) That mean that output is always shuffled same way. Other non stable, but still correct permutation is [2,3,1].
Quick sort is not stable sort and permutation differences among same elements depends on algorithm for picking pivot. Some implementations pick up at random and that can make quick sort yielding different permutations on same input using same algorithm.
Stable sort algorithm is necessary deterministic.