什么是确定性快速排序?
我一直在阅读有关快速排序的内容,发现有时它被称为“确定性快速排序”。
这是普通快速排序的替代版本吗?普通快速排序和确定性快速排序有什么区别?
I have been reading about Quicksort and found that sometimes it' s referred to as "Deterministic Quicksort".
Is this an alternate version of the normal Quicksort ? What is the difference between a normal Quicksort and a Deterministic Quicksort ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
普通(“确定性”)快速排序在特定数据集上的行为可能非常差(例如,选择第一个未排序元素的实现在已排序数据上的时间复杂度为 O(n^2))。
随机快速排序(选择随机主元,而不是确定性选择)有时用于在所有数据集上提供更好的预期性能。
The ordinary ("deterministic") Quicksort can have very poor behaviour on particular datasets (as an example, an implementation that picks the first unsorted element has O(n^2) time complexity on already-sorted data).
Randomized Quicksort (which selects a random pivot, rather than choosing deterministically) is sometimes used to give better expected performance over all datasets.
快速排序的预期/平均时间为
O(n log n)
,但最坏情况为O(n^2)
。如果选择的主元始终是最小值或最大值,就会发生这种情况。理想情况下,您希望选择中位数作为基准。如果直接找到中位数的成本太高(如果您尝试使用快速排序,通常就是这种情况),那么通常所做的就是取三个潜在主元元素的中位数,或者只是选择一个随机元素作为主元。
由于主元选择过程固有的随机性,后一种方法使得快速排序具有不确定性。
Quicksort runs in
O(n log n)
expected/average time, butO(n^2)
worst case. This occurs if the pivot chosen is consistently either the minimum or the maximum.Ideally, you want to select the median as your pivot. If finding the median directly is too costly (usually this is the case if you're trying to use quicksort), what's commonly done instead is to either take the median of three potential pivot elements, or else just pick a random element as your pivot.
The latter method renders quicksort nondeterministic because of the randomness inherent to the pivot selection process.
一般来说,如果排序算法每次都以完全相同的顺序对元素进行一致的排序,则该算法是“确定性的”。给定一组要按 id (asc) 排序的记录:
然后排序算法可以返回 Censu、Cikk、Marju、Lonzu 或 Censu、Cikku、Lonzu、Marju 作为正确排序。确定性排序是一种始终返回相同顺序的排序。情况并不总是如此。在快速排序的情况下,如果随机选择主元(理想情况下您会选择中位数,但这可能会很昂贵),则可以获得更快的平均性能。然而,这是有代价的:您的搜索不再是确定性的。
In general, a sort algorithm is "deterministic" if it consistently sorts the elements in the exact same order every time. Given a set of records to sort on id (asc):
then a sorting algorithm could return both Censu, Cikk, Marju, Lonzu, or Censu, Cikku, Lonzu, Marju, as correct sortings. A deterministic sort is one which always returns the same ordering. This needn't always be the case. In the case of quicksort, one can get faster average performance if pivots are chosen randomly (ideally you would choose the median, but this can be costly). However, this comes at a cost: your search is no longer deterministic.
您的来源可以(并且应该)给出自己的定义,但通常确定性快速排序是通过不依赖于随机数的公式选择主元的排序。例如,始终选择中间元素或始终选择第一个元素,或类似的内容。这意味着无论您在相同的输入上运行它多少次,它的性能将始终是相同的(理论上无论如何,尽管实际上差异不应该太大)。随机快速排序意味着您在选择主元时使用随机数,这意味着无法(轻松)预测同一输入上不同运行的性能。
Your source can (and should) give its own definition, but generally a deterministic quicksort is one where the pivot is chosen through a formula that doesn't depend on random numbers. For example, always pick the middle element or always the first, or something like this. This means that its performance will always be the same (in theory anyway, although in practice the difference shouldn't be too big) no matter how many times you run it on the same input. A randomised quicksort means that you are using random numbers when choosing the pivot, meaning the performance cannot be (easily) predicted for different runs on the same input.
它与分区有关(或来自快速排序中使用的著名的分而治之的划分步骤)。如果每次都以最后一个(或第一个元素或任意位置的元素,只是每次划分数据集时必须是相同的位置)作为基准进行分区,那么它就是确定性快速排序。如果枢轴是随机选取的,那么它就是随机快速排序。
这是讲义,其中包含它穿过。
我希望它能帮助
欢呼
It has to do with the partitioning (or the divide step from the famed Divide and Conquer which is used in Quick sort). If every time the last (or first element or element at any position, just that it has to be the same position every time the data set is divided) is used as the pivot to partition then it is Deterministic Quick sort. If the pivot is picked at random then it is Randomized quick sort.
Here is a lecture note which puts it across.
I hope it helps
cheers
快速排序前面的常见形容词是确定性的和随机性的。确定性意味着快速排序将始终以相同的方式对同一组数据进行排序,而随机快速排序使用随机化,并且很少以相同的精确方式对相同的数据进行排序(除非数据集非常小 - 那么它更常见) 。
确定性
这取决于如何选择枢轴。在确定性快速排序中,通过始终选择相同相对索引处的主元(例如第一个、最后一个或中间元素)或通过使用任意数量的预定元素选择的中值来选择主元。例如,一种常见的方法是选择第一个、最后一个和中间元素的中位数作为基准。即使使用我刚刚描述的 3 中位数方法,某些数据集也可以轻松给出 O(N^2) 时间复杂度。一个示例数据集是所谓的风琴管数据集:
随机
随机快速排序可以仅选择一个随机主元或使用一些随机选择的主元的中值。仍然存在 O(N^2) 时间复杂度的可能性,但概率要小得多,并且随着数据集大小的增加而变得更小。
Common adjectives in front of quicksort are deterministic and randomized. Deterministic means that the quicksort will always sort the same set of data in the same fashion while a randomized quicksort uses randomization and will rarely sort the same data in the same exact fashion (unless the data set is very small - then it is more common).
Deterministic
It comes down to how the pivots are chosen. In a deterministic quicksort, the pivots are chosen by either always choosing the pivot at the same relative index such as the first, last, or middle element or by using the median of any number of predetermined element choices. For instance, a common method is to choose the median of first, last, and middle elements as the pivot. Even with the median-of-3 method I just described, certain datasets can easily give O(N^2) time complexity. An example dataset is the so-called organ pipes set of data:
Randomized
Randomizated quicksorts can choose just a random pivot or use the median of some number of randomly chosen pivots. There is still the possibility of O(N^2) time complexity, but the probability is much, much smaller and becomes smaller with increasing dataset size.
除了许多其他人已经告诉过您如何实现确定性快速排序和非确定性快速排序之外,我相信这种排序的一个更重要的方面是,对于确定性快速排序,当键冲突时,记录的顺序始终相同,而使用非确定性快速排序时,每次运行排序时,此类记录的顺序可能不同。
我想当你有非唯一的键时,你不应该使用非确定性快速排序。
Besides what many others have already told you about how a deterministic quick sort is implemented and a non-deterministic one is, I believe one, much more important aspect of such sort, is that, with deterministic quicksort, you always have the same order of records when keys clash, while with non-deterministic quicksorts, the order of such records can be different each time you run the sort.
I guess you should not use non-deterministic quicksorting when you have non-unique keys.