高效的字符串排序算法
通过比较对字符串进行排序(例如标准 QuickSort + strcmp 类函数)可能会有点慢,特别是对于共享公共前缀的长字符串(比较函数需要 O(s) 时间,其中 s 是字符串的长度),因此标准解的复杂度为 O(s * nlog n)。有没有已知的更快的算法?
Sorting strings by comparisons (e.g. standard QuickSort + strcmp-like function) may be a bit slow, especially for long strings sharing a common prefix (the comparison function takes O(s) time, where s is the length of string), thus a standard solution has the complexity of O(s * nlog n). Are there any known faster algorithms?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
如果您知道字符串仅包含某些字符(几乎总是如此),则可以使用 BucketSort 或 RadixSort。
If you know that the string consist only of certain characters (which is almost always the case), you can use a variant of BucketSort or RadixSort.
您可以构建一个trie,它应该是
O(s*n)
,我相信。You could build a trie, which should be
O(s*n)
, I believe.请搜索“Sedgewick多键快速排序”(Sedgewick用C和Java写了著名的算法教科书)。他的算法相对容易实现并且速度相当快。就避免了你上面说的问题。有一种声称更快的突发排序算法,但我不知道有什么实现。
有一篇文章 C# 和 F# 中的快速字符串排序 描述了算法并引用了 Sedgewick 的代码以及 C# 代码。 (披露:这是我根据 Sedgewick 的论文编写的文章和代码)。
Please search for "Sedgewick Multikey quick sort" (Sedgewick wrote famous algorithms textbooks in C and Java). His algorithm is relatively easy to implement and quite fast. It avoids the problem you are talking above. There is the burst sort algorithm which claims to be faster, but I don't know of any implementation.
There is an article Fast String Sort in C# and F# that describes the algorithm and has a reference to Sedgewick's code as well as to C# code. (disclosure: it's an article and code that I wrote based on Sedgewick's paper).
摘要
我发现了 string_sorting
由 Tommi Rantala 提供的 repo 非常全面,它包括许多已知的高效(字符串)排序算法,例如 MSD 基数排序、burstsort 和多键快速排序。此外,它们中的大多数也具有缓存效率。
我的经验
在我看来,三路基数/字符串快速排序是最快的字符串排序算法之一。另外,MSD 基数排序也是一个不错的选择。 Sedgewick 的优秀算法书中介绍了它们。
以下是对 leipzig1M.txt 进行排序的一些结果,取自 此处:
三向基数/字符串快速排序的迷人之处在于它实现起来非常简单,实际上只需要大约十行源代码。
注意:但是如果你像我一样在 Google 上搜索“最快的字符串排序算法”,很可能是 burstsort,一种缓存感知的 MSD 基数排序变体(论文)。我还找到了 Bentley 和 Sedgewick 的这篇论文很有帮助,它使用了多键快速排序。
Summary
I found the string_sorting
repo by Tommi Rantala comprehensive, it includes many known efficient (string) sorting algorithms, e.g. MSD radix sort, burstsort and multi-key-quicksort. In addition, most of them are also cache efficient.
My Experience
It appears to me three-way radix/string quicksort is one of the fastest string sorting algorithms. Also, MSD radix sort is a good one. They are introduced in Sedgewick's excellent Algorithms book.
Here are some results to sort leipzig1M.txt taken from here:
The charming thing about three-way radix/string quicksort is it is really simple to implement, effectively only about ten source lines of code.
NOTE: But if you like me searching Google for "fastest string sorting algorithm", chances are it's burstsort, a cache-aware MSD radix sort variant (paper). I also found this paper by Bentley and Sedgewick helpful, which used a Multikey Quicksort.