C# 中的自然排序顺序
任何人都有好的资源或提供 C# 中 FileInfo
数组的自然顺序排序的示例吗? 我正在按我的方式实现 IComparer
接口。
Anyone have a good resource or provide a sample of a natural order sort in C# for an FileInfo
array? I am implementing the IComparer
interface in my sorts.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(20)
我们需要一种自然的排序来处理具有以下模式的文本:
出于某种原因,当我第一次查看 SO 时,我没有找到这篇文章并实现了我们自己的。 与这里提出的一些解决方案相比,虽然概念相似,但它的优点是更简单、更容易理解。 然而,虽然我确实尝试查看性能瓶颈,但它的实现仍然比默认的
OrderBy()
慢得多。这是我实现的扩展方法:
想法是将原始字符串拆分为数字和非数字块(
“\d+|\D+”
)。 由于这是一项潜在昂贵的任务,因此每个条目仅执行一次。 然后我们使用可比较对象的比较器(抱歉,我找不到更合适的方式来表达它)。 它将每个块与另一个字符串中相应的块进行比较。我希望得到有关如何改进以及主要缺陷是什么的反馈。 请注意,此时可维护性对我们来说很重要,并且我们目前没有在非常大的数据集中使用它。
We had a need for a natural sort to deal with text with the following pattern:
For some reason when I first looked on SO, I didn't find this post and implemented our own. Compared to some of the solutions presented here, while similar in concept, it could have the benefit of maybe being simpler and easier to understand. However, while I did try to look at performance bottlenecks, It is still a much slower implementation than the default
OrderBy()
.Here is the extension method I implement:
The idea is to split the original strings into blocks of digits and non-digits (
"\d+|\D+"
). Since this is a potentially expensive task, it is done only once per entry. We then use a comparer of comparable objects (sorry, I can't find a more proper way to say it). It compares each block to its corresponding block in the other string.I would like feedback on how this could be improved and what the major flaws are. Note that maintainability is important to us at this point and we are not currently using this in extremely large data sets.
如果您想按名称(自然)排序并按照数字正确排序
If you want to order by name (natural) with correct order in terms of numeric
尚未提及的另一个良好的自然排序资源是 NuGet NaturalSort.Extension 包。 它得到积极维护,在撰写本文时下载量已超过一百万次。 源代码可在 GitHub 中获取。
Another good natural-sort resource not yet mentioned is the NuGet NaturalSort.Extension package. It's actively maintained and has over one million downloads at the time of this writing. The source code is available in GitHub.
让我解释一下我的问题以及我如何解决它。
问题:- 根据从目录检索的 FileInfo 对象中的 FileName 对文件进行排序。
解决方案:- 我从 FileInfo 中选择了文件名并修剪了文件名的“.png”部分。 现在,只需执行 List.Sort(),它按自然排序顺序对文件名进行排序。 根据我的测试,我发现 .png 会打乱排序顺序。 看看下面的代码
Let me explain my problem and how i was able to solve it.
Problem:- Sort files based on FileName from FileInfo objects which are retrieved from a Directory.
Solution:- I selected the file names from FileInfo and trimed the ".png" part of the file name. Now, just do List.Sort(), which sorts the filenames in Natural sorting order. Based on my testing i found that having .png messes up sorting order. Have a look at the below code
最简单的事情就是 P/Invoke Windows 中的内置函数,并将其用作
IComparer
中的比较函数:Michael Kaplan 有一些 此处提供了此功能如何工作的示例,以及为使其工作更直观而对 Vista 所做的更改。 此功能的优点是,它将具有与其运行的 Windows 版本相同的行为,但这确实意味着它在 Windows 版本之间有所不同,因此您需要考虑这对您来说是否是一个问题。
所以一个完整的实现应该是这样的:
The easiest thing to do is just P/Invoke the built-in function in Windows, and use it as the comparison function in your
IComparer
:Michael Kaplan has some examples of how this function works here, and the changes that were made for Vista to make it work more intuitively. The plus side of this function is that it will have the same behaviour as the version of Windows it runs on, however this does mean that it differs between versions of Windows so you need to consider whether this is a problem for you.
So a complete implementation would be something like:
只是想我会添加到此(使用我能找到的最简洁的解决方案):
上面将字符串中的任何数字填充到所有字符串中所有数字的最大长度,并使用结果字符串进行排序。
转换为 (
int?
) 是为了允许不带任何数字的字符串集合(空枚举上的.Max()
会抛出InvalidOperationException
)。Just thought I'd add to this (with the most concise solution I could find):
The above pads any numbers in the string to the max length of all numbers in all strings and uses the resulting string to sort.
The cast to (
int?
) is to allow for collections of strings without any numbers (.Max()
on an empty enumerable throws anInvalidOperationException
).现有的实现看起来都不是很好,所以我编写了自己的实现。 结果与现代版本的 Windows 资源管理器 (Windows 7/8) 使用的排序几乎相同。 我看到的唯一区别是 1) 尽管 Windows 过去(例如 XP)可以处理任意长度的数字,但现在仅限于 19 位数字 - 我的是无限的,2)Windows 给出的结果与某些 Unicode 数字集不一致 - 我的工作很好(尽管它不会在数字上比较代理项对中的数字;Windows 也不会),并且 3) 我的无法区分不同类型的非主要排序权重,如果它们出现在不同的部分中(例如“e-1é”与“ é1e-" - 数字之前和之后的部分有变音符号和标点符号重量差异)。
签名与
Comparison
委托匹配:这是一个用作
IComparer
的包装类:示例:
这是我用于测试的一组很好的文件名:
None of the existing implementations looked great so I wrote my own. The results are almost identical to the sorting used by modern versions of Windows Explorer (Windows 7/8). The only differences I've seen are 1) although Windows used to (e.g. XP) handle numbers of any length, it's now limited to 19 digits - mine is unlimited, 2) Windows gives inconsistent results with certain sets of Unicode digits - mine works fine (although it doesn't numerically compare digits from surrogate pairs; nor does Windows), and 3) mine can't distinguish different types of non-primary sort weights if they occur in different sections (e.g. "e-1é" vs "é1e-" - the sections before and after the number have diacritic and punctuation weight differences).
The signature matches the
Comparison<string>
delegate:Here's a wrapper class for use as
IComparer<string>
:Example:
Here's a good set of filenames I use for testing:
马修斯霍斯利的答案是最快的方法,它不会根据程序运行的 Windows 版本而改变行为。 但是,通过创建一次正则表达式并使用 RegexOptions.Compiled 甚至可以更快。 我还添加了插入字符串比较器的选项,以便您可以在需要时忽略大小写,并稍微提高了可读性。
使用方式
对 100,000 个字符串进行排序需要 450 毫秒,而默认的 .net 字符串比较需要 300 毫秒 - 相当快!
Matthews Horsleys answer is the fastest method which doesn't change behaviour depending on which version of windows your program is running on. However, it can be even faster by creating the regex once, and using RegexOptions.Compiled. I also added the option of inserting a string comparer so you can ignore case if needed, and improved readability a bit.
Use by
This takes 450ms to sort 100,000 strings compared to 300ms for the default .net string comparison - pretty fast!
linq orderby 的纯 C# 解决方案:
http:// Zootfroot.blogspot.com/2009/09/natural-sort-compare-with-linq-orderby.html
Pure C# solution for linq orderby:
http://zootfroot.blogspot.com/2009/09/natural-sort-compare-with-linq-orderby.html
我的解决方案:
结果:
My solution:
Results:
这是 .NET Core 2.1+ / .NET 5.0+ 的版本,使用跨度来避免分配
Here's a version for .NET Core 2.1+ / .NET 5.0+, using spans to avoid allocations
你确实需要小心——我隐约记得读到过,StrCmpLogicalW 或类似的东西不是严格传递的,而且我观察到,如果比较函数违反了该规则,.NET 的排序方法有时会陷入无限循环。
传递比较总是会报告 a < 如果a<c b 且 b < C。 存在一个进行自然排序比较的函数,该函数并不总是满足该标准,但我不记得它是 StrCmpLogicalW 还是其他东西。
You do need to be careful -- I vaguely recall reading that StrCmpLogicalW, or something like it, was not strictly transitive, and I have observed .NET's sort methods to sometimes get stuck in infinite loops if the comparison function breaks that rule.
A transitive comparison will always report that a < c if a < b and b < c. There exists a function that does a natural sort order comparison that does not always meet that criterion, but I can't recall whether it is StrCmpLogicalW or something else.
这是我对同时包含字母和数字字符的字符串进行排序的代码。
首先,这个扩展方法:
然后,只需在代码中的任何地方使用它,如下所示:
它是如何工作的? 通过用零替换:
适用于多个数字:
希望这会有所帮助。
This is my code to sort a string having both alpha and numeric characters.
First, this extension method:
Then, simply use it anywhere in your code like this:
How does it works ? By replaceing with zeros:
Works with multiples numbers:
Hope that's will help.
这是一个相对简单的示例,它不使用 P/Invoke 并避免在执行期间进行任何分配。
请随意使用此处的代码,或者如果更简单,可以使用 NuGet 包:
https://www.nuget .org/packages/NaturalSort
https://github.com/drewnoakes/natural-sort
它不会忽略前导零,因此
01
位于2
之后。对应的单元测试:
Here's a relatively simple example that doesn't use P/Invoke and avoids any allocation during execution.
Feel free to use the code from here, or if it's easier there's a NuGet package:
https://www.nuget.org/packages/NaturalSort
https://github.com/drewnoakes/natural-sort
It doesn't ignore leading zeroes, so
01
comes after2
.Corresponding unit test:
添加到 Greg Beech 的答案(因为我一直在寻找),如果您想从 Linq 使用它,您可以使用带有
IComparer
的OrderBy
。 例如:Adding to Greg Beech's answer (because I've just been searching for that), if you want to use this from Linq you can use the
OrderBy
that takes anIComparer
. E.g.:实际上,我已将其实现为 StringComparer 上的扩展方法,以便您可以执行以下操作:
StringComparer.CurrentCulture.WithNaturalSort()
或StringComparer.OrdinalIgnoreCase。 WithNaturalSort()。
生成的
IComparer
可用于所有位置,例如OrderBy
、OrderByDescending
、ThenBy
、ThenByDescending
、SortedSet
等。您仍然可以轻松调整区分大小写、区域性等。该实现相当简单,即使在大型序列上也应该表现良好。
我还将其发布为一个小型 NuGet 包,所以你可以这样做:
代码包括 XML 文档注释和 测试套件可在 NaturalSort.Extension GitHub 存储库。
整个代码是这样的(如果你还不能使用C#7,只需安装NuGet包):
I've actually implemented it as an extension method on the
StringComparer
so that you could do for example:StringComparer.CurrentCulture.WithNaturalSort()
orStringComparer.OrdinalIgnoreCase.WithNaturalSort()
.The resulting
IComparer<string>
can be used in all places likeOrderBy
,OrderByDescending
,ThenBy
,ThenByDescending
,SortedSet<string>
, etc. And you can still easily tweak case sensitivity, culture, etc.The implementation is fairly trivial and it should perform quite well even on large sequences.
I've also published it as a tiny NuGet package, so you can just do:
The code including XML documentation comments and suite of tests is available in the NaturalSort.Extension GitHub repository.
The entire code is this (if you cannot use C# 7 yet, just install the NuGet package):
受 Michael Parker 解决方案的启发,这里有一个
IComparer
实现,您可以将其放入任何 linq 排序方法中:Inspired by Michael Parker's solution, here is an
IComparer
implementation that you can drop in to any of the linq ordering methods:这是一种简单的单行无正则表达式 LINQ 方式(借用自 python):
Here is a naive one-line regex-less LINQ way (borrowed from python):
扩展之前的几个答案并利用扩展方法,我提出了以下方法,它没有潜在的多个可枚举枚举的警告,也没有与使用多个正则表达式对象有关的性能问题,或者不必要地调用正则表达式,话虽如此,它确实使用了 ToList(),这可能会抵消较大集合的好处。
选择器支持通用类型以允许分配任何委托,源集合中的元素由选择器进行变异,然后使用 ToString() 转换为字符串。
Expanding on a couple of the previous answers and making use of extension methods, I came up with the following that doesn't have the caveats of potential multiple enumerable enumeration, or performance issues concerned with using multiple regex objects, or calling regex needlessly, that being said, it does use ToList(), which can negate the benefits in larger collections.
The selector supports generic typing to allow any delegate to be assigned, the elements in the source collection are mutated by the selector, then converted to strings with ToString().
更容易阅读/维护的版本。
A version that's easier to read/maintain.