如果字符串集中有多个可识别的数字序列,自然排序应该如何工作?
所谓的自然排序旨在解决以下问题:当用户期望
file1.txt
file2.txt
file3.txt
file10.txt
file11.txt
“通常”排序时,会产生:
file1.txt
file10.txt
file11.txt
file2.txt
file3.txt
这是不方便的并且不是“自然的”。
现在,我们最近遇到了一种情况,用户抱怨同样的问题,我们考虑采用自然排序。然而出现了以下问题。考虑以下一组字符串:
file1file100.txt
file2file99.txt
...
file99file2.txt
file100file1.txt
其中有多个可识别的数字序列,并且这些序列彼此相反。自然排序应该如何处理这样的集合(我的意思是结果应该是什么,而不是如何实现)?
So-called natural sort is meant to address the following problem: when users expect
file1.txt
file2.txt
file3.txt
file10.txt
file11.txt
"usual" sort instead produces:
file1.txt
file10.txt
file11.txt
file2.txt
file3.txt
which is inconvenient and isn't "natural".
Now we recently faced a situation when users complained about this very same problem and we considered employing natural sort. However the following problem arised. Consider the following set of strings:
file1file100.txt
file2file99.txt
...
file99file2.txt
file100file1.txt
in which there's more than one identifiable number sequence and those sequences are in opposite to each other. How should natural sort deal with such sets (I mean what should the result be, not how to implement that)?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
肯定是先到者胜。
通常的排序按字典顺序将文件名排序为字符序列(好吧,也许对文件扩展名进行特殊处理,尽管这可以通过在字符中首先排序
.
来实现):'f', 'i '、'l'、'e'、'1'、'f'、'i'、'l'、'e'、'1'、'0'、'0'
。自然排序按字典顺序将文件名排序为标记序列,其中每个标记可以是字符或数字:
'f', 'i', 'l', 'e', 1, 'f', 'i', 'l','e',100
。字符之间的比较是正常的字符顺序,数字之间的比较是正常的整数顺序,字符和数字之间的比较将数字放在任何字符之前(.
除外)。最后,您需要打破file1
和file01
之间的联系,因此“数字”不仅仅是数字,它们确实需要“知道”其原始表示形式,以防万一就这么远了。我实际上建议不要询问用户。如果他们对于如何对文件进行排序有非常强烈的意见,那么好吧,这很公平。否则,他们实际上可能并不确切地知道他们“应该”期望什么,因此分析师/程序员弄清楚什么是“正常”比用户这样做更有意义。当然,如果这是一个足够大且值得的交易,您可以通过可用性测试间接“询问”他们。我发现,如果你问用户错误的问题,他们就会感到有压力去猜测答案,并且仅仅因为这是用户代表当场想到的东西就编写任意的代码是没有意义的。
无论用户认为规则应该是什么,他们实际上最能接受的可能是他们的操作系统在文件管理器、文件对话框等中列出文件时默认执行的操作。所以我会向他们提供这个(或者也许是我可以编码的最接近的,而不需要在小边缘情况上浪费很多钱),如果他们仍然不满意,找出原因。
The one that comes first wins, surely.
Usual sort lexicographically sorts filenames as sequences of characters (well, perhaps with special treatment of file extensions, although that might be implemented just by ordering
.
first among characters) :'f', 'i', 'l', 'e', '1', 'f', 'i', 'l', 'e', '1', '0', '0'
.Natural sort lexicographically sorts filenames as sequences of tokens, where each token is either a character or a number:
'f', 'i', 'l', 'e', 1, 'f', 'i', 'l', 'e', 100
. Comparison between characters is normal character order, comparison between numbers is normal integer order, and comparison between a character and a number places numbers before any character (except.
). Finally you need to break the tie betweenfile1
andfile01
, so the "numbers" aren't quite just numbers, they do need to "know" their original representation in case it gets that far.I'd actually sort of advise against asking the users. If they have a really strong opinion how they want their files sorted then OK, fair enough. Otherwise they might not actually know exactly what they "should" expect, so it makes more sense for an analyst/programmer to figure out what's "normal" than for a user to do so. Of course you can "ask" them indirectly via usability testing, if it's a big enough deal to be worth it. I find that if you ask users the wrong questions, they feel pressured to guess answers, and there's no point coding something arbitrary just because it's what the user representative thought of on the spot.
Whatever users think the rules should be, chances are what they'll actually get on with best is whatever their OS does by default when listing files in its file manager, file dialogs, and that sort of thing. So I'd offer them that (or perhaps the closest to that I can code without wasting a lot of their money on minor edge cases), and if they're still not happy find out why.
我怀疑是否有一个“正确”的答案。
对我个人而言,“自然”的做法是按第一个嵌入的数字进行排序,使用第二个数字打破平局等。
但是,由于这是您的用户的期望而不是我的,因此可能值得询问他们。
I doubt there's a "correct" answer.
To me personally, the "natural" thing to do is to sort by the first embedded number, breaking ties using the second etc.
However, since it's your users' expectations and not mine that matter, it might be worth asking them.
我希望严格按照从左到右的顺序排列数字,就好像它们前面有足够的 0 前缀一样。我会尝试通过强调规则的简单性/普遍性来反对/说服那些有不同想法的用户。
I would expect a strictly left to right based order with the numbers sorted as if they were prepended with a sufficient prefix of 0's. I would try to argue against/convince users who think otherwise by emphasizing the simplicity/generality of the rule.
根据您的示例,我认为很自然地将这些文件名视为以下序列:
将每个文件拆分为这 6 个部分,并对所有字段上的文件进行排序,最左边的字段最重要。
注意:与 Steve Jessops 的好答案不同,排序时您应该将非数字或数字字符的序列作为一个整体考虑。
结果应该如您所显示的那样似乎是最自然的 - 最左边的数字字段给出整体顺序 - 毕竟我们习惯了数字中最左边的数字是最重要的;软件版本中最左边的数字是最重要的。
With your examples, I would think it natural to think of those file names as a sequence of:
Split each file into those 6 sections and sort the files on all fields, with the leftmost field being most significant.
Note: unlike Steve Jessops' good answer, you should consider sequences of either non-numeric or numeric chars as a whole when sorting.
It seems most natural that the result should be as you show it - with the leftmost numeric field giving the overall order - after-all we are used to the left-most digit in numerals being the most significant; and the leftmost number in software releases being the most significant.