排序行为不一致
我有一个示例文件,每个文件上都包含“aA0_-”字符。使用 GNU sort 对其进行排序给出以下排序顺序:
$ cat /tmp/sample | sort
_
-
0
a
A
附加一些其他字符后,我们获得不同的顺序(非字母数字字符似乎具有较低的优先级):
$ cat /tmp/sample | sed 's/$/x/' | sort
0x
ax
Ax
x
_x
-x
而当我们将此字符插入到开头时,我们获得原始排序顺序:
$ cat /tmp/sample | sed 's/^/x/' | sort
x
x_
x-
x0
xa
xA
..这种行为的解释是什么?
更新
当示例中包含“z
”和“Z
”字符时,结果似乎更奇怪:
$ cat /tmp/sample | sed 's/$/x/' | sort
0x
ax
Ax
x
_x
-x
zx
Zx
..但鉴于正确答案,之所以如此,是因为所有 ''、'
_
' 和 '-
' 在当前语言环境中都是空格 (en_US.UTF -8) 并且在排序时不会被忽略。
I have a sample file containg "aA0_- " characters on each one on a single. Sorting it using GNU sort gives the following sort order:
$ cat /tmp/sample | sort
_
-
0
a
A
after appending some other character, we obtain a different order (non-alphanumeric characters seems to have lower priority):
$ cat /tmp/sample | sed 's/$/x/' | sort
0x
ax
Ax
x
_x
-x
while when we insert this character to the beginning, we obtain the original sort order:
$ cat /tmp/sample | sed 's/^/x/' | sort
x
x_
x-
x0
xa
xA
.. what is the explanation of such behavior?
UPDATE
when 'z
' and 'Z
' characters are included in the sample, the result seems yet sranger:
$ cat /tmp/sample | sed 's/$/x/' | sort
0x
ax
Ax
x
_x
-x
zx
Zx
.. but in the light of the correct answer, it is so because all '', '
_
' and '-
' are whitespace in the current locale (en_US.UTF-8) and are not ignored in sorting.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您的区域设置文件应包含 LC_COLLATE 的定义。
这决定了字符的排序顺序。
另请检查 LC_CTYPE 的定义,以及哪些字符被归类为“空格”。
如果“-”和“_”被分类为空格,您可能会找到所显示的结果。
Your locale file should contain a definition of LC_COLLATE.
This determines the sort order of characters.
Also check the definition of LC_CTYPE, and which characters are classified as 'space'.
if '-' and '_' are classified as space, you might find the results you have shown.