Bash:按最后一个字段值对文本文件进行排序
我有一个包含约 300k 行的文本文件。每行都有不同数量的逗号分隔字段,最后一个字段保证为数字。我想按最后一个数字字段对文件进行排序。我不能这样做:
sort -t, -n -k 2 file.in > file.out
因为每行中的字段数量不是恒定的。我认为 sed、awk 也许是答案,但不确定如何。例如:
awk -F, '{print $NF}' file.in
给我最后一列值,但如何使用它对文件进行排序?
I have a text file containing ~300k rows. Each row has a varying number of comma-delimited fields, the last of which is guaranteed numerical. I want to sort the file by this last numerical field. I can't do:
sort -t, -n -k 2 file.in > file.out
as the number of fields in each row is not constant. I think sed, awk maybe the answer, but not sure how. E.g:
awk -F, '{print $NF}' file.in
gives me the last column value, but how to use this to sort the file?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
使用 awk 将数字键放在前面。
$NF
是当前记录的最后一个字段。种类。使用 sed 删除重复的键。Use awk to put the numeric key up front.
$NF
is the last field of the current record. Sort. Use sed to remove the duplicate key.也许在排序之前反转文件中每一行的字段? 类似的东西
只要不以任何方式引用逗号, 就应该这样做。如果这是一个完整的 CSV 文件(其中逗号可以用反斜杠或空格引用),那么您需要一个真正的 CSV 解析器。
Maybe reverse the fields of each line in the file before sorting? Something like
should do it, as long as commas are never quoted in any way. If this is a full-fledged CSV file (in which commas can be quoted with backslash or space) then you need a real CSV parser.
Perl 一行:
Perl one-liner:
我将把我的作为替代方案(我无法让 awk 工作):)
示例文件:
代码:
I'm going to throw mine in here as an alternative (and I couldn't get awk to work) :)
sample file:
code:
Python 一行:
Python one-liner: