如何在 awk 中创建子数组?
给定一个列表,例如:
Dog bone
Cat catnip
Human ipad
Dog collar
Dog collar
Cat collar
Human car
Human laptop
Cat catnip
Human ipad
如何使用 awk 获得这样的结果:
Dog bone 1
Dog collar 2
Cat catnip 2
Cat collar 1
Human car 1
Human laptop 1
Human ipad 2
我需要子数组吗?在我看来,需要一个由“事物”数组填充的“所有者”数组。
我想使用 awk 来执行此操作,因为这是 awk 中另一个程序的下标,目前我不想创建单独的程序。
顺便说一句,我已经可以使用 sort 和 grep -c 以及其他一些管道来做到这一点,但我真的无法在巨大的数据文件上做到这一点,因为它太慢了。有人告诉我,对于这种事情,awk 通常要快得多。
Thanks,
Kevin
编辑:请注意,这些列实际上并不像这样彼此相邻,在真实文件中,它们更像是 $8 和 $11 列。我这么说是因为我想如果它们彼此相邻,我可以合并一个 awk 正则表达式 ~/Dog\ Collar/ 或其他东西。但我不会有这个选择。 -谢谢!
Given a list like:
Dog bone
Cat catnip
Human ipad
Dog collar
Dog collar
Cat collar
Human car
Human laptop
Cat catnip
Human ipad
How can I get results like this, using awk:
Dog bone 1
Dog collar 2
Cat catnip 2
Cat collar 1
Human car 1
Human laptop 1
Human ipad 2
Do I need a sub array? It seems to me like a need an array of "owners" which is populated by arrays of "things."
I'd like to use awk to do this, as this is a subscript of another program in awk, and for now, I'd rather not create a separate program.
By the way, I can already do it using sort and grep -c, and a few other pipes, but I really won't be able to do that on gigantic data files, as it would be too slow. Awk is generally much faster for this kind of thing, I'm told.
Thanks,
Kevin
EDIT: Be aware, that the columns are actually not next to eachother like this, in the real file, they are more like column $8 and $11. I say this because I suppose if they were next to eachother I could incorporate an awk regex ~/Dog\ Collar/ or something. But I won't have that option. -thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
awk 没有多维数组,但您可以通过构造 2D 数组键进行管理:
它根据您的输入输出
这里,我使用空格来分隔键值。如果您的数据包含空格,您可以使用输入中未出现的其他字符。当我有特定的字段分隔符时,我通常使用 array[$a FS $b],因为可以保证它不会出现在字段值中。
awk does not have multi-dimensional arrays, but you can manage by constructing 2D-ish array keys:
which, from your input, outputs
Here, I use a space to separate the key values. If your data contains spaces, you can use some other character that does not appear in your input. I typically use
array[$a FS $b]
when I have a specific field separator, since that's guaranteed not to appear in the field values.GNU Awk 对多维数组有一些支持,但它是实际上只是巧妙地连接键以形成一种复合键。
我建议学习 Perl,如果你喜欢 awk,你会对它相当熟悉,但 Perl 支持真正的 < a href="http://perldoc.perl.org/perllol.html" rel="nofollow">列表的列表。一般来说,Perl 会比 awk 让你走得更远。
回复你的评论:
我并不是想表现得更优秀。我知道您问如何使用特定工具 awk 完成任务。我确实提供了在 awk 中模拟多维数组的文档的链接。但 awk 并不能很好地完成这项任务,大约 20 年前它就被 Perl 有效地取代了。
如果你问如何骑自行车过湖,我告诉你坐船会更容易,我认为这不无道理。如果我告诉你先建一座桥,或者先发明星际迷航运输车会更容易,那是不合理的。
GNU Awk has some support for multi-dimensional arrays, but it's really just cleverly concatenating keys to form a sort of compound key.
I'd recommend learning Perl, which will be fairly familiar to you if you like awk, but Perl supports true Lists of Lists. In general, Perl will take you much further than awk.
Re your comment:
I'm not trying to be superior. I understand you asked how to accomplish a task with a specific tool, awk. I did give a link to the documentation for simulating multi-dimensional arrays in awk. But awk doesn't do that task well, and it was effectively replaced by Perl nearly 20 years ago.
If you ask how to cross a lake on a bicycle, and I tell you it'll be easier in a boat, I don't think that's unreasonable. If I tell you it'll be easier to first build a bridge, or first invent a Star Trek transporter, then that would be unreasonable.