使用 awk 按列(而非字段)编号搜索
有没有办法使用 awk 使用列(而不是字段)号来过滤行?我想抓取文本文件中包含分配给变量的字段 6 的值的所有行。我正在使用:
awk -v temp=${het} '{if $6 == temp} print $0}'
但我注意到,偶尔字段 5 是空白的,这会使事情变得混乱。我真正需要的是
if colx-y == temp
,但这似乎不存在。有没有办法做到这一点,
输入格式如下所述,我刚刚找到了我必须处理的另一种变体。我想提取(在本例中)602。第五个字段可能存在,也可能不存在,也可能遇到第六个字段(下面的两个示例)。文件格式的第 23-26 列包含第 6 个字段 - gawk 听起来可能是更好的选择:
HETATM 5307 S MOY A 602 14.660 14.666 109.556 1.00 26.41 S
HETATM 5307 S MOY 602 14.660 14.666 109.556 1.00 26.41 S
HETATM 5307 S MOY A1602 14.660 14.666 109.556 1.00 26.41 S
Is there a way to filter lines with awk using the column (not field) number? I want to grab all the lines in a text file containing the value of field 6 which is assigned to a variable. I am using:
awk -v temp=${het} '{if $6 == temp} print $0}'
But I have noticed that very occasionally field 5 is blank which messes things up. What I really need is
if colx-y == temp
but this doesn't appear to exist. Is there a way to do this
the input format is as described below and I have just found another variation I have to deal with. I want to extract (in this case) the 602. The fifth field may or may not exist and may also run into the 6th (both examples below). The file format has columns 23-26 containing the 6th field - gawk sounds like it might be the better option:
HETATM 5307 S MOY A 602 14.660 14.666 109.556 1.00 26.41 S
HETATM 5307 S MOY 602 14.660 14.666 109.556 1.00 26.41 S
HETATM 5307 S MOY A1602 14.660 14.666 109.556 1.00 26.41 S
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
请将示例输入添加到您的问题中,而不是添加到评论中。目前还不清楚您的输入是什么样的。给定您的“正常”输入行:
以下哪两个与您的输入与“字段 5 为空”相匹配:
在第一种情况下, ghostdog74 的答案 应该有效。他使用的
-F"[ ]"
是一种仅按单个空格进行分割的巧妙方法。-F" "
不起作用,因为awk
使用其默认的空格分割。如果您的数据是第二种格式,我会使用
substr()
来提取正确的字段:另一种选择是使用 gawk 的 固定宽度分割,但这实际上取决于输入的确切格式。
Please add the sample input to your question, not to a comment. It is still not clear how your input looks like. Given your 'normal' input line:
Which of the following two matches your input with 'field 5 is blank':
In the first case, ghostdog74's answer should work. The
-F"[ ]"
he uses is a clever way of splitting on single spaces only.-F" "
does not work, because thenawk
uses its default whitespace splitting.If your data is of the second format, I would use
substr()
to extract the correct field:Another option could be using gawk's fixed-width splitting, but it really depends on the exact format of your input.
你为什么不使用 if else 呢?
就像下面的算法:
如果您提供一些示例输入,它也会更容易理解!
why dont you use if else?
like below algo:
It would also be more easy to understand if you provide some sample input!
基于 shot 的建议和您的示例数据:
FIELDWIDTHS
中最后的“3”表示包含“602”的字段。我省略了该行其余部分的字段宽度。某些字段宽度可以组合,但我不知道什么是作为分隔符的空格,什么是作为字段内容的空格。Based on schot's suggestion and your example data:
The final "3" in
FIELDWIDTHS
represents the field that contains "602". I've omitted field widths for the rest of the line. Some of the field widths could be combined, but I didn't know what was whitespace as delimiters versus whitespace as field contents.