从数据文件中删除选定的行
这个问题是我之前标题为“从正则表达式中选择数字”的文章的延续。
以下是之前帖子中发布的示例数据。
DONOR ACCEPTORH ACCEPTOR
atom# res@atom atom# res@atom atom# res@atom %occupied distance angle
| 4726 59@O12 | 1487 19@H12 1486 19@O12 | 85.66 2.819 ( 0.18) 21.85 (12.11)
| 1499 19@O15 | 1730 22@H12 1729 22@O12 | 83.15 3.190 ( 0.31) 22.36 (12.73)
| 1216 16@O22 | 1460 19@H22 1459 19@O22 | 75.74 2.757 ( 0.14) 24.55 (13.66)
| 4232 53@O25 | 4143 52@H24 4142 52@O24 | 74.35 2.916 ( 0.25) 28.27 (13.26)
| 3683 46@O16 | 4163 52@H13 4162 52@O13 | 73.78 2.963 ( 0.29) 23.65 (14.14)
| 4162 52@O13 | 4079 51@H12 4078 51@O12 | 73.68 2.841 ( 0.19) 21.25 (11.87)
| 3764 47@O16 | 3825 48@H26 3824 48@O26 | 70.52 2.973 ( 0.28) 26.88 (13.14)
.
.
The lines goes few thousands.
我厌倦了 Fredirk 的代码,它对于选择行效果很好。好吧,现在我想将这个想法扩展到我的实际问题。
我的数据文件中的 $3 (第 3 个字段)和 $6 (第 6 个字段)代表“数字分子”,其排列如下:
1 2 3 4 5 6 7 8
9 10 11 12 13 14 15 16
17 18 19 20 21 22 23 24
25 26 27 28 29 30 31 32
33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48
49 50 51 52 53 54 55 56
57 58 59 60 61 62 63 64
由上述数字组成的任何对实际上代表数据文件中每行的第 3 和第 6 字段中的对。
我想要的是选择仅由排列在上述排序的最外行的数字组成的对。
In short, ANY PAIRS made by only the numbers (1 2 3 4 5 6 7 8 57 58 59 60 61 62 63 64 1 9 17 25 33 41 49 57 8 16 24 32 40 48 56 64) are need to be deleted.
我不知道如何在 awk 代码中编写循环来选择这些对并立即删除这些行。
我想提前表示非常感谢。
This question is continuation from my earlier post titled "selecting digits from regular expression".
Below is the sample data as posted in the earlier post.
DONOR ACCEPTORH ACCEPTOR
atom# res@atom atom# res@atom atom# res@atom %occupied distance angle
| 4726 59@O12 | 1487 19@H12 1486 19@O12 | 85.66 2.819 ( 0.18) 21.85 (12.11)
| 1499 19@O15 | 1730 22@H12 1729 22@O12 | 83.15 3.190 ( 0.31) 22.36 (12.73)
| 1216 16@O22 | 1460 19@H22 1459 19@O22 | 75.74 2.757 ( 0.14) 24.55 (13.66)
| 4232 53@O25 | 4143 52@H24 4142 52@O24 | 74.35 2.916 ( 0.25) 28.27 (13.26)
| 3683 46@O16 | 4163 52@H13 4162 52@O13 | 73.78 2.963 ( 0.29) 23.65 (14.14)
| 4162 52@O13 | 4079 51@H12 4078 51@O12 | 73.68 2.841 ( 0.19) 21.25 (11.87)
| 3764 47@O16 | 3825 48@H26 3824 48@O26 | 70.52 2.973 ( 0.28) 26.88 (13.14)
.
.
The lines goes few thousands.
I tired Fredirk's code and it works fine for selecting the lines. Well, now I would like to extend this idea to my real problem.
The $3 (3rd field) and $6 (6th field) in my data file represent "number-molecule" which has arrangement as below:
1 2 3 4 5 6 7 8
9 10 11 12 13 14 15 16
17 18 19 20 21 22 23 24
25 26 27 28 29 30 31 32
33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48
49 50 51 52 53 54 55 56
57 58 59 60 61 62 63 64
Any pairs made from above numbers actually represents pairs in the 3rd and 6th field of each line in the data file.
What I want is to select the pairs made only by numbers which arranged at the outer most lines of the above ordering.
In short, ANY PAIRS made by only the numbers (1 2 3 4 5 6 7 8 57 58 59 60 61 62 63 64 1 9 17 25 33 41 49 57 8 16 24 32 40 48 56 64) are need to be deleted.
I have no idea how to write loop in awk code to select those pairs and delete the lines straight away.
I wish to say many thanks in advance.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
使用数组来保存一组数字。在 BEGIN 块中定义它
然后,检查 $3 和 $6 是否都在(或不在)集合中:
Use an array to hold the set of numbers. Define it in the BEGIN block
Then, check that $3 and $6 are both in (or not in) the set: