从数据文件中删除选定的行

发布于 2024-12-03 09:33:46 字数 1778 浏览 0 评论 0原文

这个问题是我之前标题为“从正则表达式中选择数字”的文章的延续。

以下是之前帖子中发布的示例数据。

          DONOR         ACCEPTORH      ACCEPTOR           
    atom#  res@atom   atom#  res@atom atom#  res@atom %occupied  distance       angle        
  |  4726   59@O12 |  1487    19@H12  1486    19@O12 |  85.66  2.819 ( 0.18)  21.85 (12.11)        
  |  1499   19@O15 |  1730    22@H12  1729    22@O12 |  83.15  3.190 ( 0.31)  22.36 (12.73)        
  |  1216   16@O22 |  1460    19@H22  1459    19@O22 |  75.74  2.757 ( 0.14)  24.55 (13.66)        
  |  4232   53@O25 |  4143    52@H24  4142    52@O24 |  74.35  2.916 ( 0.25)  28.27 (13.26)        
  |  3683   46@O16 |  4163    52@H13  4162    52@O13 |  73.78  2.963 ( 0.29)  23.65 (14.14)        
  |  4162   52@O13 |  4079    51@H12  4078    51@O12 |  73.68  2.841 ( 0.19)  21.25 (11.87)        
  |  3764   47@O16 |  3825    48@H26  3824    48@O26 |  70.52  2.973 ( 0.28)  26.88 (13.14)        
  .
  .
  The lines goes few thousands.

我厌倦了 Fredirk 的代码,它对于选择行效果很好。好吧,现在我想将这个想法扩展到我的实际问题。

我的数据文件中的 $3 (第 3 个字段)和 $6 (第 6 个字段)代表“数字分子”,其排列如下:

   1    2   3   4   5   6   7       8

   9    10  11  12  13  14  15      16
  17    18  19  20  21  22  23      24
  25    26  27  28  29  30  31      32
  33    34  35  36  37  38  39      40
  41    42  43  44  45  46  47      48
  49    50  51  52  53  54  55      56

  57    58  59  60  61  62  63      64 

由上述数字组成的任何对实际上代表数据文件中每行的第 3 和第 6 字段中的对。

我想要的是选择仅由排列在上述排序的最外行的数字组成的对。

 In short, ANY PAIRS made by only the numbers  (1 2 3 4 5 6 7 8   57 58 59 60 61 62 63 64   1 9 17 25 33 41 49 57   8 16 24 32 40 48 56 64) are need to be deleted.

我不知道如何在 awk 代码中编写循环来选择这些对并立即删除这些行。

我想提前表示非常感谢。

This question is continuation from my earlier post titled "selecting digits from regular expression".

Below is the sample data as posted in the earlier post.

          DONOR         ACCEPTORH      ACCEPTOR           
    atom#  res@atom   atom#  res@atom atom#  res@atom %occupied  distance       angle        
  |  4726   59@O12 |  1487    19@H12  1486    19@O12 |  85.66  2.819 ( 0.18)  21.85 (12.11)        
  |  1499   19@O15 |  1730    22@H12  1729    22@O12 |  83.15  3.190 ( 0.31)  22.36 (12.73)        
  |  1216   16@O22 |  1460    19@H22  1459    19@O22 |  75.74  2.757 ( 0.14)  24.55 (13.66)        
  |  4232   53@O25 |  4143    52@H24  4142    52@O24 |  74.35  2.916 ( 0.25)  28.27 (13.26)        
  |  3683   46@O16 |  4163    52@H13  4162    52@O13 |  73.78  2.963 ( 0.29)  23.65 (14.14)        
  |  4162   52@O13 |  4079    51@H12  4078    51@O12 |  73.68  2.841 ( 0.19)  21.25 (11.87)        
  |  3764   47@O16 |  3825    48@H26  3824    48@O26 |  70.52  2.973 ( 0.28)  26.88 (13.14)        
  .
  .
  The lines goes few thousands.

I tired Fredirk's code and it works fine for selecting the lines. Well, now I would like to extend this idea to my real problem.

The $3 (3rd field) and $6 (6th field) in my data file represent "number-molecule" which has arrangement as below:

   1    2   3   4   5   6   7       8

   9    10  11  12  13  14  15      16
  17    18  19  20  21  22  23      24
  25    26  27  28  29  30  31      32
  33    34  35  36  37  38  39      40
  41    42  43  44  45  46  47      48
  49    50  51  52  53  54  55      56

  57    58  59  60  61  62  63      64 

Any pairs made from above numbers actually represents pairs in the 3rd and 6th field of each line in the data file.

What I want is to select the pairs made only by numbers which arranged at the outer most lines of the above ordering.

 In short, ANY PAIRS made by only the numbers  (1 2 3 4 5 6 7 8   57 58 59 60 61 62 63 64   1 9 17 25 33 41 49 57   8 16 24 32 40 48 56 64) are need to be deleted.

I have no idea how to write loop in awk code to select those pairs and delete the lines straight away.

I wish to say many thanks in advance.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

安人多梦 2024-12-10 09:33:47

使用数组来保存一组数字。在 BEGIN 块中定义它

BEGIN {
  i=0
  for (n=1; n<=8; n++) set[i++] = n
  for (n=57; n<=64; n++) set[i++] = n
  for (n=9; n<=49; n+=8) {set[i++] = n; set[i++] = n+7}
}

然后,检查 $3 和 $6 是否都在(或不在)集合中:

($3 in set) && ($6 in set) {next}

Use an array to hold the set of numbers. Define it in the BEGIN block

BEGIN {
  i=0
  for (n=1; n<=8; n++) set[i++] = n
  for (n=57; n<=64; n++) set[i++] = n
  for (n=9; n<=49; n+=8) {set[i++] = n; set[i++] = n+7}
}

Then, check that $3 and $6 are both in (or not in) the set:

($3 in set) && ($6 in set) {next}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文