根据两列排序并根据上一列提取前两个

发布于 2025-01-26 20:16:33 字数 1286 浏览 4 评论 0原文

我有一个带有三列的文件。我想在第2列中为每个唯一值的第3列中的排名第两个值提取行。

cat file.list
run1/xx2/x2c1.txt 21 -190
run1/xx2/x2c2.txt 19 -180
run1/xx2/x2c3.txt 18 -179
run1/xx2/x2c4.txt 19 -162
run1/xx2/x2c5.txt 21 -172
run2/xx2/x2c1.txt 21 -162
run2/xx2/x2c2.txt 18 -192
run2/xx2/x2c3.txt 19 -191
run2/xx2/x2c4.txt 19 -184
run2/xx2/x2c5.txt 21 -179
run3/xx2/x2c1.txt 19 -162
run3/xx2/x2c2.txt 19 -192
run3/xx2/x2c3.txt 21 -191
run3/xx2/x2c4.txt 18 -184
run3/xx2/x2c5.txt 19 -179

期望输出

run2/xx2/x2c2.txt 18 -192
run3/xx2/x2c4.txt 18 -184
run3/xx2/x2c2.txt 19 -192
run2/xx2/x2c3.txt 19 -191
run3/xx2/x2c3.txt 21 -191
run1/xx2/x2c1.txt 21 -190

我觉得排序的某种组合,uniq和awk可能会完成,但我无法正确执行它。我可以按列进行排序,

sort -nk2 -nk3 file.list 

该列给我一个输出,按以下为-K2和-K3,

run2/xx2/x2c2.txt 18 -192
run3/xx2/x2c4.txt 18 -184
run1/xx2/x2c3.txt 18 -179
run3/xx2/x2c2.txt 19 -192
run2/xx2/x2c3.txt 19 -191
run2/xx2/x2c4.txt 19 -184
run1/xx2/x2c2.txt 19 -180
run3/xx2/x2c5.txt 19 -179
run1/xx2/x2c4.txt 19 -162
run3/xx2/x2c1.txt 19 -162
run3/xx2/x2c3.txt 21 -191
run1/xx2/x2c1.txt 21 -190
run2/xx2/x2c5.txt 21 -179
run1/xx2/x2c5.txt 21 -172
run2/xx2/x2c1.txt 21 -162

但是随后我陷入了如何在18、19和20的最后一列中仅提取最佳两个分数的行。

我真的会感谢任何狂欢解决方案。

I have a file with three columns. I would like to extract rows with top two values in column 3 for each unique value in column 2.

cat file.list
run1/xx2/x2c1.txt 21 -190
run1/xx2/x2c2.txt 19 -180
run1/xx2/x2c3.txt 18 -179
run1/xx2/x2c4.txt 19 -162
run1/xx2/x2c5.txt 21 -172
run2/xx2/x2c1.txt 21 -162
run2/xx2/x2c2.txt 18 -192
run2/xx2/x2c3.txt 19 -191
run2/xx2/x2c4.txt 19 -184
run2/xx2/x2c5.txt 21 -179
run3/xx2/x2c1.txt 19 -162
run3/xx2/x2c2.txt 19 -192
run3/xx2/x2c3.txt 21 -191
run3/xx2/x2c4.txt 18 -184
run3/xx2/x2c5.txt 19 -179

expected output

run2/xx2/x2c2.txt 18 -192
run3/xx2/x2c4.txt 18 -184
run3/xx2/x2c2.txt 19 -192
run2/xx2/x2c3.txt 19 -191
run3/xx2/x2c3.txt 21 -191
run1/xx2/x2c1.txt 21 -190

I feel like some combination of sort, uniq and awk might accomplish but I can't properly execute it. I can sort by columns

sort -nk2 -nk3 file.list 

which gives me an output sorted by -k2 and -k3 as follows,

run2/xx2/x2c2.txt 18 -192
run3/xx2/x2c4.txt 18 -184
run1/xx2/x2c3.txt 18 -179
run3/xx2/x2c2.txt 19 -192
run2/xx2/x2c3.txt 19 -191
run2/xx2/x2c4.txt 19 -184
run1/xx2/x2c2.txt 19 -180
run3/xx2/x2c5.txt 19 -179
run1/xx2/x2c4.txt 19 -162
run3/xx2/x2c1.txt 19 -162
run3/xx2/x2c3.txt 21 -191
run1/xx2/x2c1.txt 21 -190
run2/xx2/x2c5.txt 21 -179
run1/xx2/x2c5.txt 21 -172
run2/xx2/x2c1.txt 21 -162

but then I get stuck on how to extract only the rows with best two scores in the last column for 18, 19 and 20.

I would really appreciate any bash solutions.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

万劫不复 2025-02-02 20:16:33

将当前排序结果输送到awk

$ sort -nk2 -nk3 file.list | awk 'a[$2]++ < 2'
run2/xx2/x2c2.txt 18 -192
run3/xx2/x2c4.txt 18 -184
run3/xx2/x2c2.txt 19 -192
run2/xx2/x2c3.txt 19 -191
run3/xx2/x2c3.txt 21 -191
run1/xx2/x2c1.txt 21 -190

其中:

  • field#2($ 2)用作数组a []的索引
  • 如果数组中存储的值小于2,则打印当前输入行,
  • 然后第一次递增计数器(++),
  • 我们第一次看到a [18]计数为0,我们打印行,并将计数增加1
  • 第二次,我们看到a [18]计数为1,我们打印了行,并将计数递增1
  • 3rd(到nth)时间,我们看到a [18]计数大于或等于2,我们做 not 打印行,并将计数增加

一个替代方案,我们会增加首先计数:

$ sort -nk2 -nk3 file.list | awk '++a[$2] <= 2'
run2/xx2/x2c2.txt 18 -192
run3/xx2/x2c4.txt 18 -184
run3/xx2/x2c2.txt 19 -192
run2/xx2/x2c3.txt 19 -191
run3/xx2/x2c3.txt 21 -191
run1/xx2/x2c1.txt 21 -190

Piping the current sort results to awk:

$ sort -nk2 -nk3 file.list | awk 'a[$2]++ < 2'
run2/xx2/x2c2.txt 18 -192
run3/xx2/x2c4.txt 18 -184
run3/xx2/x2c2.txt 19 -192
run2/xx2/x2c3.txt 19 -191
run3/xx2/x2c3.txt 21 -191
run1/xx2/x2c1.txt 21 -190

Where:

  • field #2 ($2) is used as the index for array a[]
  • if the value stored in the array is less than 2 then print the current input line
  • then increment the counter (++)
  • 1st time we see a[18] the count is 0, we print the line, and increment the count by 1
  • 2nd time we see a[18] the count is 1, we print the line, and increment the count by 1
  • 3rd (to nth) time we see a[18] the count is greater than or equal to 2, we do not print the line, and increment the count

An alternative where we increment the count first:

$ sort -nk2 -nk3 file.list | awk '++a[$2] <= 2'
run2/xx2/x2c2.txt 18 -192
run3/xx2/x2c4.txt 18 -184
run3/xx2/x2c2.txt 19 -192
run2/xx2/x2c3.txt 19 -191
run3/xx2/x2c3.txt 21 -191
run1/xx2/x2c1.txt 21 -190
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文