根据两列排序并根据上一列提取前两个

发布于 2025-01-26 20:16:33 字数 1286 浏览 4 评论 0原文

我有一个带有三列的文件。我想在第2列中为每个唯一值的第3列中的排名第两个值提取行。

cat file.list
run1/xx2/x2c1.txt 21 -190
run1/xx2/x2c2.txt 19 -180
run1/xx2/x2c3.txt 18 -179
run1/xx2/x2c4.txt 19 -162
run1/xx2/x2c5.txt 21 -172
run2/xx2/x2c1.txt 21 -162
run2/xx2/x2c2.txt 18 -192
run2/xx2/x2c3.txt 19 -191
run2/xx2/x2c4.txt 19 -184
run2/xx2/x2c5.txt 21 -179
run3/xx2/x2c1.txt 19 -162
run3/xx2/x2c2.txt 19 -192
run3/xx2/x2c3.txt 21 -191
run3/xx2/x2c4.txt 18 -184
run3/xx2/x2c5.txt 19 -179

期望输出

run2/xx2/x2c2.txt 18 -192
run3/xx2/x2c4.txt 18 -184
run3/xx2/x2c2.txt 19 -192
run2/xx2/x2c3.txt 19 -191
run3/xx2/x2c3.txt 21 -191
run1/xx2/x2c1.txt 21 -190

我觉得排序的某种组合，uniq和awk可能会完成，但我无法正确执行它。我可以按列进行排序，

sort -nk2 -nk3 file.list

该列给我一个输出，按以下为-K2和-K3，

run2/xx2/x2c2.txt 18 -192
run3/xx2/x2c4.txt 18 -184
run1/xx2/x2c3.txt 18 -179
run3/xx2/x2c2.txt 19 -192
run2/xx2/x2c3.txt 19 -191
run2/xx2/x2c4.txt 19 -184
run1/xx2/x2c2.txt 19 -180
run3/xx2/x2c5.txt 19 -179
run1/xx2/x2c4.txt 19 -162
run3/xx2/x2c1.txt 19 -162
run3/xx2/x2c3.txt 21 -191
run1/xx2/x2c1.txt 21 -190
run2/xx2/x2c5.txt 21 -179
run1/xx2/x2c5.txt 21 -172
run2/xx2/x2c1.txt 21 -162

但是随后我陷入了如何在18、19和20的最后一列中仅提取最佳两个分数的行。

我真的会感谢任何狂欢解决方案。

原文

I have a file with three columns. I would like to extract rows with top two values in column 3 for each unique value in column 2.

cat file.list
run1/xx2/x2c1.txt 21 -190
run1/xx2/x2c2.txt 19 -180
run1/xx2/x2c3.txt 18 -179
run1/xx2/x2c4.txt 19 -162
run1/xx2/x2c5.txt 21 -172
run2/xx2/x2c1.txt 21 -162
run2/xx2/x2c2.txt 18 -192
run2/xx2/x2c3.txt 19 -191
run2/xx2/x2c4.txt 19 -184
run2/xx2/x2c5.txt 21 -179
run3/xx2/x2c1.txt 19 -162
run3/xx2/x2c2.txt 19 -192
run3/xx2/x2c3.txt 21 -191
run3/xx2/x2c4.txt 18 -184
run3/xx2/x2c5.txt 19 -179

expected output

run2/xx2/x2c2.txt 18 -192
run3/xx2/x2c4.txt 18 -184
run3/xx2/x2c2.txt 19 -192
run2/xx2/x2c3.txt 19 -191
run3/xx2/x2c3.txt 21 -191
run1/xx2/x2c1.txt 21 -190

I feel like some combination of sort, uniq and awk might accomplish but I can't properly execute it. I can sort by columns

sort -nk2 -nk3 file.list

which gives me an output sorted by -k2 and -k3 as follows,

run2/xx2/x2c2.txt 18 -192
run3/xx2/x2c4.txt 18 -184
run1/xx2/x2c3.txt 18 -179
run3/xx2/x2c2.txt 19 -192
run2/xx2/x2c3.txt 19 -191
run2/xx2/x2c4.txt 19 -184
run1/xx2/x2c2.txt 19 -180
run3/xx2/x2c5.txt 19 -179
run1/xx2/x2c4.txt 19 -162
run3/xx2/x2c1.txt 19 -162
run3/xx2/x2c3.txt 21 -191
run1/xx2/x2c1.txt 21 -190
run2/xx2/x2c5.txt 21 -179
run1/xx2/x2c5.txt 21 -172
run2/xx2/x2c1.txt 21 -162

but then I get stuck on how to extract only the rows with best two scores in the last column for 18, 19 and 20.

I would really appreciate any bash solutions.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

万劫不复 2025-02-02 20:16:33

将当前排序结果输送到awk：

$ sort -nk2 -nk3 file.list | awk 'a[$2]++ < 2'
run2/xx2/x2c2.txt 18 -192
run3/xx2/x2c4.txt 18 -184
run3/xx2/x2c2.txt 19 -192
run2/xx2/x2c3.txt 19 -191
run3/xx2/x2c3.txt 21 -191
run1/xx2/x2c1.txt 21 -190

其中：

field＃2（$ 2）用作数组a []的索引
如果数组中存储的值小于2，则打印当前输入行，
然后第一次递增计数器（++），
我们第一次看到a [18]计数为0，我们打印行，并将计数增加1
第二次，我们看到a [18]计数为1，我们打印了行，并将计数递增1
3rd（到nth）时间，我们看到a [18]计数大于或等于2，我们做 not 打印行，并将计数增加

一个替代方案，我们会增加首先计数：

$ sort -nk2 -nk3 file.list | awk '++a[$2] <= 2'
run2/xx2/x2c2.txt 18 -192
run3/xx2/x2c4.txt 18 -184
run3/xx2/x2c2.txt 19 -192
run2/xx2/x2c3.txt 19 -191
run3/xx2/x2c3.txt 21 -191
run1/xx2/x2c1.txt 21 -190

Piping the current sort results to awk:

$ sort -nk2 -nk3 file.list | awk 'a[$2]++ < 2'
run2/xx2/x2c2.txt 18 -192
run3/xx2/x2c4.txt 18 -184
run3/xx2/x2c2.txt 19 -192
run2/xx2/x2c3.txt 19 -191
run3/xx2/x2c3.txt 21 -191
run1/xx2/x2c1.txt 21 -190

Where:

field #2 ($2) is used as the index for array a[]
if the value stored in the array is less than 2 then print the current input line
then increment the counter (++)
1st time we see a[18] the count is 0, we print the line, and increment the count by 1
2nd time we see a[18] the count is 1, we print the line, and increment the count by 1
3rd (to nth) time we see a[18] the count is greater than or equal to 2, we do not print the line, and increment the count

An alternative where we increment the count first:

$ sort -nk2 -nk3 file.list | awk '++a[$2] <= 2'
run2/xx2/x2c2.txt 18 -192
run3/xx2/x2c4.txt 18 -184
run3/xx2/x2c2.txt 19 -192
run2/xx2/x2c3.txt 19 -191
run3/xx2/x2c3.txt 21 -191
run1/xx2/x2c1.txt 21 -190

回复收藏 0 原文

~没有更多了~