使用 AWK 按组选择前 n 行

发布于 2024-10-06 03:45:11 字数 967 浏览 2 评论 0原文

我有以下包含 4 个字段的文件。第2字段有3组，第4字段由0和1组成。

第一个字段只是索引。

我喜欢使用 AWK 执行以下任务

选择组 1 的前 3 行（请注意，组 1 只有 2 行）。行数基于在第 4 个字段中找到的 1 的数量乘以 3。
选择组 2 的前 6 行。行数基于在第 4 个字段中找到的 1 的数量乘以 3。
选择组 3 的前 9 行。行数基于根据第 4 个字段中找到的 1 的数量乘以 3。

因此选择了 17 行对于输出文件。

感谢您的帮助。

Input 

1   1  TN1148 1
2   1  S52689 0
3   2  TA2081 1
4   2  TA2592 1
5   2  TA4011 0
6   2  TA4246 0
7   2  TA4275 0
8   2  TB0159 0
9   2  TB0392 0
10  3  TB0454 1
11  3  TB0496 1
12  3  TB1181 1
13  3  TC0027 0
14  3  TC1340 0
15  3  TC2247 0
16  3  TC3094 0
17  3  TD0106 0
18  3  TD1146 0
19  3  TD1796 0
20  3  TD3587 0

Output 

 1  1  TN1148 1
 2  1  S52689 0
 3  2  TA2081 1
 4  2  TA2592 1
 5  2  TA4011 0
 6  2  TA4246 0
 7  2  TA4275 0
 8  2  TB0159 0
 10 3  TB0454 1
 11 3  TB0496 1
 12 3  TB1181 1
 13 3  TC0027 0
 14 3  TC1340 0
 15 3  TC2247 0
 16 3  TC3094 0
 17 3  TD0106 0
 18 3  TD1146 0

原文

I have the following file with 4 fields. There are 3 groups in field 2, and the 4th field consists 0's and 1's.

The first field is just the index.

I like to use AWK to do the following task

Select the first 3 rows of group 1 (Note that group 1 has only 2 rows). The number of rows is based on the number of 1's found in the 4th field times 3.
Select the first 6 rows of group 2. The number of rows is based on the number of 1's found in the 4th field times 3.
Select the first 9 rows of group 3. The number of rows is based on the number of 1's found in the 4th field times 3.

So 17 rows are selected for the output file.

Thank you for your help.

Input 

1   1  TN1148 1
2   1  S52689 0
3   2  TA2081 1
4   2  TA2592 1
5   2  TA4011 0
6   2  TA4246 0
7   2  TA4275 0
8   2  TB0159 0
9   2  TB0392 0
10  3  TB0454 1
11  3  TB0496 1
12  3  TB1181 1
13  3  TC0027 0
14  3  TC1340 0
15  3  TC2247 0
16  3  TC3094 0
17  3  TD0106 0
18  3  TD1146 0
19  3  TD1796 0
20  3  TD3587 0

Output 

 1  1  TN1148 1
 2  1  S52689 0
 3  2  TA2081 1
 4  2  TA2592 1
 5  2  TA4011 0
 6  2  TA4246 0
 7  2  TA4275 0
 8  2  TB0159 0
 10 3  TB0454 1
 11 3  TB0496 1
 12 3  TB1181 1
 13 3  TC0027 0
 14 3  TC1340 0
 15 3  TC2247 0
 16 3  TC3094 0
 17 3  TD0106 0
 18 3  TD1146 0

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

雨落星ぅ辰 2024-10-13 03:45:11

这个 awk 程序的关键是两次传递输入文件：一次计算您想要的行数，一次打印它们。

awk '
    NR == FNR {wanted_rows[$2] += 3*$4; next} 
    --wanted_rows[$2] >= 0 {print}
' input_file.txt input_file.txt

The key to this awk program is to pass the input file in twice: Once to count how many rows you want and once to print them.

awk '
    NR == FNR {wanted_rows[$2] += 3*$4; next} 
    --wanted_rows[$2] >= 0 {print}
' input_file.txt input_file.txt

回复收藏 0 原文

初吻给了烟 2024-10-13 03:45:11

#!/usr/bin/awk -f
# by Dennis Williamson - 2010-12-02
# for http://stackoverflow.com/questions/4334167/selecting-first-nth-rows-by-groups-using-awk
$2 == prev {
    count += $4
    groupcount++
    array[idx++] = $0
}
$2 != prev {
    if (NR > 1) {
        for (i=0; i<count*3; i++) {
            if (i == groupcount) break
            print array[i]
        }
    }
    prev = $2
    count = 1
    groupcount = 1
    split("", array) # delete the array
    idx = 0
    array[idx++] = $0
}
END {
    for (i=0; i<count*3; i++) {
        if (i == groupcount) break
        print array[i]
    }
}

#!/usr/bin/awk -f
# by Dennis Williamson - 2010-12-02
# for http://stackoverflow.com/questions/4334167/selecting-first-nth-rows-by-groups-using-awk
$2 == prev {
    count += $4
    groupcount++
    array[idx++] = $0
}
$2 != prev {
    if (NR > 1) {
        for (i=0; i<count*3; i++) {
            if (i == groupcount) break
            print array[i]
        }
    }
    prev = $2
    count = 1
    groupcount = 1
    split("", array) # delete the array
    idx = 0
    array[idx++] = $0
}
END {
    for (i=0; i<count*3; i++) {
        if (i == groupcount) break
        print array[i]
    }
}

回复收藏 0 原文

~没有更多了~