比较每隔一行,打印后面的行但删除重复项

发布于 2024-12-19 12:33:31 字数 363 浏览 0 评论 0原文

我有一个格式为以下的文件:(

id-of-item

description of item

id-of-item

description of item

id-of-item

description of item

id-of-item

description of item

id-of-item

description of item

每个之间只有一行,这里只有大空格)

我需要比较项目的描述,如果它们匹配,则删除该描述但保留 id(我需要制作一个引用的表格id 作为组)

我不知道如何做到这一点,我尝试了几个 awk 与 NR%2 和 uniq 等,但显然所有都只匹配一个而不是另一个=/

I have a file in the format of:

id-of-item

description of item

id-of-item

description of item

id-of-item

description of item

id-of-item

description of item

id-of-item

description of item

(only one line between each, just big spaces here)

I need to compare the descriptions of items and if they match, remove that description but keep the id (i need to make a table that references the ids as groups)

I have no idea how to do this, i have tried a couple of awk with NR%2 and uniq etc but obviously all have only matched one and not the other =/

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

猥︴琐丶欲为 2024-12-26 12:33:31

这可能很接近。 awk 的规则是,
将您想要消除重复的任何内容放入数组索引中:

BEGIN {title = ""}
NF == 0 { print; next;}
title == "" {
    title = $0;
    print; next;
}
{
    if (value[$0] == "" ) print;
    value[$0] = $0;
    title = ""
}

感受关联数组的强大功能。

This might be close. The rule of awk is,
put whatever you want to kill duplication into index of array:

BEGIN {title = ""}
NF == 0 { print; next;}
title == "" {
    title = $0;
    print; next;
}
{
    if (value[$0] == "" ) print;
    value[$0] = $0;
    title = ""
}

Feel the power of Associative Array.

落叶缤纷 2024-12-26 12:33:31

我将做出两个简化的假设:

  1. 描述只有一行长。
  2. 您可以识别未出现在描述或 ID 中的角色。我将为这个角色使用一个选项卡。

这两个假设都不是很强,因此如果需要,调整以下内容应该不难。

有了这些假设,我将使用 printf "1\n\nitem 1\n\n2\n\nitem 2\n\n3\n\nitem 2\n\n4\n\nitem 1\ 生成样本数据n”。它看起来像这样:

1

item 1

2

item 2

3

item 2

4

item 1

要处理这些数据,我将:

  1. 去掉空白行
  2. 连接连续的行,用选项卡分隔 ID 和描述 按
  3. 描述字段对新行进行排序 将
  4. 排序后的行格式化为表格

这是一个管道这样做:

grep -v '^[[:space:]]*

通过管道传输示例数据,您将得到

-----
item 1
1
4
-----
item 2
2
3
| awk 'NR%2 { printf("%s\t", $0) } !(NR%2)' | sort -k2 | awk -F"\t" 'desc != $2 { printf("-----\n%s\n", $2); desc = $2} { print $1 }'

通过管道传输示例数据,您将得到

I'm going to make two simplifying assumptions:

  1. Descriptions are just one line long.
  2. You can identify a character that doesn't appear in descriptions or IDs. I'll use a tab for this character.

Neither assumption is very strong, so it shouldn't be hard to adapt the following if needed.

With those assumptions, I'll produce sample data with printf "1\n\nitem 1\n\n2\n\nitem 2\n\n3\n\nitem 2\n\n4\n\nitem 1\n". It looks like this:

1

item 1

2

item 2

3

item 2

4

item 1

To process this data, I'll:

  1. Get rid of the blank lines
  2. Join successive lines, separating the ID and description by a tab
  3. Sort the new lines by the description field
  4. Format the sorted lines into a table

Here's a pipeline that does it:

grep -v '^[[:space:]]*

Pipe the sample data through it, and you get

-----
item 1
1
4
-----
item 2
2
3
| awk 'NR%2 { printf("%s\t", $0) } !(NR%2)' | sort -k2 | awk -F"\t" 'desc != $2 { printf("-----\n%s\n", $2); desc = $2} { print $1 }'

Pipe the sample data through it, and you get

这样的小城市 2024-12-26 12:33:31

这可能对您有帮助(?):

# cat input.txt
id-of-item0
id-of-item0 description of item0
id-of-item1
id-of-item1 description of item1
id-of-item0
id-of-item0 description of item0
id-of-item3
id-of-item3 description of item3
id-of-item4
id-of-item4 description of item4
# sed 'N;s/\n/!!!/' input.txt | sort -u | sed 's/!!!/\n/'
id-of-item0
id-of-item0 description of item0
id-of-item1
id-of-item1 description of item1
id-of-item3
id-of-item3 description of item3
id-of-item4
id-of-item4 description of item4

如果您想删除描述:

# sed 'N;s/\n/!!!/' input.txt | sort -u | sed 's/!!!.*//'
id-of-item0
id-of-item1
id-of-item3
id-of-item4

说明:

一次读取 input.txt 2 行,将换行符 \n 替换为分隔符 (这是!!!)。排序并删除重复项。将分隔符 !!! 替换为换行符 \n。或者完全删除描述。

编辑:

这可能对你有用(?):

sed '/^$/d' input_file |   # remove empty lines
sed -n 'h;n;G;s/\n/\t/p' | # join id with description and swap tab separating
sort |                     # sort descriptions
sed ':a;N;s/^\(\([^\t]*\)\t[^\n]*\)\n\2/\1/;ta;P;D' | # build index tab separated
sed 's/\t/\n/g'            # translate tabs to newlines

This might help you(?):

# cat input.txt
id-of-item0
id-of-item0 description of item0
id-of-item1
id-of-item1 description of item1
id-of-item0
id-of-item0 description of item0
id-of-item3
id-of-item3 description of item3
id-of-item4
id-of-item4 description of item4
# sed 'N;s/\n/!!!/' input.txt | sort -u | sed 's/!!!/\n/'
id-of-item0
id-of-item0 description of item0
id-of-item1
id-of-item1 description of item1
id-of-item3
id-of-item3 description of item3
id-of-item4
id-of-item4 description of item4

If you want to remove the description:

# sed 'N;s/\n/!!!/' input.txt | sort -u | sed 's/!!!.*//'
id-of-item0
id-of-item1
id-of-item3
id-of-item4

Explanation:

Read input.txt 2 lines at a time replacing the newline \n with a delimiter (here it is !!!). Sort and remove duplicates. Replace the delimiter !!! by a newline \n. Or remove the description altogether.

EDIT:

This might work for you(?):

sed '/^$/d' input_file |   # remove empty lines
sed -n 'h;n;G;s/\n/\t/p' | # join id with description and swap tab separating
sort |                     # sort descriptions
sed ':a;N;s/^\(\([^\t]*\)\t[^\n]*\)\n\2/\1/;ta;P;D' | # build index tab separated
sed 's/\t/\n/g'            # translate tabs to newlines
愿与i 2024-12-26 12:33:31

这行得通吗?

awk 'NF' file | sed '{N;s/\n/:/g}' | 
awk -F":" -v OFS="\n\n" -v ORS="\n\n"  '{b[$2]++} {if (b[$2]>1) print $1; else print $1,$2}'

您的文件:

[jaypal:~/Temp] cat file
id-of-item31

description of item4 <--- Duplicate description

id-of-item22

description of item4 <--- Duplicate description

id-of-item34

description of item1 <--- Duplicate description

id-of-item21

description of item3

id-of-item11

description of item1 <--- Duplicate description

执行:

[jaypal:~/Temp] awk 'NF' file | sed '{N;s/\n/:/g}' | 
awk -F":" -v OFS="\n\n" -v ORS="\n\n"  '{b[$2]++} {if (b[$2]>1) print $1; else print $1,$2}'

id-of-item31

description of item4

id-of-item22

id-of-item34

description of item1

id-of-item21

description of item3

id-of-item11

Would this work?

awk 'NF' file | sed '{N;s/\n/:/g}' | 
awk -F":" -v OFS="\n\n" -v ORS="\n\n"  '{b[$2]++} {if (b[$2]>1) print $1; else print $1,$2}'

Your File:

[jaypal:~/Temp] cat file
id-of-item31

description of item4 <--- Duplicate description

id-of-item22

description of item4 <--- Duplicate description

id-of-item34

description of item1 <--- Duplicate description

id-of-item21

description of item3

id-of-item11

description of item1 <--- Duplicate description

Execution:

[jaypal:~/Temp] awk 'NF' file | sed '{N;s/\n/:/g}' | 
awk -F":" -v OFS="\n\n" -v ORS="\n\n"  '{b[$2]++} {if (b[$2]>1) print $1; else print $1,$2}'

id-of-item31

description of item4

id-of-item22

id-of-item34

description of item1

id-of-item21

description of item3

id-of-item11
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文