查找文件中仅包含列表中单词的行

发布于 2025-01-17 11:09:18 字数 469 浏览 4 评论 0原文

这里是 file1.txt:

.apple .ball .cow
.apple .cow .tea .mine.nice
.mine.nice
.tea
.zebra

这里是 file2.txt

.apple
.mine.nice
.cow
.tea

预期结果:

.apple .cow .tea .mine.nice
.mine.nice
.tea

虽然使用以下方法没有给出预期结果,

grep -w -F -f file2.txt file1.txt 

.apple .ball .cow
.apple .cow .tea .mine.nice
.mine.nice
.tea

如何获得预期结果?

Here is file1.txt:

.apple .ball .cow
.apple .cow .tea .mine.nice
.mine.nice
.tea
.zebra

Here file2.txt

.apple
.mine.nice
.cow
.tea

Expected Result:

.apple .cow .tea .mine.nice
.mine.nice
.tea

while using following does not give expected result

grep -w -F -f file2.txt file1.txt 

gives

.apple .ball .cow
.apple .cow .tea .mine.nice
.mine.nice
.tea

How to get expected result?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

梦在夏天 2025-01-24 11:09:18

我将利用 GNU AWK next 来完成此任务,让 file1.txt 内容为

.apple .ball .cow
.apple .cow .tea .mine.nice
.mine.nice
.tea
.zebra

file2.txt 内容

.apple
.mine.nice
.cow
.tea

然后

awk 'NR==FNR{arr[$1];next}{for(i=1;i<=NF;i+=1){if(!($i in arr)){next}};print}' file2.txt file1.txt

给出

.apple .cow .tea .mine.nice
.mine.nice
.tea

解释:在处理提到的第一个文件期间(请注意,这是file2.txt),即其中行数等于当前文件的行数(NR==FNR< /code>) 询问钥匙是第一个数组arr的文件。这导致在数组中创建键,我没有指定任何值,这与将来无关。执行此操作后,转到下一行,即在处理第一个文件期间不要执行任何其他操作。对于除第一行之外的所有字段,使用 for 循环迭代字段,如果遇到不是数组 arr 的键之一的字段,请转到 next 行,按原样处理所有字段后 print 整行。请注意,此代码短路,即一旦检测到第一个不允许的单词就转到下一行。 免责声明:我假设 file2.txt 每行恰好包含 1 个单词。

(在 gawk 4.2.1 中测试)

I would exploit GNU AWK next for this task following way, let file1.txt content be

.apple .ball .cow
.apple .cow .tea .mine.nice
.mine.nice
.tea
.zebra

and file2.txt content be

.apple
.mine.nice
.cow
.tea

then

awk 'NR==FNR{arr[$1];next}{for(i=1;i<=NF;i+=1){if(!($i in arr)){next}};print}' file2.txt file1.txt

gives

.apple .cow .tea .mine.nice
.mine.nice
.tea

Explanation: during processing 1st file of mentioned (note that this is file2.txt) i.e. where number of row is equal number of row of current file (NR==FNR) ask about key being 1st file of array arr. This cause creating key in array, I do not specify any value and it is irrelevant for future. After doing that go to next line, i.e. do not do anything else during processing 1st file. For all but 1st line iterate over fields using for loop, if you encounter field which is not one of keys of array arr go to next line, after processing all fields print whole line as is. Note that this code short-circuit i.e. go to next line as soon as 1st not allowed word is detected. Disclaimer: I assume that file2.txt is holding exactly 1 word per line.

(tested in gawk 4.2.1)

一片旧的回忆 2025-01-24 11:09:18

这可能对您有用(GNU SED):

sed -En '1{x;s/.*/cat file2/e;y/\n/ /;s/$/ /;x}
         s/.*/& \n&/;G
         :a;s/^(\S+ )(.*\n.*\n.*\1)/\2/;ta;s/^\n(.*)\n.*/\1/p' file1

解决方案在模式空间中串行三行,当前行的两个副本和文件2的内容。当前行的第一个副本与文件2中的字符串匹配,并缩小了大小,直到没有更多匹配项为止。如果匹配的结果会产生一个空线,则匹配是成功的,并打印了线,否则将丢弃。处理的流量如下:

使用文件2的内容,用空间替换新线,并为模式匹配目的附加空间。

将当前线加倍,再次在第一个副本中添加一个空间,将副本通过newlines分开,并附加保留空间。

通过当前行第一份副本的正面的字符串迭代,如果它在File2中匹配,则将其删除。

如果没有更多的匹配项,则剩下的就是将副本分开的newline,然后打印当前行的纯正副本。

否则,当前行与File2中的字符串不匹配,并且该行不会产生输出。

This might work for you (GNU sed):

sed -En '1{x;s/.*/cat file2/e;y/\n/ /;s/$/ /;x}
         s/.*/& \n&/;G
         :a;s/^(\S+ )(.*\n.*\n.*\1)/\2/;ta;s/^\n(.*)\n.*/\1/p' file1

The solution juggles three lines in the pattern space, two copies of the current line and the contents of file2. The first copy of the current line is matched against the strings in file2 and reduced in size until there are no more matches. If the result of the matching produces an empty line, the matches were successful and the line is printed otherwise it is discarded. The flow of processing is as follows:

Prime the hold space with the contents of file2, replace newlines by spaces and append a space for pattern matching purposes.

Double the current line, again adding a space to the first copy,separate the copies by newlines and append the hold space.

Iterate through the strings at the front of the first copy of the current line, removing it if it matches in file2.

When there are no more matches, if all that is left is the newline separating the copies then print the unadulterated copy of the current line.

Otherwise the current line did not match the strings in file2 and no output is produced for that line.

奈何桥上唱咆哮 2025-01-24 11:09:18

如果你可以接受comm工具(这是一个非常简单的工具),你可以这样做。

对于 file1.txt 中的每一行,您可以通过带有 -2 -3 参数的 comm 工具获取仅存在于该行但不存在于 file2.txt 中的单词。如果输出不为空。然后打印该行。

cat file1.txt | xargs -I {} bash -c 'if [[ -z $(comm -2 -3 <(echo {} | sed "s/\s\+/\n/g" | sort) <(sort file2.txt)) ]]; then echo {}; fi'

If you can accept the comm tool (it is a really simple tool), you can do it like this.

For each line in file1.txt, you can get the words only exists in that line but not file2.txt by comm tool with -2 -3 params. if the output is not empty. then print the line.

cat file1.txt | xargs -I {} bash -c 'if [[ -z $(comm -2 -3 <(echo {} | sed "s/\s\+/\n/g" | sort) <(sort file2.txt)) ]]; then echo {}; fi'
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文