查找文件中仅包含列表中单词的行
这里是 file1.txt:
.apple .ball .cow
.apple .cow .tea .mine.nice
.mine.nice
.tea
.zebra
这里是 file2.txt
.apple
.mine.nice
.cow
.tea
预期结果:
.apple .cow .tea .mine.nice
.mine.nice
.tea
虽然使用以下方法没有给出预期结果,
grep -w -F -f file2.txt file1.txt
但
.apple .ball .cow
.apple .cow .tea .mine.nice
.mine.nice
.tea
如何获得预期结果?
Here is file1.txt:
.apple .ball .cow
.apple .cow .tea .mine.nice
.mine.nice
.tea
.zebra
Here file2.txt
.apple
.mine.nice
.cow
.tea
Expected Result:
.apple .cow .tea .mine.nice
.mine.nice
.tea
while using following does not give expected result
grep -w -F -f file2.txt file1.txt
gives
.apple .ball .cow
.apple .cow .tea .mine.nice
.mine.nice
.tea
How to get expected result?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我将利用 GNU
AWK
next
来完成此任务,让file1.txt
内容为file2.txt
内容然后
给出
解释:在处理提到的第一个文件期间(请注意,这是
file2.txt
),即其中行数等于当前文件的行数(NR==FNR< /code>) 询问钥匙是第一个数组
arr
的文件。这导致在数组中创建键,我没有指定任何值,这与将来无关。执行此操作后,转到下一行,即在处理第一个文件期间不要执行任何其他操作。对于除第一行之外的所有字段,使用for
循环迭代字段,如果遇到不是数组arr
的键之一的字段,请转到next
行,按原样处理所有字段后print
整行。请注意,此代码短路,即一旦检测到第一个不允许的单词就转到下一行。 免责声明:我假设file2.txt
每行恰好包含 1 个单词。(在 gawk 4.2.1 中测试)
I would exploit GNU
AWK
next
for this task following way, letfile1.txt
content beand
file2.txt
content bethen
gives
Explanation: during processing 1st file of mentioned (note that this is
file2.txt
) i.e. where number of row is equal number of row of current file (NR==FNR
) ask about key being 1st file of arrayarr
. This cause creating key in array, I do not specify any value and it is irrelevant for future. After doing that go tonext
line, i.e. do not do anything else during processing 1st file. For all but 1st line iterate over fields usingfor
loop, if you encounter field which is not one of keys of arrayarr
go tonext
line, after processing all fieldsprint
whole line as is. Note that this code short-circuit i.e. go to next line as soon as 1st not allowed word is detected. Disclaimer: I assume thatfile2.txt
is holding exactly 1 word per line.(tested in gawk 4.2.1)
这可能对您有用(GNU SED):
解决方案在模式空间中串行三行,当前行的两个副本和文件2的内容。当前行的第一个副本与文件2中的字符串匹配,并缩小了大小,直到没有更多匹配项为止。如果匹配的结果会产生一个空线,则匹配是成功的,并打印了线,否则将丢弃。处理的流量如下:
使用文件2的内容,用空间替换新线,并为模式匹配目的附加空间。
将当前线加倍,再次在第一个副本中添加一个空间,将副本通过newlines分开,并附加保留空间。
通过当前行第一份副本的正面的字符串迭代,如果它在File2中匹配,则将其删除。
如果没有更多的匹配项,则剩下的就是将副本分开的newline,然后打印当前行的纯正副本。
否则,当前行与File2中的字符串不匹配,并且该行不会产生输出。
This might work for you (GNU sed):
The solution juggles three lines in the pattern space, two copies of the current line and the contents of file2. The first copy of the current line is matched against the strings in file2 and reduced in size until there are no more matches. If the result of the matching produces an empty line, the matches were successful and the line is printed otherwise it is discarded. The flow of processing is as follows:
Prime the hold space with the contents of file2, replace newlines by spaces and append a space for pattern matching purposes.
Double the current line, again adding a space to the first copy,separate the copies by newlines and append the hold space.
Iterate through the strings at the front of the first copy of the current line, removing it if it matches in file2.
When there are no more matches, if all that is left is the newline separating the copies then print the unadulterated copy of the current line.
Otherwise the current line did not match the strings in file2 and no output is produced for that line.
如果你可以接受
comm
工具(这是一个非常简单的工具),你可以这样做。对于 file1.txt 中的每一行,您可以通过带有
-2 -3
参数的comm
工具获取仅存在于该行但不存在于 file2.txt 中的单词。如果输出不为空。然后打印该行。If you can accept the
comm
tool (it is a really simple tool), you can do it like this.For each line in file1.txt, you can get the words only exists in that line but not file2.txt by
comm
tool with-2 -3
params. if the output is not empty. then print the line.