相互比较两个文本文件
如果我必须文本文件,例如:
file1.txt
apple
orange
pear
banana
file2.txt
banana
pear
我如何将 file2.txt 行上的所有短语从 file1.txt 中取出,
那么 file1.txt 将留下:
apple
orange
If I had to text files, for example:
file1.txt
apple
orange
pear
banana
file2.txt
banana
pear
How would I take all phrases on the lines of file2.txt away from file1.txt
So file1.txt would be left with:
apple
orange
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
-v 表示仅列出 file1.txt 中与模式不匹配的行,-f 表示从文件中获取模式,在本例中为 file2.txt。 -F - 将 PATTERN 解释为固定字符串列表,以换行符分隔,其中任何一个都将被匹配。
grep 命令是 OS X 和 Linux 上内置的。在 Windows 上你必须安装它;例如通过 Cygwin。
-v means listing only the lines of file1.txt that do not match the pattern, and -f means taking the patterns from the file, in this case — file2.txt. And -F — interpret PATTERN as a list of fixed strings, separated by newlines, any of which is to be matched.
grep command is built-in on OS X and Linux. On Windows you'll have to install it; for example via Cygwin.
在 Debian 及其衍生版本上,可以在 moreutils 软件包中找到组合。
On Debian and derivatives, combine can be found in the moreutils package.
如果文件很大(但也必须排序),comm 可能比 Ivan 提出的更通用的 grep 解决方案更好,因为它逐行运行,因此不需要加载整个文件file2.txt 写入内存(或搜索每一行)。
需要使用
sed
命令来删除comm
插入的前导制表符。If the files are huge (but must also be sorted),
comm
may be preferable to the more general grep solution proposed by Ivan since it operates line by line and thus, would not need to load the entirety of file2.txt into memory (or search it for each line).The
sed
command is needed to remove a leading tab inserted bycomm
.