在Python中比较两个.txt文件并将精确和相似的匹配保存到.txt文件
我需要的是:
text_file_1.txt:
apple
orange
ice
icecream
text_file_2.txt:
apple
pear
ice
当我使用“set”时,输出将是:(
apple
ice
“相当于re.match”)
但我想得到:(
apple
ice
icecream
“相当于re.search”)
有什么方法可以做到这一点?文件很大,所以我不能只迭代它并使用正则表达式。
What i need is:
text_file_1.txt:
apple
orange
ice
icecream
text_file_2.txt:
apple
pear
ice
When i use "set", output will be:
apple
ice
("equivalent of re.match")
but I want to get:
apple
ice
icecream
("equivalent of re.search")
Is there any way how to do this? Files are large, so I can't just iterate over it and use regex.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您可能想查看 difflib
you might want to check out difflib
如果您想要的只是从文件中提取单词,其中一个单词是另一个单词的子字符串(包括相同的单词),您可以这样做:
或者,如果您想要基于字符串在字母顺序上的相似程度来进行相似性,您可以按照 Paul 在他的回答中的建议使用 difflib 提供的类之一:
我没有对两个示例中的任何一个进行计时,但我猜第二个示例会运行得慢得多,因为对于每一对,您都必须实例化一个对象...
If all you want is to extract from the files words which are one a substring of the other (including those that are identical) you could do:
Alternatively, if you want a similarity based on how strings are similar in the order of their letters, you could use as suggested by Paul in his answer one of the classes provided by difflib:
I did not timed either of the two samples, but I would guess the second will run much slower, as for each couple you will have to instantiate an object...