Python difflib 与正则表达式
我可以在 difflib 中使用正则表达式吗?
具体来说,我想做:
difflib.context_diff(actual, gold)
实际情况是:
[master 92a406f] file modified
黄金是:
\[master \w{7}\] file modified
Can I use regular expressions in difflib?
Specifically, I'd like to do:
difflib.context_diff(actual, gold)
Where actual is:
[master 92a406f] file modified
and gold is:
\[master \w{7}\] file modified
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
看来您的意思是要忽略实际文件的
92a406f
部分。您应该编写一个清理器,使用正则表达式清理您想要忽略的部分:然后存储清理后的黄金文件。然后,您可以使用标准 difflib 将擦洗的实际值与擦洗的黄金进行比较。
It looks like you mean that you want to ignore the
92a406f
part of the actual file. You should write a scrubber that uses regexes to scrub the parts you want to ignore:then store the scrubbed gold file. Then you can use standard difflib to compare the scrubbed actual to the scrubbed gold.
如果您确实想追求基于正则表达式的差异,那么您可以创建自己的类字符串对象,该对象基于正则表达式匹配定义
__eq__
,并在这些对象的序列上使用 difflib。不过我不会推荐它。If you really want to pursue a regex-based diff, then you can create your own string-like object that defines
__eq__
based on regex matching, and use difflib on a sequence of those objects. I wouldn't recommend it, though.我刚刚所做的是:用副本替换 difflib 的 find_longest_match 函数,但通过调用检查来替换 == 调用,当事情不相等时尝试将左侧解释为正则表达式(并在任何错误时返回 true,例如当它不是有效的正则表达式时)。
我将它用于单元测试预期的输出匹配,到目前为止它工作得很好。
What I just did is: replace the find_longest_match function of difflib with a copy, but replace the == invocations by invocation of a check that when things are not equal try to interpret the left side as regexp (and returns true on any error, e.g. when it is not a valid regexp).
I am using it for unit tests expected output matching and so far it is working really fine.