在 Python 中生成文本修订历史记录的视图
我有一段文本的两个版本,我想生成其修订版本的 HTML 视图,类似于 Google Docs 或 Stack Overflow 显示的内容。我需要用 Python 来做这个。我不知道这种技术叫什么,但我假设它有一个名称,并且希望有一个 Python 库可以做到这一点。
版本1:
威廉·亨利·“比尔”·盖茨三世(出生 1955 年 10 月 28 日)[2] 是美国人 商业巨头、慈善家, 微软软件公司董事长[3] 他与保罗·艾伦 (Paul Allen) 创立了公司。
版本2:
威廉·亨利·“比尔”·盖茨三世(出生 1955 年 10 月 28 日)[2] 是一家企业 巨头、慈善家和 微软软件公司董事长[3] 他与保罗艾伦创立的公司。 他是美国人。
所需的输出:
威廉·亨利·“比尔”·盖茨三世(出生 1955 年 10 月 28 日)[2] 是一家
美国企业 巨头、慈善家和 微软软件公司董事长[3] 他与保罗艾伦创立的公司。 他是美国人。
使用 diff 命令不起作用,因为它告诉我哪些行不同,但不告诉我哪些列/单词不同。
$ echo 'William Henry "Bill" Gates III (born October 28, 1955)[2] is an American business magnate, philanthropist, and chairman[3] of Microsoft, the software company he founded with Paul Allen.' > oldfile
$ echo 'William Henry "Bill" Gates III (born October 28, 1955)[2] is a business magnate, philanthropist, and chairman[3] of Microsoft, the software company he founded with Paul Allen. He is American.' > newfile
$ diff -u oldfile newfile
--- oldfile 2010-04-30 13:32:43.000000000 -0700
+++ newfile 2010-04-30 13:33:09.000000000 -0700
@@ -1 +1 @@
-William Henry "Bill" Gates III (born October 28, 1955)[2] is an American business magnate, philanthropist, and chairman[3] of Microsoft, the software company he founded with Paul Allen.
+William Henry "Bill" Gates III (born October 28, 1955)[2] is a business magnate, philanthropist, and chairman[3] of Microsoft, the software company he founded with Paul Allen. He is American.' > oldfile
I have two versions of a piece of text and I want to produce an HTML view of its revision similar to what Google Docs or Stack Overflow displays. I need to do this in Python. I don't know what this technique is called but I assume that it has a name and hopefully there is a Python library that can do it.
Version 1:
William Henry "Bill" Gates III (born
October 28, 1955)[2] is an American
business magnate, philanthropist, and
chairman[3] of Microsoft, the software
company he founded with Paul Allen.
Version 2:
William Henry "Bill" Gates III (born
October 28, 1955)[2] is a business
magnate, philanthropist, and
chairman[3] of Microsoft, the software
company he founded with Paul Allen.
He is American.
The desired output:
William Henry "Bill" Gates III (born
October 28, 1955)[2] is an Americanbusiness
magnate, philanthropist, and
chairman[3] of Microsoft, the software
company he founded with Paul Allen.
He is American.
Using the diff command doesn't work because it tells me which lines are different but not which columns/words are different.
$ echo 'William Henry "Bill" Gates III (born October 28, 1955)[2] is an American business magnate, philanthropist, and chairman[3] of Microsoft, the software company he founded with Paul Allen.' > oldfile
$ echo 'William Henry "Bill" Gates III (born October 28, 1955)[2] is a business magnate, philanthropist, and chairman[3] of Microsoft, the software company he founded with Paul Allen. He is American.' > newfile
$ diff -u oldfile newfile
--- oldfile 2010-04-30 13:32:43.000000000 -0700
+++ newfile 2010-04-30 13:33:09.000000000 -0700
@@ -1 +1 @@
-William Henry "Bill" Gates III (born October 28, 1955)[2] is an American business magnate, philanthropist, and chairman[3] of Microsoft, the software company he founded with Paul Allen.
+William Henry "Bill" Gates III (born October 28, 1955)[2] is a business magnate, philanthropist, and chairman[3] of Microsoft, the software company he founded with Paul Allen. He is American.' > oldfile
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
Google Diff 合并补丁 在纯 Python 中提供了相当不错的 diff 实现。
Google Diff Merge Patch has a pretty good diff implementation in pure python.
您可以使用 wdiff。不知道有没有Python的实现:
You can use wdiff. I don't know if there's a Python implementation:
difflib 模块可能有助于解决此问题。
The difflib module might be of assistance with this problem.