在 Python 中生成文本修订历史记录的视图

发布于 2024-08-30 18:43:50 字数 1665 浏览 8 评论 0原文

我有一段文本的两个版本,我想生成其修订版本的 HTML 视图,类似于 Google Docs 或 Stack Overflow 显示的内容。我需要用 Python 来做这个。我不知道这种技术叫什么,但我假设它有一个名称,并且希望有一个 Python 库可以做到这一点。

版本1:

威廉·亨利·“比尔”·盖茨三世(出生 1955 年 10 月 28 日)[2] 是美国人 商业巨头、慈善家, 微软软件公司董事长[3] 他与保罗·艾伦 (Paul Allen) 创立了公司。

版本2:

威廉·亨利·“比尔”·盖茨三世(出生 1955 年 10 月 28 日)[2] 是一家企业 巨头、慈善家和 微软软件公司董事长[3] 他与保罗艾伦创立的公司。 他是美国人。

所需的输出:

威廉·亨利·“比尔”·盖茨三世(出生 1955 年 10 月 28 日)[2] 是一家美国企业 巨头、慈善家和 微软软件公司董事长[3] 他与保罗艾伦创立的公司。 他是美国人。

使用 diff 命令不起作用,因为它告诉我哪些行不同,但不告诉我哪些列/单词不同。

$ echo 'William Henry "Bill" Gates III (born October 28, 1955)[2] is an American business magnate, philanthropist, and chairman[3] of Microsoft, the software company he founded with Paul Allen.' > oldfile
$ echo 'William Henry "Bill" Gates III (born October 28, 1955)[2] is a business magnate, philanthropist, and chairman[3] of Microsoft, the software company he founded with Paul Allen.  He is American.' > newfile
$ diff -u oldfile newfile
--- oldfile 2010-04-30 13:32:43.000000000 -0700
+++ newfile 2010-04-30 13:33:09.000000000 -0700
@@ -1 +1 @@
-William Henry "Bill" Gates III (born October 28, 1955)[2] is an American business magnate, philanthropist, and chairman[3] of Microsoft, the software company he founded with Paul Allen.
+William Henry "Bill" Gates III (born October 28, 1955)[2] is a business magnate, philanthropist, and chairman[3] of Microsoft, the software company he founded with Paul Allen.  He is American.' > oldfile

I have two versions of a piece of text and I want to produce an HTML view of its revision similar to what Google Docs or Stack Overflow displays. I need to do this in Python. I don't know what this technique is called but I assume that it has a name and hopefully there is a Python library that can do it.

Version 1:

William Henry "Bill" Gates III (born
October 28, 1955)[2] is an American
business magnate, philanthropist, and
chairman[3] of Microsoft, the software
company he founded with Paul Allen.

Version 2:

William Henry "Bill" Gates III (born
October 28, 1955)[2] is a business
magnate, philanthropist, and
chairman[3] of Microsoft, the software
company he founded with Paul Allen.
He is American.

The desired output:

William Henry "Bill" Gates III (born
October 28, 1955)[2] is an American business
magnate, philanthropist, and
chairman[3] of Microsoft, the software
company he founded with Paul Allen.
He is American.

Using the diff command doesn't work because it tells me which lines are different but not which columns/words are different.

$ echo 'William Henry "Bill" Gates III (born October 28, 1955)[2] is an American business magnate, philanthropist, and chairman[3] of Microsoft, the software company he founded with Paul Allen.' > oldfile
$ echo 'William Henry "Bill" Gates III (born October 28, 1955)[2] is a business magnate, philanthropist, and chairman[3] of Microsoft, the software company he founded with Paul Allen.  He is American.' > newfile
$ diff -u oldfile newfile
--- oldfile 2010-04-30 13:32:43.000000000 -0700
+++ newfile 2010-04-30 13:33:09.000000000 -0700
@@ -1 +1 @@
-William Henry "Bill" Gates III (born October 28, 1955)[2] is an American business magnate, philanthropist, and chairman[3] of Microsoft, the software company he founded with Paul Allen.
+William Henry "Bill" Gates III (born October 28, 1955)[2] is a business magnate, philanthropist, and chairman[3] of Microsoft, the software company he founded with Paul Allen.  He is American.' > oldfile

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

同尘 2024-09-06 18:43:50

Google Diff 合并补丁 在纯 Python 中提供了相当不错的 diff 实现。

Google Diff Merge Patch has a pretty good diff implementation in pure python.

晨曦慕雪 2024-09-06 18:43:50

您可以使用 wdiff。不知道有没有Python的实现:

$ wdiff oldfile newfile
William Henry "Bill" Gates III (born October 28, 1955)[2] is [-an American-] {+a+} business magnate, philanthropist, and chairman[3] of Microsoft, the software company he founded with Paul Allen.  {+He is American.+}

You can use wdiff. I don't know if there's a Python implementation:

$ wdiff oldfile newfile
William Henry "Bill" Gates III (born October 28, 1955)[2] is [-an American-] {+a+} business magnate, philanthropist, and chairman[3] of Microsoft, the software company he founded with Paul Allen.  {+He is American.+}
岛歌少女 2024-09-06 18:43:50

difflib 模块可能有助于解决此问题。

The difflib module might be of assistance with this problem.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文