在 Python 中生成漂亮的 diff HTML

发布于 2024-08-07 18:18:13 字数 183 浏览 5 评论 0原文

我有两块文本,我想比较并查看在 Python 中添加/删除/修改了哪些单词/行(类似于 Wiki 的 Diff 输出)。

我尝试过 difflib.HtmlDiff 但它的输出不太漂亮。

Python(或外部库)中是否有一种方法可以生成两组文本块的差异的干净的 HTML? (不仅是行级别,还包括行内的单词/字符修改)

I have two chunks of text that I would like to compare and see which words/lines have been added/removed/modified in Python (similar to a Wiki's Diff Output).

I have tried difflib.HtmlDiff but it's output is less than pretty.

Is there a way in Python (or external library) that would generate clean looking HTML of the diff of two sets of text chunks? (not just line level, but also word/character modifications within a line)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

猫性小仙女 2024-08-14 18:18:13

diff-match-patchdiff_prettyHtml() > 来自 Google 的库。

There's diff_prettyHtml() in the diff-match-patch library from Google.

别在捏我脸啦 2024-08-14 18:18:13

一般来说,如果您希望某些 HTML 以更漂亮的方式呈现,您可以通过添加 CSS 来实现。

例如,如果您像这样生成 HTML:

import difflib
import sys

fromfile = "xxx"
tofile = "zzz"
fromlines = open(fromfile, 'U').readlines()
tolines = open(tofile, 'U').readlines()

diff = difflib.HtmlDiff().make_file(fromlines,tolines,fromfile,tofile)

sys.stdout.writelines(diff)

那么您将在添加的行上获得绿色背景,在更改的行上获得黄色背景,在删除时获得红色背景。如果我这样做,我会获取生成的 HTML,提取正文,并在其前面加上我自己手写的 HTML 块和大量 CSS,以使其看起来不错。我也可能会删除图例表并将其移至顶部或将其放入 div 中,以便 CSS 可以做到这一点。

实际上,我会认真考虑修复 difflib 模块(用 python 编写)以生成更好的 HTML 并将其贡献回项目。如果您有 CSS 专家来帮助您或者您自己就是专家,请考虑这样做。

Generally, if you want some HTML to render in a prettier way, you do it by adding CSS.

For instance, if you generate the HTML like this:

import difflib
import sys

fromfile = "xxx"
tofile = "zzz"
fromlines = open(fromfile, 'U').readlines()
tolines = open(tofile, 'U').readlines()

diff = difflib.HtmlDiff().make_file(fromlines,tolines,fromfile,tofile)

sys.stdout.writelines(diff)

then you get green backgrounds on added lines, yellow on changed lines and red on deleted. If I were doing this I would take take the generated HTML, extract the body, and prefix it with my own handwritten block of HTML with lots of CSS to make it look good. I'd also probably strip out the legend table and move it to the top or put it in a div so that CSS can do that.

Actually, I would give serious consideration to just fixing up the difflib module (which is written in python) to generate better HTML and contribute it back to the project. If you have a CSS expert to help you or are one yourself, please consider doing this.

原谅过去的我 2024-08-14 18:18:13

我最近发布了一个 python 脚本,它就是这样做的:diff2HtmlCompare(点击链接查看屏幕截图)。在底层,它包装了 difflib 并使用 pygments 进行语法突出显示。

I recently posted a python script that does just this: diff2HtmlCompare (follow the link for a screenshot). Under the hood it wraps difflib and uses pygments for syntax highlighting.

花桑 2024-08-14 18:18:13

不仅是行级别,还包括行内的单词/字符修改

xmldiff 似乎是一个不错的包这个目的尤其是当您要比较 XML/HTML 时。请参阅他们的文档了解更多信息。

not just line level, but also word/character modifications within a line

xmldiff seems to be a nice package for this purpose especially when you have XML/HTML to compare. Read more in their documentation.

纸伞微斜 2024-08-14 18:18:13

由于来自 google 的 .. 库似乎不再有积极的开发,我建议使用 diff_py

来自github页面:

Python 编写的简单 diff 工具。差异结果可以在控制台中打印或打印到 html 文件中。

Since the .. library from google seems to have no active development any more, I suggest to use diff_py

From the github page:

The simple diff tool which is written by Python. The diff result can be printed in console or to html file.

蘑菇王子 2024-08-14 18:18:13

首先尝试通过 lxml.html 清理 HTML,然后通过 difflib 检查差异

try first of all clean up both of HTML by lxml.html, and the check the difference by difflib

乞讨 2024-08-14 18:18:13

我自己的答案副本来自此处


DaisyDiff 怎么样 (JavaPHP 版本可用的)。

以下功能非常好:

  • 适用于可以“在野外”找到的格式错误的 HTML。
  • HTML 中的差异比 XML 树差异更专业。更改文本节点的一部分不会导致整个节点发生更改。
  • 除了默认的视觉差异之外,还可以对 HTML 源代码进行连贯差异。
  • 提供易于理解的变更描述。
  • 默认 GUI 允许通过键盘快捷键和链接轻松浏览修改。

A copy of my own answer from here.


What about DaisyDiff (Java and PHP vesions available).

Following features are really nice:

  • Works with badly formed HTML that can be found "in the wild".
  • The diffing is more specialized in HTML than XML tree differs. Changing part of a text node will not cause the entire node to be changed.
  • In addition to the default visual diff, HTML source can be diffed coherently.
  • Provides easy to understand descriptions of the changes.
  • The default GUI allows easy browsing of the modifications through keyboard shortcuts and links.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文