当前位置：文江博客话题详情

在 python 中生成和应用差异

发布于 2024-08-22 00:30:14 字数 243 浏览 8 评论 0原文

python 中是否有一种“开箱即用”的方法来生成两个文本之间的差异列表，然后将此差异应用于一个文件以稍后获取另一个文件？

我想保留文本的修订历史记录，但如果只有一行已编辑的行，我不想保存每个修订的整个文本。我查看了 difflib，但我不知道如何生成仅包含编辑后的行仍可用于修改一个文本以获得另一个文本。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

む无字情书 2024-08-29 00:30:14

你看过谷歌的 diff-match-patch 吗？显然谷歌文档使用了这套算法。它不仅包括 diff 模块，还包括 patch 模块，因此您可以从旧文件和 diff 生成最新文件。

包含一个 python 版本。

http://code.google.com/p/google-diff-match-补丁/

回复收藏 0 原文

回梦 2024-08-29 00:30:14

difflib.unified_diff 你想要吗？这里有一个示例。

原来的链接已损坏。这里有一个示例

回复收藏 0 原文

无法回应 2024-08-29 00:30:14

我已经实现了一个纯 python 函数来应用 diff 补丁来恢复任一输入字符串，我希望有人发现它有用。它使用解析统一差异格式。

import re

_hdr_pat = re.compile("^@@ -(\d+),?(\d+)? \+(\d+),?(\d+)? @@$")

def apply_patch(s,patch,revert=False):
  """
  Apply unified diff patch to string s to recover newer string.
  If revert is True, treat s as the newer string, recover older string.
  """
  s = s.splitlines(True)
  p = patch.splitlines(True)
  t = ''
  i = sl = 0
  (midx,sign) = (1,'+') if not revert else (3,'-')
  while i < len(p) and p[i].startswith(("---","+++")): i += 1 # skip header lines
  while i < len(p):
    m = _hdr_pat.match(p[i])
    if not m: raise Exception("Cannot process diff")
    i += 1
    l = int(m.group(midx))-1 + (m.group(midx+1) == '0')
    t += ''.join(s[sl:l])
    sl = l
    while i < len(p) and p[i][0] != '@':
      if i+1 < len(p) and p[i+1][0] == '\\': line = p[i][:-1]; i += 2
      else: line = p[i]; i += 1
      if len(line) > 0:
        if line[0] == sign or line[0] == ' ': t += line[1:]
        sl += (line[0] != sign)
  t += ''.join(s[sl:])
  return t

如果有标题行 ("--- ...\n","+++ ...\n") 它会跳过它们。如果我们有一个统一的差异字符串 diffstr 表示 oldstr 和 newstr 之间的差异：

# recreate `newstr` from `oldstr`+patch
newstr = apply_patch(oldstr, diffstr)
# recreate `oldstr` from `newstr`+patch
oldstr = apply_patch(newstr, diffstr, True)

在 Python 中，您可以使用 diffstr 生成两个字符串的统一差异em>difflib（标准库的一部分）：

import difflib
_no_eol = "\ No newline at end of file"

def make_patch(a,b):
  """
  Get unified string diff between two strings. Trims top two lines.
  Returns empty string if strings are identical.
  """
  diffs = difflib.unified_diff(a.splitlines(True),b.splitlines(True),n=0)
  try: _,_ = next(diffs),next(diffs)
  except StopIteration: pass
  return ''.join([d if d[-1] == '\n' else d+'\n'+_no_eol+'\n' for d in diffs])

在 unix 上：diff -U0 a.txt b.txt

代码位于 GitHub 上，并使用 ASCII 和随机 unicode 字符进行测试：< a href="https://gist.github.com/noporpoise/16e731849eb1231e86d78f9dfeca3abc" rel="noreferrer">https://gist.github.com/noporpoise/16e731849eb1231e86d78f9dfeca3abc

I've implemented a pure python function to apply diff patches to recover either of the input strings, I hope someone finds it useful. It uses parses the Unified diff format.

import re

_hdr_pat = re.compile("^@@ -(\d+),?(\d+)? \+(\d+),?(\d+)? @@$")

def apply_patch(s,patch,revert=False):
  """
  Apply unified diff patch to string s to recover newer string.
  If revert is True, treat s as the newer string, recover older string.
  """
  s = s.splitlines(True)
  p = patch.splitlines(True)
  t = ''
  i = sl = 0
  (midx,sign) = (1,'+') if not revert else (3,'-')
  while i < len(p) and p[i].startswith(("---","+++")): i += 1 # skip header lines
  while i < len(p):
    m = _hdr_pat.match(p[i])
    if not m: raise Exception("Cannot process diff")
    i += 1
    l = int(m.group(midx))-1 + (m.group(midx+1) == '0')
    t += ''.join(s[sl:l])
    sl = l
    while i < len(p) and p[i][0] != '@':
      if i+1 < len(p) and p[i+1][0] == '\\': line = p[i][:-1]; i += 2
      else: line = p[i]; i += 1
      if len(line) > 0:
        if line[0] == sign or line[0] == ' ': t += line[1:]
        sl += (line[0] != sign)
  t += ''.join(s[sl:])
  return t

If there are header lines ("--- ...\n","+++ ...\n") it skips over them. If we have a unified diff string diffstr representing the diff between oldstr and newstr:

# recreate `newstr` from `oldstr`+patch
newstr = apply_patch(oldstr, diffstr)
# recreate `oldstr` from `newstr`+patch
oldstr = apply_patch(newstr, diffstr, True)

In Python you can generate a unified diff of two strings using difflib (part of the standard library):

import difflib
_no_eol = "\ No newline at end of file"

def make_patch(a,b):
  """
  Get unified string diff between two strings. Trims top two lines.
  Returns empty string if strings are identical.
  """
  diffs = difflib.unified_diff(a.splitlines(True),b.splitlines(True),n=0)
  try: _,_ = next(diffs),next(diffs)
  except StopIteration: pass
  return ''.join([d if d[-1] == '\n' else d+'\n'+_no_eol+'\n' for d in diffs])

On unix: diff -U0 a.txt b.txt

Code is on GitHub here along with tests using ASCII and random unicode characters: https://gist.github.com/noporpoise/16e731849eb1231e86d78f9dfeca3abc

回复收藏 0 原文

冷弦 2024-08-29 00:30:14

AFAIK 大多数 diff 算法使用简单的最长公共子序列匹配来查找两个文本之间的公共部分剩下的都被认为是差异。在 python 中编写自己的动态编程算法来实现这一点应该不会太困难，上面的维基百科页面也提供了该算法。

回复收藏 0 原文

毁梦 2024-08-29 00:30:14

它必须是 python 解决方案吗？
我对解决方案的第一个想法是使用版本控制系统（Subversion、Git 等）或 unix 标准的 diff / patch 实用程序系统，或者是基于 Windows 的系统的 cygwin 的一部分。

回复收藏 0 原文

疧_╮線 2024-08-29 00:30:14

也许您可以使用 unified_diff 生成文件中的差异列表。只有文件中更改的文本才能写入新的文本文件，以供将来参考。
该代码可帮助您仅将差异写入新文件。
我希望这就是您所要求的！

diff = difflib.unified_diff(old_file, new_file, lineterm='')
    lines = list(diff)[2:]
    # linesT = list(diff)[0:3]
    print (lines[0])
    added = [lineA for lineA in lines if lineA[0] == '+']


    with open("output.txt", "w") as fh1:
     for line in added:
       fh1.write(line)
    print '+',added
    removed = [lineB for lineB in lines if lineB[0] == '-']
    with open("output.txt", "a") as fh1:
     for line in removed:
       fh1.write(line)
    print '-',removed

在您的代码中使用它可以仅保存差异输出！

Probably you can use unified_diff to generate the list of difference in a file. Only the changed texts in your file can be written it into a new text file where you can use it for your future reference.
This is the code which helps you to write only the difference to your new file.
I hope this is what you are asking for !

diff = difflib.unified_diff(old_file, new_file, lineterm='')
    lines = list(diff)[2:]
    # linesT = list(diff)[0:3]
    print (lines[0])
    added = [lineA for lineA in lines if lineA[0] == '+']


    with open("output.txt", "w") as fh1:
     for line in added:
       fh1.write(line)
    print '+',added
    removed = [lineB for lineB in lines if lineB[0] == '-']
    with open("output.txt", "a") as fh1:
     for line in removed:
       fh1.write(line)
    print '-',removed

Use this in your code to save only the difference output !

回复收藏 0 原文

~没有更多了~