在 python 中生成和应用差异

发布于 2024-08-22 00:30:14 字数 243 浏览 8 评论 0原文

python 中是否有一种“开箱即用”的方法来生成两个文本之间的差异列表,然后将此差异应用于一个文件以稍后获取另一个文件?

我想保留文本的修订历史记录,但如果只有一行已编辑的行,我不想保存每个修订的整个文本。我查看了 difflib,但我不知道如何生成仅包含编辑后的行仍可用于修改一个文本以获得另一个文本。

Is there an 'out-of-the-box' way in python to generate a list of differences between two texts, and then applying this diff to one file to obtain the other, later?

I want to keep the revision history of a text, but I don't want to save the entire text for each revision if there is just a single edited line. I looked at difflib, but I couldn't see how to generate a list of just the edited lines that can still be used to modify one text to obtain the other.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

む无字情书 2024-08-29 00:30:14

你看过谷歌的 diff-match-patch 吗?显然谷歌文档使用了这套算法。它不仅包括 diff 模块,还包括 patch 模块,因此您可以从旧文件和 diff 生成最新文件。

包含一个 python 版本。

http://code.google.com/p/google-diff-match-补丁/

Did you have a look at diff-match-patch from google? Apparantly google Docs uses this set of algoritms. It includes not only a diff module, but also a patch module, so you can generate the newest file from older files and diffs.

A python version is included.

http://code.google.com/p/google-diff-match-patch/

回梦 2024-08-29 00:30:14

difflib.unified_diff 你想要吗? 这里有一个示例

原来的链接已损坏。 这里有一个示例

Does difflib.unified_diff do want you want? There is an example here.

The original link is broken. There is an example here

无法回应 2024-08-29 00:30:14

我已经实现了一个纯 python 函数来应用 diff 补丁来恢复任一输入字符串,我希望有人发现它有用。它使用解析统一差异格式

import re

_hdr_pat = re.compile("^@@ -(\d+),?(\d+)? \+(\d+),?(\d+)? @@$")

def apply_patch(s,patch,revert=False):
  """
  Apply unified diff patch to string s to recover newer string.
  If revert is True, treat s as the newer string, recover older string.
  """
  s = s.splitlines(True)
  p = patch.splitlines(True)
  t = ''
  i = sl = 0
  (midx,sign) = (1,'+') if not revert else (3,'-')
  while i < len(p) and p[i].startswith(("---","+++")): i += 1 # skip header lines
  while i < len(p):
    m = _hdr_pat.match(p[i])
    if not m: raise Exception("Cannot process diff")
    i += 1
    l = int(m.group(midx))-1 + (m.group(midx+1) == '0')
    t += ''.join(s[sl:l])
    sl = l
    while i < len(p) and p[i][0] != '@':
      if i+1 < len(p) and p[i+1][0] == '\\': line = p[i][:-1]; i += 2
      else: line = p[i]; i += 1
      if len(line) > 0:
        if line[0] == sign or line[0] == ' ': t += line[1:]
        sl += (line[0] != sign)
  t += ''.join(s[sl:])
  return t

如果有标题行 ("--- ...\n","+++ ...\n") 它会跳过它们。如果我们有一个统一的差异字符串 diffstr 表示 oldstrnewstr 之间的差异:

# recreate `newstr` from `oldstr`+patch
newstr = apply_patch(oldstr, diffstr)
# recreate `oldstr` from `newstr`+patch
oldstr = apply_patch(newstr, diffstr, True)

在 Python 中,您可以使用 diffstr 生成两个字符串的统一差异em>difflib(标准库的一部分):

import difflib
_no_eol = "\ No newline at end of file"

def make_patch(a,b):
  """
  Get unified string diff between two strings. Trims top two lines.
  Returns empty string if strings are identical.
  """
  diffs = difflib.unified_diff(a.splitlines(True),b.splitlines(True),n=0)
  try: _,_ = next(diffs),next(diffs)
  except StopIteration: pass
  return ''.join([d if d[-1] == '\n' else d+'\n'+_no_eol+'\n' for d in diffs])

在 unix 上:diff -U0 a.txt b.txt

代码位于 GitHub 上,并使用 ASCII 和随机 unicode 字符进行测试:< a href="https://gist.github.com/noporpoise/16e731849eb1231e8​​6d78f9dfeca3abc" rel="noreferrer">https://gist.github.com/noporpoise/16e731849eb1231e8​​6d78f9dfeca3abc

I've implemented a pure python function to apply diff patches to recover either of the input strings, I hope someone finds it useful. It uses parses the Unified diff format.

import re

_hdr_pat = re.compile("^@@ -(\d+),?(\d+)? \+(\d+),?(\d+)? @@$")

def apply_patch(s,patch,revert=False):
  """
  Apply unified diff patch to string s to recover newer string.
  If revert is True, treat s as the newer string, recover older string.
  """
  s = s.splitlines(True)
  p = patch.splitlines(True)
  t = ''
  i = sl = 0
  (midx,sign) = (1,'+') if not revert else (3,'-')
  while i < len(p) and p[i].startswith(("---","+++")): i += 1 # skip header lines
  while i < len(p):
    m = _hdr_pat.match(p[i])
    if not m: raise Exception("Cannot process diff")
    i += 1
    l = int(m.group(midx))-1 + (m.group(midx+1) == '0')
    t += ''.join(s[sl:l])
    sl = l
    while i < len(p) and p[i][0] != '@':
      if i+1 < len(p) and p[i+1][0] == '\\': line = p[i][:-1]; i += 2
      else: line = p[i]; i += 1
      if len(line) > 0:
        if line[0] == sign or line[0] == ' ': t += line[1:]
        sl += (line[0] != sign)
  t += ''.join(s[sl:])
  return t

If there are header lines ("--- ...\n","+++ ...\n") it skips over them. If we have a unified diff string diffstr representing the diff between oldstr and newstr:

# recreate `newstr` from `oldstr`+patch
newstr = apply_patch(oldstr, diffstr)
# recreate `oldstr` from `newstr`+patch
oldstr = apply_patch(newstr, diffstr, True)

In Python you can generate a unified diff of two strings using difflib (part of the standard library):

import difflib
_no_eol = "\ No newline at end of file"

def make_patch(a,b):
  """
  Get unified string diff between two strings. Trims top two lines.
  Returns empty string if strings are identical.
  """
  diffs = difflib.unified_diff(a.splitlines(True),b.splitlines(True),n=0)
  try: _,_ = next(diffs),next(diffs)
  except StopIteration: pass
  return ''.join([d if d[-1] == '\n' else d+'\n'+_no_eol+'\n' for d in diffs])

On unix: diff -U0 a.txt b.txt

Code is on GitHub here along with tests using ASCII and random unicode characters: https://gist.github.com/noporpoise/16e731849eb1231e86d78f9dfeca3abc

冷弦 2024-08-29 00:30:14

AFAIK 大多数 diff 算法使用简单的 最长公共子序列 匹配来查找两个文本之间的公共部分剩下的都被认为是差异。在 python 中编写自己的动态编程算法来实现这一点应该不会太困难,上面的维基百科页面也提供了该算法。

AFAIK most diff algorithms use a simple Longest Common Subsequence match, to find the common part between two texts and whatever is left is considered the difference. It shouldn't be too difficult to code up your own dynamic programming algorithm to accomplish that in python, the wikipedia page above provides the algorithm too.

毁梦 2024-08-29 00:30:14

它必须是 python 解决方案吗?
我对解决方案的第一个想法是使用版本控制系统(Subversion、Git 等)或 unix 标准的 diff / patch 实用程序系统,或者是基于 Windows 的系统的 cygwin 的一部分。

Does it have to be a python solution?
My first thoughts as to a solution would be to use either a Version Control System (Subversion, Git, etc.) or the diff / patch utilities that are standard with a unix system, or are part of cygwin for a windows based system.

疧_╮線 2024-08-29 00:30:14

也许您可以使用 unified_diff 生成文件中的差异列表。只有文件中更改的文本才能写入新的文本文件,以供将来参考。
该代码可帮助您仅将差异写入新文件。
我希望这就是您所要求的!

diff = difflib.unified_diff(old_file, new_file, lineterm='')
    lines = list(diff)[2:]
    # linesT = list(diff)[0:3]
    print (lines[0])
    added = [lineA for lineA in lines if lineA[0] == '+']


    with open("output.txt", "w") as fh1:
     for line in added:
       fh1.write(line)
    print '+',added
    removed = [lineB for lineB in lines if lineB[0] == '-']
    with open("output.txt", "a") as fh1:
     for line in removed:
       fh1.write(line)
    print '-',removed 

在您的代码中使用它可以仅保存差异输出!

Probably you can use unified_diff to generate the list of difference in a file. Only the changed texts in your file can be written it into a new text file where you can use it for your future reference.
This is the code which helps you to write only the difference to your new file.
I hope this is what you are asking for !

diff = difflib.unified_diff(old_file, new_file, lineterm='')
    lines = list(diff)[2:]
    # linesT = list(diff)[0:3]
    print (lines[0])
    added = [lineA for lineA in lines if lineA[0] == '+']


    with open("output.txt", "w") as fh1:
     for line in added:
       fh1.write(line)
    print '+',added
    removed = [lineB for lineB in lines if lineB[0] == '-']
    with open("output.txt", "a") as fh1:
     for line in removed:
       fh1.write(line)
    print '-',removed 

Use this in your code to save only the difference output !

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文