修补文本文件

发布于 2024-10-12 14:51:32 字数 3833 浏览 1 评论 0原文

我正在尝试连续建立一个带有差异补丁的文本文件。 从一个空文本文件开始,我需要应用 600 多个补丁才能得到最终文档(我编写的文本 + 使用 Mercurial 跟踪更改)。文件中的每次更改都需要添加额外信息,因此我不能简单地在命令行中使用 diffpatch

我花了一整天的时间编写(并重新编写)一个工具来解析 diff 文件并相应地对文本文件进行更改,但是其中一个 diff 文件使我的程序以一种我无法理解的方式运行的。

每个 diff 文件都会调用此函数:

# filename = name of the diff file
# date = extra information to be added as a prefix to each added line
def process_diff(filename, date):
    # that's the file all the patches will be applied to
    merge_file = open("thesis_merged.txt", "r")
    # map its content to a list to manipulate it in memory
    merge_file_lines = []
    for line in merge_file:
        line = line.rstrip()
        merge_file_lines.append(line)
    merge_file.close()

    # open for writing:
    merge_file = open("thesis_merged.txt", "w")

    # that's the diff file, containing all the changes
    diff_file = open(filename, "r")
    print "-", filename, "-" * 20

    # also map it to a list
    diff_file_lines = []
    for line in diff_file:
        line = line.rstrip()

        if not line.startswith("\\ No newline at end of file"): # useless information ... or not?
        diff_file_lines.append(line)

    # ignore header:
    #--- thesis_words_0.txt 2010-12-04 18:16:26.020000000 +0100
    #+++ thesis_words_1.txt 2010-12-04 18:16:26.197000000 +0100
    diff_file_lines = diff_file_lines[2:]

    hunks = []
    for i, line in enumerate(diff_file_lines):
        if line.startswith("@@"):
            hunks.append( get_hunk(diff_file_lines, i) )

    for hunk in hunks:
        head = hunk[0]
        # @@ -252,10 +251,9 @@
        tmp = head[3:-3].split(" ") # [-252,10] [+251,9]
        line_nr_minus = tmp[0].split(",")[0]
        line_nr_minus = int(line_nr_minus[1:]) # 252
        line_nr_plus = tmp[1].split(",")[0]
        line_nr_plus = int(line_nr_plus[1:]) # 251

        for j, line in enumerate(hunk[1:]):
            if line.startswith("-"):
            # delete line from the file in memory
            del merge_file_lines[line_nr_minus-1]

        plus_counter = 0 # counts the number of added lines
        for k, line in enumerate(hunk[1:]):
            if line.startswith("+"):
                # insert line, one after another
                merge_file_lines.insert((line_nr_plus-1)+plus_counter, line[1:])
                plus_counter += 1

    for line in merge_file_lines:
        # write the updated file back to the disk
        merge_file.write(line.rstrip() + "\n")

    merge_file.close()
    diff_file.close()
    print "\n\n"


    def get_hunk(lines, i):
        hunk = []
        hunk.append(lines[i])
        # @@ -252,10 +251,9 @@

        lines = lines[i+1:]

        for line in lines:
            if line.startswith("@@"):
                # next hunk begins, so stop here
                break
            else:
                hunk.append(line)

            return hunk

diff 文件看起来像这样 - 这里是麻烦制造者:

--- thesis_words_12.txt 2011-01-17 20:35:50.804000000 +0100
+++ thesis_words_13.txt 2011-01-17 20:35:51.057000000 +0100
@@ -245 +245,2 @@
-As
+Per
+definition
@@ -248,3 +249 @@
-already
-proposes,
-"generative"
+generative
@@ -252,10 +251,9 @@
-that
-something
-is
-created
-based
-on
-a
-set
-of
-rules.
+"having
+the
+ability
+to
+originate,
+produce,
+or
+procreate."
+<http://www.thefreedictionary.com/generative>

输出:

[...]

Per
definition
the
"generative"
generative
means
"having
the
ability
to
originate,
produce,
or
procreate."
<http://www.thefreedictionary.com/generative>
that

[...]

所有以前的补丁都按预期重现文本。我已经重写了很多次,但是错误行为仍然存在——所以现在我一无所知。

我将非常感谢有关如何以不同方式做到这一点的提示和技巧。预先非常感谢!

编辑: - 最后每一行应该看起来像这样:{date_and_time_of_text_change}word

它基本上是为了跟踪一个单词添加到文本中的日期和时间。

I'm trying to successively build up a text file with diff patches.
starting from an empty text file I need to apply 600+ patches to end up with the final document (a text I have written + tracked the changes with mercurial). to each change in the file extra information needs to be added, so I can't simply use diff and patch in the commandline.

I have spent all day to write (and re-write) a tool that parses the diff files and makes changes to the text file accordingly, but one of the diff files makes my program behave in a way that I can't make any sense of.

this function gets called for each of the diff files:

# filename = name of the diff file
# date = extra information to be added as a prefix to each added line
def process_diff(filename, date):
    # that's the file all the patches will be applied to
    merge_file = open("thesis_merged.txt", "r")
    # map its content to a list to manipulate it in memory
    merge_file_lines = []
    for line in merge_file:
        line = line.rstrip()
        merge_file_lines.append(line)
    merge_file.close()

    # open for writing:
    merge_file = open("thesis_merged.txt", "w")

    # that's the diff file, containing all the changes
    diff_file = open(filename, "r")
    print "-", filename, "-" * 20

    # also map it to a list
    diff_file_lines = []
    for line in diff_file:
        line = line.rstrip()

        if not line.startswith("\\ No newline at end of file"): # useless information ... or not?
        diff_file_lines.append(line)

    # ignore header:
    #--- thesis_words_0.txt 2010-12-04 18:16:26.020000000 +0100
    #+++ thesis_words_1.txt 2010-12-04 18:16:26.197000000 +0100
    diff_file_lines = diff_file_lines[2:]

    hunks = []
    for i, line in enumerate(diff_file_lines):
        if line.startswith("@@"):
            hunks.append( get_hunk(diff_file_lines, i) )

    for hunk in hunks:
        head = hunk[0]
        # @@ -252,10 +251,9 @@
        tmp = head[3:-3].split(" ") # [-252,10] [+251,9]
        line_nr_minus = tmp[0].split(",")[0]
        line_nr_minus = int(line_nr_minus[1:]) # 252
        line_nr_plus = tmp[1].split(",")[0]
        line_nr_plus = int(line_nr_plus[1:]) # 251

        for j, line in enumerate(hunk[1:]):
            if line.startswith("-"):
            # delete line from the file in memory
            del merge_file_lines[line_nr_minus-1]

        plus_counter = 0 # counts the number of added lines
        for k, line in enumerate(hunk[1:]):
            if line.startswith("+"):
                # insert line, one after another
                merge_file_lines.insert((line_nr_plus-1)+plus_counter, line[1:])
                plus_counter += 1

    for line in merge_file_lines:
        # write the updated file back to the disk
        merge_file.write(line.rstrip() + "\n")

    merge_file.close()
    diff_file.close()
    print "\n\n"


    def get_hunk(lines, i):
        hunk = []
        hunk.append(lines[i])
        # @@ -252,10 +251,9 @@

        lines = lines[i+1:]

        for line in lines:
            if line.startswith("@@"):
                # next hunk begins, so stop here
                break
            else:
                hunk.append(line)

            return hunk

the diff files look like this -- here the trouble maker:

--- thesis_words_12.txt 2011-01-17 20:35:50.804000000 +0100
+++ thesis_words_13.txt 2011-01-17 20:35:51.057000000 +0100
@@ -245 +245,2 @@
-As
+Per
+definition
@@ -248,3 +249 @@
-already
-proposes,
-"generative"
+generative
@@ -252,10 +251,9 @@
-that
-something
-is
-created
-based
-on
-a
-set
-of
-rules.
+"having
+the
+ability
+to
+originate,
+produce,
+or
+procreate."
+<http://www.thefreedictionary.com/generative>

output:

[...]

Per
definition
the
"generative"
generative
means
"having
the
ability
to
originate,
produce,
or
procreate."
<http://www.thefreedictionary.com/generative>
that

[...]

all the previous patches reproduce the text just as expected. I have rewritten this many times, but that buggy behavior persists -- so right now I am clueless.

I'd be very thankful for hints and tips on how to do this differently. thanks a lot in advance!

EDIT:
- in the end each line is supposed to look like this: {date_and_time_of_text_change}word

it's basicly about keeping track at what date and time a word was added to the text.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

我们的影子 2024-10-19 14:51:32

代码中确实存在一个错误——我没有正确解释 diff 文件(没有意识到当一个 diff 文件中有多个块时需要进行换行)

def process_diff(filename, date, step_nr):
    merge_file = open("thesis_merged.txt", "r")
    merge_file_lines = [line.rstrip() for line in merge_file]
    merge_file.close()

    diff_file = open(filename, "r")
    print "-", filename, "-"*2, step_nr, "-"*2, date

    diff_file_lines = [line.rstrip() for line in diff_file]
    hunks = []
    for i, line in enumerate(diff_file_lines):
        if line.startswith("@@"):
            hunks.append( get_hunk(diff_file_lines, i) )
    diff_file.close()

    line_shift = 0
    for hunk in hunks:
        head = hunk[0]
        # @@ -252,10 +251,9 @@
        tmp = head[3:-3].split(" ") # [-252,10] [+251,9]

        line_nr_minus = tmp[0].split(",")[0]
        minusses = 1
        if len( tmp[0].split(",") ) > 1:
            minusses = int( tmp[0].split(",")[1] )
        line_nr_minus = int(line_nr_minus[1:]) # 252

        line_nr_plus = tmp[1].split(",")[0]
        plusses = 1
        if len( tmp[1].split(",") ) > 1:
            plusses = int( tmp[1].split(",")[1] )
        line_nr_plus = int(line_nr_plus[1:]) # 251

        line_nr_minus += line_shift

        #@@ -248,3 +249 @@
        #-already
        #-proposes,
        #-"generative"
        #+generative

        if hunk[1]: # -
            for line in hunk[1]:
                del merge_file_lines[line_nr_minus-1]

        plus_counter = 0
        if hunk[2]: # +
            for line in hunk[2]:
                prefix = ""
                if len(line) > 1:
                    prefix = "{" + date + "}"
                merge_file_lines.insert((line_nr_plus-1)+plus_counter, prefix + line[1:])
                plus_counter += 1

        line_shift += plusses - minusses

there was indeed a bug in the code -- I wasn't interpreting the diff files correctly (didn't realize there needed to be a line shift, when there are multiple hunks in one diff file)

def process_diff(filename, date, step_nr):
    merge_file = open("thesis_merged.txt", "r")
    merge_file_lines = [line.rstrip() for line in merge_file]
    merge_file.close()

    diff_file = open(filename, "r")
    print "-", filename, "-"*2, step_nr, "-"*2, date

    diff_file_lines = [line.rstrip() for line in diff_file]
    hunks = []
    for i, line in enumerate(diff_file_lines):
        if line.startswith("@@"):
            hunks.append( get_hunk(diff_file_lines, i) )
    diff_file.close()

    line_shift = 0
    for hunk in hunks:
        head = hunk[0]
        # @@ -252,10 +251,9 @@
        tmp = head[3:-3].split(" ") # [-252,10] [+251,9]

        line_nr_minus = tmp[0].split(",")[0]
        minusses = 1
        if len( tmp[0].split(",") ) > 1:
            minusses = int( tmp[0].split(",")[1] )
        line_nr_minus = int(line_nr_minus[1:]) # 252

        line_nr_plus = tmp[1].split(",")[0]
        plusses = 1
        if len( tmp[1].split(",") ) > 1:
            plusses = int( tmp[1].split(",")[1] )
        line_nr_plus = int(line_nr_plus[1:]) # 251

        line_nr_minus += line_shift

        #@@ -248,3 +249 @@
        #-already
        #-proposes,
        #-"generative"
        #+generative

        if hunk[1]: # -
            for line in hunk[1]:
                del merge_file_lines[line_nr_minus-1]

        plus_counter = 0
        if hunk[2]: # +
            for line in hunk[2]:
                prefix = ""
                if len(line) > 1:
                    prefix = "{" + date + "}"
                merge_file_lines.insert((line_nr_plus-1)+plus_counter, prefix + line[1:])
                plus_counter += 1

        line_shift += plusses - minusses
他不在意 2024-10-19 14:51:32

尝试使用 python-patch 中的解析器 - 至少你可以申请手动一一进行测试,看看哪一个失败了。 API 不稳定,但解析器稳定,因此您只需将 patch.py​​ 从 trunk/ 复制到您的项目即可。不过,如果能得到一些关于所需 API 的建议那就太好了。

Try to use parser from python-patch - at least you'll be able to apply hunks one by one manually to see which one fails. API is not stable, but parser is, so you can just copy patch.py from trunk/ to your project. It would be nice to get some proposal on desired API, though.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文