将差异补丁应用于字符串/文件

发布于 2024-11-02 06:16:25 字数 623 浏览 10 评论 0原文

对于一款支持离线功能的智能手机应用程序,我正在为 Xml 文件创建单向文本同步。我希望我的服务器将增量/差异(例如 GNU diff 补丁)发送到目标设备。

计划是这样的:

Time = 0
Server: has version_1 of Xml file (~800 kiB)
Client: has version_1 of Xml file (~800 kiB)

Time = 1
Server: has version_1 and version_2 of Xml file (each ~800 kiB)
        computes delta of these versions (=patch) (~10 kiB) 
        sends patch to Client (~10 kiB transferred)

Client: computes version_2 from version_1 and patch  <= this is the problem =>

是否有一个 Ruby 库可以完成最后一步,将文本补丁应用于文件/字符串?补丁可以根据库的要求进行格式化。

感谢您的帮助!

(我使用的是 Rhodes 跨平台框架,它使用 Ruby 作为编程语言。)

For an offline-capable smartphone app, I'm creating a one-way text sync for Xml files. I'd like my server to send the delta/difference (e.g. a GNU diff-patch) to the target device.

This is the plan:

Time = 0
Server: has version_1 of Xml file (~800 kiB)
Client: has version_1 of Xml file (~800 kiB)

Time = 1
Server: has version_1 and version_2 of Xml file (each ~800 kiB)
        computes delta of these versions (=patch) (~10 kiB) 
        sends patch to Client (~10 kiB transferred)

Client: computes version_2 from version_1 and patch  <= this is the problem =>

Is there a Ruby library that can do this last step to apply a text patch to files/strings? The patch can be formatted as required by the library.

Thanks for your help!

(I'm using the Rhodes Cross-Platform Framework, which uses Ruby as programming language.)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

自由如风 2024-11-09 06:16:25

您的第一个任务是选择补丁格式。人类最难阅读的格式(恕我直言)事实证明是软件最容易应用的格式:ed(1) 脚本。您可以从简单的 /usr/bin/diff -e old.xml new.xml 开始生成补丁; diff(1) 将产生面向行的补丁,但这应该可以开始。 ed 格式如下所示:

36a
    <tr><td class="eg" style="background: #182349;"> </td><td><tt>#182349</tt></td></tr>
.
34c
    <tr><td class="eg" style="background: #66ccff;"> </td><td><tt>#xxxxxx</tt></td></tr>
.
20,23d

数字是行号,行号范围用逗号分隔。然后是三个单字母命令:

  • a:在此位置添加下一个文本块。
  • c:将该位置的文本更改为下面的块。这相当于 d 后跟 a 命令。
  • d:删除这些行。

您还会注意到补丁中的行号是从下到上的,因此您不必担心更改会弄乱补丁后续块中的行号。要添加或更改的实际文本块遵循命令作为以单个句点的行终止的行序列(即 /^\.$/patch_line == '. ' 取决于您的偏好)。总之,格式如下所示:

[line-number-range][command]
[optional-argument-lines...]
[dot-terminator-if-there-are-arguments]

因此,要应用 ed 补丁,您所需要做的就是将目标文件加载到数组中(每行一个元素),使用简单的方法解析补丁状态机,调用 Array#insert 添加新行并Array#delete_at 删除它们。编写修补程序不应花费超过几十行 Ruby,并且不需要任何库。

如果您可以将 XML 安排为这样:

<tag>
blah blah
</tag>
<other-tag x="y">
mumble mumble
</other>

而不是:

<tag>blah blah</tag><other-tag x="y">mumble mumble</other>

那么上面的简单的面向行的方法就可以正常工作;额外的 EOL 不会占用太多空间,因此可以轻松实施。

有一些 Ruby 库可用于生成两个数组之间的差异(通过谷歌搜索“ruby Algorithm::diff”开始)。将 diff 库与 XML 解析器相结合可以让您生成基于标记而不是基于行的补丁,这可能更适合您。重要的是补丁格式的选择,一旦您选择了 ed 格式(并意识到补丁从下到上工作的智慧),那么其他一切几乎都会毫不费力地就位。

Your first task is to choose a patch format. The hardest format for humans to read (IMHO) turns out to be the easiest format for software to apply: the ed(1) script. You can start off with a simple /usr/bin/diff -e old.xml new.xml to generate the patches; diff(1) will produce line-oriented patches but that should be fine to start with. The ed format looks like this:

36a
    <tr><td class="eg" style="background: #182349;"> </td><td><tt>#182349</tt></td></tr>
.
34c
    <tr><td class="eg" style="background: #66ccff;"> </td><td><tt>#xxxxxx</tt></td></tr>
.
20,23d

The numbers are line numbers, line number ranges are separated with commas. Then there are three single letter commands:

  • a: add the next block of text at this position.
  • c: change the text at this position to the following block. This is equivalent to a d followed by an a command.
  • d: delete these lines.

You'll also notice that the line numbers in the patch go from the bottom up so you don't have to worry about changes messing up the lines numbers in subsequent chunks of the patch. The actual chunks of text to be added or changed follow the commands as a sequence of lines terminated by a line with a single period (i.e. /^\.$/ or patch_line == '.' depending on your preference). In summary, the format looks like this:

[line-number-range][command]
[optional-argument-lines...]
[dot-terminator-if-there-are-arguments]

So, to apply an ed patch, all you need to do is load the target file into an array (one element per line), parse the patch using a simple state machine, call Array#insert to add new lines and Array#delete_at to remove them. Shouldn't take more than a couple dozen lines of Ruby to write the patcher and no library is needed.

If you can arrange your XML to come out like this:

<tag>
blah blah
</tag>
<other-tag x="y">
mumble mumble
</other>

rather than:

<tag>blah blah</tag><other-tag x="y">mumble mumble</other>

then the above simple line-oriented approach will work fine; the extra EOLs aren't going to cost much space so go for easy implementation to start.

There are Ruby libraries for producing diffs between two arrays (google "ruby algorithm::diff" to start). Combining a diff library with an XML parser will let you produce patches that are tag-based rather than line-based and this might suit you better. The important thing is the choice of patch formats, once you choose the ed format (and realize the wisdom of the patch working from the bottom to the top) then everything else pretty much falls into place with little effort.

蒲公英的约定 2024-11-09 06:16:25

我知道这个问题已经快五年了,但无论如何我都会发布答案。当搜索如何在 Ruby 中制作和应用字符串补丁时,即使是现在,我也无法找到任何可以令人满意地回答这个问题的资源。因此,我将展示如何在我的应用程序中解决这个问题。

制作补丁

我假设您使用的是 Linux,或者可以通过 Cygwin 访问程序 diff。在这种情况下,您可以使用优秀的 Diffy gem 来创建 ed 脚本 补丁:

patch_text = Diffy::Diff.new(old_text, new_text, :diff => "-e").to_s

应用补丁

应用补丁并不那么简单。我选择编写自己的算法,请求代码审查方面的改进,并且最后决定使用下面的代码。此代码与 200_success 的答案 相同,只是进行了一项更改以提高其正确性。

require 'stringio'
def self.apply_patch(old_text, patch)
  text = old_text.split("\n")
  patch = StringIO.new(patch)
  current_line = 1

  while patch_line = patch.gets
    # Grab the command
    m = %r{\A(?:(\d+))?(?:,(\d+))?([acd]|s/\.//)\Z}.match(patch_line)
    raise ArgumentError.new("Invalid ed command: #{patch_line.chomp}") if m.nil?
    first_line = (m[1] || current_line).to_i
    last_line = (m[2] || first_line).to_i
    command = m[3]

    case command
    when "s/.//"
      (first_line..last_line).each { |i| text[i - 1].sub!(/./, '') }
    else
      if ['d', 'c'].include?(command)
        text[first_line - 1 .. last_line - 1] = []
      end
      if ['a', 'c'].include?(command)
        current_line = first_line - (command=='a' ? 0 : 1) # Adds are 0-indexed, but Changes and Deletes are 1-indexed
        while (patch_line = patch.gets) && (patch_line.chomp! != '.') && (patch_line != '.')
          text.insert(current_line, patch_line)
          current_line += 1
        end
      end
    end
  end
  text.join("\n")
end

I know this question is almost five years old, but I'm going to post an answer anyway. When searching for how to make and apply patches for strings in Ruby, even now, I was unable to find any resources that answer this question satisfactorily. For that reason, I'll show how I solved this problem in my application.

Making Patches

I'm assuming you're using Linux, or else have access to the program diff through Cygwin. In that case, you can use the excellent Diffy gem to create ed script patches:

patch_text = Diffy::Diff.new(old_text, new_text, :diff => "-e").to_s

Applying Patches

Applying patches is not quite as straightforward. I opted to write my own algorithm, ask for improvements in Code Review, and finally settle on using the code below. This code is identical to 200_success's answer except for one change to improve its correctness.

require 'stringio'
def self.apply_patch(old_text, patch)
  text = old_text.split("\n")
  patch = StringIO.new(patch)
  current_line = 1

  while patch_line = patch.gets
    # Grab the command
    m = %r{\A(?:(\d+))?(?:,(\d+))?([acd]|s/\.//)\Z}.match(patch_line)
    raise ArgumentError.new("Invalid ed command: #{patch_line.chomp}") if m.nil?
    first_line = (m[1] || current_line).to_i
    last_line = (m[2] || first_line).to_i
    command = m[3]

    case command
    when "s/.//"
      (first_line..last_line).each { |i| text[i - 1].sub!(/./, '') }
    else
      if ['d', 'c'].include?(command)
        text[first_line - 1 .. last_line - 1] = []
      end
      if ['a', 'c'].include?(command)
        current_line = first_line - (command=='a' ? 0 : 1) # Adds are 0-indexed, but Changes and Deletes are 1-indexed
        while (patch_line = patch.gets) && (patch_line.chomp! != '.') && (patch_line != '.')
          text.insert(current_line, patch_line)
          current_line += 1
        end
      end
    end
  end
  text.join("\n")
end
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文