diff-match-patch
Google 的 diff-match-patch 库
Installation
npm install diff-match-patch
Usage
var DiffMatchPatch = require('diff-match-patch'); // or window.diff_match_patch
var dmp = new DiffMatchPatch();
var diff = dmp.diff_main('Good dog', 'Bad dog');
API
此页面描述了公共功能的 API。 有关更多示例,请参阅 tests
Initialization
第一步是创建一个新的diff_match_patch
对象。 该对象包含设置算法行为的各种属性以及以下函数:
diff_main(text1, text2) => diffs
计算差异数组,描述文本 1 到文本 2 的转换。 每个差异都是一个数组。 第一个元素指定它是插入 (1)、删除 (-1) 还是相等 (0)。 第二个元素指定受影响的文本。
diff_main('Good dog', 'Bad dog');
>> [[-1, 'Goo'], [1, 'Ba'], [0, 'd dog']]
尽管此函数中使用了大量优化,但 diff 可能需要一段时间才能计算。 diff_match_patch.Diff_Timeout
属性可用于设置任何 diff 的探索阶段可能需要的秒数。 默认值为 1.0。 值为 0 将禁用超时并让 diff 运行直到完成。 如果 diff 超时,返回值仍然是一个有效的差异,尽管可能不是最佳的。
diff_cleanupSemantic(diffs)
两个不相关文本的差异可以用巧合的匹配来填充:
diff_main('mouse', 'sofas');
>> [[-1, 'm'], [1, 's'], [0, 'o'], [-1, 'u'], [1, 'fa'], [0, 's'], [-1, 'e']]
虽然这是最佳差异,但人类很难理解。 语义清理重写差异,将其扩展为更易于理解的格式。 上面的示例将变为:[(-1, 'mouse'), (1, 'sofas')]
。 如果差异是人类可读的,则应将其传递给 diff_cleanupSemantic
。
diff_cleanupEfficiency(diffs)
此函数类似于 diff_cleanupSemantic
,不同之处在于它不是将差异优化为人类可读,而是优化差异以提高机器处理效率。 两种清理类型的结果通常是相同的。
效率清理是基于这样的观察,即由大量小差异编辑组成的差异可能需要更长的时间来处理(在下游应用程序中),或者需要更多的容量来存储或传输,而不是数量较少的较大差异。 diff_match_patch.Diff_EditCost
属性设置处理新编辑的成本与处理现有编辑中的额外字符有关。 默认值为 4,这意味着如果将 diff 的长度扩展三个字符可以消除一次编辑,那么该优化将降低总成本。
diff_levenshtein(diffs) => int
给定一个 diff,根据插入、删除或替换字符的数量来衡量其 Levenshtein 距离。 最小距离为 0 表示相等,最大距离为较长字符串的长度。
diff_prettyHtml(diffs) => html
接受一个 diff 数组并返回一个漂亮的 HTML 序列。 此函数主要用作编写自己的显示函数的示例。
match_main(text, pattern, loc) => location
给定要搜索的文本、要搜索的模式以及文本中要查找模式的预期位置,返回最匹配的位置。 该函数将根据模式和潜在匹配之间的字符错误数量以及预期位置和潜在匹配之间的距离来搜索最佳匹配。
下面的例子是一个经典的困境。 有两种可能的匹配,一种接近预期位置但包含一个字符错误,另一种远离预期位置但正是寻找的模式:
match_main('abc12345678901234567890abbc', 'abc', 26);
返回哪个结果(0 或 24)由 < code>diff_match_patch.Match_Distance 属性。 与模糊位置“距离”字符的精确字母匹配将记为完全不匹配。 例如,0 的距离要求匹配位于指定的确切位置,而 1000 的阈值将要求完美匹配在使用 0.8 阈值找到的预期位置的 800 个字符以内(见下文)。 Match_Distance
越大,match_main
的计算速度就越慢。 此变量默认为 1000。
另一个属性是 diff_match_patch.Match_Threshold
,它确定有效匹配的截止值。 如果Match_Threshold
越接近0,对精度的要求就越高。 如果 Match_Threshold
更接近 1,则更有可能找到匹配项。 Match_Threshold
越大,match_main
的计算速度就越慢。 此变量默认为 0.5。 如果未找到匹配项,则函数返回 -1。
patch_make(text1, text2) => patches
patch_make(diffs) => patches
patch_make(text1, diffs) => patches
给定两个文本,或一个已经计算出的差异列表,返回一个补丁对象数组。 第三种形式 (text1, diffs) 是首选,如果您碰巧有可用的数据,请使用它,否则此函数将计算缺失的部分。
patch_toText(patches) => string
将补丁对象数组缩减为看起来与标准 GNU diff/补丁格式极其相似的文本块。 可以存储或传输该文本。
patch_fromText(text) => patches
解析一个文本块(它可能是由 patch_toText
函数创建的)并返回一个补丁对象数组。
patch_apply(patches, text1) => [text2, results]
将补丁列表应用于 text1。 返回值的第一个元素是新修补的文本。 第二个元素是真/假值数组,指示成功应用了哪些补丁。 [请注意,这第二个元素不是很有用,因为大补丁可能会在内部分解,导致结果列表比输入更长,无法确定哪个补丁成功或失败。
前面提到的 Match_Distance
和 Match_Threshold
属性用于评估不完全匹配的文本上的补丁应用。 此外,diff_match_patch.Patch_DeleteThreshold
属性确定主要(~64 个字符)删除中的文本与预期文本的匹配程度。 如果 Patch_DeleteThreshold
更接近 0,则删除的文本必须更接近地匹配预期的文本。 如果 Patch_DeleteThreshold
接近 1,则删除的文本可能包含任何内容。 在大多数用例中,Patch_DeleteThreshold
应该设置为与 Match_Threshold
相同的值
API 参考:https://code.google.com/archive/p/google-diff-匹配补丁/wikis/API.wiki
diff-match-patch
a JavaScript implementation of Google's diff-match-patch lib
Installation
npm install diff-match-patch
Usage
var DiffMatchPatch = require('diff-match-patch'); // or window.diff_match_patch
var dmp = new DiffMatchPatch();
var diff = dmp.diff_main('Good dog', 'Bad dog');
API
This page describes the API for the public functions. For further examples, see tests
Initialization
The first step is to create a new diff_match_patch
object. This object contains various properties which set the behaviour of the algorithms, as well as the following functions:
diff_main(text1, text2) => diffs
An array of differences is computed which describe the transformation of text1 into text2. Each difference is an array. The first element specifies if it is an insertion (1), a deletion (-1) or an equality (0). The second element specifies the affected text.
diff_main('Good dog', 'Bad dog');
>> [[-1, 'Goo'], [1, 'Ba'], [0, 'd dog']]
Despite the large number of optimizations used in this function, diff can take a while to compute. The diff_match_patch.Diff_Timeout
property is available to set how many seconds any diff's exploration phase may take. The default value is 1.0. A value of 0 disables the timeout and lets diff run until completion. Should diff timeout, the return value will still be a valid difference, though probably non-optimal.
diff_cleanupSemantic(diffs)
A diff of two unrelated texts can be filled with coincidental matches:
diff_main('mouse', 'sofas');
>> [[-1, 'm'], [1, 's'], [0, 'o'], [-1, 'u'], [1, 'fa'], [0, 's'], [-1, 'e']]
While this is the optimum diff, it is difficult for humans to understand. Semantic cleanup rewrites the diff, expanding it into a more intelligible format. The above example would become: [(-1, 'mouse'), (1, 'sofas')]
. If a diff is to be human-readable, it should be passed to diff_cleanupSemantic
.
diff_cleanupEfficiency(diffs)
This function is similar to diff_cleanupSemantic
, except that instead of optimizing a diff to be human-readable, it optimizes the diff to be efficient for machine processing. The results of both cleanup types are often the same.
The efficiency cleanup is based on the observation that a diff made up of large numbers of small diffs edits may take longer to process (in downstream applications) or take more capacity to store or transmit than a smaller number of larger diffs. The diff_match_patch.Diff_EditCost
property sets what the cost of handling a new edit is in terms of handling extra characters in an existing edit. The default value is 4, which means if expanding the length of a diff by three characters can eliminate one edit, then that optimization will reduce the total costs.
diff_levenshtein(diffs) => int
Given a diff, measure its Levenshtein distance in terms of the number of inserted, deleted or substituted characters. The minimum distance is 0 which means equality, the maximum distance is the length of the longer string.
diff_prettyHtml(diffs) => html
Takes a diff array and returns a pretty HTML sequence. This function is mainly intended as an example from which to write ones own display functions.
match_main(text, pattern, loc) => location
Given a text to search, a pattern to search for and an expected location in the text near which to find the pattern, return the location which matches closest. The function will search for the best match based on both the number of character errors between the pattern and the potential match, as well as the distance between the expected location and the potential match.
The following example is a classic dilemma. There are two potential matches, one is close to the expected location but contains a one character error, the other is far from the expected location but is exactly the pattern sought after:
match_main('abc12345678901234567890abbc', 'abc', 26);
Which result is returned (0 or 24) is determined by the diff_match_patch.Match_Distance
property. An exact letter match which is 'distance' characters away from the fuzzy location would score as a complete mismatch. For example, a distance of 0 requires the match be at the exact location specified, whereas a threshold of 1000 would require a perfect match to be within 800 characters of the expected location to be found using a 0.8 threshold (see below). The larger Match_Distance
is, the slower match_main
may take to compute. This variable defaults to 1000.
Another property is diff_match_patch.Match_Threshold
which determines the cut-off value for a valid match. If Match_Threshold
is closer to 0, the requirements for accuracy increase. If Match_Threshold
is closer to 1 then it is more likely that a match will be found. The larger Match_Threshold
is, the slower match_main
may take to compute. This variable defaults to 0.5. If no match is found, the function returns -1.
patch_make(text1, text2) => patches
patch_make(diffs) => patches
patch_make(text1, diffs) => patches
Given two texts, or an already computed list of differences, return an array of patch objects. The third form (text1, diffs) is preferred, use it if you happen to have that data available, otherwise this function will compute the missing pieces.
patch_toText(patches) => string
Reduces an array of patch objects to a block of text which looks extremely similar to the standard GNU diff/patch format. This text may be stored or transmitted.
patch_fromText(text) => patches
Parses a block of text (which was presumably created by the patch_toText
function) and returns an array of patch objects.
patch_apply(patches, text1) => [text2, results]
Applies a list of patches to text1. The first element of the return value is the newly patched text. The second element is an array of true/false values indicating which of the patches were successfully applied. [Note that this second element is not too useful since large patches may get broken up internally, resulting in a longer results list than the input with no way to figure out which patch succeeded or failed.
The previously mentioned Match_Distance
and Match_Threshold
properties are used to evaluate patch application on text which does not match exactly. In addition, the diff_match_patch.Patch_DeleteThreshold
property determines how closely the text within a major (~64 character) delete needs to match the expected text. If Patch_DeleteThreshold
is closer to 0, then the deleted text must match the expected text more closely. If Patch_DeleteThreshold
is closer to 1, then the deleted text may contain anything. In most use cases Patch_DeleteThreshold
should just be set to the same value as Match_Threshold
API Reference: https://code.google.com/archive/p/google-diff-match-patch/wikis/API.wiki