Delphi - 比较并标记字符串差异

发布于 2024-08-19 22:27:27 字数 2254 浏览 1 评论 0原文

我需要做的是比较两个字符串并用更改的开始/结束标记标记差异。示例:

this is string number one.
this string is string number two.

输出将是

this [is|string is] string number [one|two].  

一段时间以来我一直在尝试解决这个问题。我发现一些我相信可以帮助我做到这一点的东西,但我无法实现这一点。
http://www.angusj.com/delphi/textdiff.html

我有大约 80% 的人在这里工作,但我不知道如何让它完全按照我想要的方式去做。任何帮助将不胜感激。


uses
  Windows, Messages, SysUtils, Variants, Classes, Graphics, Controls, Forms,
  Dialogs, Diff, StdCtrls;

type
  TForm2 = class(TForm)
    Edit1: TEdit;
    Edit2: TEdit;
    Button1: TButton;
    Memo1: TMemo;
    Diff: TDiff;
    procedure Button1Click(Sender: TObject);
  private
    { Private declarations }
  public
    { Public declarations }
  end;

var
  Form2: TForm2;
  s1,s2:string;

implementation

{$R *.dfm}

procedure TForm2.Button1Click(Sender: TObject);
var
  i: Integer;
  lastKind: TChangeKind;

  procedure AddCharToStr(var s: string; c: char; kind, lastkind: TChangeKind);
  begin
    if (kind  lastkind) AND (lastkind = ckNone) and (kind  ckNone) then s:=s+'[';
    if (kind  lastkind) AND (lastkind  ckNone) and (kind = ckNone) then s:=s+']';
    case kind of         
      ckNone: s := s  + c;
      ckAdd: s := s + c;         
      ckDelete: s := s  + c;
      ckModify: s := s + '|' + c;
    end;
  end;

begin
  Diff.Execute(pchar(edit1.text), pchar(edit2.text), length(edit1.text), length(edit2.text));

  //now, display the diffs ...
  lastKind := ckNone;
  s1 := ''; s2 := '';
  form2.caption:= inttostr(diff.Count);
  for i := 0 to Diff.count-1 do
  begin

    with Diff.Compares[i] do
    begin
      //show changes to first string (with spaces for adds to align with second string)
      if Kind = ckAdd then 
      begin 
        AddCharToStr(s1,' ',Kind, lastKind); 
      end
      else 
      AddCharToStr(s1,chr1,Kind,lastKind);
      if Kind = ckDelete then 
      begin 
        AddCharToStr(s2,' ',Kind, lastKind)
      end
      else AddCharToStr(s2,chr2,Kind,lastKind);

      lastKind := Kind;
    end;
  end;
    memo1.Lines.Add(s1);
    memo1.Lines.Add(s2);
end;

end.

我从 angusj.com 获取了 basicdemo1 并对其进行了修改以达到目前的效果。

What I need to do is compare two strings and mark the differences with begining/ending marks for changes. Example:

this is string number one.
this string is string number two.

output would be

this [is|string is] string number [one|two].  

I've been trying to figure this out for some time now. And I found something I blieved would help me do this, but I am unable to make this happen.
http://www.angusj.com/delphi/textdiff.html

I have it about 80% working here, but I've got no idea how to get it to do exactly what I want it to do. Any help would be appreciated.


uses
  Windows, Messages, SysUtils, Variants, Classes, Graphics, Controls, Forms,
  Dialogs, Diff, StdCtrls;

type
  TForm2 = class(TForm)
    Edit1: TEdit;
    Edit2: TEdit;
    Button1: TButton;
    Memo1: TMemo;
    Diff: TDiff;
    procedure Button1Click(Sender: TObject);
  private
    { Private declarations }
  public
    { Public declarations }
  end;

var
  Form2: TForm2;
  s1,s2:string;

implementation

{$R *.dfm}

procedure TForm2.Button1Click(Sender: TObject);
var
  i: Integer;
  lastKind: TChangeKind;

  procedure AddCharToStr(var s: string; c: char; kind, lastkind: TChangeKind);
  begin
    if (kind  lastkind) AND (lastkind = ckNone) and (kind  ckNone) then s:=s+'[';
    if (kind  lastkind) AND (lastkind  ckNone) and (kind = ckNone) then s:=s+']';
    case kind of         
      ckNone: s := s  + c;
      ckAdd: s := s + c;         
      ckDelete: s := s  + c;
      ckModify: s := s + '|' + c;
    end;
  end;

begin
  Diff.Execute(pchar(edit1.text), pchar(edit2.text), length(edit1.text), length(edit2.text));

  //now, display the diffs ...
  lastKind := ckNone;
  s1 := ''; s2 := '';
  form2.caption:= inttostr(diff.Count);
  for i := 0 to Diff.count-1 do
  begin

    with Diff.Compares[i] do
    begin
      //show changes to first string (with spaces for adds to align with second string)
      if Kind = ckAdd then 
      begin 
        AddCharToStr(s1,' ',Kind, lastKind); 
      end
      else 
      AddCharToStr(s1,chr1,Kind,lastKind);
      if Kind = ckDelete then 
      begin 
        AddCharToStr(s2,' ',Kind, lastKind)
      end
      else AddCharToStr(s2,chr2,Kind,lastKind);

      lastKind := Kind;
    end;
  end;
    memo1.Lines.Add(s1);
    memo1.Lines.Add(s2);
end;

end.

I took the basicdemo1 from angusj.com and modified it to get this far.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

初见你 2024-08-26 22:27:27

要解决您描述的问题,您本质上必须执行类似于生物序列比对<中所做的操作DNA 或蛋白质数据。如果只有两个字符串(或一个常量引用字符串),可以通过动态编程来实现基于成对对齐算法,例如 Needleman Wunsch 算法* 及相关算法。 (多序列比对变得更加复杂。)

[*编辑:链接应该是:http://en。 wikipedia.org/wiki/Needleman–Wunsch_algorithm]

编辑 2:

由于您似乎对单词级别而不是字符级别的比较感兴趣,因此您可以 (1) 拆分输入字符串转换为字符串数组,其中每个数组元素代表一个单词,然后 (2) 在这些单词的级别上执行对齐。这样做的好处是对齐的搜索空间变得更小,因此您期望它总体上更快。我相应地改编并“德尔菲化”了维基百科文章中的伪代码示例:


program AlignWords;

{$APPTYPE CONSOLE}

function MaxChoice(C1,C2,C3:整数):整数; 开始 结果:= C1; 如果C2>结果然后结果:= C2; 如果C3>结果然后结果:= C3; 结尾;

函数WordSim(const S1,S2:字符串):整数;超载; //区分大小写! var i, l1, l2, minL: 整数; 开始 l1:= 长度(S1); l2:= 长度(S2); 结果:=l1-l2; 如果结果> 0 那么结果:= -结果; 如果 (S1='') 或 (S2='') 则退出; 最小L:= l1; 如果 l2 < l1 然后 minL:= l2; for i := 1 to minL do if S1[i]<>S2[i] then dec(Result); 结尾;

procedure AlignWordsNW (const A, B: 字符串数组; GapChar: Char; const Delimiter: ShortString; GapPenalty: 整数; out AlignmentA, AlignmentB: 字符串); // Needleman-Wunsch 对齐 // GapPenalty 应该是负值! 变量 F:整数数组的数组; 我,j, 选择1、选择2、选择3、 分数、ScoreDiag、ScoreUp、ScoreLeft :整数; 函数 GapChars (const S: String): String; var i:整数; 开始 断言(长度(S)>0); 结果:=''; for i := 0 to length(S) - 1 do Result:=Result + GapChar; 结尾; 开始 SetLength(F, 长度(A)+1, 长度(B)+1); 对于 i := 0 到 length(A) 执行 F[i,0]:= GapPenaltyi; 对于 j := 0 到 length(B) 执行 F[0,j]:= GapPenaltyj; for i:=1 to length(A) 开始 对于 j:= 1 到 length(B) 开始 选择1:= F[i-1,j-1] + WordSim(A[i-1], B[j-1]); 选择2:= F[i-1, j] + GapPenalty; 选择3:= F[i, j-1] + GapPenalty; F[i,j]:= maxChoice(选择1,选择2,选择3); 结尾; 结尾; 对齐方式A:= ''; 对齐方式B:= ''; 我:=长度(A); j:= 长度(B); 当 (i > 0) 和 (j > 0) 开始时 分数 := F[i,j]; ScoreDiag:= F[i-1,j-1]; ScoreUp:= F[i,j-1]; ScoreLeft:= F[i-1,j]; 如果 Score = ScoreDiag + WordSim(A[i-1], B[j-1]) 则开始 AlignmentA:= A[i-1] + 分隔符 + AlignmentA; AlignmentB:= B[j-1] + 分隔符 + AlignmentB; 十二月(一); 十二月 (j); 结束 否则如果 Score = ScoreLeft + GapPenalty 然后开始 AlignmentA:= A[i-1] + 分隔符 + AlignmentA; AlignmentB:= GapChars (A[i-1]) + 分隔符 + AlignmentB; 十二月(一); 结束否则开始 断言(得分 = ScoreUp + GapPenalty); AlignmentA:= GapChars(B[j-1]) + 分隔符 + AlignmentA; AlignmentB:= B[j-1] + 分隔符 + AlignmentB; 十二月 (j); 结尾; 结尾; 当 (i > 0) 开始时 AlignmentA:= A[i-1] + 分隔符 + AlignmentA; AlignmentB:= GapChars(A[i-1]) + 分隔符 + AlignmentB; 十二月(一); 结尾; while (j > 0) 开始 AlignmentA:= GapChars(B[j-1]) + 分隔符 + AlignmentA; AlignmentB:= B[j-1] + 分隔符 + AlignmentB; 十二月(j); 结尾; 结尾;

类型 TStringArray = 字符串数组;

瓦尔 as1、as2:TStringArray; s1、s2:字符串;

开始 as1:= TStringArray.create('这个','是','字符串','数字','一'); as2:= TStringArray.Create('这个','字符串','是','字符串','数字','两个');

AlignWordsNW (as1, as2, '-',' ',-1, s1,s2);
writeln (s1);
writeln (s2);

结尾。

这个例子的输出是

this ------ is string number ---- one. 
this string is string number two. ---- 

It's not Perfect,但你明白了。从这种输出中,您应该能够做您想做的事情。请注意,您可能需要调整 GapPenalty 和相似度评分函数 WordSim 以满足您的需求。

To solve the problem you describe, you'd essentially have to do something like what is done in biological sequence alignment of DNA or protein data. If you only have two strings (or one constant reference string), it can approached by dynamic programming-based pairwise alignment algorithms such as the Needleman Wunsch algorithm* and related algorithms. (Multiple sequence alignment gets much more complicated.)

[* Edit: link should be: http://en.wikipedia.org/wiki/Needleman–Wunsch_algorithm ]

Edit 2:

Since you seem to be interested in comparing at the level of words rather than characters, you could (1) split the input strings into arrays of strings, where each array element represents a word and then (2) carry out the alignment at the level of these words. This has the benefit of the search space for the alignment becoming smaller and thus you'd expect it to be faster overall. I have adapted and 'Delphified' the pseudo-code example from the wikipedia-article accordingly:


program AlignWords;

{$APPTYPE CONSOLE}

function MaxChoice (C1, C2, C3: integer): integer; begin Result:= C1; if C2 > Result then Result:= C2; if C3 > Result then Result:= C3; end;

function WordSim (const S1, S2: String): integer; overload; //Case-sensitive! var i, l1, l2, minL: integer; begin l1:= length(S1); l2:= length(S2); Result:= l1-l2; if Result > 0 then Result:= -Result; if (S1='') or (S2='') then exit; minL:= l1; if l2 < l1 then minL:= l2; for i := 1 to minL do if S1[i]<>S2[i] then dec(Result); end;

procedure AlignWordsNW (const A, B: Array of String; GapChar: Char; const Delimiter: ShortString; GapPenalty: integer; out AlignmentA, AlignmentB: string); // Needleman-Wunsch alignment // GapPenalty should be a negative value! var F: array of array of integer; i, j, Choice1, Choice2, Choice3, Score, ScoreDiag, ScoreUp, ScoreLeft :integer; function GapChars (const S: String): String; var i: integer; begin assert (length(S)>0); Result:=''; for i := 0 to length(S) - 1 do Result:=Result + GapChar; end; begin SetLength (F, length(A)+1, length(B)+1); for i := 0 to length(A) do F[i,0]:= GapPenaltyi; for j := 0 to length(B) do F[0,j]:= GapPenaltyj; for i:=1 to length(A) do begin for j:= 1 to length(B) do begin Choice1:= F[i-1,j-1] + WordSim(A[i-1], B[j-1]); Choice2:= F[i-1, j] + GapPenalty; Choice3:= F[i, j-1] + GapPenalty; F[i,j]:= maxChoice (Choice1, Choice2, Choice3); end; end; AlignmentA:= ''; AlignmentB:= ''; i:= length(A); j:= length(B); while (i > 0) and (j > 0) do begin Score := F[i,j]; ScoreDiag:= F[i-1,j-1]; ScoreUp:= F[i,j-1]; ScoreLeft:= F[i-1,j]; if Score = ScoreDiag + WordSim(A[i-1], B[j-1]) then begin AlignmentA:= A[i-1] + Delimiter + AlignmentA; AlignmentB:= B[j-1] + Delimiter + AlignmentB; dec (i); dec (j); end else if Score = ScoreLeft + GapPenalty then begin AlignmentA:= A[i-1] + Delimiter + AlignmentA; AlignmentB:= GapChars (A[i-1]) + Delimiter + AlignmentB; dec(i); end else begin assert (Score = ScoreUp + GapPenalty); AlignmentA:= GapChars(B[j-1]) + Delimiter + AlignmentA; AlignmentB:= B[j-1] + Delimiter + AlignmentB; dec (j); end; end; while (i > 0) do begin AlignmentA:= A[i-1] + Delimiter + AlignmentA; AlignmentB:= GapChars(A[i-1]) + Delimiter + AlignmentB; dec(i); end; while (j > 0) do begin AlignmentA:= GapChars(B[j-1]) + Delimiter + AlignmentA; AlignmentB:= B[j-1] + Delimiter + AlignmentB; dec(j); end; end;

Type TStringArray = Array Of String;

Var as1, as2: TStringArray; s1, s2: string;

BEGIN as1:= TStringArray.create ('this','is','string','number','one.'); as2:= TStringArray.Create ('this','string','is','string','number','two.');

AlignWordsNW (as1, as2, '-',' ',-1, s1,s2);
writeln (s1);
writeln (s2);

END.

The output on this example is

this ------ is string number ---- one. 
this string is string number two. ---- 

It's not perfect, but you get the idea. From this sort of output, you should be able to do what you want. Note that you might want to tweak the GapPenalty and the Similarity Scoring Function WordSim to fit your needs.

渡你暖光 2024-08-26 22:27:27

有一个 Object Pascal Diff Engine 可用这可能会有帮助。您可能希望将每个“单词”分成单独的行进行比较,或者修改算法以逐个单词进行比较。

There is an Object Pascal Diff Engine available which might be of assistance. You might want to break each "word" into a separate line for the comparison, or modify the algorithm to compare on a word by word basis.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文