Delphi XE 中的正则表达式命名捕获组

发布于 2024-10-22 02:21:53 字数 1072 浏览 6 评论 0 原文

我在 RegexBuddy 中构建了一个匹配模式,其行为完全符合我的预期。但我无法将其转移到 Delphi XE,至少在使用最新内置的 TRegEx 或 TPerlRegEx 时。

我的现实世界代码有 6 个捕获组,但我可以用一个更简单的示例来说明问题。此代码在第一个对话框中给出“3”,然后在执行第二个对话框时引发异常(-7 索引越界)。

var
  Regex: TRegEx;
  M: TMatch;
begin
  Regex := TRegEx.Create('(?P<time>\d{1,2}:\d{1,2})(?P<judge>.{1,3})');
  M := Regex.Match('00:00  X1 90  55KENNY BENNY');
  ShowMessage(IntToStr(M.Groups.Count));
  ShowMessage(M.Groups['time'].Value);
end;

但是,如果我仅使用一个捕获组,

Regex := TRegEx.Create('(?P<time>\d{1,2}:\d{1,2})');

第一个对话框将显示“2”,第二个对话框将按预期显示时间“00:00”。

但是,如果只允许一个命名捕获组,这将有点限制,但事实并非如此......如果我将捕获组名称更改为“atime”。

var
  Regex: TRegEx;
  M: TMatch;
begin
  Regex := TRegEx.Create('(?P<atime>\d{1,2}:\d{1,2})(?P<judge>.{1,3})');
  M := Regex.Match('00:00  X1 90  55KENNY BENNY');
  ShowMessage(IntToStr(M.Groups.Count));
  ShowMessage(M.Groups['atime'].Value);
end;

正如预期的那样,我将得到“3”和“00:00”。是否有我不能使用的保留字?我不这么认为,因为在我的真实示例中,我尝试了完全随机的名称。我只是无法弄清楚是什么导致了这种行为。

I have built a match pattern in RegexBuddy which behaves exactly as I expect. But I cannot transfer this to Delphi XE, at least when using the latest built in TRegEx or TPerlRegEx.

My real world code have 6 capture group but I can illustrate the problem in an easier example. This code gives "3" in first dialog and then raises an exception (-7 index out of bounds) when executing the second dialog.

var
  Regex: TRegEx;
  M: TMatch;
begin
  Regex := TRegEx.Create('(?P<time>\d{1,2}:\d{1,2})(?P<judge>.{1,3})');
  M := Regex.Match('00:00  X1 90  55KENNY BENNY');
  ShowMessage(IntToStr(M.Groups.Count));
  ShowMessage(M.Groups['time'].Value);
end;

But if I use only one capture group

Regex := TRegEx.Create('(?P<time>\d{1,2}:\d{1,2})');

The first dialog shows "2" and the second dialog will show the time "00:00" as expected.

However this would be a bit limiting if only one named capture group was allowed, but thats not the case... If I change the capture group name to for example "atime".

var
  Regex: TRegEx;
  M: TMatch;
begin
  Regex := TRegEx.Create('(?P<atime>\d{1,2}:\d{1,2})(?P<judge>.{1,3})');
  M := Regex.Match('00:00  X1 90  55KENNY BENNY');
  ShowMessage(IntToStr(M.Groups.Count));
  ShowMessage(M.Groups['atime'].Value);
end;

I'll get "3" and "00:00", just as expected. Is there reserved words I cannot use? I don't think so because in my real example I've tried completely random names. I just cannot figure out what causes this behaviour.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

温折酒 2024-10-29 02:21:53

pcre_get_stringnumber 找不到名称时,PCRE_ERROR_NOSUBSTRING 为回来了。

PCRE_ERROR_NOSUBSTRING 在 RegularExpressionsAPI 中定义为 PCRE_ERROR_NOSUBSTRING = -7

一些测试表明,对于第一个字母在 kz 范围内的每个名称,pcre_get_stringnumber 返回 PCRE_ERROR_NOSUBSTRING该范围取决于 judge 中的第一个字母。将 judge 更改为其他内容会更改范围。

据我所知,这里至少涉及两个错误。 pcre_get_stringnumber 中的一个和 TGroupCollection.GetItem 中的一个需要引发适当的异常,而不是 SRegExIndexOutOfBounds

When pcre_get_stringnumber does not find the name, PCRE_ERROR_NOSUBSTRING is returned.

PCRE_ERROR_NOSUBSTRING is defined in RegularExpressionsAPI as PCRE_ERROR_NOSUBSTRING = -7.

Some testing shows that pcre_get_stringnumber returns PCRE_ERROR_NOSUBSTRING for every name that has the first letter in the range of k to z and that range is dependent of the first letter in judge. Changing judge to something else changes the range.

As i see it there is at lest two bugs involved here. One in pcre_get_stringnumber and one in TGroupCollection.GetItem that needs to raise a proper exception instead of SRegExIndexOutOfBounds

べ映画 2024-10-29 02:21:53

该错误似乎位于包装 PCRE 库的 RegularExpressionsAPI 单元中,或其链接的 PCRE OBJ 文件中。如果我运行此代码:

program Project1;

{$APPTYPE CONSOLE}

uses
  SysUtils, RegularExpressionsAPI;

var
  myregexp: Pointer;
  Error: PAnsiChar;
  ErrorOffset: Integer;
  Offsets: array[0..300] of Integer;
  OffsetCount, Group: Integer;

begin
  try
    myregexp := pcre_compile('(?P<time>\d{1,2}:\d{1,2})(?P<judge>.{1,3})', 0, @error, @erroroffset, nil);
    if (myregexp <> nil) then begin
      offsetcount := pcre_exec(myregexp, nil, '00:00  X1 90  55KENNY BENNY', Length('00:00  X1 90  55KENNY BENNY'), 0, 0, @offsets[0], High(Offsets));
      if (offsetcount > 0) then begin
        Group := pcre_get_stringnumber(myregexp, 'time');
        WriteLn(Group);
        Group := pcre_get_stringnumber(myregexp, 'judge');
        WriteLn(Group);
      end;
    end;
  except
    on E: Exception do
      Writeln(E.ClassName, ': ', E.Message);
  end;
  ReadLn;
end.

它会打印 -7 和 2,而不是 1 和 2。

如果我从 uses 子句中删除 RegularExpressionsAPI 并从我的 pcre 单元="http://www.regular-expressions.info/delphi.html">TPerlRegEx组件,然后它正确打印1和2。Delphi

XE中的RegularExpressionsAPI基于我的 pcre 单元,而 RegularExpressionsCore 单元基于我的 PerlRegEx 单元。内河码头确实对这两个单位做了一些改变。他们还从通过 RegularExpressionsAPI 链接的 PCRE 库编译了自己的 OBJ 文件。

我已将此错误报告为 QC 92497

我还创建了一份单独的报告 QC 92498 请求 TGroupCollection.GetItem 在请求命名组时引发更合理的异常那不存在。 (此代码位于 RegularExpressions 单元中,该单元基于 Vincent Parrett 编写的代码,而不是我自己。)

The bug seems to be in the RegularExpressionsAPI unit that wraps the PCRE library, or in the PCRE OBJ files that it links. If I run this code:

program Project1;

{$APPTYPE CONSOLE}

uses
  SysUtils, RegularExpressionsAPI;

var
  myregexp: Pointer;
  Error: PAnsiChar;
  ErrorOffset: Integer;
  Offsets: array[0..300] of Integer;
  OffsetCount, Group: Integer;

begin
  try
    myregexp := pcre_compile('(?P<time>\d{1,2}:\d{1,2})(?P<judge>.{1,3})', 0, @error, @erroroffset, nil);
    if (myregexp <> nil) then begin
      offsetcount := pcre_exec(myregexp, nil, '00:00  X1 90  55KENNY BENNY', Length('00:00  X1 90  55KENNY BENNY'), 0, 0, @offsets[0], High(Offsets));
      if (offsetcount > 0) then begin
        Group := pcre_get_stringnumber(myregexp, 'time');
        WriteLn(Group);
        Group := pcre_get_stringnumber(myregexp, 'judge');
        WriteLn(Group);
      end;
    end;
  except
    on E: Exception do
      Writeln(E.ClassName, ': ', E.Message);
  end;
  ReadLn;
end.

It prints -7 and 2 instead of 1 and 2.

If I remove RegularExpressionsAPI from the uses clause and add the pcre unit from my TPerlRegEx component, then it does correctly print 1 and 2.

The RegularExpressionsAPI in Delphi XE is based on my pcre unit, and the RegularExpressionsCore unit is based on my PerlRegEx unit. Embarcadero did make some changes to both units. They also compiled their own OBJ files from the PCRE library that are linked by RegularExpressionsAPI.

I have reported this bug as QC 92497

I have also created a separate report QC 92498 to request that TGroupCollection.GetItem raise a more sensible exception when requesting a named group that does not exist. (This code is in the RegularExpressions unit which is based on code written by Vincent Parrett, not myself.)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文