我在 RegexBuddy 中构建了一个匹配模式,其行为完全符合我的预期。但我无法将其转移到 Delphi XE,至少在使用最新内置的 TRegEx 或 TPerlRegEx 时。
我的现实世界代码有 6 个捕获组,但我可以用一个更简单的示例来说明问题。此代码在第一个对话框中给出“3”,然后在执行第二个对话框时引发异常(-7 索引越界)。
var
Regex: TRegEx;
M: TMatch;
begin
Regex := TRegEx.Create('(?P<time>\d{1,2}:\d{1,2})(?P<judge>.{1,3})');
M := Regex.Match('00:00 X1 90 55KENNY BENNY');
ShowMessage(IntToStr(M.Groups.Count));
ShowMessage(M.Groups['time'].Value);
end;
但是,如果我仅使用一个捕获组,
Regex := TRegEx.Create('(?P<time>\d{1,2}:\d{1,2})');
第一个对话框将显示“2”,第二个对话框将按预期显示时间“00:00”。
但是,如果只允许一个命名捕获组,这将有点限制,但事实并非如此......如果我将捕获组名称更改为“atime”。
var
Regex: TRegEx;
M: TMatch;
begin
Regex := TRegEx.Create('(?P<atime>\d{1,2}:\d{1,2})(?P<judge>.{1,3})');
M := Regex.Match('00:00 X1 90 55KENNY BENNY');
ShowMessage(IntToStr(M.Groups.Count));
ShowMessage(M.Groups['atime'].Value);
end;
正如预期的那样,我将得到“3”和“00:00”。是否有我不能使用的保留字?我不这么认为,因为在我的真实示例中,我尝试了完全随机的名称。我只是无法弄清楚是什么导致了这种行为。
I have built a match pattern in RegexBuddy which behaves exactly as I expect. But I cannot transfer this to Delphi XE, at least when using the latest built in TRegEx or TPerlRegEx.
My real world code have 6 capture group but I can illustrate the problem in an easier example. This code gives "3" in first dialog and then raises an exception (-7 index out of bounds) when executing the second dialog.
var
Regex: TRegEx;
M: TMatch;
begin
Regex := TRegEx.Create('(?P<time>\d{1,2}:\d{1,2})(?P<judge>.{1,3})');
M := Regex.Match('00:00 X1 90 55KENNY BENNY');
ShowMessage(IntToStr(M.Groups.Count));
ShowMessage(M.Groups['time'].Value);
end;
But if I use only one capture group
Regex := TRegEx.Create('(?P<time>\d{1,2}:\d{1,2})');
The first dialog shows "2" and the second dialog will show the time "00:00" as expected.
However this would be a bit limiting if only one named capture group was allowed, but thats not the case... If I change the capture group name to for example "atime".
var
Regex: TRegEx;
M: TMatch;
begin
Regex := TRegEx.Create('(?P<atime>\d{1,2}:\d{1,2})(?P<judge>.{1,3})');
M := Regex.Match('00:00 X1 90 55KENNY BENNY');
ShowMessage(IntToStr(M.Groups.Count));
ShowMessage(M.Groups['atime'].Value);
end;
I'll get "3" and "00:00", just as expected. Is there reserved words I cannot use? I don't think so because in my real example I've tried completely random names. I just cannot figure out what causes this behaviour.
发布评论
评论(2)
当 pcre_get_stringnumber 找不到名称时,
PCRE_ERROR_NOSUBSTRING
为回来了。PCRE_ERROR_NOSUBSTRING
在 RegularExpressionsAPI 中定义为PCRE_ERROR_NOSUBSTRING = -7
。一些测试表明,对于第一个字母在
k
到z
范围内的每个名称,pcre_get_stringnumber
返回PCRE_ERROR_NOSUBSTRING
该范围取决于judge
中的第一个字母。将judge
更改为其他内容会更改范围。据我所知,这里至少涉及两个错误。 pcre_get_stringnumber 中的一个和 TGroupCollection.GetItem 中的一个需要引发适当的异常,而不是
SRegExIndexOutOfBounds
When pcre_get_stringnumber does not find the name,
PCRE_ERROR_NOSUBSTRING
is returned.PCRE_ERROR_NOSUBSTRING
is defined in RegularExpressionsAPI asPCRE_ERROR_NOSUBSTRING = -7
.Some testing shows that
pcre_get_stringnumber
returnsPCRE_ERROR_NOSUBSTRING
for every name that has the first letter in the range ofk
toz
and that range is dependent of the first letter injudge
. Changingjudge
to something else changes the range.As i see it there is at lest two bugs involved here. One in
pcre_get_stringnumber
and one in TGroupCollection.GetItem that needs to raise a proper exception instead ofSRegExIndexOutOfBounds
该错误似乎位于包装 PCRE 库的
RegularExpressionsAPI
单元中,或其链接的 PCRE OBJ 文件中。如果我运行此代码:它会打印 -7 和 2,而不是 1 和 2。
如果我从
uses
子句中删除 RegularExpressionsAPI 并从我的 pcre 单元="http://www.regular-expressions.info/delphi.html">TPerlRegEx组件,然后它正确打印1和2。DelphiXE中的
RegularExpressionsAPI
基于我的pcre
单元,而RegularExpressionsCore
单元基于我的PerlRegEx
单元。内河码头确实对这两个单位做了一些改变。他们还从通过RegularExpressionsAPI
链接的 PCRE 库编译了自己的 OBJ 文件。我已将此错误报告为 QC 92497
我还创建了一份单独的报告 QC 92498 请求
TGroupCollection.GetItem
在请求命名组时引发更合理的异常那不存在。 (此代码位于RegularExpressions
单元中,该单元基于 Vincent Parrett 编写的代码,而不是我自己。)The bug seems to be in the
RegularExpressionsAPI
unit that wraps the PCRE library, or in the PCRE OBJ files that it links. If I run this code:It prints -7 and 2 instead of 1 and 2.
If I remove RegularExpressionsAPI from the
uses
clause and add thepcre
unit from my TPerlRegEx component, then it does correctly print 1 and 2.The
RegularExpressionsAPI
in Delphi XE is based on mypcre
unit, and theRegularExpressionsCore
unit is based on myPerlRegEx
unit. Embarcadero did make some changes to both units. They also compiled their own OBJ files from the PCRE library that are linked byRegularExpressionsAPI
.I have reported this bug as QC 92497
I have also created a separate report QC 92498 to request that
TGroupCollection.GetItem
raise a more sensible exception when requesting a named group that does not exist. (This code is in theRegularExpressions
unit which is based on code written by Vincent Parrett, not myself.)