我想接受用户的任意正则表达式并将其锚定在两侧以强制完全匹配 (^$
) 但是我不知道我是否有考虑到用户可能已经锚定了他的正则表达式。
看起来 Perl、C++、.NET 和 JavaScript 都允许 double 多重锚定。
"hello" =~ /^h/ # true
"hello" =~ /^^h/ # true
"hello" =~ /^^^h/ # true
"hello" =~ /e/ # true
"hello" =~ /^e/ # false
"hello" =~ /^^e/ # false
有谁知道这是否指定以这种方式工作?我可以依赖这种行为吗?或者这是一个将来可能会改变的意外事件吗?
编辑:我们需要这个的原因是我们使用VBScript的正则表达式(来自COM),我们使用match
,但是这会返回所有匹配项,因此匹配速度要慢得多将字符串 abc
转换为 .*a.*
而不是 ^.*a.*$
。通过使用 @Tim 建议的锚定,我们将匹配速度(对于长字符串)提高了超过一个因子12.
I want to accept an arbitrary regular expression from the user and anchor it on both sides in order to enforce a full match (^<user's-regex>$
) however I don't know if I have to take into account the fact that the user may have already anchored his regex.
It looks like Perl, C++, .NET and JavaScript all allow double multiple anchoring.
"hello" =~ /^h/ # true
"hello" =~ /^^h/ # true
"hello" =~ /^^^h/ # true
"hello" =~ /e/ # true
"hello" =~ /^e/ # false
"hello" =~ /^^e/ # false
Does anyone know if this is specified to work this way? Can I depend on this behaviour or is it an accident that is liable to change in the future?
Edit: The reason we need this is that we're using VBScript's regex's (from COM), we're using match
however this returns all matches so it's much slower to match the string abc
to .*a.*
than to ^.*a.*$
. By using the anchoring as suggested by @Tim we speed matches up (for long strings) by more than a factor of 12.
发布评论
评论(1)
您可以依赖这种行为。正则表达式引擎不介意连续断言相同的事情一次、两次或一百次。
但是,您不应简单地在正则表达式周围添加锚点,还应该在其周围添加一个非捕获组:
^(?:
- user regex -)$
或者最好是,如果您的正则表达式风格允许这样做:\A(?:
- user regex -)\Z
否则,如果用户在其正则表达式中使用交替,您就会出错。比较:
You can depend on this behavior. The regex engine doesn't mind asserting the same thing once, twice, or a hundred times in a row.
However, instead of simply adding anchors around the regex, you should also add a non-capturing group around it:
^(?:
- user regex -)$
or preferably, if your regex flavor allows this:\A(?:
- user regex -)\Z
Otherwise, you'll trip up if the user uses alternation in his regex. Compare: