使用平衡组的正则表达式
我有一个基本的文本模板引擎,它使用这样的语法:
foo bar
%IF MY_VAR
some text
%IF OTHER_VAR
some other text
%ENDIF
%ENDIF
bar foo
我用来解析它的正则表达式有一个问题,它没有考虑嵌套的 IF/ENDIF 块。
我当前使用的正则表达式是: %IF (?
我一直在阅读有关平衡捕获的 内容组(.NET 正则表达式库的一项功能)据我了解,这是在.NET 中支持“递归”正则表达式的推荐方法。
我一直在研究平衡组,到目前为止已得出以下结论:
(
(
(?'Open'%IF\s(?<Name>[\w_]+))
(?<Contents>.*?)
)+
(
(?'Close-Open'%ENDIF)(?<Remainder>.*?)
)+
)*
(?(Open)(?!))
但这并不完全按照我的预期运行。例如,它捕获了很多空组。帮助?
I have a basic text template engine that uses a syntax like this:
foo bar
%IF MY_VAR
some text
%IF OTHER_VAR
some other text
%ENDIF
%ENDIF
bar foo
I have an issue with the regular expression that I am using to parse it whereby it is not taking into account the nested IF/ENDIF blocks.
The current regex I'm using is: %IF (?<Name>[\w_]+)(?<Contents>.*?)%ENDIF
I have been reading up on balancing capture groups (a feature of .NET's regex library) as I understand this is the recommended way of supporting "recursive" regex's in .NET.
I've been playing with balancing groups and have so far came up with the following:
(
(
(?'Open'%IF\s(?<Name>[\w_]+))
(?<Contents>.*?)
)+
(
(?'Close-Open'%ENDIF)(?<Remainder>.*?)
)+
)*
(?(Open)(?!))
But this is not behaving entirely how I would expect. It is for instance capturing a lot of empty groups. Help?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
要使用平衡的 IF 语句捕获整个 IF/ENDIF 块,您可以使用此正则表达式:
这里的要点是:您不能在单个
Match
中捕获多个每个指定的组。例如,您只能获得最后捕获值的一个(?\w+)
组。在我的正则表达式中,我保留了简单正则表达式的Name
和Contents
组,并限制了Contents
组内的平衡 - 正则表达式仍然包含在IF
和ENDIF
中。当您的数据更加复杂时,它会变得有趣。例如:
在这里,您将获得两个匹配项,一个用于
MY_VAR
,另一个用于OTHER_VAR3
。如果您想捕获MY_VAR
内容上的两个 if,则必须在其Contents
组上重新运行正则表达式(如果您必须 - 将整个正则表达式包装在(?=...)
中,但您需要使用位置和长度以某种方式将其放入逻辑结构中。现在,我不会解释太多,因为看起来你已经掌握了基础知识,但关于内容组的简短说明 - 我使用了所有格组来避免回溯。否则,点最终可能会匹配整个 IF 并打破平衡。组上的惰性匹配的行为类似(
( )+?
而不是(?> )+
)。To capture a whole IF/ENDIF block with balanced IF statements, you can use this regex:
The point here is this: you cannot capture in a single
Match
more than one of every named group. You will only get one(?<Name>\w+)
group, for example, of the last captured value. In my regex, I kept theName
andContents
groups of your simple regex, and limited the balancing inside theContents
group - the regex is still wrapped inIF
andENDIF
.If becomes interesting when your data is more complex. For example:
Here, you will get two matches, one for
MY_VAR
, and one forOTHER_VAR3
. If you want to capture the two ifs onMY_VAR
's content, you have to rerun the regex on itsContents
group (you can get around it by using a lookahead if you must - wrap the whole regex in(?=...)
, but you'll need to put it into a logical structure somehow, using positions and lengths).Now, I won't explain too much, because it seems you get the basics, but a short note about the contents group - I've uses a possessive group to avoid backtracking. Otherwise, it would be possible for the dot to eventually match whole
IF
s and break the balance. A lazy match on the group would behave similarly (( )+?
instead of(?> )+
).