仅使用一个表达式匹配第一个匹配的更改版本?
我正在为 Alex Gorbatchev 的语法荧光笔编写一个画笔,以突出显示 Smalltalk 代码。现在,考虑以下 Smalltalk 代码:
aCollection do: [ :each | each shout ]
我想找到块参数“:each”,然后在每次出现“each”时匹配它(为简单起见,假设每次出现都不仅仅是在括号内)。 请注意,参数可以具有任何名称,例如“:myArg”。
我尝试匹配“:each”:
\:([\d\w]+)
这似乎有效。问题是我要匹配“每个”的出现。我认为这样的事情可以工作:
\:([\d\w]+)|\1
但是交替的右侧似乎被视为独立的表达式,因此反向引用不起作用。
是否有可能通过一个表达式来完成我想要的事情?或者我是否必须在第二个表达式中使用反向引用(通过另一个函数调用)?
I'm writing a brush for Alex Gorbatchev's Syntax Highlighter to get highlighting for Smalltalk code. Now, consider the following Smalltalk code:
aCollection do: [ :each | each shout ]
I want to find the block argument ":each" and then match "each" every time it occurrs afterwards (for simplicity, let's say every occurrence an not just inside the brackets).
Note that the argument can have any name, e.g. ":myArg".
My attempt to match ":each":
\:([\d\w]+)
This seems to work. The problem is for me to match the occurrences of "each". I thought something like this could work:
\:([\d\w]+)|\1
But the right hand side of the alternation seems to be treated as an independent expression, so backreferencing doesn't work.
Is it even possible to accomplish what I want in a single expression? Or would I have to use the backreference within a second expression (via another function call)?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您可以使用支持可变长度lookbehind的语言来做到这一点(据我所知只有.NET框架语言可以,Perl 6可以)。如果某个单词与
(?<=:(\w+)\b.*)\1
匹配,您可以突出显示该单词。但 JavaScript 根本不支持lookbehind。但无论如何,这个正则表达式的效率非常低(我刚刚在 RegexBuddy 中检查了一个简单的示例,正则表达式引擎需要超过 60 个步骤来处理文档中的几乎每个字符来决定匹配和不匹配),所以这不是一个好主意如果你想用它来突出显示代码。
我建议您使用您提到的两步方法:首先匹配
:(\w+)\b
(为安全起见插入单词边界,\d
隐含在 < code>\w),然后对匹配结果\1
进行字面搜索。You could do it in languages that support variable-length lookbehind (AFAIK only the .NET framework languages do, Perl 6 might). There you could highlight a word if it matches
(?<=:(\w+)\b.*)\1
. But JavaScript doesn't support lookbehind at all.But anyway this regex would be very inefficient (I just checked a simple example in RegexBuddy, and the regex engine needs over 60 steps for nearly every character in the document to decide between match and non-match), so this is not a good idea if you want to use it for code highlighting.
I'd recommend you use the two-step approach you mentioned: First match
:(\w+)\b
(word boundary inserted for safety,\d
is implied in\w
), then do a literal search for match result\1
.我相信正则表达式引擎在比赛之间存储的唯一内容是最后一场比赛的位置。因此,在查找下一个匹配项时,不能使用对之前匹配项的反向引用。
所以,不,我认为这是不可能的。
I believe the only thing stored by the Regex engine between matches is the position of the last match. Therefore, when looking for the next match, you cannot use a backreference to the match before.
So, no, I do not think that this is possible.