正则表达式匹配字符串内容直到注释
我试图匹配字符串中 [%___%]
中包含的表达式,在 //
(注释)之前,不包括 //
用引号括起来(在字符串内)
例如[%tag%] = "a" + "//" + [%tag2%]; //[%tag3%]
应该匹配 [%tag%]
和 [%tag2%]
我能得到的最接近的是 ^(?:(?:\[%([^% \]\[]*)%\])|[^"]|"[^"]*")*?(?://)
所以我遇到的问题是这不匹配任何不以 //
结尾的字符串
事实上,它会聚合行,直到它可以结束为包含 //
我试图在最后用 ?.*?$
来解决这个问题,以表示 //
不是必需的并转到第一行,但它真的不起作用。
其次,它只捕获第二个标签。这不是因为 "//"
因为即使使用 [%1%] [%2%]
它也不会捕获
我使用 C# 的 第一个和带有 RegexOptions.Multiline
选项的 Regex.Matches
,这是我的转义字符串
"^(?:(?:\\[%([^%\\]\\[]*)%\\])|[^\"]|\"[^\"]*\")*?(?://)"
I'm trying to match to expresions contained within [%___%]
in a string, before //
(comments) excluding //
that are in quotations (inside a string)
so for example[%tag%] = "a" + "//" + [%tag2%]; //[%tag3%]
should match [%tag%]
and [%tag2%]
The closest I can get is ^(?:(?:\[%([^%\]\[]*)%\])|[^"]|"[^"]*")*?(?://)
So the problems I'm having are that this doesn't match any strings which don't end in //
In fact, it aggregates lines until it can conclude in one that contains //
I've tried to remedy this problem with ?.*?$
at the end, to signify that //
is not necessary and to go to the first endline, but it doesn't really work.
And Secondly, it only captures the second tag. This isn't because of the "//"
since even with [%1%] [%2%]
it won't capture the first
I'm using C# and Regex.Matches
with the RegexOptions.Multiline
option and this is my escaped string
"^(?:(?:\\[%([^%\\]\\[]*)%\\])|[^\"]|\"[^\"]*\")*?(?://)"
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
首先,我想说我喜欢正则表达式。我几年前读过 Friedl 的《掌握正则表达式》,并且从未回头。话虽如此,不要不要使用一个巨大的正则表达式来解决这个问题。使用您的编程语言。您最终将获得更具可读性和可维护性的代码。看起来您正在尝试解析一种语言,其中不同的规则适用于不同的上下文。您的模式可能出现在带引号的字符串中。带引号的字符串内部可能包含需要转义的引号。在一个正则表达式中捕获所有微妙之处将是一场噩梦。我建议逐个字符地遍历字符串,一路构建标记,查找引号,并跟踪是否位于带引号的字符串中。当您遇到与您的条件匹配的标记(您可以在这部分使用正则表达式)并且您不在字符串中时,请将其添加到您的列表中。当您到达语句的末尾并遇到注释的开头时,请丢弃剩余的字符,直到注释的末尾。
First off, let me just say that I love regexes. I read Friedl's Mastering Regular Expressions years ago and never looked back. That being said, do not use one giant regex to solve this problem. Use your programming language. You'll end up with more readable and maintainable code. It looks like you're trying to parse a language here where different rules apply in different contexts. Your pattern could appear in a quoted string. Quoted strings might have quotes inside them which need to be escaped. Capturing all the subtleties in one regex would be a nightmare. I recommend iterating through the string character by character, building tokens along the way, looking for the quotes, and keeping track of whether or not you're in a quoted string. When you encounter a token that matches your criteria (you can use a regex for this part), and you're not within a string, add it to your list. When you hit the end of a statement and encounter the beginning of a comment, discard the remaining characters until the end of the comment.
我认为一次性完成此操作有点困难,因为双引号匹配很难检查。您可以分两个阶段进行:
¤ 删除所有匹配的双引号
¤ 寻找你的模式
I think doing this in one shot is a little difficult because of double quotes matching being difficult to check. You can do it in two phases:
¤ Removing all matching double quotes
¤ Finding your pattern