正则表达式匹配字符串内容直到注释

发布于 2024-11-13 11:15:49 字数 747 浏览 1 评论 0原文

我试图匹配字符串中 [%___%] 中包含的表达式,在 // (注释)之前,不包括 //用引号括起来(在字符串内)
例如
[%tag%] = "a" + "//" + [%tag2%]; //[%tag3%]
应该匹配 [%tag%][%tag2%]

我能得到的最接近的是 ^(?:(?:\[%([^% \]\[]*)%\])|[^"]|"[^"]*")*?(?://)

所以我遇到的问题是这不匹配任何不以 // 结尾的字符串
事实上,它会聚合行,直到它可以结束为包含 //
我试图在最后用 ?.*?$ 来解决这个问题,以表示 // 不是必需的并转到第一行,但它真的不起作用。

其次,它只捕获第二个标签。这不是因为 "//" 因为即使使用 [%1%] [%2%] 它也不会捕获

我使用 C# 的 第一个和带有 RegexOptions.Multiline 选项的 Regex.Matches ,这是我的转义字符串

"^(?:(?:\\[%([^%\\]\\[]*)%\\])|[^\"]|\"[^\"]*\")*?(?://)"

I'm trying to match to expresions contained within [%___%] in a string, before // (comments) excluding // that are in quotations (inside a string)
so for example
[%tag%] = "a" + "//" + [%tag2%]; //[%tag3%]
should match [%tag%] and [%tag2%]

The closest I can get is ^(?:(?:\[%([^%\]\[]*)%\])|[^"]|"[^"]*")*?(?://)

So the problems I'm having are that this doesn't match any strings which don't end in //
In fact, it aggregates lines until it can conclude in one that contains //
I've tried to remedy this problem with ?.*?$ at the end, to signify that // is not necessary and to go to the first endline, but it doesn't really work.

And Secondly, it only captures the second tag. This isn't because of the "//" since even with [%1%] [%2%] it won't capture the first

I'm using C# and Regex.Matches with the RegexOptions.Multiline option and this is my escaped string

"^(?:(?:\\[%([^%\\]\\[]*)%\\])|[^\"]|\"[^\"]*\")*?(?://)"

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

狠疯拽 2024-11-20 11:15:49

首先,我想说我喜欢正则表达式。我几年前读过 Friedl 的《掌握正则表达式》,并且从未回头。话虽如此,不要不要使用一个巨大的正则表达式来解决这个问题。使用您的编程语言。您最终将获得更具可读性和可维护性的代码。看起来您正在尝试解析一种语言,其中不同的规则适用于不同的上下文。您的模式可能出现在带引号的字符串中。带引号的字符串内部可能包含需要转义的引号。在一个正则表达式中捕获所有微妙之处将是一场噩梦。我建议逐个字符地遍历字符串,一路构建标记,查找引号,并跟踪是否位于带引号的字符串中。当您遇到与您的条件匹配的标记(您可以在这部分使用正则表达式)并且您不在字符串中时,请将其添加到您的列表中。当您到达语句的末尾并遇到注释的开头时,请丢弃剩余的字符,直到注释的末尾。

First off, let me just say that I love regexes. I read Friedl's Mastering Regular Expressions years ago and never looked back. That being said, do not use one giant regex to solve this problem. Use your programming language. You'll end up with more readable and maintainable code. It looks like you're trying to parse a language here where different rules apply in different contexts. Your pattern could appear in a quoted string. Quoted strings might have quotes inside them which need to be escaped. Capturing all the subtleties in one regex would be a nightmare. I recommend iterating through the string character by character, building tokens along the way, looking for the quotes, and keeping track of whether or not you're in a quoted string. When you encounter a token that matches your criteria (you can use a regex for this part), and you're not within a string, add it to your list. When you hit the end of a statement and encounter the beginning of a comment, discard the remaining characters until the end of the comment.

放手` 2024-11-20 11:15:49

我认为一次性完成此操作有点困难,因为双引号匹配很难检查。您可以分两个阶段进行:

¤ 删除所有匹配的双引号
¤ 寻找你的模式

Regex re1 = new Regex(@"""[^""]*""", RegexOptions.Multiline);
Regex re2 = new Regex(@"(?<!//.*)\[%\w+%\]", RegexOptions.Multiline);
string input = @"[%tag%] = ""a"" + ""//"" + [%tag2%]; //[%tag3%]
[%tag%] = ""a"" + ""ii//"" + [%tag2%]; //[%tag3%]";

MatchCollection ms = re2.Matches(re1.Replace(input, ""));

I think doing this in one shot is a little difficult because of double quotes matching being difficult to check. You can do it in two phases:

¤ Removing all matching double quotes
¤ Finding your pattern

Regex re1 = new Regex(@"""[^""]*""", RegexOptions.Multiline);
Regex re2 = new Regex(@"(?<!//.*)\[%\w+%\]", RegexOptions.Multiline);
string input = @"[%tag%] = ""a"" + ""//"" + [%tag2%]; //[%tag3%]
[%tag%] = ""a"" + ""ii//"" + [%tag2%]; //[%tag3%]";

MatchCollection ms = re2.Matches(re1.Replace(input, ""));
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文