嵌套值的正则表达式

发布于 2024-10-25 05:21:25 字数 168 浏览 4 评论 0原文

我想要一个正则表达式,可以解析忽略嵌套匹配

,我的意思是例如:

/*asdasdasd /* asdasdsa */ qweqweqwe */

将第一个“/*”与最后一个“*/”匹配,而不是停止到第一个“*/”

谢谢...

I want a regex that can parse ignoring the nested matches

I mean on this for example:

/*asdasdasd /* asdasdsa */ qweqweqwe */

to match the first "/*" with the last "*/" and not stopping to the first "*/"

Thanks...

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

Bonjour°[大白 2024-11-01 05:21:25

正则表达式自然是贪婪的,所以你可以使用:

\/\*.*\*\/

如果你想让它做你害怕的事情并让正则表达式变得懒惰并在第一个匹配后停止,你必须添加一个 ?如:

\/\*.*?\*\/

RegEx expressions will naturally be greedy, so you can just use:

\/\*.*\*\/

If you wanted it to do what you're afraid of and make the RegEx be lazy and stop after the first match you'd have to add an ? like:

\/\*.*?\*\/
琴流音 2024-11-01 05:21:25

正则表达式无法根据定义对嵌套项进行计数(尽管实现确实比计算机科学定义更进一步)。

请参阅http://en.wikipedia.org/wiki/Regular_expression#Expressive_power_and_compactness

Regular expressions can't count nested items by definition (though implementations do go further than the computer scientific definition).

See http://en.wikipedia.org/wiki/Regular_expression#Expressive_power_and_compactness

还在原地等你 2024-11-01 05:21:25

如果文本只有一个嵌套注释,到目前为止提出的解决方案可以正常工作。然而,正如 LHMathies 指出的那样,如果文本中有多个注释,并且您想在它们之间保留一些内容,那么这些解决方案就会失败。例如,这里有一些测试数据来验证算法是否正常工作:

/* one */
东西一
/* 两个 /* 三 */ 两个 */
填充两个
/* four */

正确的解决方案将保留其中包含内容的两行。为了在 Javascript 中正确处理这种情况,您需要一个与最里面的注释匹配的正则表达式(这是最难的部分),然后重复应用它,直到所有注释都消失。下面是一个经过测试的函数,它正是执行此操作:

function strip_nested_C_comments(text)
{ // Regex to match innermost "C" style comment.
    var re = /\/\*[^*\/]*(?:(?!\/\*|\*\/)[*\/][^*\/]*)*\*\//i;
    // Iterate stripping comments from inside out.
    while (text.search(re) != -1) {
        text = text.replace(re, '');
    }
    return text;
}

编辑: 改进了非匹配情况下的正则表达式效率。 (即将“特殊”从 [\S\s] 更改为 [*\/])。

The solutions presented so far work ok if the text has only one nested comment. However, as LHMathies noted, if the text has more than one comment with stuff you want to keep between them, then these solutions fail. For example, here is some test data to verify the algorithm works correctly:

/* one */
Stuff one
/* two /* three */ two */
Stuff two
/* four */

A correct solution will preserve the two lines with stuff in them. To correctly handle this case in Javascript, you need a regex which matches an innermost comment (and this is the hard part), and then apply this repeatedly until all the comments are gone. Here is a tested function which does precisely that:

function strip_nested_C_comments(text)
{ // Regex to match innermost "C" style comment.
    var re = /\/\*[^*\/]*(?:(?!\/\*|\*\/)[*\/][^*\/]*)*\*\//i;
    // Iterate stripping comments from inside out.
    while (text.search(re) != -1) {
        text = text.replace(re, '');
    }
    return text;
}

Edit: Improved regex efficiency for non-match cases. (i.e. changed the "special" from [\S\s] to [*\/]).

智商已欠费 2024-11-01 05:21:25

正则表达式不擅长处理嵌套值,因为您所描述的不是“常规语言< /a>"

但正则表达式天生就是贪婪的。这意味着默认情况下 * 和 + 量词将完全按照您的要求进行操作

var data = "/*asdasdasd /* asdasdsa */ qweqweqwe */";
data = data.replace( /\/\*.*\*\//, '' );
alert( 'Data: ' + data );

Regular expressions aren't good at dealing with nested values, since what you're describing is not a "regular language"

But regular expressions are naturally greedy. That means that * and + quantifiers by default they will do exactly what you're asking for

var data = "/*asdasdasd /* asdasdsa */ qweqweqwe */";
data = data.replace( /\/\*.*\*\//, '' );
alert( 'Data: ' + data );
许你一世情深 2024-11-01 05:21:25

我猜你真的在寻找能够从字符串中删除或处理正确嵌套注释的东西,即使有多个 - 给出“贪婪”正则表达式的答案将从第一个 /* 到最后一个 */:在像 keep /* comment */ keep /* comment */ keep 这样的字符串中,它们将处理中间的 keep 作为评论的一部分。

简而言之,Javascript RegExp 不够强大,无法做到这一点,您需要递归模式。 (也称为正则表达式无法计数)。

但是,如果您只想删除注释,则可以使用循环并首先删除最里面的注释(使用 @mVChr 中的非贪婪 RegExp,修改为匹配最后一个可能的起始分隔符而不是第一个)

var re = /(.*)\/\*.*?\*\//; while (re.test(string)) string.replace(re, '$1')

:可以这么说,从正则表达式计数(嵌套级别)并进入循环。 (我没有在正则表达式上放置 g 标志,因为我不确定在循环中的两个位置使用此类正则表达式时的副作用。无论如何,循环都会查找所有出现的情况)。

I'm guessing that you're really after something that will remove or process properly nested comments from a string, even if there's more than one -- the answers giving 'greedy' regexes will go from the first /* to the last */: in strings like keep /* comment */ keep /* comment */ keep they will treat the middle keep as part of the comment.

The short answer is that Javascript RegExps aren't powerful enough to do that, you need recursive patterns. (Also known as regexps can't count).

But, if you just want to remove the comments, you can use a loop and remove the innermost ones first (using the non-greedy RegExp from @mVChr, modified to match the last possible starting delimiter instead of the first):

var re = /(.*)\/\*.*?\*\//; while (re.test(string)) string.replace(re, '$1')

This moves the counting (of nesting levels) out of the regexp and into the loop, so to speak. (I didn't put a g flag on the regexp because I'm unsure of the side effects when using such an regexp in two places in a loop. And the loop takes care of finding all occurrences anyway).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文