正则表达式 - 嵌套模式 - 在外部模式内但排除内部模式
我有一个包含以下内容的文件。
<td> ${ dontReplaceMe } ReplaceMe ${dontReplaceMeEither} </td>
如果“ReplaceMe”位于 td 标记中,我想匹配它,但如果它位于 ${ ... } 表达式中,则不匹配。
我可以用正则表达式做到这一点吗?
目前有:
sed '/\${.*?ReplaceMe.*?}/!s/ReplaceMe/REPLACED/g' data.txt
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
这是不可能的。
正则表达式可用于Type-3 Chomsky 语言(常规语言)。
然而,您的示例代码是Type-2 Chomsky 语言(上下文无关语言)。
几乎只要涉及到任何类型的嵌套(括号),您就会处理上下文无关的语言,这些语言不包含在正则表达式中。
基本上没有办法在正则表达式中定义
一对 x 和 y
,因为这会要求正则表达式具有某种堆栈,但它没有(功能上相当于有限状态自动机)。brandizzi 要求找到一个至少可以匹配简单案例的正则表达式
我实际上想出了这个(令人痛苦的黑客)正则表达式模式:
对于这些情况,它确实正确(原文如此!)匹配:
并且失败了 (嵌套是 Chomsky Type-2,还记得吗?;) ):
并且它无法替换多个匹配:
获取前导 < code>$ 覆盖的是棘手的部分。
这并保持 Reginald/Reggy 在编写这个野兽时不断崩溃。
再次强调:实验性的,切勿在生产代码中使用它!
(...或者我会追捕你,如果我必须使用你的代码/应用程序;)
This is not possible.
Regex can be used for Type-3 Chomsky languages (regular language).
Your sample code however is a Type-2 Chomsky language (context-free language).
Pretty much as soon as any kind of nesting (brackets) is involved you're dealing with context free languages, which are not covered by regular expressions.
There is basically no way to define
within a pair of x and y
in a regular expression, as this would require the regular expression to have some kind of stack, which it doesn't (being functionally equivalent to a finite state automaton).Challenged by brandizzi to find a regex that might match at least trivial cases
I actually came up with this (painfully hacky) regex pattern:
It does proper (sic!) matching for these cases:
And fails with this one (nesting is Chomsky Type-2, remember? ;) ):
And it can't replace multiple matches either:
Getting the leading
$
covered was the tricky part.This and keeping Reginald/Reggy from crashing constantly while writing this beast.
AGAIN: EXPERIMENTAL, DO NOT EVER USE THIS IN PRODUCTION CODE!
(…or I'll hunt you down, should I ever have to work with your code/app ;)
好吧,对于这种简单的情况,您只需验证该行不匹配
${.*}
:< 后的
!
code>/\${.*}/ sed 地址否定条件。OTOH,如果情况不是那么简单,我怀疑你的问题会变得很多,正则表达式不会是最好的解决方案。
Well, for such simple case, you just need to verify that the line does not match
${.*}
:The
!
after the/\${.*}/
sed address negates the criteria.OTOH, if the case is not that so simple, I'd suspect that your problem will grow a lot and regex will not be the best solution.
当涉及结构化标记时,通常使用正则表达式是一个坏主意。在某些特殊情况下可能没问题,但有更好的工具来解析 html,然后您可以在文本节点上使用正则表达式。
usually it is a bad idea to use regex when there is structured markup involved. in some special cases it might be ok, but there are better tools to parse html and then you can use regex on the text nodes.
如果 grep 支持负向后查找(我不记得是否有)。
Something like
<td>.*(?<!${).*ReplaceMe(?!.*}).*</td>
should work, if grep supports negative lookbehinds (I don't remember if it does).为我工作。
您可以考虑使用 -i.bak 备份旧文件,以防出现错误。
或者,
perl -pi -e 's/\sReplaceMe\s<\/td>/Replaced<\/td>/g' temp
也同样有效,记下要备份的-pi.bak。
worked for me.
you may consider using -i.bak to backup the old file, in case of a mistake.
alternatively,
perl -pi -e 's/<td>\sReplaceMe\s<\/td>/<td>Replaced<\/td>/g' temp
also works, again, note the -pi.bak to backup.