当前位置：文江博客话题详情

正则表达式 - 嵌套模式 - 在外部模式内但排除内部模式

发布于 2024-11-14 12:12:45 字数 352 浏览 4 评论 0 原文

我有一个包含以下内容的文件。

<td> ${ dontReplaceMe } ReplaceMe ${dontReplaceMeEither} </td>

如果“ReplaceMe”位于 td 标记中，我想匹配它，但如果它位于 ${ ... } 表达式中，则不匹配。

我可以用正则表达式做到这一点吗？

目前有：

sed '/\${.*?ReplaceMe.*?}/!s/ReplaceMe/REPLACED/g' data.txt

原文

I have a file with the content below.

<td> ${ dontReplaceMe } ReplaceMe ${dontReplaceMeEither} </td>

I want to match 'ReplaceMe' if it is in the td tag, but NOT if it is in the ${ ... } expression.

Can I do this with regex?

Currently have:

sed '/\${.*?ReplaceMe.*?}/!s/ReplaceMe/REPLACED/g' data.txt

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

清风挽心 2024-11-21 12:12:45

这是不可能的。

正则表达式可用于Type-3 Chomsky 语言（常规语言）。
然而，您的示例代码是Type-2 Chomsky 语言（上下文无关语言）。

几乎只要涉及到任何类型的嵌套（括号），您就会处理上下文无关的语言，这些语言不包含在正则表达式中。

基本上没有办法在正则表达式中定义一对 x 和 y ，因为这会要求正则表达式具有某种堆栈，但它没有（功能上相当于有限状态自动机）。

brandizzi 要求找到一个至少可以匹配简单案例的正则表达式
我实际上想出了这个（令人痛苦的黑客）正则表达式模式：

perl -pe 's/(?<=<td>)((?:(?:\{.*?\})*[^{]*?)*)(ReplaceMe)(.*)(?=<\/td>)/$1REPLACED$3/g'

对于这些情况，它确实正确（原文如此！）匹配：

<td> ${ dontReplaceMe } ReplaceMe ${dontReplaceMeEither} </td>
<td> ReplaceMe ${dontReplaceMeEither} </td>
<td> ${ dontReplaceMe } ReplaceMe </td>
<td> ReplaceMe </td>

并且失败了 （嵌套是 Chomsky Type-2，还记得吗？;) ）：

<td>${ ${ dontReplaceMe } ReplaceMe ${dontReplaceMeEither} }</td>

并且它无法替换多个匹配：

<td> ReplaceMe ReplaceMe </td>
<td> ReplaceMe ${dontReplaceMeEither} ReplaceMe </td>

获取前导 < code>$ 覆盖的是棘手的部分。
这并保持 Reginald/Reggy 在编写这个野兽时不断崩溃。

再次强调：实验性的，切勿在生产代码中使用它！

^{（...或者我会追捕你，如果我必须使用你的代码/应用程序；）}

This is not possible.

Regex can be used for Type-3 Chomsky languages (regular language).
Your sample code however is a Type-2 Chomsky language (context-free language).

Pretty much as soon as any kind of nesting (brackets) is involved you're dealing with context free languages, which are not covered by regular expressions.

There is basically no way to define within a pair of x and y in a regular expression, as this would require the regular expression to have some kind of stack, which it doesn't (being functionally equivalent to a finite state automaton).

Challenged by brandizzi to find a regex that might match at least trivial cases
I actually came up with this (painfully hacky) regex pattern:

perl -pe 's/(?<=<td>)((?:(?:\{.*?\})*[^{]*?)*)(ReplaceMe)(.*)(?=<\/td>)/$1REPLACED$3/g'

It does proper (sic!) matching for these cases:

<td> ${ dontReplaceMe } ReplaceMe ${dontReplaceMeEither} </td>
<td> ReplaceMe ${dontReplaceMeEither} </td>
<td> ${ dontReplaceMe } ReplaceMe </td>
<td> ReplaceMe </td>

And fails with this one (nesting is Chomsky Type-2, remember? ;) ):

<td>${ ${ dontReplaceMe } ReplaceMe ${dontReplaceMeEither} }</td>

And it can't replace multiple matches either:

<td> ReplaceMe ReplaceMe </td>
<td> ReplaceMe ${dontReplaceMeEither} ReplaceMe </td>

Getting the leading $ covered was the tricky part.
This and keeping Reginald/Reggy from crashing constantly while writing this beast.

AGAIN: EXPERIMENTAL, DO NOT EVER USE THIS IN PRODUCTION CODE!

^{(…or I'll hunt you down, should I ever have to work with your code/app ;)}

回复收藏 0 原文

随心而道 2024-11-21 12:12:45

好吧，对于这种简单的情况，您只需验证该行不匹配 ${.*}：

$ sed '/\${.*}/!s/ReplaceMe/REPLACED/' input
<td> REPLACED </td>
<td> ${ don't ReplaceMe } </td>

< 后的 ! code>/\${.*}/ sed 地址否定条件。

OTOH，如果情况不是那么简单，我怀疑你的问题会变得很多，正则表达式不会是最好的解决方案。

Well, for such simple case, you just need to verify that the line does not match ${.*}:

$ sed '/\${.*}/!s/ReplaceMe/REPLACED/' input
<td> REPLACED </td>
<td> ${ don't ReplaceMe } </td>

The ! after the /\${.*}/ sed address negates the criteria.

OTOH, if the case is not that so simple, I'd suspect that your problem will grow a lot and regex will not be the best solution.

回复收藏 0 原文

失而复得 2024-11-21 12:12:45

当涉及结构化标记时，通常使用正则表达式是一个坏主意。在某些特殊情况下可能没问题，但有更好的工具来解析 html，然后您可以在文本节点上使用正则表达式。

回复收藏 0 原文

踏月而来 2024-11-21 12:12:45

如果 grep 支持负向后查找（我不记得是否有）。

回复收藏 0 原文

追星践月 2024-11-21 12:12:45

sed -i 's/<td>\sReplaceMe\s<\/td>/<td>Replaced<\/td>/gi' input.file

为我工作。

您可以考虑使用 -i.bak 备份旧文件，以防出现错误。

或者，

perl -pi -e 's/\sReplaceMe\s<\/td>/Replaced<\/td>/g' temp

也同样有效，记下要备份的-pi.bak。

sed -i 's/<td>\sReplaceMe\s<\/td>/<td>Replaced<\/td>/gi' input.file

worked for me.

you may consider using -i.bak to backup the old file, in case of a mistake.

alternatively,

perl -pi -e 's/<td>\sReplaceMe\s<\/td>/<td>Replaced<\/td>/g' temp

also works, again, note the -pi.bak to backup.

回复收藏 0 原文

~没有更多了~

关于作者

擦肩而过的背影

暂无简介

文章

27 人气

关注发私信

友情链接

文江博客

正则表达式 - 嵌套模式 - 在外部模式内但排除内部模式

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（5）

再次强调：实验性的，切勿在生产代码中使用它！

AGAIN: EXPERIMENTAL, DO NOT EVER USE THIS IN PRODUCTION CODE!

关于作者

相关话题

热门标签

推荐作者

达拉崩吧

PANGOO

kkgtx

WordPress小学生

酷炫老祖宗

硪扪都還晓

友情链接

正则表达式 - 嵌套模式 - 在外部模式内但排除内部模式

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（5）

再次强调：实验性的，切勿在生产代码中使用它！

AGAIN: EXPERIMENTAL, DO NOT EVER USE THIS IN PRODUCTION CODE!

关于作者

相关话题

热门标签

推荐作者

达拉崩吧

PANGOO

kkgtx

WordPress小学生

酷炫老祖宗

硪扪都還晓

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。