需要可变宽度负向后查找替换

发布于 2024-08-30 16:35:20 字数 605 浏览 9 评论 0原文

我在这里查看了许多问题(以及更多网站),并提供了一些提示,但没有一个给我明确的答案。我知道正则表达式,但我距离大师还很远。这个特殊问题涉及 PHP 中的正则表达式。

我需要在文本中查找未被给定类的超链接包围的单词。例如,我可能需要

This <a href="blabblah" class="no_check">elephant</a> is green and this elephant is blue while this <a href="blahblah">elephant</a> is red.

与第二只和第三只大象匹配,但不是第一只大象(由测试类“no_check”标识)。请注意,超链接中可以有更多属性,而不仅仅是 href 和 class。我想出了

((?<!<a .*class="no_check".*>)\belephant\b)

它在正则表达式测试软件中工作得很好但在 PHP 中却不行。

非常感谢任何帮助。如果您无法提供正则表达式,但可以找到某种 PHP 代码逻辑来避免需要它,我将同样感激不已。

I have looked at many questions here (and many more websites) and some provided hints but none gave me a definitive answer. I know regular expressions but I am far from being a guru. This particular question deals with regex in PHP.

I need to locate words in a text that are not surrounded by a hyperlink of a given class. For example, I might have

This <a href="blabblah" class="no_check">elephant</a> is green and this elephant is blue while this <a href="blahblah">elephant</a> is red.

I would need to match against the second and third elephants but not the first (identified by test class "no_check"). Note that there could more attributes than just href and class within hyperlinks. I came up with

((?<!<a .*class="no_check".*>)\belephant\b)

which works beautifully in regex test software but not in PHP.

Any help is greatly appreciated. If you cannot provide a regular expression but can find some sort of PHP code logic that would circumvent the need for it, I would be equally grateful.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

梦魇绽荼蘼 2024-09-06 16:35:20

如果可变宽度负后向查找不可用,则快速而肮脏的解决方案是反转内存中的字符串并使用可变宽度负后向查找。然后再次反转字符串。

但使用 HTML 解析器可能会更好。

If variable width negative look-behind is not available a quick and dirty solution is to reverse the string in memory and use variable width negative look-ahead instead. then reverse the string again.

But you may be better off using an HTML parser.

别忘他 2024-09-06 16:35:20

我认为最简单的方法是将一个完整的 元素与“no_check”属性相匹配您的单词正在寻找。例如:

<a [^<>]*class="no_check"[^<>]*>.*?</a>|(\belephant\b)

如果这是您匹配的单词,则它将位于捕获组 #1 中;如果不是,该组应该为空或为 null。

当然,我所说的“最简单的方法”实际上是指最简单的正则表达式方法。更简单的是使用 HTML 解析器。

I think the simplest approach would be to match either a complete <a> element with a "no_check" attribute, or the word you're searching for. For example:

<a [^<>]*class="no_check"[^<>]*>.*?</a>|(\belephant\b)

If it was the word you matched, it will be in capture group #1; if not, that group should be empty or null.

Of course, by "simplest approach" I really meant the simplest regex approach. Even simpler would be to use an HTML parser.

携余温的黄昏 2024-09-06 16:35:20

我最终使用了混合解决方案。事实证明,我必须解析文本中的特定关键字,并检查它们是否已经是链接的一部分,如果不是,则将它们添加到超链接中。这里提供的解决方案非常有趣,但不足以满足我的需要。

使用 HTML 解析器的想法是一个很好的想法,我目前正在另一个项目中使用它。因此,向艾伦·摩尔和埃里克·斯特罗姆提出的解决方案致敬。

I ended up using a mixed solution. It turns out that I had to parse a text for specific keywords and check if they were already part of a link and if not add them to a hyperlink. The solutions provided here were very interesting but not exactly tailored enough for what I needed.

The idea of using an HTML parser was a good one though and I am currently using one in another project. So hats off to both Alan Moore and Eric Strom for suggesting that solution.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文