php 正则表达式匹配 html 标签之外

发布于 2024-12-11 17:16:39 字数 568 浏览 0 评论 0原文

我正在 html 页面上制作 preg_replace 。我的模式旨在为 html 中的某些单词添加周围标签。然而,有时我的正则表达式会修改 html 标签。例如,当我尝试替换此文本时:

<a href="example.com" alt="yasar home page">yasar</a>

以便 yasar 读取 yasar ,我的正则表达式还替换了锚标记的 alt 属性中的 yasar。当前我使用的 preg_replace() 看起来像这样:

preg_replace("/(asf|gfd|oyws)/", '<span class=something>${1}</span>',$target);

How can I make a Regular Expression, so it does not match everything inside a html tag?

I am making a preg_replace on html page. My pattern is aimed to add surrounding tag to some words in html. However, sometimes my regular expression modifies html tags. For example, when I try to replace this text:

<a href="example.com" alt="yasar home page">yasar</a>

So that yasar reads <span class="selected-word">yasar</span> , my regular expression also replaces yasar in alt attribute of anchor tag. Current preg_replace() I am using looks like this:

preg_replace("/(asf|gfd|oyws)/", '<span class=something>${1}</span>',$target);

How can I make a regular expression, so that it doesn't match anything inside a html tag?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

似最初 2024-12-18 17:16:39

您可以为此使用断言,因为您只需确保搜索的单词出现在 > 之后或任何 < 之前。后一个测试更容易完成,因为前瞻断言可以是可变长度:

/(asf|foo|barr)(?=[^>]*(<|$))/

另请参阅 http://www.regular-expressions.info /lookaround.html 对该断言语法有很好的解释。

You can use an assertion for that, as you just have to ensure that the searched words occur somewhen after an >, or before any <. The latter test is easier to accomplish as lookahead assertions can be variable length:

/(asf|foo|barr)(?=[^>]*(<|$))/

See also http://www.regular-expressions.info/lookaround.html for a nice explanation of that assertion syntax.

还给你自由 2024-12-18 17:16:39

Yasar,重新提出这个问题,因为它有另一个未提及的解决方案。

此解决方案不是仅检查下一个标记字符是否为开始标记,而是跳过所有 <完整标记>

关于使用正则表达式解析 html 的所有免责声明,下面是正则表达式:

<[^>]*>(*SKIP)(*F)|word1|word2|word3

这是一个 演示。在代码中,它看起来像这样:

$target = "word1 <a skip this word2 >word2 again</a> word3";
$regex = "~<[^>]*>(*SKIP)(*F)|word1|word2|word3~";
$repl= '<span class="">\0</span>';
$new=preg_replace($regex,$repl,$target);
echo htmlentities($new);

这是此代码的在线演示

参考

  1. 如何匹配除 s1、s2、s3 情况之外的模式
  2. 如何匹配模式,除非...

Yasar, resurrecting this question because it had another solution that wasn't mentioned.

Instead of just checking that the next tag character is an opening tag, this solution skips all <full tags>.

With all the disclaimers about using regex to parse html, here is the regex:

<[^>]*>(*SKIP)(*F)|word1|word2|word3

Here is a demo. In code, it looks like this:

$target = "word1 <a skip this word2 >word2 again</a> word3";
$regex = "~<[^>]*>(*SKIP)(*F)|word1|word2|word3~";
$repl= '<span class="">\0</span>';
$new=preg_replace($regex,$repl,$target);
echo htmlentities($new);

Here is an online demo of this code.

Reference

  1. How to match pattern except in situations s1, s2, s3
  2. How to match a pattern unless...
断爱 2024-12-18 17:16:39

这可能是您想要的东西: http://snipplr.com/view/3618/< /a>
一般来说,我建议不要这样做。更好的选择是去掉所有 HTML 标签,转而依赖 BBcode,例如:

[b]bold text[b] [i]italic text[i]

但是,我意识到这可能不太适合您想要做的事情。

另一种选择可能是 HTML Purifier,请参阅:http://htmlpurifier.org/

This might be the kind of thing that you're after: http://snipplr.com/view/3618/
In general, I'd advise against such. A better alternative is to strip out all HTML tags and instead rely on BBcode, such as:

[b]bold text[b] [i]italic text[i]

However I appreciate that this might not work well with what you're trying to do.

Another option may be HTML Purifier, see: http://htmlpurifier.org/

我做我的改变 2024-12-18 17:16:39

从我的想法来看,这应该有效:

echo preg_replace("/<(.*)>(.*)<\/(.*)>/i","<$1><span class=\"some-class\">$2</span></$3>",$target);

但是,我不知道这有多安全。我只是提出一种可能性:)

From top of my mind, this should be working:

echo preg_replace("/<(.*)>(.*)<\/(.*)>/i","<$1><span class=\"some-class\">$2</span></$3>",$target);

But, I don't know how safe this would be. I am just presenting a possibility :)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文