php 正则表达式匹配 html 标签之外
我正在 html 页面上制作 preg_replace 。我的模式旨在为 html 中的某些单词添加周围标签。然而,有时我的正则表达式会修改 html 标签。例如,当我尝试替换此文本时:
<a href="example.com" alt="yasar home page">yasar</a>
以便 yasar
读取 yasar
,我的正则表达式还替换了锚标记的 alt 属性中的 yasar。当前我使用的 preg_replace()
看起来像这样:
preg_replace("/(asf|gfd|oyws)/", '<span class=something>${1}</span>',$target);
How can I make a Regular Expression, so it does not match everything inside a html tag?
I am making a preg_replace on html page. My pattern is aimed to add surrounding tag to some words in html. However, sometimes my regular expression modifies html tags. For example, when I try to replace this text:
<a href="example.com" alt="yasar home page">yasar</a>
So that yasar
reads <span class="selected-word">yasar</span>
, my regular expression also replaces yasar in alt attribute of anchor tag. Current preg_replace()
I am using looks like this:
preg_replace("/(asf|gfd|oyws)/", '<span class=something>${1}</span>',$target);
How can I make a regular expression, so that it doesn't match anything inside a html tag?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您可以为此使用断言,因为您只需确保搜索的单词出现在
>
之后或任何<
之前。后一个测试更容易完成,因为前瞻断言可以是可变长度:另请参阅 http://www.regular-expressions.info /lookaround.html 对该断言语法有很好的解释。
You can use an assertion for that, as you just have to ensure that the searched words occur somewhen after an
>
, or before any<
. The latter test is easier to accomplish as lookahead assertions can be variable length:See also http://www.regular-expressions.info/lookaround.html for a nice explanation of that assertion syntax.
Yasar,重新提出这个问题,因为它有另一个未提及的解决方案。
此解决方案不是仅检查下一个标记字符是否为开始标记,而是跳过所有
<完整标记>
。关于使用正则表达式解析 html 的所有免责声明,下面是正则表达式:
这是一个 演示。在代码中,它看起来像这样:
这是此代码的在线演示。
参考
Yasar, resurrecting this question because it had another solution that wasn't mentioned.
Instead of just checking that the next tag character is an opening tag, this solution skips all
<full tags>
.With all the disclaimers about using regex to parse html, here is the regex:
Here is a demo. In code, it looks like this:
Here is an online demo of this code.
Reference
这可能是您想要的东西: http://snipplr.com/view/3618/< /a>
一般来说,我建议不要这样做。更好的选择是去掉所有 HTML 标签,转而依赖 BBcode,例如:
但是,我意识到这可能不太适合您想要做的事情。
另一种选择可能是 HTML Purifier,请参阅:http://htmlpurifier.org/
This might be the kind of thing that you're after: http://snipplr.com/view/3618/
In general, I'd advise against such. A better alternative is to strip out all HTML tags and instead rely on BBcode, such as:
However I appreciate that this might not work well with what you're trying to do.
Another option may be HTML Purifier, see: http://htmlpurifier.org/
从我的想法来看,这应该有效:
但是,我不知道这有多安全。我只是提出一种可能性:)
From top of my mind, this should be working:
But, I don't know how safe this would be. I am just presenting a possibility :)