preg_replace 删除空标签但保留块引号的末尾
我创建这个表达式是为了删除页面中的所有空标签(包括只有空格的标签)。
$content = preg_replace('/<[^\/>]*>([\s]?)*<\/[^>]*>/', '', $content);
它工作得很好,直到它必须处理这样的内容...
<blockquote>
<p >foo bar</p>
</blockquote>
<p ><a href="image.jpg" rel="lightbox" title=""><img title="image" src="image.jpg" /></a><br /></p>
并将其输出为...
<blockquote>
<p >this is a test for the pluggin</p>
<p ><a href="image.jpg" rel="lightbox" title=""><img title="image" src="image.jpg" /></a><br /></p>
从而删除 。
我一直在摸索这个问题,但无法让它发挥作用。除了指定应该格式化哪些标签之外,任何人都可以看到明显的解决方案吗?我还应该说它正在格式化 WordPress 帖子上的“the_content”。
I made this expression to remove all empty (inluding tags with just whitespace) tags in the page.
$content = preg_replace('/<[^\/>]*>([\s]?)*<\/[^>]*>/', '', $content);
It worked a treat until it had to deal with content like this...
<blockquote>
<p >foo bar</p>
</blockquote>
<p ><a href="image.jpg" rel="lightbox" title=""><img title="image" src="image.jpg" /></a><br /></p>
and it outputs it as...
<blockquote>
<p >this is a test for the pluggin</p>
<p ><a href="image.jpg" rel="lightbox" title=""><img title="image" src="image.jpg" /></a><br /></p>
Thus removing the </blockquote>
.
I have been scratching my head on this one and can't get it working. Can anyone see an obvious solution other than specifying what tags it should format? I should also say that it is formatting 'the_content' on a wordpress post.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
正则表达式和 HTML 并不是一个很好的匹配,因为 HTML 不是常规语法,并且存在无穷无尽的边缘情况和陷阱。您最好使用 HTML 解析器(例如 这个)并检查/操作 DOM 对象。
Regexps and HTML are not a good match, since HTML is not a regular syntax, and there are no end of edge cases and gotchas. You'll be better off using an HTML parser such as this one and inspecting/manipulating the DOM object.
您可能还想看看 HTML Purifier,如果您发现它,它比 Simple HTML Dom 更高级没有获得所有标签。
You might also like to take a look at HTML Purifier, which is more advanced than Simple HTML Dom, if you find it doesn't get all the tags.