使用 simpleHTML 删除嵌套标签
我正在尝试使用 simple_html_dom 删除 HTML 片段中的所有范围,并且我使用以下内容:
$body = "<span class='outer' style='background:red'>x<span class='mid' style='background:purple'>y<span class='inner' style='background:orange'>z</span></span></span>";
$HTML = new simple_html_dom;
$HTML->load($body);
$spans = $HTML->find('span');
foreach($spans as $span_tag) {
echo "working on ". $span_tag->class . " ... ";
echo "setting " . $span_tag->outertext . " equal to " . $span_tag->innertext . "<br/>\n";
$span_tag->outertext = (string)$span_tag->innertext;
}
$text = $HTML->save();
$HTML->clear();
unset($HTML);
echo "<br/>The Cleaned TEXT is: $text<br/>";
这是我的浏览器中的结果:
http://www.pixeloution.com/RAC/clean.gif
那么为什么我最终只删除了最外面的跨度呢?
编辑
实际上,如果有更简单的方法可以做到这一点,我很乐意。目的是删除标签,但保留其中的任何内容,包括其他标签,否则我只需使用 $obj->paintext
Edit #2
好吧......显然我得到了它的工作,尽管奇怪的是我仍然喜欢如果有人以前遇到过这个问题,就可以真正理解这个问题。我知道这只是删除了最外层的跨度,所以我这样做了:
function cleanSpansRecursive(&$body) {
$HTML = new simple_html_dom;
$HTML->load($body);
$spans = $HTML->find('span');
foreach($spans as $span_tag) {
$span_tag->outertext = (string)$span_tag->innertext;
}
$body = (string)$HTML;
if($HTML->find('span')) {
$HTML->clear();
unset($HTML);
cleanSpansRecursive($body);
} else {
$HTML->clear();
unset($HTML);
}
}
而且它似乎有效。
I'm trying to use simple_html_dom to remove all the spans from a snippet of HTML, and I'm using the following:
$body = "<span class='outer' style='background:red'>x<span class='mid' style='background:purple'>y<span class='inner' style='background:orange'>z</span></span></span>";
$HTML = new simple_html_dom;
$HTML->load($body);
$spans = $HTML->find('span');
foreach($spans as $span_tag) {
echo "working on ". $span_tag->class . " ... ";
echo "setting " . $span_tag->outertext . " equal to " . $span_tag->innertext . "<br/>\n";
$span_tag->outertext = (string)$span_tag->innertext;
}
$text = $HTML->save();
$HTML->clear();
unset($HTML);
echo "<br/>The Cleaned TEXT is: $text<br/>";
And here's the result in my browser:
http://www.pixeloution.com/RAC/clean.gif
So why is it I'm only ending up with the outer most span removed?
Edit
Actually if there's an easier way to do this, I'm game. The object is to remove the tags but keep anything inside them including other tags, or else I'd just use $obj->paintext
Edit #2
Okay ... apparently I got it working, although oddly enough I'd still like to actually understand the problem if anyone ran into this before. Knowing it was only removing the outermost span, I did this:
function cleanSpansRecursive(&$body) {
$HTML = new simple_html_dom;
$HTML->load($body);
$spans = $HTML->find('span');
foreach($spans as $span_tag) {
$span_tag->outertext = (string)$span_tag->innertext;
}
$body = (string)$HTML;
if($HTML->find('span')) {
$HTML->clear();
unset($HTML);
cleanSpansRecursive($body);
} else {
$HTML->clear();
unset($HTML);
}
}
And it seems to work.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我的计算机或开发服务器上没有安装
simple_html_dom
,因此我无法测试,但从外观上看,设置$span_tag->outertext
将创建外部跨度内有新的span
对象,因此旧的引用将不再存在于$HTML
中。从最里面到外面应该可以修复它,因为引用将保持完整。编辑:在第二次编辑中,每次进行替换时,您都会找到新创建的跨度,这就是它起作用的原因。
I don't have
simple_html_dom
installed on my machine or dev server so I can't test, but from the looks of it, setting$span_tag->outertext
will create newspan
objects inside the outer span, so the old references will no longer exist in$HTML
. Going from innermost to outer should fix it since the references would be kept intact.EDIT: In your second edit, you are finding the newly-created spans every time you do a replacement, which is why it works.