剥离 HTML 标签及其内容
我正在使用 DOM 来解析字符串。我需要剥离跨度标签及其内容的功能。例如,如果我有:
This is some text that contains photo.
<span class='title'> photobyile</span>
我希望函数返回
This is some text that contains photo.
这是我尝试过的:
$dom = new domDocument;
$dom->loadHTML($string);
$dom->preserveWhiteSpace = false;
$spans = $dom->getElementsByTagName('span');
foreach($spans as $span)
{
$naslov = $span->nodeValue;
echo $naslov;
$string = preg_replace("/$naslov/", " ", $string);
}
我知道 $span->nodeValue
返回 span 标记的值而不是整个标记,但我不知道不知道如何获取整个标签以及类名。
谢谢, 岛
I'm using DOM to parse string. I need function that strips span tags and its contents. For example, if I have:
This is some text that contains photo.
<span class='title'> photobyile</span>
I would like function to return
This is some text that contains photo.
This is what I tried:
$dom = new domDocument;
$dom->loadHTML($string);
$dom->preserveWhiteSpace = false;
$spans = $dom->getElementsByTagName('span');
foreach($spans as $span)
{
$naslov = $span->nodeValue;
echo $naslov;
$string = preg_replace("/$naslov/", " ", $string);
}
I'm aware that $span->nodeValue
returns value of span tag and not whole tag, but I don't know how to get whole tag, together with class name.
Thanks,
Ile
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
尝试直接从 DOM 树中删除跨度。
Try removing the spans directly from the DOM tree.
@ile - 我遇到了这个问题 - 这是因为 foreach 迭代器的索引愉快地不断增加,而在 DOM 上调用removeChild()似乎也从 DomNodeList ($spans) 中删除节点。因此,对于您删除的每个跨度,节点列表都会缩小一个元素,然后将其 foreach 计数器加一。最终结果:它跳过一个跨度。
我确信有一种更优雅的方法,但这就是我的做法 - 我将引用从 DomNodeList 移动到第二个数组,在该数组中它们不会被 removeChild() 操作删除。
@ile - I've had that problem - it's because the index of the foreach iterator happily keeps incrementing, while calling removeChild() on the DOM also seems to remove the nodes from the DomNodeList ($spans). So for every span you remove, the nodelist shrinks one element and then gets its foreach counter incremented by one. Net result: it skips one span.
I'm sure there is a more elegant way, but this is how I did it - I moved the references from the DomNodeList to a second array, where they would not be removed by the removeChild() operation.