简单的 HTML Dom:如何删除元素?

发布于 2024-12-17 08:16:36 字数 204 浏览 5 评论 0原文

我想使用 Simple HTML DOM 删除文章中的所有图像,以便我可以轻松地为新闻滚动条创建一小段文本,但我还没有弄清楚如何用它删除元素。

基本上我会做

  1. 以 HTML 字符串形式获取内容
  2. 从内容中删除所有图像标签
  3. 将内容限制为 x 个单词
  4. 输出。

有什么帮助吗?

I would like to use Simple HTML DOM to remove all images in an article so I can easily create a small snippet of text for a news ticker but I haven't figured out how to remove elements with it.

Basically I would do

  1. Get content as HTML string
  2. Remove all image tags from content
  3. Limit content to x words
  4. Output.

Any help?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(11

场罚期间 2024-12-24 08:16:36

没有专门的方法来删除元素。你只需找到所有的 img 元素然后做

$e->outertext = '';

There is no dedicated methods for removing elements. You just find all the img elements and then do

$e->outertext = '';
依 靠 2024-12-24 08:16:36

当您只删除外部文本时,您会删除 HTML 内容本身,但如果您对相同元素执行另一次查找,它将出现在结果中。原因是简单的 HTML DOM 对象仍然具有元素的内部结构,只是没有其实际内容。为了真正删除该元素,您需要做的只是将 HTML 作为字符串重新加载到同一变量中。这样,将重新创建对象而不删除已删除的内容,并且将在没有删除内容的情况下构建简单的 HTML DOM 对象。

这是一个示例函数:

public function removeNode($selector)
{
    foreach ($this->find($selector) as $node)
    {
        $node->outertext = '';
    }

    $this->load($this->save());        
}

将此函数放入 simple_html_dom 类中,就可以了。

when you only delete the outer text you delete the HTML content itself, but if you perform another find on the same elements it will appear in the result. the reason is that the simple HTML DOM object still has it's internal structure of the element, only without its actual content. what you need to do in order to really delete the element is simply reload the HTML as string to the same variable. this way the object will be recreated without the deleted content, and the simple HTML DOM object will be built without it.

here is an example function:

public function removeNode($selector)
{
    foreach ($this->find($selector) as $node)
    {
        $node->outertext = '';
    }

    $this->load($this->save());        
}

put this function inside the simple_html_dom class and you're good.

御弟哥哥 2024-12-24 08:16:36

我认为你遇到了一些困难,因为你忘记保存(将内部 DOM 树转储回字符串)。

试试这个:

$html = file_get_html("http://example.com");

foreach($html ->find('img') as $item) {
    $item->outertext = '';
    }

$html->save();

echo $html;

I think you have some difficulties because you forgot to save(dump the internal DOM tree back into string).

Try this:

$html = file_get_html("http://example.com");

foreach($html ->find('img') as $item) {
    $item->outertext = '';
    }

$html->save();

echo $html;
通知家属抬走 2024-12-24 08:16:36

我不知道该函数应该放在哪里,所以我只是将以下内容直接放入我的代码中:

$html->load($html->save());

它基本上将 for 循环中所做的更改锁定回上面的 html 中。

I could not figure out where to put the function so I just put the following directly in my code:

$html->load($html->save());

It basically locks changes made in the for loop back into the html per above.

酒与心事 2024-12-24 08:16:36

所谓的解决方案非常昂贵,并且在大循环或其他类型的重复中实际上无法使用。

我更喜欢使用“软删除”:

foreach($html->find('somecondition'),$item){
    if (somecheck) $item->setAttribute('softDelete', true); //<= set marker to check in further code
    $item->outertext='';


   foreach($foo as $bar){
       if(!baz->getAttribute('softDelete'){
           //do something 
        }
    }

}

The supposed solutions are quite expensive and practically unusable in a big loop or other kind of repetition.

I prefer to use "soft deletes":

foreach($html->find('somecondition'),$item){
    if (somecheck) $item->setAttribute('softDelete', true); //<= set marker to check in further code
    $item->outertext='';


   foreach($foo as $bar){
       if(!baz->getAttribute('softDelete'){
           //do something 
        }
    }

}
对你的占有欲 2024-12-24 08:16:36

这对我有用:

foreach($html->find('element') as $element){
   $element = NULL;
}

This is working for me:

foreach($html->find('element') as $element){
   $element = NULL;
}
朕就是辣么酷 2024-12-24 08:16:36

添加新答案,因为 removeNode 绝对是删除它的更好方法:

$html->removeNode('img');

当标记接受的答案时,此方法可能不可用。您不需要循环 html 来查找每一个,这将删除它们。

Adding new answer since removeNode is definitely a better way of removing it:

$html->removeNode('img');

This method probably was not available when accepted answer was marked. You do not need to loop the html to find each one, this will remove them.

梦里人 2024-12-24 08:16:36

使用 outerhtml 而不是 outertext

<div id='your_div'>the contents of your div</div>

$your_div->outertext = '';
echo $your_div // echoes <div id='your_div'></div>

$your_div->outerhtml= '';
echo $your_div // echoes nothing

Use outerhtml instead of outertext

<div id='your_div'>the contents of your div</div>

$your_div->outertext = '';
echo $your_div // echoes <div id='your_div'></div>

$your_div->outerhtml= '';
echo $your_div // echoes nothing
岁月染过的梦 2024-12-24 08:16:36

试试这个:

$dom = new Dom();
$dom->loadStr($text);
foreach ($dom->find('element') as $element) {
   $element->delete();
}

Try this:

$dom = new Dom();
$dom->loadStr($text);
foreach ($dom->find('element') as $element) {
   $element->delete();
}
白首有我共你 2024-12-24 08:16:36

现在可以使用:

$element->remove();

您可以在此处查看该方法的文档。

This works now:

$element->remove();

You can see the documentation for the method here.

雨后彩虹 2024-12-24 08:16:36

下面我使用 FIND() 函数的 2 种不同方法删除传入 url 的 HEADER 和所有 SCRIPT 节点。删除第二个参数以返回所有匹配节点的数组,然后循环遍历节点。

$clean_html = file_get_html($url);
 
// Find and remove 1st instance of node.   
$node = $clean_html->find('header', 0);
$node->remove();       

// Find and remove all instances of Nde.
$nodes = $clean_html->find('script');
foreach($nodes as $node) {
    $node->remove();       
}

Below I remove the HEADER and all SCRIPT nodes of the incoming url by using 2 different methods of the FIND() function. Remove the 2nd parameter to return an array of all matching nodes then just loop through the nodes.

$clean_html = file_get_html($url);
 
// Find and remove 1st instance of node.   
$node = $clean_html->find('header', 0);
$node->remove();       

// Find and remove all instances of Nde.
$nodes = $clean_html->find('script');
foreach($nodes as $node) {
    $node->remove();       
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文