简单的 HTML Dom:如何删除元素?
我想使用 Simple HTML DOM 删除文章中的所有图像,以便我可以轻松地为新闻滚动条创建一小段文本,但我还没有弄清楚如何用它删除元素。
基本上我会做
- 以 HTML 字符串形式获取内容
- 从内容中删除所有图像标签
- 将内容限制为 x 个单词
- 输出。
有什么帮助吗?
I would like to use Simple HTML DOM to remove all images in an article so I can easily create a small snippet of text for a news ticker but I haven't figured out how to remove elements with it.
Basically I would do
- Get content as HTML string
- Remove all image tags from content
- Limit content to x words
- Output.
Any help?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(11)
没有专门的方法来删除元素。你只需找到所有的 img 元素然后做
There is no dedicated methods for removing elements. You just find all the img elements and then do
当您只删除外部文本时,您会删除 HTML 内容本身,但如果您对相同元素执行另一次查找,它将出现在结果中。原因是简单的 HTML DOM 对象仍然具有元素的内部结构,只是没有其实际内容。为了真正删除该元素,您需要做的只是将 HTML 作为字符串重新加载到同一变量中。这样,将重新创建对象而不删除已删除的内容,并且将在没有删除内容的情况下构建简单的 HTML DOM 对象。
这是一个示例函数:
将此函数放入 simple_html_dom 类中,就可以了。
when you only delete the outer text you delete the HTML content itself, but if you perform another find on the same elements it will appear in the result. the reason is that the simple HTML DOM object still has it's internal structure of the element, only without its actual content. what you need to do in order to really delete the element is simply reload the HTML as string to the same variable. this way the object will be recreated without the deleted content, and the simple HTML DOM object will be built without it.
here is an example function:
put this function inside the simple_html_dom class and you're good.
我认为你遇到了一些困难,因为你忘记保存(将内部 DOM 树转储回字符串)。
试试这个:
I think you have some difficulties because you forgot to save(dump the internal DOM tree back into string).
Try this:
我不知道该函数应该放在哪里,所以我只是将以下内容直接放入我的代码中:
它基本上将 for 循环中所做的更改锁定回上面的 html 中。
I could not figure out where to put the function so I just put the following directly in my code:
It basically locks changes made in the for loop back into the html per above.
所谓的解决方案非常昂贵,并且在大循环或其他类型的重复中实际上无法使用。
我更喜欢使用“软删除”:
The supposed solutions are quite expensive and practically unusable in a big loop or other kind of repetition.
I prefer to use "soft deletes":
这对我有用:
This is working for me:
添加新答案,因为
removeNode
绝对是删除它的更好方法:当标记接受的答案时,此方法可能不可用。您不需要循环 html 来查找每一个,这将删除它们。
Adding new answer since
removeNode
is definitely a better way of removing it:This method probably was not available when accepted answer was marked. You do not need to loop the html to find each one, this will remove them.
使用 outerhtml 而不是 outertext
Use outerhtml instead of outertext
试试这个:
Try this:
现在可以使用:
您可以在此处查看该方法的文档。
This works now:
You can see the documentation for the method here.
下面我使用 FIND() 函数的 2 种不同方法删除传入 url 的 HEADER 和所有 SCRIPT 节点。删除第二个参数以返回所有匹配节点的数组,然后循环遍历节点。
Below I remove the HEADER and all SCRIPT nodes of the incoming url by using 2 different methods of the FIND() function. Remove the 2nd parameter to return an array of all matching nodes then just loop through the nodes.