删除inner_html中的注释

发布于 2024-12-11 09:15:26 字数 271 浏览 0 评论 0原文

我有一些使用 Nokogiri 的代码，我试图在不获取注释的情况下获取 inner_html 。

html = Nokogiri::HTML(open(@sql_scripts_url[1])) #using first value of the array
html.css('td[class="ms-formbody"]').each do |node|
  puts node.inner_html # prints comments
end

原文

I have some code that uses Nokogiri and I am trying to get the inner_html without getting the comments.

html = Nokogiri::HTML(open(@sql_scripts_url[1])) #using first value of the array
html.css('td[class="ms-formbody"]').each do |node|
  puts node.inner_html # prints comments
end

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

甜宝宝 2024-12-18 09:15:26

由于您没有提供任何示例 HTML 或所需的输出，因此这里有一个通用解决方案：

您可以使用 comment() 节点测试；您可以通过调用 .remove 在所有评论节点上。图解：

require 'nokogiri'
doc  = Nokogiri.XML('<r><b>hello</b> <!-- foo --> world</r>')
p doc.inner_html                        #=> "<b>hello</b> <!-- foo --> world"
doc.xpath('//comment()').remove
p doc.inner_html                        #=> "<b>hello</b>  world"

请注意，上面破坏性地修改了文档以删除注释。如果您希望保持原始文档不被修改，您也可以这样做：

class Nokogiri::XML::Node
  def inner_html_reject(xpath='.//comment()')
    dup.tap{ |shadow| shadow.xpath(xpath).remove }.inner_html
  end
end

doc = Nokogiri.XML('<r><b>hello</b> <!-- foo --> world</r>')
p doc.inner_html_reject #=> "<r><b>hello</b>  world</r>"
p doc.inner_html        #=> "<r><b>hello</b> <!-- foo --> world</r>"

最后，请注意，如果您不需要标记，则仅要求 text 本身不包含 HTML 注释：

p doc.text              #=> "hello  world"

Since you have not provided any sample HTML or desired output, here's a general solution:

You can select SGML comments in XPath by using the comment() node test; you can strip them out of the document by calling .remove on all comment nodes. Illustrated:

require 'nokogiri'
doc  = Nokogiri.XML('<r><b>hello</b> <!-- foo --> world</r>')
p doc.inner_html                        #=> "<b>hello</b> <!-- foo --> world"
doc.xpath('//comment()').remove
p doc.inner_html                        #=> "<b>hello</b>  world"

Note that the above modifies the document destructively to remove the comments. If you wish to keep the original document unmodified, you could alternatively do this:

class Nokogiri::XML::Node
  def inner_html_reject(xpath='.//comment()')
    dup.tap{ |shadow| shadow.xpath(xpath).remove }.inner_html
  end
end

doc = Nokogiri.XML('<r><b>hello</b> <!-- foo --> world</r>')
p doc.inner_html_reject #=> "<r><b>hello</b>  world</r>"
p doc.inner_html        #=> "<r><b>hello</b> <!-- foo --> world</r>"

Finally, note that if you don't need the markup, just asking for the text itself does not include HTML comments:

p doc.text              #=> "hello  world"

回复收藏 0 原文

~没有更多了~

关于作者

清晨说晚安

我之所以活到现在的全部意义，是为了此刻能对你说，我爱你，我会在你身后永远守护你。

0 文章

0 评论

21959 人气

关注发私信

友情链接

文江博客

删除inner_html中的注释

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

胡图图

zt006

z祗昰~

冰葑

野の

天空

友情链接

删除inner_html中的注释

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

胡图图

zt006

z祗昰~

冰葑

野の

天空

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。