XmlSlurper/NekoHTML 文档片段解析 - 不需要 HTML 或 BODY 标签
亲爱的大家,我正在尝试解析以下 HTML 片段,并且我希望获得与输出相同的片段(没有 HTML 和 BODY 标签)。这可能吗?如果是这样,怎么办?
谢谢 米莎
附注我在这里读: http://nekohtml.sourceforge.net/faq.html#fragments 我相信我已经在下面添加了正确的选项。但是,输出仍然不正确:(
谢谢 米莎
import groovy.xml.MarkupBuilder
import groovy.xml.StreamingMarkupBuilder
import groovy.util.XmlNodePrinter
import groovy.util.slurpersupport.NodeChild
def text="""
<div><h2>Test</h2>
<div>Hi</div>
</div>
"""
// Parse
def config=new org.cyberneko.html.HTMLConfiguration()
config.setFeature("http://cyberneko.org/html/features/balance-tags/document-fragment",true)
def html=new XmlSlurper(new org.cyberneko.html.parsers.SAXParser()).parseText(text)
// Output
def printNode(NodeChild node) {
def writer = new StringWriter()
writer << new StreamingMarkupBuilder().bind {
mkp.declareNamespace('':node[0].namespaceURI())
mkp.yield node
}
new XmlNodePrinter().print(new XmlParser().parseText(writer.toString()))
}
printNode(html)
输出:
<HTML>
<tag0:HEAD xmlns:tag0="http://www.w3.org/1999/xhtml"/>
<BODY>
<DIV>
<H2>
Test
</H2>
<DIV>
Hi
</DIV>
</DIV>
</BODY>
</HTML>
Dear All, I am trying to parse the following HTML fragment, and I would like to get the same fragment as output (without HTML and BODY tags). Is this possible? If so, how?
Thank you
Misha
p.s. I am reading here:
http://nekohtml.sourceforge.net/faq.html#fragments
and I believe I have added the correct options below. However, the output is still incorrect :(
Thank you
Misha
import groovy.xml.MarkupBuilder
import groovy.xml.StreamingMarkupBuilder
import groovy.util.XmlNodePrinter
import groovy.util.slurpersupport.NodeChild
def text="""
<div><h2>Test</h2>
<div>Hi</div>
</div>
"""
// Parse
def config=new org.cyberneko.html.HTMLConfiguration()
config.setFeature("http://cyberneko.org/html/features/balance-tags/document-fragment",true)
def html=new XmlSlurper(new org.cyberneko.html.parsers.SAXParser()).parseText(text)
// Output
def printNode(NodeChild node) {
def writer = new StringWriter()
writer << new StreamingMarkupBuilder().bind {
mkp.declareNamespace('':node[0].namespaceURI())
mkp.yield node
}
new XmlNodePrinter().print(new XmlParser().parseText(writer.toString()))
}
printNode(html)
Output:
<HTML>
<tag0:HEAD xmlns:tag0="http://www.w3.org/1999/xhtml"/>
<BODY>
<DIV>
<H2>
Test
</H2>
<DIV>
Hi
</DIV>
</DIV>
</BODY>
</HTML>
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
直接在解析器对象上调用 setFeature,如下所示:
Call setFeature on the parser object directly, like so: