如何以文本形式检索元素混合子元素 (JDOM)
我有一个如下所示的 XML:
<documentation>
This value must be <i>bigger</i> than the other.
</documentation>
使用 JDOM,我可以获得以下文本结构:
Document d = new SAXBuilder().build( new StringReader( s ) );
System.out.printf( "getText: '%s'%n", d.getRootElement().getText() );
System.out.printf( "getTextNormalize: '%s'%n", d.getRootElement().getTextNormalize() );
System.out.printf( "getTextTrim: '%s'%n", d.getRootElement().getTextTrim() );
System.out.printf( "getValue: '%s'%n", d.getRootElement().getValue() );
这给了我以下输出:
getText: '
This value must be than the other.
'
getTextNormalize: 'This value must be than the other.'
getTextTrim: 'This value must be than the other.'
getValue: '
This value must be bigger than the other.
'
我真正想要的是以字符串形式获取元素的内容,即 "This value必须比另一个大。”
。 getValue()
很接近,但删除了 标记。我想我想要类似
innerHTML
的 XML 文档...
我应该在内容上使用 XMLOutputter 吗?或者有更好的选择吗?
I have an XML like the following:
<documentation>
This value must be <i>bigger</i> than the other.
</documentation>
Using JDOM, I can get the following text structures:
Document d = new SAXBuilder().build( new StringReader( s ) );
System.out.printf( "getText: '%s'%n", d.getRootElement().getText() );
System.out.printf( "getTextNormalize: '%s'%n", d.getRootElement().getTextNormalize() );
System.out.printf( "getTextTrim: '%s'%n", d.getRootElement().getTextTrim() );
System.out.printf( "getValue: '%s'%n", d.getRootElement().getValue() );
which give me the following outputs:
getText: '
This value must be than the other.
'
getTextNormalize: 'This value must be than the other.'
getTextTrim: 'This value must be than the other.'
getValue: '
This value must be bigger than the other.
'
What I really wanted was to get the content of the element as a string, namely, "This value must be <i>bigger</i> than the other."
. getValue()
comes close but removes the <i>
tag. I guess I wanted something like innerHTML
for XML documents...
Should I just use an XMLOutputter on the contents? Or is there a better alternative?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
在 JDOM 伪代码中:
然而,正如 Prashant Bhate 所写: content.getText() 给出即时文本,该文本仅在以下情况下有用:带有文本内容的叶子元素。
In JDOM pseudocode:
However, as Prashant Bhate wrote: content.getText() gives immediate text which is only useful fine with the leaf elements with text content.
Jericho HTML 非常适合此类任务。您可以使用这样的代码块准确地完成您想要做的事情:
它对于一般的 HTML 处理也非常有用,因为它不会试图强制它成为 XML...它处理它的方式要宽松得多。
Jericho HTML is great for this sort of task. You can accomplish exactly what you're trying to do with a code block like this:
It's also great for working with HTML in general because it doesn't try to force it into being XML...it deals with it much more leniently.
我想说您应该更改您的文档
以遵守 XML 规范。否则,
将被视为
的子元素,而不是内容。I'd say you should change your document to
in order to adhere to the XML specification. Otherwise
<i>
would be considered a child element of<documentation>
and not content.