如何以文本形式检索元素混合子元素 (JDOM)

发布于 2024-11-04 06:31:15 字数 1113 浏览 6 评论 0原文

我有一个如下所示的 XML:

<documentation>
    This value must be <i>bigger</i> than the other.
</documentation>

使用 JDOM,我可以获得以下文本结构:

Document d = new SAXBuilder().build( new StringReader( s ) );
System.out.printf( "getText:          '%s'%n", d.getRootElement().getText() );
System.out.printf( "getTextNormalize: '%s'%n", d.getRootElement().getTextNormalize() );
System.out.printf( "getTextTrim:      '%s'%n", d.getRootElement().getTextTrim() );
System.out.printf( "getValue:         '%s'%n", d.getRootElement().getValue() );

这给了我以下输出:

getText:          '
    This value must be  than the other.
'
getTextNormalize: 'This value must be than the other.'
getTextTrim:      'This value must be  than the other.'
getValue:         '
    This value must be bigger than the other.
'

我真正想要的是以字符串形式获取元素的内容,即 "This value必须比另一个大。”getValue() 很接近,但删除了 标记。我想我想要类似 innerHTML 的 XML 文档...

我应该在内容上使用 XMLOutputter 吗?或者有更好的选择吗?

I have an XML like the following:

<documentation>
    This value must be <i>bigger</i> than the other.
</documentation>

Using JDOM, I can get the following text structures:

Document d = new SAXBuilder().build( new StringReader( s ) );
System.out.printf( "getText:          '%s'%n", d.getRootElement().getText() );
System.out.printf( "getTextNormalize: '%s'%n", d.getRootElement().getTextNormalize() );
System.out.printf( "getTextTrim:      '%s'%n", d.getRootElement().getTextTrim() );
System.out.printf( "getValue:         '%s'%n", d.getRootElement().getValue() );

which give me the following outputs:

getText:          '
    This value must be  than the other.
'
getTextNormalize: 'This value must be than the other.'
getTextTrim:      'This value must be  than the other.'
getValue:         '
    This value must be bigger than the other.
'

What I really wanted was to get the content of the element as a string, namely, "This value must be <i>bigger</i> than the other.". getValue() comes close but removes the <i> tag. I guess I wanted something like innerHTML for XML documents...

Should I just use an XMLOutputter on the contents? Or is there a better alternative?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

已下线请稍等 2024-11-11 06:31:15

在 JDOM 伪代码中:

for Object o in d.getRootElement().getContents()
   if o instanceOf Element
      print <o.getName>o.getText</o.getName>
   else // it's a text
      print o.getText() 

然而,正如 Prashant Bhate 所写: content.getText() 给出即时文本,该文本仅在以下情况下有用:带有文本内容的叶子元素。

In JDOM pseudocode:

for Object o in d.getRootElement().getContents()
   if o instanceOf Element
      print <o.getName>o.getText</o.getName>
   else // it's a text
      print o.getText() 

However, as Prashant Bhate wrote: content.getText() gives immediate text which is only useful fine with the leaf elements with text content.

一百个冬季 2024-11-11 06:31:15

Jericho HTML 非常适合此类任务。您可以使用这样的代码块准确地完成您想要做的事情:

String snippet = new Source(html).getFirstElement().getContent().toString();

它对于一般的 HTML 处理也非常有用,因为它不会试图强制它成为 XML...它处理它的方式要宽松得多。

Jericho HTML is great for this sort of task. You can accomplish exactly what you're trying to do with a code block like this:

String snippet = new Source(html).getFirstElement().getContent().toString();

It's also great for working with HTML in general because it doesn't try to force it into being XML...it deals with it much more leniently.

小猫一只 2024-11-11 06:31:15

我想说您应该更改您的文档

<documentation>
  <![CDATA[This value must be <i>bigger</i> than the other.]]>
</documentation>

以遵守 XML 规范。否则, 将被视为 的子元素,而不是内容。

I'd say you should change your document to

<documentation>
  <![CDATA[This value must be <i>bigger</i> than the other.]]>
</documentation>

in order to adhere to the XML specification. Otherwise <i> would be considered a child element of <documentation> and not content.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文