如何在 CDATA 之外编写未转义的 XML

发布于 2024-09-04 21:45:29 字数 1068 浏览 6 评论 0原文

我正在尝试使用 Stax 编写 XML 数据,其中内容本身是 HTML

如果我尝试

xtw.writeStartElement("contents");
xtw.writeCharacters("<b>here</b>");
xtw.writeEndElement();

得到这个

<contents>&lt;b&gt;here&lt;/b&gt;</contents>

然后我注意到 CDATA 方法并将我的代码更改为:

xtw.writeStartElement("contents");
xtw.writeCData("<b>here</b>");
xtw.writeEndElement();

这次结果

<contents><![CDATA[<b>here</b>]]></contents>

仍然不好。 我真正想要的是

<contents><b>here</b></contents>

那么是否有一个 XML API/库允许我在不使用 CDATA 部分的情况下编写原始文本?到目前为止,我已经查看了 Stax 和 JDom,他们似乎没有提供此功能。

最后,我可能会求助于老式的 StringBuilder,但这并不优雅。

更新:

我基本上同意到目前为止的答案。不过,我可以使用一个 1MB 的 HTML 文档,而不是 here,并将其嵌入到更大的 XML 文档中。你的建议意味着我必须解析这个 HTML 文档才能理解它的结构。如果可能的话我想避免这种情况。

答案:

这是不可能的,否则您可能会创建无效的 XML 文档。

I am trying to write XML data using Stax where the content itself is HTML

If I try

xtw.writeStartElement("contents");
xtw.writeCharacters("<b>here</b>");
xtw.writeEndElement();

I get this

<contents><b>here</b></contents>

Then I notice the CDATA method and change my code to:

xtw.writeStartElement("contents");
xtw.writeCData("<b>here</b>");
xtw.writeEndElement();

and this time the result is

<contents><![CDATA[<b>here</b>]]></contents>

which is still not good. What I really want is

<contents><b>here</b></contents>

So is there an XML API/Library that allows me to write raw text without being in a CDATA section? So far I have looked at Stax and JDom and they do not seem to offer this.

In the end I might resort to good old StringBuilder but this would not be elegant.

Update:

I agree mostly with the answers so far. However instead of <b>here</b> I could have a 1MB HTML document that I want to embed in a bigger XML document. What you suggest means that I have to parse this HTML document in order to understand its structure. I would like to avoid this if possible.

Answer:

It is not possible, otherwise you could create invalid XML documents.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

悲喜皆因你 2024-09-11 21:45:30

问题是这不是原始文本,它是一个元素,所以你应该写

xtw.writeStartElement("contents");
xtw.writeStartElement("b");
xtw.writeCData("here");
xtw.writeEndElement();
xtw.writeEndElement();

The issue is that is not raw text it is an element so you should be writing

xtw.writeStartElement("contents");
xtw.writeStartElement("b");
xtw.writeCData("here");
xtw.writeEndElement();
xtw.writeEndElement();
允世 2024-09-11 21:45:30

如果您希望将 XML 作为 XML 而不是字符数据包含在内,则必须在某个时刻对其进行解析。如果您不想自己手动进行解析,则有两种选择:

(1) 使用外部解析实体——在这种情况下,外部文件将被 XML 解析器拉入并解析。当输出再次序列化时,它将包含外部文件的内容。

[ 请参阅 http://www.javacommerce.com/displaypage。 jsp?name=entities.sql&id=18238 ]

(2) 使用 Xinclude —— 在这种情况下,文件必须通过 xinclude 处理器运行,该处理器会将 xinclude 引用合并到输出中。大多数 xslt 处理器以及 xmllint 也会使用适当的选项执行 xinclude。

[参见:http://www.xml.com/pub /a/2002/07/31/xinclude.html]

(XSLT也可以用来合并文档,而不需要使用XInclude语法。XInclude只是提供了标准语法)

If you want the XML to be included AS XML and not as character data, then it has to be parsed at some point. If you don't want to manually do the parsing yourself, you have two alternatives:

(1) Use external parsed entities -- in this case the external file will be pulled in and parsed by the XML parser. When the output is again serialized, it will include the contents of the external file.

[ See http://www.javacommerce.com/displaypage.jsp?name=entities.sql&id=18238 ]

(2) Use Xinclude -- in that case the file has to be run thru an xinclude processor which will merge the xinclude references into the output. Most xslt processors, as well as xmllint will also do xinclude with an appropriate option.

[ See: http://www.xml.com/pub/a/2002/07/31/xinclude.html ]

( XSLT can also be used to merge documents without using the XInclude syntax. XInclude just provides a standard syntax )

梦中的蝴蝶 2024-09-11 21:45:30

问题不在于“这里”,而在于

添加 元素作为内容的子元素,您就可以做到这一点。任何库(如 JDOM 或 DOM4J)都允许您执行此操作。一般情况是将内容解析为 XML DOM 并将根元素添加为 的子元素。

您不能在 CDATA 部分之外添加转义值。

The problem is not "here", it's <b></b>.

Add the <b> element as a child of contents and you'll be able to do it. Any library like JDOM or DOM4J will allow you to do this. The general case is to parse the content into an XML DOM and add the root element as a child of <contents>.

You can't add escaped values outside of a CDATA section.

花想c 2024-09-11 21:45:30

如果您想在 XML 文档中嵌入大型 HTML 文档,那么 CDATA 是最佳选择。这样您就不必理解或处理内部结构,并且稍后可以轻松地将文档类型从 HTML 更改为其他类型。另外,我认为您不能直接嵌入例如 DOCTYPE 指令(即作为保留 DOCTYPE 指令语义的结构化数据)。它们必须被表示为字符。

(这主要是对您的更新的回应,但可惜我没有足够的代表来发表评论…………)

If you want to embed a large HTML document in an XML document then CDATA imho is the way to go. That way you don't have to understand or process the internal structure and you can later change the document type from HTML to something else without much hassle. Also I think you can't embed e.g. DOCTYPE instructions directly (i.e. as structured data that retains the semantics of the DOCTYPE instruction). They have to be represented as characters.

(This is primarily a response to your update but alas I don't have enough rep to comment...............)

独夜无伴 2024-09-11 21:45:30

我不明白解析要插入到输出中的大 XML 块有什么问题。使用 StAX 解析器来解析它,只需编写代码将所有事件转发到现有的序列化程序(变量“xtw”)。

I don't see what the problem is with parsing the large block of XML you want to insert into your output. Use a StAX parser to parse it, and just write code to forward all of the events to your existing serializer (variable "xtw").

一百个冬季 2024-09-11 21:45:30

如果 html 的 blob 实际上是 xhtml 那么我建议做类似的事情(以伪代码):

xtw.writeStartElement("contents")
XMLReader  xtr=new XMLReader();
xtr.read(blob);
Dom dom=xtr.getDom();
for(element e:dom){
    xtw.writeElement(e);
}
xtw.writeEndElement();

或类似的事情。我不得不做一次类似的事情,但使用了不同的库。

If the blob of html is actually xhtml then I'd suggest doing something like (in pseudo-code):

xtw.writeStartElement("contents")
XMLReader  xtr=new XMLReader();
xtr.read(blob);
Dom dom=xtr.getDom();
for(element e:dom){
    xtw.writeElement(e);
}
xtw.writeEndElement();

or something like that. I had to do something similar once but used a different library.

楠木可依 2024-09-11 21:45:30

如果您的 XML 和 HTML 不是太大,您可以采取解决方法:

xtw.writeStartElement("contents");
xtw.writeCharacters("anUniqueIdentifierForReplace"); // <--
xtw.writeEndElement();

当您将 XML 作为字符串时:

xmlAsString.replace("anUniqueIdentifierForReplace", yourHtmlAsString);

我知道,这不太好,但这可以工作。

编辑:当然,您应该检查 yourHtmlAsString 是否有效。

If your XML and HTML are not too big, you could make a workaround:

xtw.writeStartElement("contents");
xtw.writeCharacters("anUniqueIdentifierForReplace"); // <--
xtw.writeEndElement();

When you have your XML as a String:

xmlAsString.replace("anUniqueIdentifierForReplace", yourHtmlAsString);

I know, it's not so nice, but this could work.

Edit: Of course, you should check if yourHtmlAsString is valid.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文