使用 xsd 将 xhtml 解组为字符串
我正在尝试使用 XSD 和 jaxb 解组一个大型 xhtml 文档。除了包含纯 html 的一部分之外,我已经一切正常。这是我得到的 xhtml 的示例(我能够获取除“内容”之外的每个元素):
<feed xmlns="http://www.w3.org/2005/Atom">
<title type="text">...</title>
<id>...</id>
<updated>...</updated>
<entry>
<id>...</id>
<title type="text">...</title>
<updated>...</updated>
<author>
<name>...</name>
</author>
<content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml">
<div>{html...}<div>{html...}</div>/<div>/<div>
</content>
</entry>
</feed>
这是 xsd 文件的扩展:
<xsd:complexType name="ApCategoriesJAXB" >
<xsd:sequence>
<xsd:element name="id" type="xsd:string" minOccurs="1" maxOccurs="1"></xsd:element>
<xsd:element name="title" type="xsd:string" minOccurs="1" maxOccurs="1"></xsd:element>
<xsd:element name="updated" type="xsd:string" minOccurs="1" maxOccurs="1"></xsd:element>
<xsd:element name="link" type="tns:ApLinkJAXB" minOccurs="0"></xsd:element>
<xsd:element name="entry" type="tns:ApEntryJAXB" minOccurs="0" maxOccurs="unbounded"></xsd:element>
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="ApEntryJAXB">
<xsd:sequence>
<xsd:element name="id" type="xsd:string" minOccurs="1" maxOccurs="1"></xsd:element>
<xsd:element name="name" type="xsd:string" minOccurs="0"></xsd:element>
<xsd:element name="title" type="xsd:string" minOccurs="1" maxOccurs="1"></xsd:element>
<xsd:element name="updated" type="xsd:string" minOccurs="1" maxOccurs="1"></xsd:element>
<xsd:element name="author" type="tns:ApAuthorJAXB" minOccurs="0"></xsd:element>
<xsd:element name="link" type="tns:ApLinkJAXB" minOccurs="0"></xsd:element>
<xsd:element name="category" type="tns:ApCategoryJAXB" minOccurs="0"></xsd:element>
<xsd:element name="content" type="tns:ApContentJAXB" minOccurs="0"></xsd:element>
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="ApCategoryJAXB" >
<xsd:sequence></xsd:sequence>
<xsd:attribute name="term" type="xsd:string" />
<xsd:attribute name="label" type="xsd:string" />
<xsd:attribute name="scheme" type="xsd:string" />
</xsd:complexType>
<xsd:complexType name="ApContentJAXB" >
<xsd:sequence>
<xsd:element name="div" type="tns:ApDivJAXB" minOccurs="0" maxOccurs="unbounded"></xsd:element>
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="ApDivJAXB" >
<xsd:sequence>
<xsd:any namespace="http://www.w3.org/2005/Atom" processContents="lax" minOccurs="0" maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
我已经尝试了嵌套 xsd 元素、complexTypes、xsd:any 等的每种组合等等,无论我如何尝试,似乎都无法获得这个“内容”值。我很高兴将所有 html 作为字符串,或者将其解组为一个对象。
预先感谢您的任何想法。
** 我已经编辑了 xsd 部分以包含相关部分。我已经尝试将“any”元素嵌套在“div”complexType 中(如图所示),以及完全跳过“div”complexType。
再次感谢。
I'm trying to unmarshal a large xhtml document using XSD's and jaxb. I've got everything working except for one part, which contains pure html. Here is an example of the xhtml I'm getting (I am able to grab every element except the "content"):
<feed xmlns="http://www.w3.org/2005/Atom">
<title type="text">...</title>
<id>...</id>
<updated>...</updated>
<entry>
<id>...</id>
<title type="text">...</title>
<updated>...</updated>
<author>
<name>...</name>
</author>
<content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml">
<div>{html...}<div>{html...}</div>/<div>/<div>
</content>
</entry>
</feed>
Here's an expansion of the xsd file:
<xsd:complexType name="ApCategoriesJAXB" >
<xsd:sequence>
<xsd:element name="id" type="xsd:string" minOccurs="1" maxOccurs="1"></xsd:element>
<xsd:element name="title" type="xsd:string" minOccurs="1" maxOccurs="1"></xsd:element>
<xsd:element name="updated" type="xsd:string" minOccurs="1" maxOccurs="1"></xsd:element>
<xsd:element name="link" type="tns:ApLinkJAXB" minOccurs="0"></xsd:element>
<xsd:element name="entry" type="tns:ApEntryJAXB" minOccurs="0" maxOccurs="unbounded"></xsd:element>
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="ApEntryJAXB">
<xsd:sequence>
<xsd:element name="id" type="xsd:string" minOccurs="1" maxOccurs="1"></xsd:element>
<xsd:element name="name" type="xsd:string" minOccurs="0"></xsd:element>
<xsd:element name="title" type="xsd:string" minOccurs="1" maxOccurs="1"></xsd:element>
<xsd:element name="updated" type="xsd:string" minOccurs="1" maxOccurs="1"></xsd:element>
<xsd:element name="author" type="tns:ApAuthorJAXB" minOccurs="0"></xsd:element>
<xsd:element name="link" type="tns:ApLinkJAXB" minOccurs="0"></xsd:element>
<xsd:element name="category" type="tns:ApCategoryJAXB" minOccurs="0"></xsd:element>
<xsd:element name="content" type="tns:ApContentJAXB" minOccurs="0"></xsd:element>
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="ApCategoryJAXB" >
<xsd:sequence></xsd:sequence>
<xsd:attribute name="term" type="xsd:string" />
<xsd:attribute name="label" type="xsd:string" />
<xsd:attribute name="scheme" type="xsd:string" />
</xsd:complexType>
<xsd:complexType name="ApContentJAXB" >
<xsd:sequence>
<xsd:element name="div" type="tns:ApDivJAXB" minOccurs="0" maxOccurs="unbounded"></xsd:element>
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="ApDivJAXB" >
<xsd:sequence>
<xsd:any namespace="http://www.w3.org/2005/Atom" processContents="lax" minOccurs="0" maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
I have tried every combination of nested xsd elements, complexTypes, xsd:any etc etc and cannot seem to get this "content" value no matter what I try. I am happy to take all the html as a string, or unmarshal it into an object.
Thank you in advance for any thoughts.
** I've edited the xsd part to include relevant parts. I've tried both nesting the "any" element in the "div" complexType as seen, as well as skipping the "div" complexType altogether.
Thanks again.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果您希望
具有xsd:string
类型,则需要对 HTML 进行编码或转义。您可以使用 CDATA 部分、Base64 编码或转义所有实体(例如<
到<
等)。否则,
xsd:any
应该可以工作。当您尝试此操作时,您能提供更完整的 XSD 示例吗?If you want
<content>
to have a type ofxsd:string
, you need to encode or otherwise escape the HTML. You could use CDATA sections, Base64 encoding, or escape all the entities (e.g.<
to<
, etc.).Otherwise,
xsd:any
should work. Can you provide a more complete example of your XSD when you tried this?