使用 SAX 读取 XML,跳过传递 org.xml.sax.SAXParseException 的节点

发布于 2024-12-17 12:14:23 字数 1074 浏览 0 评论 0原文

我正在使用 SAX 读取 XML (javax.xml.parsers.SAXParser;)。在该 XML 中,子节点值中有一些特殊字符,例如 (&,<,>,",')。因此,到目前为止,SAX 已成功读取 XML,但此时它会抛出 < 。

例如,在下面的示例 XML 中,SAX 成功读取了节点值,但它会抛出此异常 org.xml.sax.SAXParseException 处,因为 Name 参数的值中有 <

<Parent>
   <child1>
      LS-23541723
   </child1>
   <child2 id="2" Name="T-Shirt And Denim - T<D" Rate="500.00">
   </child2>
   <child3>
      <![CDATA[This is the child 2]]>
   </child3>
   <child4>
      <![CDATA[This is the child 4]]>
   </child4>
</Parent>

。无法事先确定包含这些特殊字符的节点。(它是动态的。)所以,我想做的是,用 SAX 读取 XML,忽略包含这些特殊字符的节点。简单地说,我想我如果可以阅读的话可以这样做使用 SAX 的 XML,跳过传递 org.xml.sax.SAXParseException 的节点。

这可能吗?如果是的话,如何实现?

注意:我不能简单地用 &amp; 之类的实体引用替换它们。因为,有时 XML 节点也会与 &lt;&gt; 一起传递( )。因此,在开始使用 SAX 读取它之前,我将所有实体引用替换为字符引用。(replaceAll("&gt;",">"),etc)

I am reading a XML using SAX (javax.xml.parsers.SAXParser;). In that XML, there are some special character like (&,<,>,",') available among the child node values. So, upto that point SAX read the XML successfully, but on that point it throws a org.xml.sax.SAXParseException.

For an example, in the below sample XML, SAX reads up to the node value of the successfully. But it throws this org.xml.sax.SAXParseException at the since the value of Name argument has < in there.

<Parent>
   <child1>
      LS-23541723
   </child1>
   <child2 id="2" Name="T-Shirt And Denim - T<D" Rate="500.00">
   </child2>
   <child3>
      <![CDATA[This is the child 2]]>
   </child3>
   <child4>
      <![CDATA[This is the child 4]]>
   </child4>
</Parent>

I can't determine the nodes that contains these special characters before hand.(It is dyanamic.) So, What I wanna do is, reading an XML with SAX, ignoring the nodes that contains these like special characters.. Simply, I think I can do this if it is possible to read the XML with SAX, skipping the nodes that pass the org.xml.sax.SAXParseException.

Is this possible and if yes how?

Note : I cannot simply replace them with the Entity Refrences like & since, some times the XML nodes are comming with the < , > as well ( is comming as <child1>). So, before starting to read it with SAX, I replace all the Entity References with the Character References.(replaceAll(">",">"),etc)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

爱,才寂寞 2024-12-24 12:14:23

我认为 SAX 无法处理这个问题。 XML 必须格式良好。因此,在将文本提交到 SAX 之前,您必须进行大量替换。查找任何位置不正确的 '"<之间的 "' 位于 '< 之间,不是开始标记或结束标记的一部分。这应该是可行的。这是第一遍之后的第二遍替换<> 由其等效的对应项组成。
理想情况下,您还应该注意注释、CDATA 部分等...以确保它们格式良好。

I don't think that SAX can handle this. The XML has to be well-formed. Thus you have to make a bunch of replacements before the text is submitted to SAX. Look for any ', " or < that are not in the right places. " between ", ' between ' and < that is not part of a start tag or end tag. That should be feasible. That's the second pass after your first pass that replaces < and > by their equivalent counterparts.
Ideally you should also watch for comments, CDATA section, etc... to be sure they are well-formed.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文